Title: | Bayesian Deep Gaussian Processes using MCMC |
---|---|
Description: | Performs Bayesian posterior inference for deep Gaussian processes following Sauer, Gramacy, and Higdon (2023, <doi:10.48550/arXiv.2012.08015>). See Sauer (2023, <http://hdl.handle.net/10919/114845>) for comprehensive methodological details and <https://bitbucket.org/gramacylab/deepgp-ex/> for a variety of coding examples. Models are trained through MCMC including elliptical slice sampling of latent Gaussian layers and Metropolis-Hastings sampling of kernel hyperparameters. Vecchia-approximation for faster computation is implemented following Sauer, Cooper, and Gramacy (2023, <doi:10.48550/arXiv.2204.02904>). Optional monotonic warpings are implemented following Barnett et al. (2024, <doi:10.48550/arXiv.2408.01540>). Downstream tasks include sequential design through active learning Cohn/integrated mean squared error (ALC/IMSE; Sauer, Gramacy, and Higdon, 2023), optimization through expected improvement (EI; Gramacy, Sauer, and Wycoff, 2022 <doi:10.48550/arXiv.2112.07457>), and contour location through entropy (Booth, Renganathan, and Gramacy, 2024 <doi:10.48550/arXiv.2308.04420>). Models extend up to three layers deep; a one layer model is equivalent to typical Gaussian process regression. Incorporates OpenMP and SNOW parallelization and utilizes C/C++ under the hood. |
Authors: | Annie S. Booth [aut, cre] |
Maintainer: | Annie S. Booth <[email protected]> |
License: | LGPL |
Version: | 1.1.3 |
Built: | 2024-11-01 11:49:13 UTC |
Source: | CRAN |
Performs Bayesian posterior inference for deep Gaussian processes following Sauer, Gramacy, and Higdon (2023). See Sauer (2023) for comprehensive methodological details and https://bitbucket.org/gramacylab/deepgp-ex/ for a variety of coding examples. Models are trained through MCMC including elliptical slice sampling of latent Gaussian layers and Metropolis-Hastings sampling of kernel hyperparameters. Vecchia-approximation for faster computation is implemented following Sauer, Cooper, and Gramacy (2023). Optional monotonic warpings are implemented following Barnett et al. (2024). Downstream tasks include sequential design through active learning Cohn/integrated mean squared error (ALC/IMSE; Sauer, Gramacy, and Higdon, 2023), optimization through expected improvement (EI; Gramacy, Sauer, and Wycoff, 2022), and contour location through entropy (Booth, Renganathan, and Gramacy, 2024). Models extend up to three layers deep; a one layer model is equivalent to typical Gaussian process regression. Incorporates OpenMP and SNOW parallelization and utilizes C/C++ under the hood.
fit_one_layer
: conducts MCMC sampling of
hyperparameters for a one layer GP
fit_two_layer
: conducts MCMC sampling of
hyperparameters and hidden layer for a two layer deep GP
fit_three_layer
: conducts MCMC sampling of
hyperparameters and hidden layers for a three layer deep GP
continue
: collects additional MCMC samples
trim
: cuts off burn-in and optionally thins
samples
predict
: calculates posterior mean and
variance over a set of input locations (optionally calculates EI or entropy)
plot
: produces trace plots, hidden layer
plots, and posterior predictive plots
ALC
: calculates active learning Cohn over
set of input locations using reference grid
IMSE
: calculates integrated mean-squared error
over set of input locations
Annie S. Booth [email protected]
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
http://hdl.handle.net/10919/114845
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
Gramacy, R. B., Sauer, A. & Wycoff, N. (2022). Triangulation candidates for Bayesian
optimization. *Advances in Neural Information Processing Systems (NeurIPS), 35,*
35933-35945. arXiv:2112.07457
Booth, A., Renganathan, S. A. & Gramacy, R. B. (2024). Contour location for
reliability in airfoil simulation experiments using deep Gaussian
processes. *In Review.* arXiv:2308.04420
Barnett, S., Beesley, L. J., Booth, A. S., Gramacy, R. B., & Osthus D. (2024). Monotonic warpings for additive and deep Gaussian processes. *In Review.* arXiv:2408.01540
# See vignette, ?fit_one_layer, ?fit_two_layer, ?fit_three_layer, # ?ALC, or ?IMSE for examples # Many more examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/
# See vignette, ?fit_one_layer, ?fit_two_layer, ?fit_three_layer, # ?ALC, or ?IMSE for examples # Many more examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/
Acts on a gp
, dgp2
, or dgp3
object.
Current version requires squared exponential covariance
(cov = "exp2"
). Calculates ALC over the input locations
x_new
using specified reference grid. If no reference grid is
specified, x_new
is used as the reference. Optionally utilizes
SNOW parallelization. User should
select the point with the highest ALC to add to the design.
ALC(object, x_new, ref, cores) ## S3 method for class 'gp' ALC(object, x_new = NULL, ref = NULL, cores = 1) ## S3 method for class 'dgp2' ALC(object, x_new = NULL, ref = NULL, cores = 1) ## S3 method for class 'dgp3' ALC(object, x_new = NULL, ref = NULL, cores = 1)
ALC(object, x_new, ref, cores) ## S3 method for class 'gp' ALC(object, x_new = NULL, ref = NULL, cores = 1) ## S3 method for class 'dgp2' ALC(object, x_new = NULL, ref = NULL, cores = 1) ## S3 method for class 'dgp3' ALC(object, x_new = NULL, ref = NULL, cores = 1)
object |
object of class |
x_new |
matrix of possible input locations, if object has been run
through |
ref |
optional reference grid for ALC approximation, if |
cores |
number of cores to utilize in parallel, by default no parallelization is used |
Not yet implemented for Vecchia-approximated fits or Matern kernels.
All iterations in the object are used in the calculation, so samples
should be burned-in. Thinning the samples using trim
will
speed up computation. This function may be used in two ways:
Option 1: called on an object with only MCMC iterations, in
which case x_new
must be specified
Option 2: called on an object that has been predicted over, in
which case the x_new
from predict
is used
In Option 2, it is recommended to set store_latent = TRUE
for
dgp2
and dgp3
objects so
latent mappings do not have to be re-calculated. Through predict
,
the user may specify a mean mapping (mean_map = TRUE
) or a full
sample from the MVN distribution over w_new
(mean_map = FALSE
). When the object has not yet been predicted
over (Option 1), the mean mapping is used.
SNOW parallelization reduces computation time but requires more memory storage. C code derived from the "laGP" package (Robert B Gramacy and Furong Sun).
list with elements:
value
: vector of ALC values, indices correspond to x_new
time
: computation time in seconds
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Seo, S, M Wallat, T Graepel, and K Obermayer. 2000. Gaussian Process Regression:
Active Data Selection and Test Point Rejection. In Mustererkennung 2000,
2734. New York, NY: SpringerVerlag.
Gramacy, RB and F Sun. (2016). laGP: Large-Scale Spatial Modeling via Local
Approximate Gaussian Processes in R. Journal of Statistical Software
72 (1), 1-46. doi:10.18637/jss.v072.i01
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # -------------------------------------------------------- # Example 1: toy step function, runs in less than 5 seconds # -------------------------------------------------------- f <- function(x) { if (x <= 0.4) return(-1) if (x >= 0.6) return(1) if (x > 0.4 & x < 0.6) return(10*(x-0.5)) } x <- seq(0.05, 0.95, length = 7) y <- sapply(x, f) x_new <- seq(0, 1, length = 100) # Fit model and calculate ALC fit <- fit_two_layer(x, y, nmcmc = 100, cov = "exp2") fit <- trim(fit, 50) fit <- predict(fit, x_new, cores = 1, store_latent = TRUE) alc <- ALC(fit) # -------------------------------------------------------- # Example 2: damped sine wave # -------------------------------------------------------- f <- function(x) { exp(-10*x) * (cos(10*pi*x - 1) + sin(10*pi*x - 1)) * 5 - 0.2 } # Training data x <- seq(0, 1, length = 30) y <- f(x) + rnorm(30, 0, 0.05) # Testing data xx <- seq(0, 1, length = 100) yy <- f(xx) plot(xx, yy, type = "l") points(x, y, col = 2) # Conduct MCMC (can replace fit_two_layer with fit_one_layer/fit_three_layer) fit <- fit_two_layer(x, y, D = 1, nmcmc = 2000, cov = "exp2") plot(fit) fit <- trim(fit, 1000, 2) # Option 1 - calculate ALC from MCMC iterations alc <- ALC(fit, xx) # Option 2 - calculate ALC after predictions fit <- predict(fit, xx, cores = 1, store_latent = TRUE) alc <- ALC(fit) # Visualize fit plot(fit) par(new = TRUE) # overlay ALC plot(xx, alc$value, type = 'l', lty = 2, axes = FALSE, xlab = '', ylab = '') # Select next design point x_new <- xx[which.max(alc$value)]
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # -------------------------------------------------------- # Example 1: toy step function, runs in less than 5 seconds # -------------------------------------------------------- f <- function(x) { if (x <= 0.4) return(-1) if (x >= 0.6) return(1) if (x > 0.4 & x < 0.6) return(10*(x-0.5)) } x <- seq(0.05, 0.95, length = 7) y <- sapply(x, f) x_new <- seq(0, 1, length = 100) # Fit model and calculate ALC fit <- fit_two_layer(x, y, nmcmc = 100, cov = "exp2") fit <- trim(fit, 50) fit <- predict(fit, x_new, cores = 1, store_latent = TRUE) alc <- ALC(fit) # -------------------------------------------------------- # Example 2: damped sine wave # -------------------------------------------------------- f <- function(x) { exp(-10*x) * (cos(10*pi*x - 1) + sin(10*pi*x - 1)) * 5 - 0.2 } # Training data x <- seq(0, 1, length = 30) y <- f(x) + rnorm(30, 0, 0.05) # Testing data xx <- seq(0, 1, length = 100) yy <- f(xx) plot(xx, yy, type = "l") points(x, y, col = 2) # Conduct MCMC (can replace fit_two_layer with fit_one_layer/fit_three_layer) fit <- fit_two_layer(x, y, D = 1, nmcmc = 2000, cov = "exp2") plot(fit) fit <- trim(fit, 1000, 2) # Option 1 - calculate ALC from MCMC iterations alc <- ALC(fit, xx) # Option 2 - calculate ALC after predictions fit <- predict(fit, xx, cores = 1, store_latent = TRUE) alc <- ALC(fit) # Visualize fit plot(fit) par(new = TRUE) # overlay ALC plot(xx, alc$value, type = 'l', lty = 2, axes = FALSE, xlab = '', ylab = '') # Select next design point x_new <- xx[which.max(alc$value)]
Acts on a gp
, gpvec
, dgp2
,
dgp2vec
, dgp3
, or dgp3vec
object.
Continues MCMC sampling of hyperparameters and hidden layers using
settings from the original object. Appends new samples to existing
samples. When vecchia = TRUE
, this function provides the option
to update Vecchia ordering/conditioning sets based on latent layer
warpings through the specification of re_approx = TRUE
.
continue(object, new_mcmc, verb, re_approx, ...) ## S3 method for class 'gp' continue(object, new_mcmc = 1000, verb = TRUE, ...) ## S3 method for class 'dgp2' continue(object, new_mcmc = 1000, verb = TRUE, ...) ## S3 method for class 'dgp3' continue(object, new_mcmc = 1000, verb = TRUE, ...) ## S3 method for class 'gpvec' continue(object, new_mcmc = 1000, verb = TRUE, re_approx = FALSE, ...) ## S3 method for class 'dgp2vec' continue(object, new_mcmc = 1000, verb = TRUE, re_approx = FALSE, ...) ## S3 method for class 'dgp3vec' continue(object, new_mcmc = 1000, verb = TRUE, re_approx = FALSE, ...)
continue(object, new_mcmc, verb, re_approx, ...) ## S3 method for class 'gp' continue(object, new_mcmc = 1000, verb = TRUE, ...) ## S3 method for class 'dgp2' continue(object, new_mcmc = 1000, verb = TRUE, ...) ## S3 method for class 'dgp3' continue(object, new_mcmc = 1000, verb = TRUE, ...) ## S3 method for class 'gpvec' continue(object, new_mcmc = 1000, verb = TRUE, re_approx = FALSE, ...) ## S3 method for class 'dgp2vec' continue(object, new_mcmc = 1000, verb = TRUE, re_approx = FALSE, ...) ## S3 method for class 'dgp3vec' continue(object, new_mcmc = 1000, verb = TRUE, re_approx = FALSE, ...)
object |
object from |
new_mcmc |
number of new MCMC iterations to conduct and append |
verb |
logical indicating whether to print iteration progress |
re_approx |
logical indicating whether to re-randomize the ordering
and update Vecchia nearest-neighbor conditioning sets (only for fits
with |
... |
N/A |
See fit_one_layer
, fit_two_layer
, or
fit_three_layer
for details on MCMC. The resulting
object will have nmcmc
equal to the previous nmcmc
plus
new_mcmc
. It is recommended to start an MCMC fit then
investigate trace plots to assess burn-in. The primary use of this
function is to gather more MCMC iterations in order to obtain burned-in
samples.
Specifying re_approx = TRUE
updates random orderings and
nearest-neighbor conditioning sets (only for vecchia = TRUE
fits). In one-layer, there is no latent warping but the Vecchia
approximation is still re-randomized and nearest-neighbors are adjusted
accordingly. In two- and three-layers, the latest samples of hidden
layers are used to update nearest-neighbors. If you update the
Vecchia approximation, you should later remove previous samples
(updating the approximation effectively starts a new chain). When
re_approx = FALSE
the previous orderings and conditioning sets
are used (maintaining the continuity of the previous chain).
object of the same class with the new iterations appended
# See ?fit_two_layer for an example
# See ?fit_two_layer for an example
Calculates continuous ranked probability score (lower CRPS indicate better fits, better uncertainty quantification).
crps(y, mu, s2)
crps(y, mu, s2)
y |
response vector |
mu |
predicted mean |
s2 |
predicted point-wise variances |
Gneiting, T, and AE Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association 102 (477), 359-378.
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/
Conducts MCMC sampling of hyperparameters for a one layer
GP. Length scale parameter theta
governs
the strength of the correlation and nugget parameter g
governs noise. In Matern covariance, v
governs smoothness.
fit_one_layer( x, y, nmcmc = 10000, sep = FALSE, verb = TRUE, g_0 = 0.001, theta_0 = 0.1, true_g = NULL, settings = NULL, cov = c("matern", "exp2"), v = 2.5, vecchia = FALSE, m = min(25, length(y) - 1), ordering = NULL )
fit_one_layer( x, y, nmcmc = 10000, sep = FALSE, verb = TRUE, g_0 = 0.001, theta_0 = 0.1, true_g = NULL, settings = NULL, cov = c("matern", "exp2"), v = 2.5, vecchia = FALSE, m = min(25, length(y) - 1), ordering = NULL )
x |
vector or matrix of input locations |
y |
vector of response values |
nmcmc |
number of MCMC iterations |
sep |
logical indicating whether to use separable ( |
verb |
logical indicating whether to print iteration progress |
g_0 |
initial value for |
theta_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
ordering |
optional ordering for Vecchia approximation, must correspond
to rows of |
Utilizes Metropolis Hastings sampling of the length scale and
nugget parameters with proposals and priors controlled by
settings
. When true_g
is set to a specific value, the
nugget is not estimated. When vecchia = TRUE
, all calculations
leverage the Vecchia approximation with specified conditioning set size
m
. Vecchia approximation is only implemented for
cov = "matern"
.
NOTE on OpenMP: The Vecchia implementation relies on OpenMP parallelization for efficient computation. This function will produce a warning message if the package was installed without OpenMP (this is the default for CRAN packages installed on Apple machines). To set up OpenMP parallelization, download the package source code and install using the gcc/g++ compiler.
Proposals for g
and theta
follow a uniform sliding window
scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
and theta
follow Gamma distributions with
shape parameters (alpha
) and rate parameters (beta
)
controlled within the settings
list object.
Defaults have been updated with package version 1.1.3. Default priors differ
for noisy/deterministic settings and depend on whether monowarp = TRUE
.
All default values are visible in the internal
deepgp:::check_settings
function.
These priors are designed for x
scaled
to [0, 1] and y
scaled to have mean 0 and variance 1. These may
be adjusted using the settings
input.
The output object of class gp
is designed for use with
continue
, trim
, and predict
.
a list of the S3 class gp
or gpvec
with elements:
x
: copy of input matrix
y
: copy of response vector
nmcmc
: number of MCMC iterations
settings
: copy of proposal/prior settings
v
: copy of Matern smoothness parameter (v = 999
indicates cov = "exp2"
)
g
: vector of MCMC samples for g
theta
: vector of MCMC samples for theta
tau2
: vector of MLE estimates for tau2
(scale parameter)
ll
: vector of MVN log likelihood for each Gibbs iteration
time
: computation time in seconds
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # Booth function (inspired by the Higdon function) f <- function(x) { i <- which(x <= 0.58) x[i] <- sin(pi * x[i] * 6) + cos(pi * x[i] * 12) x[-i] <- 5 * x[-i] - 4.9 return(x) } # Training data x <- seq(0, 1, length = 25) y <- f(x) # Testing data xx <- seq(0, 1, length = 200) yy <- f(xx) plot(xx, yy, type = "l") points(x, y, col = 2) # Example 1: full model (nugget fixed) fit <- fit_one_layer(x, y, nmcmc = 2000, true_g = 1e-6) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit) # Example 2: full model (nugget estimated, EI calculated) fit <- fit_one_layer(x, y, nmcmc = 2000) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1, EI = TRUE) plot(fit) par(new = TRUE) # overlay EI plot(xx[order(xx)], fit$EI[order(xx)], type = 'l', lty = 2, axes = FALSE, xlab = '', ylab = '') # Example 3: Vecchia approximated model (nugget estimated) fit <- fit_one_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit)
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # Booth function (inspired by the Higdon function) f <- function(x) { i <- which(x <= 0.58) x[i] <- sin(pi * x[i] * 6) + cos(pi * x[i] * 12) x[-i] <- 5 * x[-i] - 4.9 return(x) } # Training data x <- seq(0, 1, length = 25) y <- f(x) # Testing data xx <- seq(0, 1, length = 200) yy <- f(xx) plot(xx, yy, type = "l") points(x, y, col = 2) # Example 1: full model (nugget fixed) fit <- fit_one_layer(x, y, nmcmc = 2000, true_g = 1e-6) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit) # Example 2: full model (nugget estimated, EI calculated) fit <- fit_one_layer(x, y, nmcmc = 2000) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1, EI = TRUE) plot(fit) par(new = TRUE) # overlay EI plot(xx[order(xx)], fit$EI[order(xx)], type = 'l', lty = 2, axes = FALSE, xlab = '', ylab = '') # Example 3: Vecchia approximated model (nugget estimated) fit <- fit_one_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit)
Conducts MCMC sampling of hyperparameters, hidden layer
z
, and hidden layer w
for a three layer deep GP.
Separate length scale parameters theta_z
,
theta_w
, and theta_y
govern the correlation
strength of the inner layer, middle layer, and outer layer respectively.
Nugget parameter g
governs noise on the outer layer. In Matern
covariance, v
governs smoothness.
fit_three_layer( x, y, nmcmc = 10000, D = ifelse(is.matrix(x), ncol(x), 1), verb = TRUE, w_0 = NULL, z_0 = NULL, g_0 = 0.001, theta_y_0 = 0.1, theta_w_0 = 0.1, theta_z_0 = 0.1, true_g = NULL, settings = NULL, cov = c("matern", "exp2"), v = 2.5, vecchia = FALSE, m = min(25, length(y) - 1), ordering = NULL )
fit_three_layer( x, y, nmcmc = 10000, D = ifelse(is.matrix(x), ncol(x), 1), verb = TRUE, w_0 = NULL, z_0 = NULL, g_0 = 0.001, theta_y_0 = 0.1, theta_w_0 = 0.1, theta_z_0 = 0.1, true_g = NULL, settings = NULL, cov = c("matern", "exp2"), v = 2.5, vecchia = FALSE, m = min(25, length(y) - 1), ordering = NULL )
x |
vector or matrix of input locations |
y |
vector of response values |
nmcmc |
number of MCMC iterations |
D |
integer designating dimension of hidden layers, defaults to
dimension of |
verb |
logical indicating whether to print iteration progress |
w_0 |
initial value for hidden layer |
z_0 |
initial value for hidden layer |
g_0 |
initial value for |
theta_y_0 |
initial value for |
theta_w_0 |
initial value for |
theta_z_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
ordering |
optional ordering for Vecchia approximation, must correspond
to rows of |
pmx = TRUE
option not yet implemented for three-layer DGP.
Maps inputs x
through hidden layer z
then hidden
layer w
to outputs y
. Conducts sampling of the hidden
layers using Elliptical Slice sampling. Utilizes Metropolis Hastings
sampling of the length scale and nugget parameters with proposals and
priors controlled by settings
. When true_g
is set to a
specific value, the nugget is not estimated. When
vecchia = TRUE
, all calculations leverage the Vecchia
approximation with specified conditioning set size m
. Vecchia
approximation is only implemented for cov = "matern"
.
NOTE on OpenMP: The Vecchia implementation relies on OpenMP parallelization for efficient computation. This function will produce a warning message if the package was installed without OpenMP (this is the default for CRAN packages installed on Apple machines). To set up OpenMP parallelization, download the package source code and install using the gcc/g++ compiler.
Proposals for g
,
theta_y
, theta_w
, and theta_z
follow a uniform
sliding window scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
, theta_y
, theta_w
, and theta_z
follow Gamma distributions with shape parameters (alpha
) and rate
parameters (beta
) controlled within the settings
list
object. Defaults have been updated with package version 1.1.3.
Default priors differ for noisy/deterministic settings and
depend on whether monowarp = TRUE
. All default values are
visible in the internal deepgp:::check_settings
function.
These priors are designed for x
scaled to [0, 1] and y
scaled to have mean 0 and variance 1. These may be adjusted using the
settings
input.
In the current version, the three-layer does not have any equivalent
setting for monowarp = TRUE
or pmx = TRUE
as in
fit_two_layer
.
When w_0 = NULL
and/or z_0 = NULL
, the hidden layers are
initialized at x
(i.e. the identity mapping). The default prior
mean of the inner hidden layer z
is zero, but may be adjusted to x
using settings = list(z_prior_mean = x)
. The prior mean of the
middle hidden layer w
is set at zero is is not user adjustable.
If w_0
and/or z_0
is of dimension nrow(x) - 1
by
D
, the final row is predicted using kriging. This is helpful in
sequential design when adding a new input location and starting the MCMC
at the place where the previous MCMC left off.
The output object of class dgp3
or dgp3vec
is designed for
use with continue
, trim
, and predict
.
a list of the S3 class dgp3
or dgp3vec
with elements:
x
: copy of input matrix
y
: copy of response vector
nmcmc
: number of MCMC iterations
settings
: copy of proposal/prior settings
v
: copy of Matern smoothness parameter (v = 999
indicates cov = "exp2"
)
g
: vector of MCMC samples for g
theta_y
: vector of MCMC samples for theta_y
(length
scale of outer layer)
theta_w
: matrix of MCMC samples for theta_w
(length
scale of middle layer)
theta_z
: matrix of MCMC samples for theta_z
(length
scale of inner layer)
tau2
: vector of MLE estimates for tau2
(scale
parameter of outer layer)
w
: list of MCMC samples for middle hidden layer w
z
: list of MCMC samples for inner hidden layer z
ll
: vector of MVN log likelihood of the outer layer
for reach Gibbs iteration
time
: computation time in seconds
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # G function in 2 dimensions (https://www.sfu.ca/~ssurjano/gfunc.html) f <- function(xx, a = (c(1:length(xx)) - 1) / 2) { new1 <- abs(4 * xx - 2) + a new2 <- 1 + a prod <- prod(new1 / new2) return((prod - 1) / 0.86) } # Training data d <- 2 n <- 30 x <- matrix(runif(n * d), ncol = d) y <- apply(x, 1, f) # Testing data n_test <- 500 xx <- matrix(runif(n_test * d), ncol = d) yy <- apply(xx, 1, f) i <- interp::interp(xx[, 1], xx[, 2], yy) image(i, col = heat.colors(128)) contour(i, add = TRUE) contour(i, level = -0.5, col = 4, add = TRUE) # potential failure limit points(x) # Example 1: full model (nugget estimated, entropy calculated) fit <- fit_three_layer(x, y, nmcmc = 2000) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, entropy_limit = -0.5, cores = 1) plot(fit) i <- interp::interp(xx[, 1], xx[, 2], fit$entropy) image(i, col = heat.colors(128), main = "Entropy") # Example 2: Vecchia approximated model (nugget fixed) # (Vecchia approximation is faster for larger data sizes) fit <- fit_three_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10, true_g = 1e-6) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit)
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # G function in 2 dimensions (https://www.sfu.ca/~ssurjano/gfunc.html) f <- function(xx, a = (c(1:length(xx)) - 1) / 2) { new1 <- abs(4 * xx - 2) + a new2 <- 1 + a prod <- prod(new1 / new2) return((prod - 1) / 0.86) } # Training data d <- 2 n <- 30 x <- matrix(runif(n * d), ncol = d) y <- apply(x, 1, f) # Testing data n_test <- 500 xx <- matrix(runif(n_test * d), ncol = d) yy <- apply(xx, 1, f) i <- interp::interp(xx[, 1], xx[, 2], yy) image(i, col = heat.colors(128)) contour(i, add = TRUE) contour(i, level = -0.5, col = 4, add = TRUE) # potential failure limit points(x) # Example 1: full model (nugget estimated, entropy calculated) fit <- fit_three_layer(x, y, nmcmc = 2000) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, entropy_limit = -0.5, cores = 1) plot(fit) i <- interp::interp(xx[, 1], xx[, 2], fit$entropy) image(i, col = heat.colors(128), main = "Entropy") # Example 2: Vecchia approximated model (nugget fixed) # (Vecchia approximation is faster for larger data sizes) fit <- fit_three_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10, true_g = 1e-6) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit)
Conducts MCMC sampling of hyperparameters and hidden layer
w
for a two layer deep GP. Separate length scale
parameters theta_w
and theta_y
govern the correlation
strength of the hidden layer and outer layer respectively. Nugget
parameter g
governs noise on the outer layer. In Matern
covariance, v
governs smoothness.
fit_two_layer( x, y, nmcmc = 10000, D = ifelse(is.matrix(x), ncol(x), 1), monowarp = FALSE, pmx = FALSE, verb = TRUE, w_0 = NULL, g_0 = 0.001, theta_y_0 = 0.1, theta_w_0 = 0.1, true_g = NULL, settings = NULL, cov = c("matern", "exp2"), v = 2.5, vecchia = FALSE, m = min(25, length(y) - 1), ordering = NULL )
fit_two_layer( x, y, nmcmc = 10000, D = ifelse(is.matrix(x), ncol(x), 1), monowarp = FALSE, pmx = FALSE, verb = TRUE, w_0 = NULL, g_0 = 0.001, theta_y_0 = 0.1, theta_w_0 = 0.1, true_g = NULL, settings = NULL, cov = c("matern", "exp2"), v = 2.5, vecchia = FALSE, m = min(25, length(y) - 1), ordering = NULL )
x |
vector or matrix of input locations |
y |
vector of response values |
nmcmc |
number of MCMC iterations |
D |
integer designating dimension of hidden layer, defaults to
dimension of |
monowarp |
indicates whether warpings should be forced to be
monotonic. Input may be a matrix of
grid points (or a vector which will be applied to every dimension)
for interpolation of the cumulative sum, an integer
specifying the number of grid points to use over the range [0, 1],
or simply the boolean |
pmx |
"prior mean x", logical indicating whether |
verb |
logical indicating whether to print iteration progress |
w_0 |
initial value for hidden layer |
g_0 |
initial value for |
theta_y_0 |
initial value for |
theta_w_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
ordering |
optional ordering for Vecchia approximation, must correspond
to rows of |
Maps inputs x
through hidden layer w
to outputs
y
. Conducts sampling of the hidden layer using Elliptical
Slice sampling. Utilizes Metropolis Hastings sampling of the length
scale and nugget parameters with proposals and priors controlled by
settings
. When true_g
is set to a specific value, the
nugget is not estimated. When vecchia = TRUE
, all calculations
leverage the Vecchia approximation with specified conditioning set size
m
. Vecchia approximation is only implemented for
cov = "matern"
.
When monowarp = TRUE
, each input dimension is warped separately and
monotonically. This requires D = ncol(x)
. Monotonic warpings trigger
separable lengthscales on the outer layer (theta_y
). As a default, monotonic
warpings use the reference grid: seq(0, 1, length = 50)
. The grid size
may be controlled by passing a numeric integer to monowarp
(i.e., monowarp = 100
uses the grid seq(0, 1, length = 100)
).
Alternatively, any user-specified grid may be passed as the argument to
monowarp
.
When pmx = TRUE
, the prior on the latent layer is set at x
(rather than the default of zero). This requires D = ncol(x)
. If
pmx
is set to a numeric value, then that value is used as the scale
parameter on the latent layer. Specifying a small value here encourages
an identity mapping.
NOTE on OpenMP: The Vecchia implementation relies on OpenMP parallelization for efficient computation. This function will produce a warning message if the package was installed without OpenMP (this is the default for CRAN packages installed on Apple machines). To set up OpenMP parallelization, download the package source code and install using the gcc/g++ compiler.
Proposals for g
, theta_y
, and
theta_w
follow a uniform sliding window scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
, theta_y
, and theta_w
follow Gamma
distributions with shape parameters (alpha
) and rate parameters
(beta
) controlled within the settings
list object.
Defaults have been updated with package version 1.1.3. Default priors differ
for noisy/deterministic settings and depend on whether monowarp = TRUE
.
All default values are visible in the internal
deepgp:::check_settings
function.
These priors are designed for x
scaled to
[0, 1] and y
scaled to have mean 0 and variance 1. These may be
adjusted using the settings
input.
When w_0 = NULL
, the hidden layer is initialized at x
(i.e. the identity mapping). If w_0
is of dimension
nrow(x) - 1
by D
, the final row is predicted using kriging.
This is helpful in sequential design when adding a new input location
and starting the MCMC at the place where the previous MCMC left off.
The output object of class dgp2
or dgp2vec
is designed for
use with continue
, trim
, and predict
.
a list of the S3 class dgp2
or dgp2vec
with elements:
x
: copy of input matrix
y
: copy of response vector
nmcmc
: number of MCMC iterations
settings
: copy of proposal/prior settings
v
: copy of Matern smoothness parameter (v = 999
indicates cov = "exp2"
)
g
: vector of MCMC samples for g
theta_y
: vector of MCMC samples for theta_y
(length
scale of outer layer)
theta_w
: matrix of MCMC samples for theta_w
(length
scale of inner layer)
tau2
: vector of MLE estimates for tau2
(scale
parameter of outer layer)
w
: list of MCMC samples for hidden layer w
ll
: vector of MVN log likelihood of the outer layer
for reach Gibbs iteration
time
: computation time in seconds
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
Barnett, S., Beesley, L. J., Booth, A. S., Gramacy, R. B., & Osthus D. (2024). Monotonic
warpings for additive and deep Gaussian processes. *In Review.* arXiv:2408.01540
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # Booth function (inspired by the Higdon function) f <- function(x) { i <- which(x <= 0.58) x[i] <- sin(pi * x[i] * 6) + cos(pi * x[i] * 12) x[-i] <- 5 * x[-i] - 4.9 return(x) } # Training data x <- seq(0, 1, length = 25) y <- f(x) # Testing data xx <- seq(0, 1, length = 200) yy <- f(xx) plot(xx, yy, type = "l") points(x, y, col = 2) # Example 1: full model (nugget estimated, using continue) fit <- fit_two_layer(x, y, nmcmc = 1000) plot(fit) fit <- continue(fit, 1000) plot(fit, hidden = TRUE) # trace plots and ESS samples fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit) # Example 2: Vecchia approximated model (nugget estimated) # (Vecchia approximation is faster for larger data sizes) fit <- fit_two_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10) plot(fit, hidden = TRUE) # trace plots and ESS samples fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit) # Example 3: Vecchia approximated model, re-approximated after burn-in fit <- fit_two_layer(x, y, nmcmc = 1000, vecchia = TRUE, m = 10) fit <- continue(fit, 1000, re_approx = TRUE) plot(fit, hidden = TRUE) # trace plots and ESS samples fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit) # Example 4: full model with monotonic warpings (nugget estimated) fit <- fit_two_layer(x, y, nmcmc = 2000, monowarp = TRUE) plot(fit, hidden = TRUE) # trace plots and ESS samples fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit)
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # Booth function (inspired by the Higdon function) f <- function(x) { i <- which(x <= 0.58) x[i] <- sin(pi * x[i] * 6) + cos(pi * x[i] * 12) x[-i] <- 5 * x[-i] - 4.9 return(x) } # Training data x <- seq(0, 1, length = 25) y <- f(x) # Testing data xx <- seq(0, 1, length = 200) yy <- f(xx) plot(xx, yy, type = "l") points(x, y, col = 2) # Example 1: full model (nugget estimated, using continue) fit <- fit_two_layer(x, y, nmcmc = 1000) plot(fit) fit <- continue(fit, 1000) plot(fit, hidden = TRUE) # trace plots and ESS samples fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit) # Example 2: Vecchia approximated model (nugget estimated) # (Vecchia approximation is faster for larger data sizes) fit <- fit_two_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10) plot(fit, hidden = TRUE) # trace plots and ESS samples fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit) # Example 3: Vecchia approximated model, re-approximated after burn-in fit <- fit_two_layer(x, y, nmcmc = 1000, vecchia = TRUE, m = 10) fit <- continue(fit, 1000, re_approx = TRUE) plot(fit, hidden = TRUE) # trace plots and ESS samples fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit) # Example 4: full model with monotonic warpings (nugget estimated) fit <- fit_two_layer(x, y, nmcmc = 2000, monowarp = TRUE) plot(fit, hidden = TRUE) # trace plots and ESS samples fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit)
Acts on a gp
, dgp2
, or dgp3
object.
Current version requires squared exponential covariance
(cov = "exp2"
). Calculates IMSE over the input locations
x_new
. Optionally utilizes SNOW parallelization. User should
select the point with the lowest IMSE to add to the design.
IMSE(object, x_new, cores) ## S3 method for class 'gp' IMSE(object, x_new = NULL, cores = 1) ## S3 method for class 'dgp2' IMSE(object, x_new = NULL, cores = 1) ## S3 method for class 'dgp3' IMSE(object, x_new = NULL, cores = 1)
IMSE(object, x_new, cores) ## S3 method for class 'gp' IMSE(object, x_new = NULL, cores = 1) ## S3 method for class 'dgp2' IMSE(object, x_new = NULL, cores = 1) ## S3 method for class 'dgp3' IMSE(object, x_new = NULL, cores = 1)
object |
object of class |
x_new |
matrix of possible input locations, if object has been run
through |
cores |
number of cores to utilize in parallel, by default no parallelization is used |
Not yet implemented for Vecchia-approximated fits or Matern kernels.
All iterations in the object are used in the calculation, so samples
should be burned-in. Thinning the samples using trim
will speed
up computation. This function may be used in two ways:
Option 1: called on an object with only MCMC iterations, in
which case x_new
must be specified
Option 2: called on an object that has been predicted over, in
which case the x_new
from predict
is used
In Option 2, it is recommended to set store_latent = TRUE
for
dgp2
and dgp3
objects so latent mappings do not have to
be re-calculated. Through predict
, the user may
specify a mean mapping (mean_map = TRUE
) or a full sample from
the MVN distribution over w_new
(mean_map = FALSE
). When
the object has not yet been predicted over (Option 1), the mean mapping
is used.
SNOW parallelization reduces computation time but requires more memory storage.
list with elements:
value
: vector of IMSE values, indices correspond to x_new
time
: computation time in seconds
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for
deep Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Binois, M, J Huang, RB Gramacy, and M Ludkovski. 2019. "Replication or Exploration?
Sequential Design for Stochastic Simulation Experiments." Technometrics
61, 7-23. Taylor & Francis. doi:10.1080/00401706.2018.1469433
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # -------------------------------------------------------- # Example 1: toy step function, runs in less than 5 seconds # -------------------------------------------------------- f <- function(x) { if (x <= 0.4) return(-1) if (x >= 0.6) return(1) if (x > 0.4 & x < 0.6) return(10*(x-0.5)) } x <- seq(0.05, 0.95, length = 7) y <- sapply(x, f) x_new <- seq(0, 1, length = 100) # Fit model and calculate IMSE fit <- fit_one_layer(x, y, nmcmc = 100, cov = "exp2") fit <- trim(fit, 50) fit <- predict(fit, x_new, cores = 1, store_latent = TRUE) imse <- IMSE(fit) # -------------------------------------------------------- # Example 2: Higdon function # -------------------------------------------------------- f <- function(x) { i <- which(x <= 0.48) x[i] <- 2 * sin(pi * x[i] * 4) + 0.4 * cos(pi * x[i] * 16) x[-i] <- 2 * x[-i] - 1 return(x) } # Training data x <- seq(0, 1, length = 30) y <- f(x) + rnorm(30, 0, 0.05) # Testing data xx <- seq(0, 1, length = 100) yy <- f(xx) plot(xx, yy, type = "l") points(x, y, col = 2) # Conduct MCMC (can replace fit_three_layer with fit_one_layer/fit_two_layer) fit <- fit_three_layer(x, y, D = 1, nmcmc = 2000, cov = "exp2") plot(fit) fit <- trim(fit, 1000, 2) # Option 1 - calculate IMSE from only MCMC iterations imse <- IMSE(fit, xx) # Option 2 - calculate IMSE after predictions fit <- predict(fit, xx, cores = 1, store_latent = TRUE) imse <- IMSE(fit) # Visualize fit plot(fit) par(new = TRUE) # overlay IMSE plot(xx, imse$value, col = 2, type = 'l', lty = 2, axes = FALSE, xlab = '', ylab = '') # Select next design point x_new <- xx[which.min(imse$value)]
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # -------------------------------------------------------- # Example 1: toy step function, runs in less than 5 seconds # -------------------------------------------------------- f <- function(x) { if (x <= 0.4) return(-1) if (x >= 0.6) return(1) if (x > 0.4 & x < 0.6) return(10*(x-0.5)) } x <- seq(0.05, 0.95, length = 7) y <- sapply(x, f) x_new <- seq(0, 1, length = 100) # Fit model and calculate IMSE fit <- fit_one_layer(x, y, nmcmc = 100, cov = "exp2") fit <- trim(fit, 50) fit <- predict(fit, x_new, cores = 1, store_latent = TRUE) imse <- IMSE(fit) # -------------------------------------------------------- # Example 2: Higdon function # -------------------------------------------------------- f <- function(x) { i <- which(x <= 0.48) x[i] <- 2 * sin(pi * x[i] * 4) + 0.4 * cos(pi * x[i] * 16) x[-i] <- 2 * x[-i] - 1 return(x) } # Training data x <- seq(0, 1, length = 30) y <- f(x) + rnorm(30, 0, 0.05) # Testing data xx <- seq(0, 1, length = 100) yy <- f(xx) plot(xx, yy, type = "l") points(x, y, col = 2) # Conduct MCMC (can replace fit_three_layer with fit_one_layer/fit_two_layer) fit <- fit_three_layer(x, y, D = 1, nmcmc = 2000, cov = "exp2") plot(fit) fit <- trim(fit, 1000, 2) # Option 1 - calculate IMSE from only MCMC iterations imse <- IMSE(fit, xx) # Option 2 - calculate IMSE after predictions fit <- predict(fit, xx, cores = 1, store_latent = TRUE) imse <- IMSE(fit) # Visualize fit plot(fit) par(new = TRUE) # overlay IMSE plot(xx, imse$value, col = 2, type = 'l', lty = 2, axes = FALSE, xlab = '', ylab = '') # Select next design point x_new <- xx[which.min(imse$value)]
deepgp
packageActs on a gp
, gpvec
, dgp2
, dgp2vec
,
dgp3
, or dgp3vec
object.
Generates trace plots for outer log likelihood, length scale,
and nugget hyperparameters.
Generates plots of hidden layers for one-dimensional inputs or monotonic
warpings. Generates
plots of the posterior mean and estimated 90% prediction intervals for
one-dimensional inputs; generates heat maps of the posterior mean and
point-wise variance for two-dimensional inputs.
## S3 method for class 'gp' plot(x, trace = NULL, predict = NULL, ...) ## S3 method for class 'gpvec' plot(x, trace = NULL, predict = NULL, ...) ## S3 method for class 'dgp2' plot(x, trace = NULL, hidden = NULL, predict = NULL, ...) ## S3 method for class 'dgp2vec' plot(x, trace = NULL, hidden = NULL, predict = NULL, ...) ## S3 method for class 'dgp3' plot(x, trace = NULL, hidden = NULL, predict = NULL, ...) ## S3 method for class 'dgp3vec' plot(x, trace = NULL, hidden = NULL, predict = NULL, ...)
## S3 method for class 'gp' plot(x, trace = NULL, predict = NULL, ...) ## S3 method for class 'gpvec' plot(x, trace = NULL, predict = NULL, ...) ## S3 method for class 'dgp2' plot(x, trace = NULL, hidden = NULL, predict = NULL, ...) ## S3 method for class 'dgp2vec' plot(x, trace = NULL, hidden = NULL, predict = NULL, ...) ## S3 method for class 'dgp3' plot(x, trace = NULL, hidden = NULL, predict = NULL, ...) ## S3 method for class 'dgp3vec' plot(x, trace = NULL, hidden = NULL, predict = NULL, ...)
x |
object of class |
trace |
logical indicating whether to generate trace plots (default is
TRUE if the object has not been through |
predict |
logical indicating whether to generate posterior predictive
plot (default is TRUE if the object has been through |
... |
N/A |
logical indicating whether to generate plots of hidden layers (two or three layer only, default is FALSE) |
Trace plots are useful in assessing burn-in. If there are too
many hyperparameters to plot them all, then it is most useful to
visualize the log likelihood (e.g., plot(fit$ll, type = "l")
).
Hidden layer plots are colored on a gradient - red lines represent earlier iterations and yellow lines represent later iterations - to help assess burn-in of the hidden layers. Only every 100th sample is plotted.
# See ?fit_one_layer, ?fit_two_layer, or ?fit_three_layer # for examples
# See ?fit_one_layer, ?fit_two_layer, or ?fit_three_layer # for examples
Acts on a gp
, dgp2
, or dgp3
object.
Calculates posterior mean and variance/covariance over specified input
locations. Optionally calculates expected improvement (EI) or entropy
over candidate inputs. Optionally utilizes SNOW parallelization.
## S3 method for class 'gp' predict( object, x_new, lite = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'dgp2' predict( object, x_new, lite = TRUE, store_latent = FALSE, mean_map = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'dgp3' predict( object, x_new, lite = TRUE, store_latent = FALSE, mean_map = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'gpvec' predict( object, x_new, m = object$m, ordering_new = NULL, lite = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'dgp2vec' predict( object, x_new, m = object$m, ordering_new = NULL, lite = TRUE, store_latent = FALSE, mean_map = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'dgp3vec' predict( object, x_new, m = object$m, ordering_new = NULL, lite = TRUE, store_latent = FALSE, mean_map = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... )
## S3 method for class 'gp' predict( object, x_new, lite = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'dgp2' predict( object, x_new, lite = TRUE, store_latent = FALSE, mean_map = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'dgp3' predict( object, x_new, lite = TRUE, store_latent = FALSE, mean_map = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'gpvec' predict( object, x_new, m = object$m, ordering_new = NULL, lite = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'dgp2vec' predict( object, x_new, m = object$m, ordering_new = NULL, lite = TRUE, store_latent = FALSE, mean_map = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... ) ## S3 method for class 'dgp3vec' predict( object, x_new, m = object$m, ordering_new = NULL, lite = TRUE, store_latent = FALSE, mean_map = TRUE, return_all = FALSE, EI = FALSE, entropy_limit = NULL, cores = 1, ... )
object |
object from |
x_new |
matrix of predictive input locations |
lite |
logical indicating whether to calculate only point-wise
variances ( |
return_all |
logical indicating whether to return mean and point-wise
variance prediction for ALL samples (only available for |
EI |
logical indicating whether to calculate expected improvement (for minimizing the response) |
entropy_limit |
optional limit state for entropy calculations (separating
passes and failures), default value of |
cores |
number of cores to utilize in parallel |
... |
N/A |
store_latent |
logical indicating whether to store and return mapped values of latent layers (two or three layer models only) |
mean_map |
logical indicating whether to map hidden layers using
conditional mean ( |
m |
size of Vecchia conditioning sets (only for fits with
|
ordering_new |
optional ordering for Vecchia approximation, must correspond
to rows of |
All iterations in the object are used for prediction, so samples
should be burned-in. Thinning the samples using trim
will speed
up computation. Posterior moments are calculated using conditional
expectation and variance. As a default, only point-wise variance is
calculated. Full covariance may be calculated using lite = FALSE
.
Expected improvement is calculated with the goal of minimizing the response. See Chapter 7 of Gramacy (2020) for details. Entropy is calculated based on two classes separated by the specified limit. See Sauer (2023, Chapter 3) for details.
SNOW parallelization reduces computation time but requires more memory storage.
object of the same class with the following additional elements:
x_new
: copy of predictive input locations
mean
: predicted posterior mean, indices correspond to
x_new
locations
s2
: predicted point-wise variances, indices correspond to
x_new
locations (only returned when lite = TRUE
)
mean_all
: predicted posterior mean for each sample (column
indices), only returned when return_all = TRUE
s2_all
predicted point-wise variances for each sample (column
indices), only returned when return-all = TRUE
Sigma
: predicted posterior covariance, indices correspond to
x_new
locations (only returned when lite = FALSE
)
EI
: vector of expected improvement values, indices correspond
to x_new
locations (only returned when EI = TRUE
)
entropy
: vector of entropy values, indices correspond to
x_new
locations (only returned when entropy_limit
is
numeric)
w_new
: list of hidden layer mappings (only returned when
store_latent = TRUE
), list index corresponds to iteration and
row index corresponds to x_new
location (two or three layer
models only)
z_new
: list of hidden layer mappings (only returned when
store_latent = TRUE
), list index corresponds to iteration and
row index corresponds to x_new
location (three layer models only)
Computation time is added to the computation time of the existing object.
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
Barnett, S., Beesley, L. J., Booth, A. S., Gramacy, R. B., & Osthus D. (2024). Monotonic
warpings for additive and deep Gaussian processes. *In Review.* arXiv:2408.01540
# See ?fit_one_layer, ?fit_two_layer, or ?fit_three_layer # for examples
# See ?fit_one_layer, ?fit_two_layer, or ?fit_three_layer # for examples
Calculates root mean square error (lower RMSE indicate better fits).
rmse(y, mu)
rmse(y, mu)
y |
response vector |
mu |
predicted mean |
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/
Calculates score, proportional to the multivariate normal log
likelihood. Higher scores indicate better fits. Only
applicable to noisy data. Requires full covariance matrix
(e.g. predict
with lite = FALSE
).
score(y, mu, sigma)
score(y, mu, sigma)
y |
response vector |
mu |
predicted mean |
sigma |
predicted covariance |
Gneiting, T, and AE Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association 102 (477), 359-378.
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/
# Additional examples including real-world computer experiments are available at: # https://bitbucket.org/gramacylab/deepgp-ex/
Calculates squared pairwise euclidean distances using C.
sq_dist(X1, X2 = NULL)
sq_dist(X1, X2 = NULL)
X1 |
matrix of input locations |
X2 |
matrix of second input locations (if |
C code derived from the "laGP" package (Robert B Gramacy and Furong Sun).
symmetric matrix of squared euclidean distances
Gramacy, RB and F Sun. (2016). laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R. Journal of Statistical Software 72 (1), 1-46. doi:10.18637/jss.v072.i01
x <- seq(0, 1, length = 10) d2 <- sq_dist(x)
x <- seq(0, 1, length = 10) d2 <- sq_dist(x)
Acts on a gp
, gpvec
, dgp2
, dgp2vec
,
dgp3vec
, or dgp3
object.
Removes the specified number of MCMC iterations (starting at the first
iteration). After these samples are removed, the remaining samples are
optionally thinned.
trim(object, burn, thin) ## S3 method for class 'gp' trim(object, burn, thin = 1) ## S3 method for class 'gpvec' trim(object, burn, thin = 1) ## S3 method for class 'dgp2' trim(object, burn, thin = 1) ## S3 method for class 'dgp2vec' trim(object, burn, thin = 1) ## S3 method for class 'dgp3' trim(object, burn, thin = 1) ## S3 method for class 'dgp3vec' trim(object, burn, thin = 1)
trim(object, burn, thin) ## S3 method for class 'gp' trim(object, burn, thin = 1) ## S3 method for class 'gpvec' trim(object, burn, thin = 1) ## S3 method for class 'dgp2' trim(object, burn, thin = 1) ## S3 method for class 'dgp2vec' trim(object, burn, thin = 1) ## S3 method for class 'dgp3' trim(object, burn, thin = 1) ## S3 method for class 'dgp3vec' trim(object, burn, thin = 1)
object |
object from |
burn |
integer specifying number of iterations to cut off as burn-in |
thin |
integer specifying amount of thinning ( |
The resulting object will have nmcmc
equal to the previous
nmcmc
minus burn
divided by thin
. It is
recommended to start an MCMC fit then investigate trace plots to assess
burn-in. Once burn-in has been achieved, use this function to remove
the starting iterations. Thinning reduces the size of the resulting
object while accounting for the high correlation between consecutive
iterations.
object of the same class with the selected iterations removed
# See ?fit_one_layer, ?fit_two_layer, or ?fit_three_layer # for examples
# See ?fit_one_layer, ?fit_two_layer, or ?fit_three_layer # for examples