Title: | Tools for Gaussian Process Modeling in Uncertainty Quantification |
---|---|
Description: | Gaussian processes ('GPs') have been widely used to model spatial data, 'spatio'-temporal data, and computer experiments in diverse areas of statistics including spatial statistics, 'spatio'-temporal statistics, uncertainty quantification, and machine learning. This package creates basic tools for fitting and prediction based on 'GPs' with spatial data, 'spatio'-temporal data, and computer experiments. Key characteristics for this GP tool include: (1) the comprehensive implementation of various covariance functions including the 'Matérn' family and the Confluent 'Hypergeometric' family with isotropic form, tensor form, and automatic relevance determination form, where the isotropic form is widely used in spatial statistics, the tensor form is widely used in design and analysis of computer experiments and uncertainty quantification, and the automatic relevance determination form is widely used in machine learning; (2) implementations via Markov chain Monte Carlo ('MCMC') algorithms and optimization algorithms for GP models with all the implemented covariance functions. The methods for fitting and prediction are mainly implemented in a Bayesian framework; (3) model evaluation via Fisher information and predictive metrics such as predictive scores; (4) built-in functionality for simulating 'GPs' with all the implemented covariance functions; (5) unified implementation to allow easy specification of various 'GPs'. |
Authors: | Pulong Ma [aut, cre] |
Maintainer: | Pulong Ma <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0-6 |
Built: | 2024-11-22 06:48:14 UTC |
Source: | CRAN |
Gaussian processes (GPs) have been widely used to model spatial data, spatio-temporal data, and computer experiments in diverse areas of statistics including spatial statistics, spatio-temporal statistics, uncertainty quantification, and machine learning. This package creates basic tools for fitting and prediction based on GPs with spatial data, spatio-temporal data, and computer experiments. Key characteristics for this GP tool include: (1) the comprehensive implementation of various covariance functions including the Matérn family and the Confluent Hypergeometric family with isotropic form, tensor form, and automatic relevance determination form, where the isotropic form is widely used in spatial statistics, the tensor form is widely used in design and analysis of computer experiments and uncertainty quantification, and the automatic relevance determination form is widely used in machine learning; (2) implementations via Markov chain Monte Carlo (MCMC) algorithms and optimization algorithms for GP models with all the implemented covariance functions. The methods for fitting and prediction are mainly implemented in a Bayesian framework; (3) model evaluation via Fisher information and predictive metrics such as predictive scores; (4) built-in functionality for simulating GPs with all the implemented covariance functions; (5) unified implementation to allow easy specification of various GPs.
Data types: For many scientific applications, spatial data, spatio-temporal data, and computer experiments arise naturally. This package provides a comprehensive set of basic tools to fit GaSP models for univariate and multivariate spatial data, spatio-temporal data, computer experiments. Various covariance functions have been implemented including the Confluent Hypergeometric covariance functions, the Matérn covariance functions, the Gaussian covariance function, the generalized Cauchy covariance function. These covariance families can be in isotropic form, in tensor form, or in automatic relevance determination form. The routines kernel
and ikernel
contain the details of implementation.
Model simulation: This package can simulate realizations from GaSP for different types of data including spatial data, spatio-temporal data, and computer experiments. This feature is quite useful in part because benchmarks are used to evaluate the performance of GaSP models. This functionality is implemented in the routine gp.sim
for unconditional simulation and gp.condsim
for conditional simulation.
Model fitting: Both maximum likelihood methods (or its variants) and Bayes estimation methods such as maximum a posterior (MAP) and Markov chain Monte Carlo (MCMC) methods are implemented. In this package, the nugget parameter is included in the model by default for the sake of better prediction performance and stable computation in practice. In addition, the smoothness parameter in covariance functions such as the Matérn class and the Confluent Hypergeometric class can be estimated. The routine gp.optim
provides optimization based estimation approaches and the routine gp.mcmc
provides MCMC algorithms based estimation approaches.
Model prediction: Prediction is made based on the parameter estimation procedure. If maximum likelihood estimation (MLE) methods are used for parameter estimation, the plug-in approach is used for prediction in the sense that MLEs of parameters are plugged into posterior predictive distributions. If partial Bayes methods (e.g., maximum a posterior) are used, the plug-in approach is used for prediction as well. If fully Bayes methods via MCMC algorithms are used, posterior samples are drawn from posterior predictive distributions. The routine gp.mcmc
allows prediction to be made within the MCMC algorithms, while the routine gp.predict
generates prediction with estimated parameters.
Model assessment: Tools for assessing model adequacy are included in a Bayesian context. Deviance information criteria (DIC), log pointwise predictive density, and log joint predictive density can be computed via the routine gp.model.adequacy
.
Pulong Ma [email protected]
Cressie, N. (1993). “Statistics for Spatial Data.” John Wiley & Sons, New York, revised edition.
Ma and Bhadra (2023). “Beyond Matérn: On a Class of Interpretable Confluent Hypergeometric Covariance Functions.” Journal of the American Statistical Association 118(543), 2045-2058.
Sacks, Jerome, William J Welch, Toby J Mitchell, and Henry P Wynn. (1989). “Design and Analysis of Computer Experiments.” Statistical Science 4(4). Institute of Mathematical Statistics: 409–435.
Santner, Thomas J., Brian J. Williams, and William I. Notz. (2018). “The Design and Analysis of Computer Experiments”; 2nd Ed. New York: Springer.
Stein, Michael L. (1999). “Interpolation of Spatial Data.” Springer Science & Business Media, New York.
##################################################################### ##################################################################### ############## Examples for fitting univariate GP models ############ ## Set up the Sine example from the tgp package code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) df.data = data.frame(x=c(input), y=output, y.true=Ztrue) ## fitting a GaSP model with the Cauchy prior fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), proposal=list(range=.35, nugget=.8, nu=0.8), dtype="Euclidean", model.fit="Cauchy_prior", nsample=3000, burnin=500, verbose=TRUE) ## fitting a GaSP model with the beta prior fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), prior=list(range=list(a=1,b=1,lb=0,ub=20), nugget=list(a=1,b=1,lb=0,ub=var(output)), proposal=list(range=.35, nugget=.8, nu=0.8), dtype="Euclidean", model.fit="Beta_prior", nsample=3000, burnin=500, verbose=TRUE)) ## fitting a GaSP model with the marginal maximum likelihood approach fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), dtype="Euclidean", model.fit="MMLE", verbose=TRUE) ## fitting a GaSP model with the profile maximum likelihood approach fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), dtype="Euclidean", model.fit="MPLE", verbose=TRUE)
##################################################################### ##################################################################### ############## Examples for fitting univariate GP models ############ ## Set up the Sine example from the tgp package code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) df.data = data.frame(x=c(input), y=output, y.true=Ztrue) ## fitting a GaSP model with the Cauchy prior fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), proposal=list(range=.35, nugget=.8, nu=0.8), dtype="Euclidean", model.fit="Cauchy_prior", nsample=3000, burnin=500, verbose=TRUE) ## fitting a GaSP model with the beta prior fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), prior=list(range=list(a=1,b=1,lb=0,ub=20), nugget=list(a=1,b=1,lb=0,ub=var(output)), proposal=list(range=.35, nugget=.8, nu=0.8), dtype="Euclidean", model.fit="Beta_prior", nsample=3000, burnin=500, verbose=TRUE)) ## fitting a GaSP model with the marginal maximum likelihood approach fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), dtype="Euclidean", model.fit="MMLE", verbose=TRUE) ## fitting a GaSP model with the profile maximum likelihood approach fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), dtype="Euclidean", model.fit="MPLE", verbose=TRUE)
This function calls the GSL scientific library to evaluate the modified Bessel function of the second kind.
BesselK(nu, z)
BesselK(nu, z)
nu |
a real positive value |
z |
a real positive value |
a numerical value
Pulong Ma [email protected]
This function computes the generalized Cauchy correlation function given a distance matrix. The generalized Cauchy covariance is given by
where is the range parameter.
is the tail decay parameter.
is the smoothness parameter.
The case where
corresponds to the Cauchy covariance model, which is infinitely differentiable.
cauchy(d, range, tail, nu)
cauchy(d, range, tail, nu)
d |
a matrix of distances |
range |
a numerical value containing the range parameter |
tail |
a numerical value containing the tail decay parameter |
nu |
a numerical value containing the smoothness parameter |
a numerical matrix
Pulong Ma [email protected]
This function computes the Confluent Hypergeometric correlation function given a distance matrix. The Confluent Hypergeometric correlation function is given by
where is the tail decay parameter.
is the range parameter.
is the smoothness parameter.
is the confluent hypergeometric
function of the second kind. Note that this parameterization of the CH covariance
is different from the one in Ma and Bhadra (2023). For details about this covariance,
see Ma and Bhadra (2023; doi:10.1080/01621459.2022.2027775).
CH(d, range, tail, nu)
CH(d, range, tail, nu)
d |
a matrix of distances |
range |
a numerical value containing the range parameter |
tail |
a numerical value containing the tail decay parameter |
nu |
a numerical value containing the smoothness parameter |
a numerical matrix
Pulong Ma [email protected]
GPBayes-package, GaSP
, gp, matern
, kernel
, ikernel
This function finds the correlation parameter given effective range
cor.to.par( d, param, family = "CH", cor.target = 0.05, lower = NULL, upper = NULL, tol = .Machine$double.eps )
cor.to.par( d, param, family = "CH", cor.target = 0.05, lower = NULL, upper = NULL, tol = .Machine$double.eps )
d |
a numerical value containing the effective range |
param |
a list containing correlation parameters. The specification of
param should depend on the covariance model. If the parameter value is
|
family |
a string indicating the type of covariance structure. The following correlation functions are implemented:
|
cor.target |
a numerical value. The default value is 0.05, which means that correlation parameters are searched such that the correlation is approximately 0.05. |
lower |
a numerical value. This sets the lower bound to find the
correlation parameter via the |
upper |
a numerical value. This sets the upper bound to find the
correlation parameter via the |
tol |
a numerical value. This sets the precision of the solution with default value
specified as the machine precision |
a numerical value of correlation parameters
Pulong Ma [email protected]
GPBayes-package, GaSP
, kernel
, ikernel
range = cor.to.par(1,param=list(tail=0.5,nu=2.5), family="CH") tail = cor.to.par(1,param=list(range=0.5,nu=2.5), family="CH") range = cor.to.par(1,param=list(nu=2.5),family="matern")
range = cor.to.par(1,param=list(tail=0.5,nu=2.5), family="CH") tail = cor.to.par(1,param=list(range=0.5,nu=2.5), family="CH") range = cor.to.par(1,param=list(nu=2.5),family="matern")
This function wraps existing built-in routines to construct the derivative of correlation matrix with respect to correlation parameters.
deriv_kernel(d, range, tail, nu, covmodel)
deriv_kernel(d, range, tail, nu, covmodel)
d |
a matrix or a list of distances returned from |
range |
a vector of range parameters |
tail |
a vector of tail decay parameters |
nu |
a vector of smoothness parameters |
covmodel |
a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.
|
a list of matrices
Pulong Ma [email protected]
CH
, matern
, kernel
, GPBayes-package, GaSP
input = seq(0,1,length=10) d = distance(input,input,type="isotropic",dtype="Euclidean") dR = deriv_kernel(d,range=0.5,tail=0.2,nu=2.5, covmodel=list(family="CH",form="isotropic"))
input = seq(0,1,length=10) d = distance(input,input,type="isotropic",dtype="Euclidean") dR = deriv_kernel(d,range=0.5,tail=0.2,nu=2.5, covmodel=list(family="CH",form="isotropic"))
This function computes distances for two sets of inputs and returns
a R
object.
distance(input1, input2, type = "isotropic", dtype = "Euclidean")
distance(input1, input2, type = "isotropic", dtype = "Euclidean")
input1 |
a matrix of inputs |
input2 |
a matrix of inputs |
type |
a string indicating the form of distances with three froms supported currently: isotropic, tensor, ARD. |
dtype |
a string indicating distance type: Euclidean, GCD, where the latter indicates great circle distance. |
a R object holding distances for two sets of inputs. If type is isotropic, a matrix of distances is returned; if type is tensor or ARD, a list of distance matrices along each input dimension is returned.
a numeric vector or matrix of distances
Pulong Ma [email protected]
input = seq(0,1,length=20) d = distance(input, input, type="isotropic", dtype="Euclidean")
input = seq(0,1,length=20) d = distance(input, input, type="isotropic", dtype="Euclidean")
This function serves as a wrapper to build, fit, and make prediction
for a Gaussian process model. It calls on functions gp
, gp.mcmc
,
gp.optim
, gp.predict
.
GaSP( formula = ~1, output, input, param, smooth.est = FALSE, input.new = NULL, cov.model = list(family = "CH", form = "isotropic"), model.fit = "Cauchy_prior", prior = list(), proposal = list(range = 0.35, tail = 2, nugget = 0.8, nu = 0.8), nsample = 5000, burnin = 1000, opt = NULL, bound = NULL, dtype = "Euclidean", verbose = TRUE )
GaSP( formula = ~1, output, input, param, smooth.est = FALSE, input.new = NULL, cov.model = list(family = "CH", form = "isotropic"), model.fit = "Cauchy_prior", prior = list(), proposal = list(range = 0.35, tail = 2, nugget = 0.8, nu = 0.8), nsample = 5000, burnin = 1000, opt = NULL, bound = NULL, dtype = "Euclidean", verbose = TRUE )
formula |
an object of |
output |
a numerical vector including observations or outputs in a GaSP |
input |
a matrix including inputs in a GaSP |
param |
a list including values for regression parameters, covariance parameters, and nugget variance parameter. The specification of param should depend on the covariance model.
|
smooth.est |
a logical value indicating whether smoothness parameter will be estimated. |
input.new |
a matrix of new input locations |
cov.model |
a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.
|
model.fit |
a string indicating the choice of priors on correlation parameters:
|
prior |
a list containing tuning parameters in prior distribution. This is used only if a subjective Bayes estimation method with informative priors is used. |
proposal |
a list containing tuning parameters in proposal distribution. This is used only if a Bayes estimation method is used. |
nsample |
an integer indicating the number of MCMC samples. |
burnin |
an integer indicating the burn-in period. |
opt |
a list of arguments to setup the
|
bound |
Default value is
for the Confluent Hypergeometric covariance and the Cauchy covariance.
|
dtype |
a string indicating the type of distance:
|
verbose |
a logical value. If it is |
a list containing the S4
object gp and prediction results
Pulong Ma [email protected]
GPBayes-package
, gp
, gp.mcmc
, gp.optim
, gp.predict
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) # fitting a GaSP model with the objective Bayes approach fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), proposal=list(range=.35, nugget=.8, nu=0.8), dtype="Euclidean", model.fit="Cauchy_prior", nsample=50, burnin=10, verbose=TRUE)
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) # fitting a GaSP model with the objective Bayes approach fit = GaSP(formula=~1, output, input, param=list(range=3, nugget=0.1, nu=2.5), smooth.est=FALSE, input.new=XX, cov.model=list(family="matern", form="isotropic"), proposal=list(range=.35, nugget=.8, nu=0.8), dtype="Euclidean", model.fit="Cauchy_prior", nsample=50, burnin=10, verbose=TRUE)
S4
object gp
This function constructs the S4
object gp that is used for Gaussian process
model fitting and prediction.
gp( formula = ~1, output, input, param, smooth.est = FALSE, cov.model = list(family = "CH", form = "isotropic"), dtype = "Euclidean" )
gp( formula = ~1, output, input, param, smooth.est = FALSE, cov.model = list(family = "CH", form = "isotropic"), dtype = "Euclidean" )
formula |
an object of |
output |
a numerical vector including observations or outputs in a GaSP |
input |
a matrix including inputs in a GaSP |
param |
a list including values for regression parameters, covariance parameters, and nugget variance parameter. The specification of param should depend on the covariance model.
|
smooth.est |
a logical value indicating whether smoothness parameter will be estimated. |
cov.model |
a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.
|
dtype |
a string indicating the type of distance:
|
an S4
object of gp class
Pulong Ma [email protected]
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) obj = gp(formula=~1, output, input, param=list(range=4, nugget=0.1,nu=2.5), smooth.est=FALSE, cov.model=list(family="matern", form="isotropic"))
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) obj = gp(formula=~1, output, input, param=list(range=4, nugget=0.1,nu=2.5), smooth.est=FALSE, cov.model=list(family="matern", form="isotropic"))
gp
classThis is an S4 class definition for gp
in the GaSP
package.
formula
an object of formula
class that specifies regressors; see formula
for details.
output
a numerical vector including observations or outputs in a GaSP
input
a matrix including inputs in a GaSP
param
a list including values for regression parameters, correlation parameters, and nugget variance parameter. The specification of param should depend on the covariance model.
The regression parameters are denoted by coeff. Default value is .
The marginal variance or partial sill is denoted by sig2. Default value is 1.
The nugget variance parameter is denoted by nugget for all covariance models. Default value is 0.
For the Confluent Hypergeometric class, range is used to denote the range parameter .
tail is used to denote the tail decay parameter
. nu is used to denote the
smoothness parameter
.
For the generalized Cauchy class, range is used to denote the range parameter .
tail is used to denote the tail decay parameter
. nu is used to denote the
smoothness parameter
.
For the Matérn class, range is used to denote the range parameter .
nu is used to denote the smoothness parameter
. When
, the
Matérn class corresponds to the exponential covariance.
For the powered-exponential class, range is used to denote the range parameter .
nu is used to denote the smoothness parameter. When
, the powered-exponential class
corresponds to the Gaussian covariance.
cov.model
a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.
The Confluent Hypergeometric correlation function is given by
where is the tail decay parameter.
is the range parameter.
is the smoothness parameter.
is the confluent hypergeometric
function of the second kind. Note that this parameterization of the CH covariance is different from the one in Ma and Bhadra (2023). For details about this covariance,
see Ma and Bhadra (2023; doi:10.1080/01621459.2022.2027775).
The generalized Cauchy covariance is given by
where is the range parameter.
is the tail decay parameter.
is the smoothness parameter with default value at 2.
The Matérn correlation function is given by
where is the range parameter.
is the smoothness parameter.
is the modified Bessel function of the second kind of order
.
The exponential correlation function is given by
where is the range parameter. This is the Matérn correlation with
.
The Matérn correlation with .
The Matérn correlation with .
The powered-exponential correlation function is given by
where is the range parameter.
is the smoothness parameter.
The Gaussian correlation function is given by
where is the range parameter.
This indicates the isotropic form of covariance functions. That is,
where denotes the
Euclidean distance or the great circle distance for data on sphere.
denotes
any isotropic covariance family specified in family.
This indicates the tensor product of correlation functions. That is,
where is the dimension of input space.
is the distance along the
th input dimension. This type of covariance structure has been often used in Gaussian process emulation for computer experiments.
This indicates the automatic relevance determination form. That is,
where denotes the range parameter along the
th input dimension.
smooth.est
a logical value. If it is TRUE
, the smoothness parameter
will be estimated; otherwise the smoothness is not estimated.
dtype
a string indicating the type of distance:
Euclidean distance is used. This is the default choice.
Great circle distance is used for data on sphere.
loglik
a numerical value containing the log-likelihood with current
gp
object.
mcmc
a list containing MCMC samples if available.
prior
a list containing tuning parameters in prior distribution. This is used only if a Bayes estimation method with informative priors is used.
proposal
a list containing tuning parameters in proposal distribution. This is used only if a Bayes estimation method is used.
info
a list containing the maximum distance in the input space. It should be a vector if isotropic covariance is used, otherwise it is vector of maximum distances along each input dimension
Pulong Ma [email protected]
This function simulate from the Gaussian process model conditional on existing data (i.e., locations, observations). This is known as conditional simulation in geostatistics, which simulates realizations from the (posterior) predictive distribution of the process given the data.
gp.condsim(obj, XX, nsample = 1, seed = NULL)
gp.condsim(obj, XX, nsample = 1, seed = NULL)
obj |
an |
XX |
a matrix of new locations where conditional simulation is performed. |
nsample |
number of conditional simulations default at 1 |
seed |
random generation seed default to be |
an array (vector or matrix) of conditional simulations
This function computes the Fisher information matrix for a
Gaussian process model
, where
with correlation function
and nugget parameter
;
is a vector of regression coefficients,
is the variance parameter (or partial sill).
Given data points that are assumed to be realizations from the GP model,
the standard likelihood is defined as
where is a vector of
observations.
is a matrix of covariates,
contains correlation
parameters and nugget parameter,
denotes the
-by-
correlation matrix.
The integrated likelihood is defined as
where is the conditional Jeffreys-rule (or reference prior)
in the model with the above standard likelihood when
is assumed to be known.
For the Matérn class, current implementation only computes Fisher information matrix
for variance parameter , range parameter
, and nugget variance
parameter
. That is,
.
For the Confluent Hypergeometric class, current implementation computes Fisher information matrix
for variance parameter , range parameter
, tail decay parameter
, smoothness parameter
and nugget variance
parameter
. That is,
.
gp.fisher( obj = NULL, intloglik = FALSE, formula = ~1, input = NULL, param = NULL, cov.model = NULL, dtype = "Euclidean" )
gp.fisher( obj = NULL, intloglik = FALSE, formula = ~1, input = NULL, param = NULL, cov.model = NULL, dtype = "Euclidean" )
obj |
a |
intloglik |
a logical value with default value |
formula |
an object of |
input |
a matrix including inputs in a GaSP |
param |
a list including values for regression parameters, covariance parameters, and nugget variance parameter. The specification of param should depend on the covariance model.
|
cov.model |
a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.
|
dtype |
a string indicating the type of distance:
|
a numerical matrix of Fisher information
Pulong Ma [email protected]
GPBayes-package, GaSP
, gp
, kernel
, ikernel
,
n=100 input = seq(0, 20, length=n) range = 1 tail = .5 nu = 1.5 sig2 = 1 nugget = 0.01 coeff = 0 par = list(range=range, tail=tail, nu=nu, sig2=sig2, nugget=nugget, coeff=coeff) I = gp.fisher(formula=~1, input=input, param=list(range=4, nugget=0.1,nu=2.5), cov.model=list(family="CH", form="isotropic"))
n=100 input = seq(0, 20, length=n) range = 1 tail = .5 nu = 1.5 sig2 = 1 nugget = 0.01 coeff = 0 par = list(range=range, tail=tail, nu=nu, sig2=sig2, nugget=nugget, coeff=coeff) I = gp.fisher(formula=~1, input=input, param=list(range=4, nugget=0.1,nu=2.5), cov.model=list(family="CH", form="isotropic"))
This function processes posterior samples in the gp
object.
gp.get.mcmc(obj, burnin = 500)
gp.get.mcmc(obj, burnin = 500)
obj |
a |
burnin |
a numerical value specifying the burn-in period for calculating posterior summaries. |
a list of posterior summaries
GPBayes-package, GaSP
, gp
, gp.mcmc
This function is a wraper to estimate parameters via MCMC algorithms in the GaSP model with different choices of priors.
gp.mcmc( obj, input.new = NULL, method = "Cauchy_prior", prior = list(), proposal = list(), nsample = 10000, verbose = TRUE )
gp.mcmc( obj, input.new = NULL, method = "Cauchy_prior", prior = list(), proposal = list(), nsample = 10000, verbose = TRUE )
obj |
an |
input.new |
a matrix of prediction locations. Default value is |
method |
a string indicating the Bayes estimation approaches with different choices of priors on correlation parameters:
|
prior |
a list containing tuning parameters in prior distributions. This is used only if a Bayes estimation method with subjective priors is used. |
proposal |
a list containing tuning parameters in proposal distributions. This is used only if a Bayes estimation method is used. |
nsample |
an integer indicating the number of MCMC samples. |
verbose |
a logical value. If it is |
a gp
object with prior, proposal, MCMC samples included.
Pulong Ma [email protected]
GPBayes-package, GaSP
, gp, gp.optim
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) obj = gp(formula=~1, output, input, param=list(range=4, nugget=0.1,nu=2.5), smooth.est=FALSE, cov.model=list(family="matern", form="isotropic")) fit.mcmc = gp.mcmc(obj, method="Cauchy_prior", proposal=list(range=0.3, nugget=0.8), nsample=100, verbose=TRUE)
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) obj = gp(formula=~1, output, input, param=list(range=4, nugget=0.1,nu=2.5), smooth.est=FALSE, cov.model=list(family="matern", form="isotropic")) fit.mcmc = gp.mcmc(obj, method="Cauchy_prior", proposal=list(range=0.3, nugget=0.8), nsample=100, verbose=TRUE)
This function computes effective number of parameters (pD), deviance information criterion (DIC), logarithmic pointwise predictive density (lppd), and logarithmic joint predictive density (ljpd). For detailed introduction of these metrics, see Chapter 7 of Gelman et al. (2013).
The deviance function for a model with a vector of parameters
is defined as
where is a vector of
observations.
The effective number of parameters (see p.172 of Gelman et al. 2013) is defined as
where
The interpretation is that the effective number of parameters is the “expected"
deviance minus the “fitted" deviance. Higher
implies more over-fitting with estimate
.
The Deviance information criteria (DIC) (see pp. 172-173 of Gelman et al. 2013) is
DIC approximates Akaike information criterion (AIC) and is more appropriate for hierarchical models than AIC and BIC.
The log predictive density (lpd) is defined as
where is a vector of
observations.
contains correlation
parameters and nugget parameter. This predictive density should be understood as an update of the likelihood since data is treated as prior information now.
With a set of prediction locations
.
The log pointwise predictive density (lppd) is defined as
The log joint predictive density (ljpd) is defined as
The lppd
is connected to cross-validation, while the ljpd
measures joint uncertainty across prediction locations.
gp.model.adequacy( obj, testing.input, testing.output, pointwise = TRUE, joint = TRUE )
gp.model.adequacy( obj, testing.input, testing.output, pointwise = TRUE, joint = TRUE )
obj |
a |
testing.input |
a matrix of testing inputs |
testing.output |
a vector of testing outputs |
pointwise |
a logical value with default value |
joint |
a logical value with default value |
a list containing pD, DIC, lppd, ljpd.
Pulong Ma [email protected]
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin (2013). Bayesian Data Analysis, Third Edition. CRC Press.
This function is a wraper to estimate parameters in the GaSP model with different choices of estimation methods using numerical optimization methods.
gp.optim(obj, method = "MMLE", opt = NULL, bound = NULL)
gp.optim(obj, method = "MMLE", opt = NULL, bound = NULL)
obj |
an |
method |
a string indicating the parameter estimation method:
|
opt |
a list of arguments to setup the
|
bound |
Default value is
for the Confluent Hypergeometric covariance and the Cauchy covariance.
|
a list of updated gp
object obj and
fitted information fit
Pulong Ma [email protected]
GPBayes-package, GaSP
, gp, gp.mcmc
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) obj = gp(formula=~1, output, input, param=list(range=4, nugget=0.1,nu=2.5), smooth.est=FALSE, cov.model=list(family="matern", form="isotropic")) fit.optim = gp.optim(obj, method="MPLE")
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) obj = gp(formula=~1, output, input, param=list(range=4, nugget=0.1,nu=2.5), smooth.est=FALSE, cov.model=list(family="matern", form="isotropic")) fit.optim = gp.optim(obj, method="MPLE")
This function provides the capability to make prediction based on a GaSP when different estimation methods are employed.
gp.predict(obj, input.new, method = "Bayes")
gp.predict(obj, input.new, method = "Bayes")
obj |
an |
input.new |
a matrix of new input lomessageions |
method |
a string indicating the parameter estimation method:
|
a list of predictive mean, predictive standard deviation, 95% predictive intervals
Pulong Ma [email protected]
GPBayes-package, GaSP
, gp, gp.mcmc
, gp.optim
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) obj = gp(formula=~1, output, input, param=list(range=4, nugget=0.1,nu=2.5), smooth.est=FALSE, cov.model=list(family="matern", form="isotropic")) fit.optim = gp.optim(obj, method="MMLE") obj = fit.optim$obj pred = gp.predict(obj, input.new=XX, method="MMLE")
code = function(x){ y = (sin(pi*x/5) + 0.2*cos(4*pi*x/5))*(x<=9.6) + (x/10-1)*(x>9.6) return(y) } n=100 input = seq(0, 20, length=n) XX = seq(0, 20, length=99) Ztrue = code(input) set.seed(1234) output = Ztrue + rnorm(length(Ztrue), sd=0.1) obj = gp(formula=~1, output, input, param=list(range=4, nugget=0.1,nu=2.5), smooth.est=FALSE, cov.model=list(family="matern", form="isotropic")) fit.optim = gp.optim(obj, method="MMLE") obj = fit.optim$obj pred = gp.predict(obj, input.new=XX, method="MMLE")
This function simulates realizations from Gaussian processes.
gp.sim( formula = ~1, input, param, cov.model = list(family = "CH", form = "isotropic"), dtype = "Euclidean", nsample = 1, seed = NULL )
gp.sim( formula = ~1, input, param, cov.model = list(family = "CH", form = "isotropic"), dtype = "Euclidean", nsample = 1, seed = NULL )
formula |
an object of |
input |
a matrix including inputs in a GaSP |
param |
a list including values for regression parameters, covariance parameters, and nugget variance parameter. The specification of param should depend on the covariance model.
|
cov.model |
a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.
|
dtype |
a string indicating the type of distance:
|
nsample |
an integer indicating the number of realizations from a Gaussian process |
seed |
a number specifying random number seed |
a numerical vector or a matrix
Pulong Ma [email protected]
n=50 y.sim = gp.sim(input=seq(0,1,length=n), param=list(range=0.5,nugget=0.1,nu=2.5), cov.model=list(family="matern",form="isotropic"), seed=123)
n=50 y.sim = gp.sim(input=seq(0,1,length=n), param=list(range=0.5,nugget=0.1,nu=2.5), cov.model=list(family="matern",form="isotropic"), seed=123)
This function calls the GSL scientific library to evaluate the confluent hypergeometric function of the second kind; see Abramowitz and Stegun 1972, p.505.
HypergU(a, b, x)
HypergU(a, b, x)
a |
a real value |
b |
a real value |
x |
a real value |
a numerical value
Pulong Ma [email protected]
This function wraps existing built-in routines to construct a covariance
matrix for two input matrices based on data type, covariance type, and distance type. The constructed
covariance matrix can be directly used for GaSP fitting and and prediction for spatial
data, spatio-temporal data, and computer experiments. This function explicitly takes inputs as arguments.
The prefix “i" in ikernel
standards for “input".
ikernel(input1, input2, range, tail, nu, covmodel, dtype = "Euclidean")
ikernel(input1, input2, range, tail, nu, covmodel, dtype = "Euclidean")
input1 |
a matrix of input locations |
input2 |
a matrix of input locations |
range |
a vector of range parameters, which could be a scalar. |
tail |
a vector of tail decay parameters, which could be a scalar. |
nu |
a vector of smoothness parameters, which could be a scalar. |
covmodel |
a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.
|
dtype |
a string indicating distance type: Euclidean, GCD, where the latter indicates great circle distance. |
a correlation matrix
Pulong Ma [email protected]
CH
, matern
, kernel
, GPBayes-package, GaSP
input = seq(0,1,length=10) cormat = ikernel(input,input,range=0.5,tail=0.2,nu=2.5, covmodel=list(family="CH",form="isotropic"))
input = seq(0,1,length=10) cormat = ikernel(input,input,range=0.5,tail=0.2,nu=2.5, covmodel=list(family="CH",form="isotropic"))
This function wraps existing built-in routines to construct a covariance matrix based on data type, covariance type, and distance type with distances as inputs. The constructed covariance matrix can be directly used for GaSP fitting and and prediction for spatial data, spatio-temporal data, and computer experiments.
kernel(d, range, tail, nu, covmodel)
kernel(d, range, tail, nu, covmodel)
d |
a matrix or a list of distances |
range |
a vector of range parameters, which could be a scalar. |
tail |
a vector of tail decay parameters, which could be a scalar. |
nu |
a vector of smoothness parameters, which could be a scalar. |
covmodel |
a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.
|
a correlation matrix
Pulong Ma [email protected]
CH
, matern
, ikernel
, GPBayes-package, GaSP
input = seq(0,1,length=10) d = distance(input,input,type="isotropic",dtype="Euclidean") cormat = kernel(d,range=0.5,tail=0.2,nu=2.5, covmodel=list(family="CH",form="isotropic"))
input = seq(0,1,length=10) d = distance(input,input,type="isotropic",dtype="Euclidean") cormat = kernel(d,range=0.5,tail=0.2,nu=2.5, covmodel=list(family="CH",form="isotropic"))
This function wraps existing built-in routines to construct the natural logarithm of the integrated likelihood function. The constructed loglikelihood can be directly used for numerical optimization
loglik(par, output, H, d, covmodel, smooth, smoothness_est)
loglik(par, output, H, d, covmodel, smooth, smoothness_est)
par |
a numerical vector, with which numerical optimization routine such as |
output |
a matrix of outputs |
H |
a matrix of regressors in the mean function of a GaSP model. |
d |
an R object holding the distances. It should be a distance matrix for constructing isotropic correlation matrix, or a list of distance matrices along each input dimension for constructing tensor or ARD types of correlation matrix. |
covmodel |
a list of two strings: family, form, where family indicates the family of covariance functions including the Confluent Hypergeometric class, the Matérn class, the Cauchy class, the powered-exponential class. form indicates the specific form of covariance structures including the isotropic form, tensor form, automatic relevance determination form.
|
smooth |
The smoothness parameter |
smoothness_est |
a logical value indicating whether the smoothness parameter is estimated. |
The natural logarithm of marginal or integrated likelihood
Pulong Ma [email protected]
CH
, matern
, gp.optim
, GPBayes-package, GaSP
This function computes the Matérn correlation function given a distance matrix. The Matérn correlation function is given by
where is the range parameter.
is the smoothness parameter.
is the modified Bessel function of the second kind of order
.
The form of covariance includes the following special cases by specifying
to be 0.5, 1.5, 2.5.
corresponds to the exponential correlation function (exp)
of the form
corresponds to the Matérn correlation function with smoothness parameter 1.5 (matern_3_2)
of the form
corresponds to the Matérn correlation function with smoothness parameter 2.5 (matern_5_2)
of the form
matern(d, range, nu)
matern(d, range, nu)
d |
a matrix of distances |
range |
a numerical value containing the range parameter |
nu |
a numerical value containing the smoothness parameter |
a numerical matrix
Pulong Ma [email protected]
GPBayes-package, GaSP
, gp, CH
, kernel
, ikernel
This function computes the powered-exponential correlation function given a distance matrix. The powered-exponential correlation function is given by
where is the range parameter.
is the smoothness parameter.
The case
corresponds to the well-known Gaussian correlation.
powexp(d, range, nu)
powexp(d, range, nu)
d |
a matrix of distances |
range |
a numerical value containing the range parameter |
nu |
a numerical value containing the smoothness parameter |
a numerical matrix
Pulong Ma [email protected]
gp
classPrint the information an object of the gp
class
## S4 method for signature 'gp' show(object)
## S4 method for signature 'gp' show(object)
object |
an object of |