Title: | Spatial Regression Models for Large Datasets using Nearest Neighbor Gaussian Processes |
---|---|
Description: | Fits univariate Bayesian spatial regression models for large datasets using Nearest Neighbor Gaussian Processes (NNGP) detailed in Finley, Datta, Banerjee (2022) <doi:10.18637/jss.v103.i05>, Finley, Datta, Cook, Morton, Andersen, and Banerjee (2019) <doi:10.1080/10618600.2018.1537924>, and Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091>. |
Authors: | Andrew Finley [aut, cre], Abhirup Datta [aut], Sudipto Banerjee [aut] |
Maintainer: | Andrew Finley <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1 |
Built: | 2024-12-23 06:41:15 UTC |
Source: | CRAN |
Forest canopy height (FCH) estimates from NASA Goddard's LiDAR Hyperspectral and Thermal (G-LiHT; Cook et al. 2013) Airborne Imager and percent tree cover (Hansen et al. 2013) over a subset of Bonanza Creek Experimental Forest, AK, collected in Summer 2014.
The BCEF
matrix columns are longitude (x), latitude (y), forest
canopy height (FCH) in meters from ground, and Landsat derived percent tree cover
(PTC) for 188,717 observations. Longitude and latitude are
in Albers Equal Area (proj4string
"+proj=aea +lat_1=55 +lat_2=65 +lat_0=50 +lon_0=-154 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=km +no_defs"). The last column (holdout) identifies a subset of data useful for assessing wall-to-wall predictive performance.
data(BCEF)
data(BCEF)
A matrix containing 188,717 rows and 6 columns named longitude, latitude, FCH, PTC, and holdout.
G-LiHT data were downloaded from https://gliht.gsfc.nasa.gov.
Cook, B.D., L.W. Corp, R.F. Nelson, E.M. Middleton, D.C. Morton, J.T. McCorkel, J.G. Masek, K.J. Ranson, and V. Ly. (2013) NASA Goddard's Lidar, Hyperspectral and Thermal (G-LiHT) airborne imager. Remote Sensing 5:4045-4066.
Hansen, M.C., Potapov, P.V., Moore, R., Hancher, M., Turubanova, S.A., Tyukavina, A.,Thau, D., Stehman, S.V., Goetz, S.J., Loveland, T.R., Kommareddy, A., Egorov, A., Chini, L., Justice, C.O., and Townshend, J.R.G. (2013), High-Resolution Global Mapsof 21st-Century Forest Cover Change, Science, 342, 850-853.
Eastern hemlock (Tsuga canadensis L.) analyzed in Lany et al. (2019). The date comprise hemlock occurrence (binomial outcome) on 17,743 forest stands across Michigan, USA. A set of covariates were also observed at each stand and can be used to explain the probability of hemlock occurrence. Covariates included minimum winter temperature (MIN), maximum summer temperature (MAX), total precipitation in the coldest quarter of the year (WIP), total precipitation in the warmest quarter of the year (SUP), annual actual evapotranspiration (AET) and annual climatic water deficit (DEF). Spatial coordinates are recorded in Longitude (long) and latitude (lat) which are Albers Equal Area (proj4string "+proj=aea +lat_1=55 +lat_2=65 +lat_0=50 +lon_0=-154 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"). See Lany et al. (2019) for details.
data(MI_TSCA)
data(MI_TSCA)
A data frame containing 17,743 rows and 9 columns.
Lany, N. K., Zarnetske, P. L., Finley, A. O. and McCullough, D. G. 2019. Complimentary strengths of spatially-explicit and multi-species distribution models. – Ecography doi: 10.1111/ecog.04728
The function PGLogit
fits logistic models to binomial data using Polya-Gamma latent variables.
PGLogit(formula, weights = 1, data = parent.frame(), n.samples, n.omp.threads = 1, fit.rep = FALSE, sub.sample, verbose = TRUE, ...)
PGLogit(formula, weights = 1, data = parent.frame(), n.samples, n.omp.threads = 1, fit.rep = FALSE, sub.sample, verbose = TRUE, ...)
formula |
a symbolic description of the regression model to be fit. See example below. |
weights |
specifies the number of trials for each observation. The
default is 1 trial for each observation. Valid arguments are a
scalar value that specifies the number of trials used if all
observations have the same number of trials, and a vector of length |
data |
an optional data frame containing the variables in the
model. If not found in data, the variables are taken from
|
n.samples |
the number of posterior samples to collect. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
fit.rep |
if |
sub.sample |
an optional list that specifies the samples used
for |
verbose |
if |
... |
currently no additional arguments. |
An object of class PGLogit
which is a list comprising:
p.beta.samples |
a |
y.hat.samples |
if |
y.hat.quants |
if |
y.rep.samples |
if |
y.rep.quants |
if |
s.indx |
if |
run.time |
MCMC sampler execution time reported using |
The return object will include additional objects used for subsequent prediction and/or model fit evaluation.
Some of the underlying code used for generating random number from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Andrew O. Finley [email protected],
Abhirup Datta [email protected],
Sudipto Banerjee [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
##Generate binary data set.seed(1) n <- 100 x <- cbind(1, rnorm(n), runif(n,0,1)) beta <- c(0.1,-5, 5) p <- 1/(1+exp(-(x%*%beta))) ##Assume 5 trials per outcome weights <- rep(5, n) y <- rbinom(n, size=weights, prob=p) m <- PGLogit(y~x-1, weights = rep(5, n), n.samples = 1000) summary(m)
##Generate binary data set.seed(1) n <- 100 x <- cbind(1, rnorm(n), runif(n,0,1)) beta <- c(0.1,-5, 5) p <- 1/(1+exp(-(x%*%beta))) ##Assume 5 trials per outcome weights <- rep(5, n) y <- rbinom(n, size=weights, prob=p) m <- PGLogit(y~x-1, weights = rep(5, n), n.samples = 1000) summary(m)
NNGP
models.The function predict
collects posterior predictive samples
for a set of new locations given an object of
class NNGP
.
## S3 method for class 'NNGP' predict(object, X.0, coords.0, sub.sample, n.omp.threads = 1, verbose=TRUE, n.report=100, ...)
## S3 method for class 'NNGP' predict(object, X.0, coords.0, sub.sample, n.omp.threads = 1, verbose=TRUE, n.report=100, ...)
object |
an object of class |
X.0 |
the design matrix for prediction locations. An
intercept should be provided in the first column if one is specified
in |
coords.0 |
the spatial coordinates corresponding to
|
sub.sample |
an optional list that specifies the samples to included in
the composition sampling a non-Conjugate model. Valid tags are |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report sampling progress. |
... |
currently no additional arguments. |
An object of class predict.NNGP
which is a list comprising:
p.y.0 |
a matrix that holds the response variable posterior
predictive samples where rows are locations corresponding to
|
p.w.0 |
a matrix that holds the random effect posterior
predictive samples where rows are locations corresponding to
|
run.time |
execution time reported using |
Andrew O. Finley [email protected],
Abhirup Datta [email protected],
Sudipto Banerjee [email protected]
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Jurnal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
rmvn <- function(n, mu=0, V = matrix(1)){ p <- length(mu) if(any(is.na(match(dim(V),p)))) stop("Dimension problem!") D <- chol(V) t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p))) } ##Make some data set.seed(1) n <- 100 coords <- cbind(runif(n,0,1), runif(n,0,1)) x <- cbind(1, rnorm(n)) B <- as.matrix(c(1,5)) sigma.sq <- 5 tau.sq <- 1 phi <- 3/0.5 D <- as.matrix(dist(coords)) R <- exp(-phi*D) w <- rmvn(1, rep(0,n), sigma.sq*R) y <- rnorm(n, x%*%B + w, sqrt(tau.sq)) ho <- sample(1:n, 50) y.ho <- y[ho] x.ho <- x[ho,,drop=FALSE] w.ho <- w[ho] coords.ho <- coords[ho,] y <- y[-ho] x <- x[-ho,,drop=FALSE] w <- w[-ho,,drop=FALSE] coords <- coords[-ho,] ##Fit a Response, Latent, and Conjugate NNGP model n.samples <- 500 starting <- list("phi"=phi, "sigma.sq"=5, "tau.sq"=1) tuning <- list("phi"=0.5, "sigma.sq"=0.5, "tau.sq"=0.5) priors <- list("phi.Unif"=c(3/1, 3/0.01), "sigma.sq.IG"=c(2, 5), "tau.sq.IG"=c(2, 1)) cov.model <- "exponential" n.report <- 500 ##Latent m.s <- spNNGP(y~x-1, coords=coords, starting=starting, method="latent", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, n.samples=n.samples, n.omp.threads=1, n.report=n.report) p.s <- predict(m.s, X.0 = x.ho, coords.0 = coords.ho, n.omp.threads=1) plot(apply(p.s$p.w.0, 1, mean), w.ho) plot(apply(p.s$p.y.0, 1, mean), y.ho) ##Response m.r <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, n.samples=n.samples, n.omp.threads=1, n.report=n.report) p.r <- predict(m.r, X.0 = x.ho, coords.0 = coords.ho, n.omp.threads=1) points(apply(p.r$p.y.0, 1, mean), y.ho, pch=19, col="blue") ##Conjugate theta.alpha <- c(phi, tau.sq/sigma.sq) names(theta.alpha) <- c("phi", "alpha") m.c <- spConjNNGP(y~x-1, coords=coords, n.neighbors=10, theta.alpha=theta.alpha, sigma.sq.IG=c(2, sigma.sq), cov.model=cov.model, n.omp.threads=1) p.c <- predict(m.c, X.0 = x.ho, coords.0 = coords.ho, n.omp.threads=1) points(p.c$y.0.hat, y.ho, pch=19, col="orange")
rmvn <- function(n, mu=0, V = matrix(1)){ p <- length(mu) if(any(is.na(match(dim(V),p)))) stop("Dimension problem!") D <- chol(V) t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p))) } ##Make some data set.seed(1) n <- 100 coords <- cbind(runif(n,0,1), runif(n,0,1)) x <- cbind(1, rnorm(n)) B <- as.matrix(c(1,5)) sigma.sq <- 5 tau.sq <- 1 phi <- 3/0.5 D <- as.matrix(dist(coords)) R <- exp(-phi*D) w <- rmvn(1, rep(0,n), sigma.sq*R) y <- rnorm(n, x%*%B + w, sqrt(tau.sq)) ho <- sample(1:n, 50) y.ho <- y[ho] x.ho <- x[ho,,drop=FALSE] w.ho <- w[ho] coords.ho <- coords[ho,] y <- y[-ho] x <- x[-ho,,drop=FALSE] w <- w[-ho,,drop=FALSE] coords <- coords[-ho,] ##Fit a Response, Latent, and Conjugate NNGP model n.samples <- 500 starting <- list("phi"=phi, "sigma.sq"=5, "tau.sq"=1) tuning <- list("phi"=0.5, "sigma.sq"=0.5, "tau.sq"=0.5) priors <- list("phi.Unif"=c(3/1, 3/0.01), "sigma.sq.IG"=c(2, 5), "tau.sq.IG"=c(2, 1)) cov.model <- "exponential" n.report <- 500 ##Latent m.s <- spNNGP(y~x-1, coords=coords, starting=starting, method="latent", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, n.samples=n.samples, n.omp.threads=1, n.report=n.report) p.s <- predict(m.s, X.0 = x.ho, coords.0 = coords.ho, n.omp.threads=1) plot(apply(p.s$p.w.0, 1, mean), w.ho) plot(apply(p.s$p.y.0, 1, mean), y.ho) ##Response m.r <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, n.samples=n.samples, n.omp.threads=1, n.report=n.report) p.r <- predict(m.r, X.0 = x.ho, coords.0 = coords.ho, n.omp.threads=1) points(apply(p.r$p.y.0, 1, mean), y.ho, pch=19, col="blue") ##Conjugate theta.alpha <- c(phi, tau.sq/sigma.sq) names(theta.alpha) <- c("phi", "alpha") m.c <- spConjNNGP(y~x-1, coords=coords, n.neighbors=10, theta.alpha=theta.alpha, sigma.sq.IG=c(2, sigma.sq), cov.model=cov.model, n.omp.threads=1) p.c <- predict(m.c, X.0 = x.ho, coords.0 = coords.ho, n.omp.threads=1) points(p.c$y.0.hat, y.ho, pch=19, col="orange")
Methods for extracting information from spDiag
.
## S3 method for class 'spDiag' print(x, ...)
## S3 method for class 'spDiag' print(x, ...)
x |
object of class |
... |
currently not used. |
A standard extractor function for printing objects of class spDiag
.
The function spConjNNGP
fits Gaussian univariate Bayesian conjugate spatial
regression models using Nearest Neighbor Gaussian Processes (NNGP).
spConjNNGP(formula, data = parent.frame(), coords, knots, n.neighbors = 15, theta.alpha, sigma.sq.IG, cov.model = "exponential", k.fold = 5, score.rule = "crps", X.0, coords.0, n.omp.threads = 1, search.type = "cb", ord, return.neighbor.info = TRUE, neighbor.info, fit.rep = FALSE, n.samples, verbose = TRUE, ...)
spConjNNGP(formula, data = parent.frame(), coords, knots, n.neighbors = 15, theta.alpha, sigma.sq.IG, cov.model = "exponential", k.fold = 5, score.rule = "crps", X.0, coords.0, n.omp.threads = 1, search.type = "cb", ord, return.neighbor.info = TRUE, neighbor.info, fit.rep = FALSE, n.samples, verbose = TRUE, ...)
formula |
a symbolic description of the regression model to be fit. See example below. |
data |
an optional data frame containing the variables in the
model. If not found in data, the variables are taken from
|
coords |
an |
knots |
an |
n.neighbors |
number of neighbors used in the NNGP. |
theta.alpha |
a vector or matrix of parameter values for
|
sigma.sq.IG |
a vector of length two that holds the
hyperparameters, shape and scale respectively, for the
inverse-Gamma prior on |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
k.fold |
specifies the number of
k folds for cross-validation. If |
score.rule |
a quoted keyword |
X.0 |
the design matrix for prediction locations. An
intercept should be provided in the first column if one is specified
in |
coords.0 |
the spatial coordinates corresponding to
|
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
search.type |
a quoted keyword that specifies type of nearest
neighbor search algorithm. Supported method key words are: |
ord |
an index vector of length |
return.neighbor.info |
if |
neighbor.info |
see the |
fit.rep |
if |
n.samples |
gives the number of posterior samples
returned. Note, point and associated variance estimates for model
parameters are not based on posterior samples. Only specify
|
verbose |
if |
... |
currently no additional arguments. |
An object of class NNGP
and conjugate
, and, if knots
is provided, SLGP
. Among other elements, the object contains:
theta.alpha |
the input |
beta.hat |
a matrix of regression coefficient estimates
corresponding to the returned |
beta.var |
|
sigma.sq.hat |
estimate of |
sigma.sq.var |
|
k.fold.scores |
results from the k-fold cross-validation if
|
y.0.hat |
prediction if |
y.0.var.hat |
|
n.neighbors |
number of neighbors used in the NNGP. |
neighbor.info |
returned if |
run.time |
execution time for parameter estimation reported using
|
Andrew O. Finley [email protected],
Abhirup Datta [email protected],
Sudipto Banerjee [email protected]
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, S. Banerjee (2022) spNNGP R Package for Nearest Neighbor Gaussian Process Models. Journal of Statistical Software, doi:10.18637/jss.v103.i05.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Gneiting, T and A.E. Raftery. (2007) Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, doi:10.1198/016214506000001437.
Shirota, S., A.O. Finley, B.D. Cook, and S. Banerjee (2019) Conjugate Nearest Neighbor Gaussian Process models for efficient statistical interpolation of large spatial data. https://arxiv.org/abs/1907.10109.
rmvn <- function(n, mu=0, V = matrix(1)){ p <- length(mu) if(any(is.na(match(dim(V),p)))) stop("Dimension problem!") D <- chol(V) t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p))) } ##Make some data set.seed(1) n <- 2000 coords <- cbind(runif(n,0,1), runif(n,0,1)) x <- cbind(1, rnorm(n)) B <- as.matrix(c(1,5)) sigma.sq <- 5 tau.sq <- 1 phi <- 3/0.5 D <- as.matrix(dist(coords)) R <- exp(-phi*D) w <- rmvn(1, rep(0,n), sigma.sq*R) y <- rnorm(n, x%*%B + w, sqrt(tau.sq)) ho <- sample(1:n, 1000) y.ho <- y[ho] x.ho <- x[ho,,drop=FALSE] w.ho <- w[ho] coords.ho <- coords[ho,] y <- y[-ho] x <- x[-ho,,drop=FALSE] w <- w[-ho,,drop=FALSE] coords <- coords[-ho,] ##Fit a Conjugate NNGP model and predict for the holdout sigma.sq.IG <- c(2, sigma.sq) cov.model <- "exponential" g <- 10 theta.alpha <- cbind(seq(phi,30,length.out=g), seq(tau.sq/sigma.sq,5,length.out=g)) colnames(theta.alpha) <- c("phi", "alpha") m.c <- spConjNNGP(y~x-1, coords=coords, n.neighbors = 10, X.0 = x.ho, coords.0 = coords.ho, k.fold = 5, score.rule = "crps", n.omp.threads = 1, theta.alpha = theta.alpha, sigma.sq.IG = sigma.sq.IG, cov.model = cov.model) m.c
rmvn <- function(n, mu=0, V = matrix(1)){ p <- length(mu) if(any(is.na(match(dim(V),p)))) stop("Dimension problem!") D <- chol(V) t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p))) } ##Make some data set.seed(1) n <- 2000 coords <- cbind(runif(n,0,1), runif(n,0,1)) x <- cbind(1, rnorm(n)) B <- as.matrix(c(1,5)) sigma.sq <- 5 tau.sq <- 1 phi <- 3/0.5 D <- as.matrix(dist(coords)) R <- exp(-phi*D) w <- rmvn(1, rep(0,n), sigma.sq*R) y <- rnorm(n, x%*%B + w, sqrt(tau.sq)) ho <- sample(1:n, 1000) y.ho <- y[ho] x.ho <- x[ho,,drop=FALSE] w.ho <- w[ho] coords.ho <- coords[ho,] y <- y[-ho] x <- x[-ho,,drop=FALSE] w <- w[-ho,,drop=FALSE] coords <- coords[-ho,] ##Fit a Conjugate NNGP model and predict for the holdout sigma.sq.IG <- c(2, sigma.sq) cov.model <- "exponential" g <- 10 theta.alpha <- cbind(seq(phi,30,length.out=g), seq(tau.sq/sigma.sq,5,length.out=g)) colnames(theta.alpha) <- c("phi", "alpha") m.c <- spConjNNGP(y~x-1, coords=coords, n.neighbors = 10, X.0 = x.ho, coords.0 = coords.ho, k.fold = 5, score.rule = "crps", n.omp.threads = 1, theta.alpha = theta.alpha, sigma.sq.IG = sigma.sq.IG, cov.model = cov.model) m.c
The function spDiag
calculates measurements of model fit for
objects of class NNGP
and PGLogit
.
spDiag(object, sub.sample, ...)
spDiag(object, sub.sample, ...)
object |
an object of class |
sub.sample |
an optional list that specifies the samples to included in
the computations. Valid tags are |
... |
currently no additional arguments. |
A list with the following tags:
DIC |
a data frame holding Deviance information criterion (DIC) and associated values. Values in |
GPD |
a data frame holding D=G+P and associated values. Values in
|
GRS |
a scoring rule, see Equation 27 in Gneiting and Raftery (2007) for details. |
WAIC |
a data frame hold Watanabe-Akaike information criteria (WAIC) and associated values. Values in
|
y.rep.samples |
if |
y.fit.samples |
if |
s.indx |
the index of samples used for the computations. |
Andrew O. Finley [email protected],
Sudipto Banerjee [email protected]
Finley, A.O., A. Datta, S. Banerjee (2022) spNNGP R Package for Nearest Neighbor Gaussian Process Models. Journal of Statistical Software, doi:10.18637/jss.v103.i05.
Gelfand A.E. and Ghosh, S.K. (1998). Model choice: a minimum posterior predictive loss approach. Biometrika, 85:1-11.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24:997-1016.
Gneiting, T. and Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102:359-378.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B., 64:583-639.
The function spNNGP
fits Gaussian and non-Gaussian univariate Bayesian spatial
regression models using Nearest Neighbor Gaussian Processes (NNGP).
spNNGP(formula, data = parent.frame(), coords, method = "response", family="gaussian", weights, n.neighbors = 15, starting, tuning, priors, cov.model = "exponential", n.samples, n.omp.threads = 1, search.type = "cb", ord, return.neighbor.info = FALSE, neighbor.info, fit.rep = FALSE, sub.sample, verbose = TRUE, n.report = 100, ...)
spNNGP(formula, data = parent.frame(), coords, method = "response", family="gaussian", weights, n.neighbors = 15, starting, tuning, priors, cov.model = "exponential", n.samples, n.omp.threads = 1, search.type = "cb", ord, return.neighbor.info = FALSE, neighbor.info, fit.rep = FALSE, sub.sample, verbose = TRUE, n.report = 100, ...)
formula |
a symbolic description of the regression model to be fit. See example below. |
data |
an optional data frame containing the variables in the
model. If not found in data, the variables are taken from
|
coords |
an |
method |
a quoted keyword that specifies the NNGP sampling
algorithm. Supported method keywords are: |
family |
a quoted keyword that specifies the data likelihood. Choices are "gaussian" for continuous outcome and "binomial" for discrete outcome which assumes a logistic link modeled using Polya-Gamma latent variables. |
weights |
specifies the number of trials for each observation when
|
n.neighbors |
number of neighbors used in the NNGP. |
starting |
a list with each tag corresponding to a parameter name. Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
priors |
a list with each tag corresponding to a
parameter name. Valid tags are |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
n.samples |
the number of posterior samples to collect. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
fit.rep |
if |
sub.sample |
an optional list that specifies the samples used
for |
verbose |
if |
search.type |
a quoted keyword that specifies type of nearest
neighbor search algorithm. Supported method key words are: |
ord |
an index vector of length |
return.neighbor.info |
if |
neighbor.info |
see the |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. |
... |
currently no additional arguments. |
Model parameters can be fixed at their starting
values by setting their
tuning
values to zero.
The no nugget model is specified by setting tau.sq
to zero
in the starting
and tuning
lists.
An object of class NNGP
with additional class designations for
method
and family
. The return object is a list comprising:
p.beta.samples |
a |
p.theta.samples |
a |
p.w.samples |
is a matrix of posterior samples for the spatial
random effects, where rows correspond to locations in |
y.hat.samples |
if |
y.hat.quants |
if |
y.rep.samples |
if |
y.rep.quants |
if |
s.indx |
if |
neighbor.info |
returned if |
run.time |
execution time for parameter estimation reported using
|
The return object will include additional objects used for subsequent prediction and/or model fit evaluation.
Andrew O. Finley [email protected],
Abhirup Datta [email protected],
Sudipto Banerjee [email protected]
Finley, A.O., A. Datta, S. Banerjee (2022) spNNGP R Package for Nearest Neighbor Gaussian Process Models. Journal of Statistical Software, doi:10.18637/jss.v103.i05.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
rmvn <- function(n, mu=0, V = matrix(1)){ p <- length(mu) if(any(is.na(match(dim(V),p)))) stop("Dimension problem!") D <- chol(V) t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p))) } ##Make some data set.seed(1) n <- 100 coords <- cbind(runif(n,0,1), runif(n,0,1)) x <- cbind(1, rnorm(n)) B <- as.matrix(c(1,5)) sigma.sq <- 5 tau.sq <- 1 phi <- 3/0.5 D <- as.matrix(dist(coords)) R <- exp(-phi*D) w <- rmvn(1, rep(0,n), sigma.sq*R) y <- rnorm(n, x%*%B + w, sqrt(tau.sq)) ##Fit a Response and Latent NNGP model n.samples <- 500 starting <- list("phi"=phi, "sigma.sq"=5, "tau.sq"=1) tuning <- list("phi"=0.5, "sigma.sq"=0.5, "tau.sq"=0.5) priors <- list("phi.Unif"=c(3/1, 3/0.01), "sigma.sq.IG"=c(2, 5), "tau.sq.IG"=c(2, 1)) cov.model <- "exponential" m.s <- spNNGP(y~x-1, coords=coords, starting=starting, method="latent", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, n.samples=n.samples, n.omp.threads=1) summary(m.s) plot(apply(m.s$p.w.samples, 1, median), w) m.r <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, n.samples=n.samples, n.omp.threads=1) summary(m.r) ##Fit with some user defined neighbor ordering ##ord <- order(coords[,2]) ##y-axis ord <- order(coords[,1]+coords[,2]) ##x+y-axis ##ord <- sample(1:n, n) ##random m.r.xy <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, ord=ord, return.neighbor.info=TRUE, n.samples=n.samples, n.omp.threads=1) summary(m.r.xy) ## Not run: ##Visualize the neighbor sets and ordering constraint n.indx <- m.r.xy$neighbor.info$n.indx ord <- m.r.xy$neighbor.info$ord ##This is how the data are ordered internally for model fitting coords.ord <- coords[ord,] for(i in 1:n){ plot(coords.ord, cex=1, xlab="Easting", ylab="Northing") points(coords.ord[i,,drop=FALSE], col="blue", pch=19, cex=1) points(coords.ord[n.indx[[i]],,drop=FALSE], col="red", pch=19, cex=1) readline(prompt = "Pause. Press <Enter> to continue...") } ## End(Not run)
rmvn <- function(n, mu=0, V = matrix(1)){ p <- length(mu) if(any(is.na(match(dim(V),p)))) stop("Dimension problem!") D <- chol(V) t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p))) } ##Make some data set.seed(1) n <- 100 coords <- cbind(runif(n,0,1), runif(n,0,1)) x <- cbind(1, rnorm(n)) B <- as.matrix(c(1,5)) sigma.sq <- 5 tau.sq <- 1 phi <- 3/0.5 D <- as.matrix(dist(coords)) R <- exp(-phi*D) w <- rmvn(1, rep(0,n), sigma.sq*R) y <- rnorm(n, x%*%B + w, sqrt(tau.sq)) ##Fit a Response and Latent NNGP model n.samples <- 500 starting <- list("phi"=phi, "sigma.sq"=5, "tau.sq"=1) tuning <- list("phi"=0.5, "sigma.sq"=0.5, "tau.sq"=0.5) priors <- list("phi.Unif"=c(3/1, 3/0.01), "sigma.sq.IG"=c(2, 5), "tau.sq.IG"=c(2, 1)) cov.model <- "exponential" m.s <- spNNGP(y~x-1, coords=coords, starting=starting, method="latent", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, n.samples=n.samples, n.omp.threads=1) summary(m.s) plot(apply(m.s$p.w.samples, 1, median), w) m.r <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, n.samples=n.samples, n.omp.threads=1) summary(m.r) ##Fit with some user defined neighbor ordering ##ord <- order(coords[,2]) ##y-axis ord <- order(coords[,1]+coords[,2]) ##x+y-axis ##ord <- sample(1:n, n) ##random m.r.xy <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10, tuning=tuning, priors=priors, cov.model=cov.model, ord=ord, return.neighbor.info=TRUE, n.samples=n.samples, n.omp.threads=1) summary(m.r.xy) ## Not run: ##Visualize the neighbor sets and ordering constraint n.indx <- m.r.xy$neighbor.info$n.indx ord <- m.r.xy$neighbor.info$ord ##This is how the data are ordered internally for model fitting coords.ord <- coords[ord,] for(i in 1:n){ plot(coords.ord, cex=1, xlab="Easting", ylab="Northing") points(coords.ord[i,,drop=FALSE], col="blue", pch=19, cex=1) points(coords.ord[n.indx[[i]],,drop=FALSE], col="red", pch=19, cex=1) readline(prompt = "Pause. Press <Enter> to continue...") } ## End(Not run)
Methods for extracting information from fitted NNGP model of class
NNGP
and predict.NNGP
objects from predict
.
## S3 method for class 'NNGP' summary(object, sub.sample, quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'NNGP' print(x, ...) ## S3 method for class 'predict.NNGP' print(x, ...)
## S3 method for class 'NNGP' summary(object, sub.sample, quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'NNGP' print(x, ...) ## S3 method for class 'predict.NNGP' print(x, ...)
object , x
|
object of class |
sub.sample |
an optional list that specifies the samples to included in
the summary or composition sampling. Valid tags are |
quantiles |
for |
digits |
for |
... |
currently not used. |
A set of standard extractor functions for fitted model objects of
class NNGP
and prediction object of class predict.NNGP
, including methods to the generic functions print
and summary
.
Methods for extracting information from fitted PGLogit
model.
## S3 method for class 'PGLogit' summary(object, sub.sample, quantiles =c(0.025, 0.25, 0.5, 0.75, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'PGLogit' print(x, ...)
## S3 method for class 'PGLogit' summary(object, sub.sample, quantiles =c(0.025, 0.25, 0.5, 0.75, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'PGLogit' print(x, ...)
object , x
|
object of class |
sub.sample |
an optional list that specifies the samples to included in
the summary or composition sampling. Valid tags are |
quantiles |
for |
digits |
for |
... |
currently not used. |
A set of standard extractor functions for fitted model objects of
class PGLogit
, including methods to the generic functions print
and summary
.