Title: | Single-Species, Multi-Species, and Integrated Spatial Occupancy Models |
---|---|
Description: | Fits single-species, multi-species, and integrated non-spatial and spatial occupancy models using Markov Chain Monte Carlo (MCMC). Models are fit using Polya-Gamma data augmentation detailed in Polson, Scott, and Windle (2013) <doi:10.1080/01621459.2013.829001>. Spatial models are fit using either Gaussian processes or Nearest Neighbor Gaussian Processes (NNGP) for large spatial datasets. Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Provides functionality for data integration of multiple single-species occupancy data sets using a joint likelihood framework. Details on data integration are given in Miller, Pacifici, Sanderlin, and Reich (2019) <doi:10.1111/2041-210X.13110>. Details on single-species and multi-species models are found in MacKenzie, Nichols, Lachman, Droege, Royle, and Langtimm (2002) <doi:10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2> and Dorazio and Royle <doi:10.1198/016214505000000015>, respectively. |
Authors: | Jeffrey Doser [aut, cre], Andrew Finley [aut], Marc Kery [ctb] |
Maintainer: | Jeffrey Doser <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.8.0 |
Built: | 2024-12-14 12:49:23 UTC |
Source: | CRAN |
Fits single-species, multi-species, and integrated non-spatial and spatial
occupancy models using Markov Chain Monte Carlo (MCMC). Models are fit using
Polya-Gamma data augmentation detailed in Polson, Scott, and Windle (2013).
Spatial models are fit using either Gaussian processes or Nearest Neighbor
Gaussian Processes (NNGP) for large spatial datasets. Details on NNGPs are
given in Datta, Banerjee, Finley, and Gelfand (2016). Provides functionality
for data integration of multiple occupancy data sets using a
joint likelihood framework. Details on data integration are given in
Miller, Pacifici, Sanderlin, and Reich (2019). Details on single-species and
multi-species models are found in MacKenzie et al. (2002) and Dorazio and Royle (2005),
respectively. Details on the package functionality is given in Doser et al. (2022),
Doser, Finley, Banerjee (2023), Doser et al. (2024a,b).
See citation('spOccupancy')
for how to cite spOccupancy in publications.
Single-species models
PGOcc
fits single-species occupancy models.
spPGOcc
fits single-species spatial occupancy models.
intPGOcc
fits single-species integrated occupancy models (i.e., an occupancy model with multiple data sources).
spIntPGOcc
fits single-species integrated spatial occupancy models.
tPGOcc
fits a multi-season single-species occupancy model.
stPGOcc
fits a multi-season single-species spatial occupancy model.
svcPGBinom
fits a single-species spatially-varying coefficient GLM.
svcPGOcc
fits a single-species spatially-varying coefficient occupancy model.
svcTPGBinom
fits a single-species spatially-varying coefficient multi-season GLM.
svcTPGOcc
fits a single-species spatially-varying coefficient multi-season occupancy model.
Multi-species models
msPGOcc
fits multi-species occupancy models.
spMsPGOcc
fits multi-species spatial occupancy models.
lfJSDM
fits a joint species distribution model without imperfect detection.
sfJSDM
fits a spatial joint species distribution model without imperfect detection.
lfMsPGOcc
fits a joint species distribution model with imperfect detection (i.e., a multi-species occupancy model with residual species correlations).
sfMsPGOcc
fits a spatial joint species distribution model with imperfect detection.
svcMsPGOcc
fits a multi-species spatially-varying coefficient occupancy model.
tMsPGOcc
fits a multi-season multi-species occupancy model.
stMsPGOcc
fits a multi-season multi-species spatial occupancy model.
svcTMsPGOcc
fits a multi-season multi-species spatially-varying coefficient occupancy model.
Goodness of Fit and Model Assessment Functions
ppcOcc
performs posterior predictive checks.
waicOcc
computes the Widely Applicable Information Criterion for spOccupancy model objects.
Data Simulation Functions
simOcc
simulates single-species occupancy data.
simTOcc
simulates single-species multi-season occupancy data.
simBinom
simulates detection-nondetection data with perfect detection.
simTBinom
simulates multi-season detection-nondetection data with perfect detection.
simMsOcc
simulates multi-species occupancy data.
simIntOcc
simulates single-species occupancy data from multiple data sources.
simTMsOcc
simulates multi-species multi-season occupancy data from multiple data sources.
Miscellaneous
postHocLM
fits post-hoc linear (mixed) models.
getSVCSamples
extracts spatially varying coefficient MCMC samples.
updateMCMC
updates a spOccupancy or spAbundance model object with more MCMC iterations.
All objects from model-fitting functions have support with the summary
function for
displaying a concise summary of model results, the fitted
function for extracting
model fitted values, and the predict
function for predicting occupancy and/or detection
across an area of interest.
Jeffrey W. Doser, Andrew O. Finley, Marc Kery
Doser, J. W., Finley, A. O., Kery, M., & Zipkin, E. F. (2022). spOccupancy: An R package for single-species, multi-species, and integrated spatial occupancy models. Methods in Ecology and Evolution, 13, 1670-1678. doi:10.1111/2041-210X.13897.
Doser, J. W., Finley, A. O., & Banerjee, S. (2023). Joint species distribution models with imperfect detection for high-dimensional spatial data. Ecology, 104(9), e4137. doi:10.1002/ecy.4137.
Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed, A. S., & Zipkin, E. F. (2024A). Modeling complex species-environment relationships through spatially-varying coefficient occupancy models. Journal of Agricultural, Biological and Environmental Statistics. doi:10.1007/s13253-023-00595-6.
Doser, J. W., Kery, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, 33(4), e13814. doi:10.1111/geb.13814.
Method for extracting model fitted values and detection probability values from a fitted single-species integrated occupancy (intPGOcc
) model.
## S3 method for class 'intPGOcc' fitted(object, ...)
## S3 method for class 'intPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class intPGOcc
.
A list comprised of
y.rep.samples |
A list of three-dimensional numeric arrays of fitted values for each individual data source for use in Goodness of Fit assessments. |
p.samples |
A list of three-dimensional numeric arrays of detection probability values. |
Method for extracting model fitted values and probability values from a fitted latent factor joint species distribution model (lfJSDM
).
## S3 method for class 'lfJSDM' fitted(object, ...)
## S3 method for class 'lfJSDM' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and probability values for fitted model objects of class lfJSDM
.
A list comprised of:
z.samples |
A three-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, and sites. |
psi.samples |
A three-dimensional numeric array of probability values. Array dimensions correspond to MCMC samples, species, and sites. |
Method for extracting model fitted values and detection probability values from a fitted latent factor multi-species occupancy (lfMsPGOcc
) model.
## S3 method for class 'lfMsPGOcc' fitted(object, ...)
## S3 method for class 'lfMsPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class lfMsPGOcc
.
A list comprised of:
y.rep.samples |
A four-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
p.samples |
A four-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
Method for extracting model fitted values and detection probability values from a fitted multi-species occupancy (msPGOcc
) model.
## S3 method for class 'msPGOcc' fitted(object, ...)
## S3 method for class 'msPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class msPGOcc
.
A list comprised of:
y.rep.samples |
A four-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
p.samples |
A four-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
Method for extracting model fitted values and detection probabilities from a fitted single-species occupancy (PGOcc
) model.
## S3 method for class 'PGOcc' fitted(object, ...)
## S3 method for class 'PGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probabilities for fitted model objects of class PGOcc
.
A list comprised of:
y.rep.samples |
A three-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, sites, and replicates. |
p.samples |
A three-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, sites, and replicates. |
Method for extracting model fitted values and probability values from a fitted spatial factor joint species distribution model (sfJSDM
).
## S3 method for class 'sfJSDM' fitted(object, ...)
## S3 method for class 'sfJSDM' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and probability values for fitted model objects of class sfJSDM
.
A list comprised of:
z.samples |
A three-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, and sites. |
psi.samples |
A three-dimensional numeric array of probability values. Array dimensions correspond to MCMC samples, species, and sites. |
Method for extracting model fitted values and detection probability values from a fitted spatial factor multi-species occupancy (sfMsPGOcc
) model.
## S3 method for class 'sfMsPGOcc' fitted(object, ...)
## S3 method for class 'sfMsPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class sfMsPGOcc
.
A list comprised of:
y.rep.samples |
A four-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
p.samples |
A four-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
Method for extracting model fitted values and detection probability values from a fitted single-species integrated spatial occupancy (spIntPGOcc
) model.
## S3 method for class 'spIntPGOcc' fitted(object, ...)
## S3 method for class 'spIntPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class spIntPGOcc
.
A list comprised of
y.rep.samples |
A list of three-dimensional numeric arrays of fitted values for each individual data source for use in Goodness of Fit assessments. |
p.samples |
A list of three-dimensional numeric arrays of detection probability values. |
Method for extracting model fitted values and detection probability values from a fitted multi-species spatial occupancy (spMsPGOcc
) model.
## S3 method for class 'spMsPGOcc' fitted(object, ...)
## S3 method for class 'spMsPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class spMsPGOcc
.
A list comprised of:
y.rep.samples |
A four-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
p.samples |
A four-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
Method for extracting model fitted values and detection probabilities from a fitted single-species spatial occupancy (spPGOcc
) model.
## S3 method for class 'spPGOcc' fitted(object, ...)
## S3 method for class 'spPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probabilities for fitted model objects of class spPGOcc
.
A list comprised of:
y.rep.samples |
A three-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, sites, and replicates. |
p.samples |
A three-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, sites, and replicates. |
Method for extracting model fitted values and detection probabilities from a fitted multi-season single-species spatial integrated occupancy (stIntPGOcc
) model.
## S3 method for class 'stIntPGOcc' fitted(object, ...)
## S3 method for class 'stIntPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probabilities for fitted model objects of class stIntPGOcc
.
A list comprised of:
y.rep.samples |
a list of four-dimensional numeric arrays of fitted values for each data set for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
p.samples |
a list of four-dimensional numeric arrays of detection probability values for each data set. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
Method for extracting model fitted values and detection probability values from a fitted multi-species multi-season spatial occupancy (stMsPGOcc
) model.
## S3 method for class 'stMsPGOcc' fitted(object, ...)
## S3 method for class 'stMsPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class stMsPGOcc
.
A list comprised of:
y.rep.samples |
A five-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, sites, primary time period, and replicates. |
p.samples |
A five-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, species, sites, primary time period, and replicates. |
Method for extracting model fitted values and detection probabilities from a fitted multi-season single-species spatial occupancy (stPGOcc
) model.
## S3 method for class 'stPGOcc' fitted(object, ...)
## S3 method for class 'stPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probabilities for fitted model objects of class stPGOcc
.
A list comprised of:
y.rep.samples |
A four-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
p.samples |
A four-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
Method for extracting model fitted values and detection probability values from a fitted multi-species spatially varying coefficient occupancy (svcMsPGOcc
) model.
## S3 method for class 'svcMsPGOcc' fitted(object, ...)
## S3 method for class 'svcMsPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class svcMsPGOcc
.
A list comprised of:
y.rep.samples |
A four-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
p.samples |
A four-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, species, sites, and replicates. |
Method for extracting model fitted values from a fitted single-species spatially-varying coefficients binomial model (svcPGBinom
).
## S3 method for class 'svcPGBinom' fitted(object, ...)
## S3 method for class 'svcPGBinom' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values for fitted model objects of class svcPGBinom
.
A two-dimensional matrix of fitted values for use in Goodness of Fit assessments. Dimensions correspond to MCMC samples and sites.
Method for extracting model fitted values and detection probabilities from a fitted single-species spatially-varying coefficients occupancy (svcPGOcc
) model.
## S3 method for class 'svcPGOcc' fitted(object, ...)
## S3 method for class 'svcPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probabilities for fitted model objects of class svcPGOcc
.
A list comprised of:
y.rep.samples |
A three-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, sites, and replicates. |
p.samples |
A three-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, sites, and replicates. |
Method for extracting model fitted values and detection probabilities from a fitted multi-season single-species spatially-varying coefficient integrated occupancy (svcTIntPGOcc
) model.
## S3 method for class 'svcTIntPGOcc' fitted(object, ...)
## S3 method for class 'svcTIntPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probabilities for fitted model objects of class svcTIntPGOcc
.
A list comprised of:
y.rep.samples |
a list of four-dimensional numeric arrays of fitted values for each data set for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
p.samples |
a list of four-dimensional numeric arrays of detection probability values for each data set. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
Method for extracting model fitted values and detection probability values from a fitted multi-species multi-season spatially varying coefficient occupancy (svcTMsPGOcc
) model.
## S3 method for class 'svcTMsPGOcc' fitted(object, ...)
## S3 method for class 'svcTMsPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class svcTMsPGOcc
.
A list comprised of:
y.rep.samples |
A five-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, sites, primary time period, and replicates. |
p.samples |
A five-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, species, sites, primary time period, and replicates. |
Method for extracting model fitted values from a fitted multi-season single-species spatially-varying coefficients binomial model (svcTPGBinom
).
## S3 method for class 'svcTPGBinom' fitted(object, ...)
## S3 method for class 'svcTPGBinom' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values for fitted model objects of class svcTPGBinom
.
A three-dimensional matrix of fitted values for use in Goodness of Fit assessments. Dimensions correspond to MCMC samples, sites, and primary time periods.
Method for extracting model fitted values and detection probabilities from a fitted multi-season single-species spatially-varying coefficients occupancy (svcTPGOcc
) model.
## S3 method for class 'svcTPGOcc' fitted(object, ...)
## S3 method for class 'svcTPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probabilities for fitted model objects of class svcTPGOcc
.
A list comprised of:
y.rep.samples |
A four-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
p.samples |
A four-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
Method for extracting model fitted values and detection probabilities from a fitted multi-season single-species integrated occupancy (tIntPGOcc
) model.
## S3 method for class 'tIntPGOcc' fitted(object, ...)
## S3 method for class 'tIntPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probabilities for fitted model objects of class tIntPGOcc
.
A list comprised of:
y.rep.samples |
a list of four-dimensional numeric arrays of fitted values for each data set for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
p.samples |
a list of four-dimensional numeric arrays of detection probability values for each data set. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
Method for extracting model fitted values and detection probability values from a fitted multi-species multi-season occupancy (tMsPGOcc
) model.
## S3 method for class 'tMsPGOcc' fitted(object, ...)
## S3 method for class 'tMsPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probability values for fitted model objects of class tMsPGOcc
.
A list comprised of:
y.rep.samples |
A five-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, species, sites, primary time period, and replicates. |
p.samples |
A five-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, species, sites, primary time period, and replicates. |
Method for extracting model fitted values and detection probabilities from a fitted multi-season single-species occupancy (tPGOcc
) model.
## S3 method for class 'tPGOcc' fitted(object, ...)
## S3 method for class 'tPGOcc' fitted(object, ...)
object |
object of class |
... |
currently no additional arguments |
A method to the generic fitted
function to extract fitted values and detection probabilities for fitted model objects of class tPGOcc
.
A list comprised of:
y.rep.samples |
A four-dimensional numeric array of fitted values for use in Goodness of Fit assessments. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
p.samples |
A four-dimensional numeric array of detection probability values. Array dimensions correspond to MCMC samples, sites, primary time periods, and replicates. |
Function for extracting the full spatially-varying coefficient MCMC samples from an spOccupancy model object.
getSVCSamples(object, pred.object, ...)
getSVCSamples(object, pred.object, ...)
object |
an object of class |
pred.object |
a prediction object from a spatially-varying coefficient
model fit using spOccupancy. Should be of class |
... |
currently no additional arguments |
A list of coda::mcmc
objects of the spatially-varying coefficient MCMC samples
for all spatially-varying coefficients estimated in the model (including the
intercept if specified). Note these values correspond to the sum of the estimated
spatial and non-spatial effect to give the overall effect of the covariate at
each location. Each element of the list is a two-dimensional matrix
where dimensions correspond to MCMC sample and site. If pred.object
is specified,
values are returned for the prediction locations instead of the sampled locations.
For multi-species models, the value of the SVC will be returned at all
spatial locations for each species even when range.ind
is specified
in the data list when fitting the model. This may not be desirable for complete
summaries of the SVC for each species, so if specifying range.ind
in
the data list, you may want to subsequently process the SVC samples for each species
to be restricted to each species range.
Jeffrey W. Doser [email protected],
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) phi <- c(3 / .6, 3 / .8) sigma.sq <- c(1.2, 0.7) svc.cols <- c(1, 2) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential', svc.cols = svc.cols) # Detection-nondetection data y <- dat$y # Occupancy covariates X <- dat$X # Detection covarites X.p <- dat$X.p # Spatial coordinates coords <- dat$coords # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3/1, b = 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 2, w = matrix(0, nrow = length(svc.cols), ncol = nrow(X)), z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) out <- svcPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = 'exponential', svc.cols = c(1, 2), tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1) svc.samples <- getSVCSamples(out) str(svc.samples)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) phi <- c(3 / .6, 3 / .8) sigma.sq <- c(1.2, 0.7) svc.cols <- c(1, 2) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential', svc.cols = svc.cols) # Detection-nondetection data y <- dat$y # Occupancy covariates X <- dat$X # Detection covarites X.p <- dat$X.p # Spatial coordinates coords <- dat$coords # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3/1, b = 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 2, w = matrix(0, nrow = length(svc.cols), ncol = nrow(X)), z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) out <- svcPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = 'exponential', svc.cols = c(1, 2), tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1) svc.samples <- getSVCSamples(out) str(svc.samples)
Detection-nondetection data of 12 foliage gleaning bird species in 2015 in the Hubbard Brook Experimental Forest (HBEF) in New Hampshire, USA. Data were collected at 373 sites over three replicate point counts each of 10 minutes in length, with a detection radius of 100m. Some sites were not visited for all three replicates. The 12 species included in the data set are as follows: (1) AMRE: American Redstart; (2) BAWW: Black-and-white Warbler; (3) BHVI: Blue-headed Vireo; (4) BLBW: Blackburnian Warbler; (5) BLPW: Blackpoll Warbler; (6) BTBW: Black-throated Blue Warbler; (7) BTNW: BLack-throated Green Warbler; (8) CAWA: Canada Warbler; (9) MAWA: Magnolia Warbler; (10) NAWA: Nashville Warbler; (11) OVEN: Ovenbird; (12) REVI: Red-eyed Vireo.
data(hbef2015)
data(hbef2015)
hbef2015
is a list with four elements:
y
: a three-dimensional array of detection-nondetection data with
dimensions of species (12), sites (373) and replicates (3).
occ.covs
: a numeric matrix with 373 rows and one column consisting of the
elevation at each site.
det.covs
: a list of two numeric matrices with 373 rows and 3 columns.
The first element is the day of year when the survey was
conducted for a given site and replicate. The second element is the
time of day when the survey was conducted.
coords
: a numeric matrix with 373 rows and two columns containing the
site coordinates (Easting and Northing) in UTM Zone 19. The proj4string is
"+proj=utm +zone=19 +units=m +datum=NAD83".
Rodenhouse, N. and S. Sillett. 2019. Valleywide Bird Survey, Hubbard Brook Experimental Forest, 1999-2016 (ongoing) ver 3. Environmental Data Initiative. doi:10.6073/pasta/faca2b2cf2db9d415c39b695cc7fc217 (Accessed 2021-09-07)
Doser, J. W., Leuenberger, W., Sillett, T. S., Hallworth, M. T. & Zipkin, E. F. (2022). Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources. Methods in Ecology and Evolution, 00, 1-14. doi:10.1111/2041-210X.13811
Elevation in meters extracted at a 30m resolution of the Hubbard Brook Experimental Forest. Data come from the National Elevation Dataset.
data(hbefElev)
data(hbefElev)
hbefElev
is a data frame with three columns:
val
: the elevation value in meters.
Easting
: the x coordinate of the point. The proj4string is
"+proj=utm +zone=19 +units=m +datum=NAD83".
Northing
: the y coordinate of the point. The proj4string is
"+proj=utm +zone=19 +units=m +datum=NAD83".
Gesch, D., Oimoen, M., Greenlee, S., Nelson, C., Steuck, M., & Tyler, D. (2002). The national elevation dataset. Photogrammetric engineering and remote sensing, 68(1), 5-32.
Gesch, D., Oimoen, M., Greenlee, S., Nelson, C., Steuck, M., & Tyler, D. (2002). The national elevation dataset. Photogrammetric engineering and remote sensing, 68(1), 5-32.
Detection-nondetection data of 12 foliage gleaning bird species in 2010-2018 in the Hubbard Brook Experimental Forest (HBEF) in New Hampshire, USA. Data were collected at 373 sites over three replicate point counts each of 10 minutes in length, with a detection radius of 100m. Some sites were not visited for all three replicates. The 12 species included in the data set are as follows: (1) AMRE: American Redstart; (2) BAWW: Black-and-white Warbler; (3) BHVI: Blue-headed Vireo; (4) BLBW: Blackburnian Warbler; (5) BLPW: Blackpoll Warbler; (6) BTBW: Black-throated Blue Warbler; (7) BTNW: BLack-throated Green Warbler; (8) CAWA: Canada Warbler; (9) MAWA: Magnolia Warbler; (10) NAWA: Nashville Warbler; (11) OVEN: Ovenbird; (12) REVI: Red-eyed Vireo.
data(hbefTrends)
data(hbefTrends)
hbefTrends
is a list with four elements:
y
: a four-dimensional array of detection-nondetection data with
dimensions of species (12), sites (373), years (9), and replicates (3).
occ.covs
: a list of potential covariates for inclusion in the
occurrence portion of an occupancy model. There are two covariates:
elevation (a site-level covariate), and years (a temporal covariate.
)
det.covs
: a list of two numeric three-dimensional arrays with
dimensions corresponding to sites (373), years (9), and replicates (3).
The first element is the day of year when the survey was
conducted for a given site, year, and replicate. The second element is the
time of day when the survey was conducted.
coords
: a numeric matrix with 373 rows and two columns containing the
site coordinates (Easting and Northing) in UTM Zone 19. The proj4string is
"+proj=utm +zone=19 +units=m +datum=NAD83".
Rodenhouse, N. and S. Sillett. 2019. Valleywide Bird Survey, Hubbard Brook Experimental Forest, 1999-2016 (ongoing) ver 3. Environmental Data Initiative. doi:10.6073/pasta/faca2b2cf2db9d415c39b695cc7fc217 (Accessed 2021-09-07)
Doser, J. W., Leuenberger, W., Sillett, T. S., Hallworth, M. T. & Zipkin, E. F. (2022). Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources. Methods in Ecology and Evolution, 00, 1-14. doi:10.1111/2041-210X.13811
Function for fitting integrated multi-species occupancy models using Polya-Gamma latent variables.
intMsPGOcc(occ.formula, det.formula, data, inits, priors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, ...)
intMsPGOcc(occ.formula, det.formula, data, inits, priors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a list of symbolic descriptions of the models to be fit for the detection portion of the model using R's model syntax for each data set. Each element in the list is a formula for the detection model of a given data set. Only right-hand side of formula is specified. Random effects are not currently supported. See example below. |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
n.samples |
the number of posterior samples to collect in each chain. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. This will have no impact on
model run times for non-spatial models. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
... |
currently no additional arguments |
An object of class intMsPGOcc
that is a list comprised of:
beta.comm.samples |
a |
alpha.comm.samples |
a |
tau.sq.beta.samples |
a |
tau.sq.alpha.samples |
a |
beta.samples |
a |
alpha.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occurrence values for each species. |
psi.samples |
a three-dimensional array of posterior samples for the latent occurrence probability values for each species. |
sigma.sq.psi.samples |
a |
beta.star.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood value associated with each site and species. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
The return object will include additional objects used for subsequent prediction and/or model fit evaluation.
Jeffrey W. Doser [email protected],
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Dorazio, R. M., and Royle, J. A. (2005). Estimating size and composition of biological communities by modeling the occurrence of species. Journal of the American Statistical Association, 100(470), 389-398.
Doser, J. W., Leuenberger, W., Sillett, T. S., Hallworth, M. T. & Zipkin, E. F. (2022). Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources. Methods in Ecology and Evolution, 00, 1-14. doi:10.1111/2041-210X.13811
set.seed(91) J.x <- 10 J.y <- 10 # Total number of data sources across the study region J.all <- J.x * J.y # Number of data sources. n.data <- 2 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) n.rep <- list() n.rep[[1]] <- rep(3, J.obs[1]) n.rep[[2]] <- rep(4, J.obs[2]) # Number of species observed in each data source N <- c(8, 3) # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.4, 0.3) # Detection # Detection covariates alpha.mean <- list() tau.sq.alpha <- list() # Number of detection parameters in each data source p.det.long <- c(4, 3) for (i in 1:n.data) { alpha.mean[[i]] <- runif(p.det.long[i], -1, 1) tau.sq.alpha[[i]] <- runif(p.det.long[i], 0.1, 1) } # Random effects psi.RE <- list() p.RE <- list() beta <- matrix(NA, nrow = max(N), ncol = p.occ) for (i in 1:p.occ) { beta[, i] <- rnorm(max(N), beta.mean[i], sqrt(tau.sq.beta[i])) } alpha <- list() for (i in 1:n.data) { alpha[[i]] <- matrix(NA, nrow = N[i], ncol = p.det.long[i]) for (t in 1:p.det.long[i]) { alpha[[i]][, t] <- rnorm(N[i], alpha.mean[[i]][t], sqrt(tau.sq.alpha[[i]])[t]) } } sp <- FALSE factor.model <- FALSE # Simulate occupancy data dat <- simIntMsOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = sp, factor.model = factor.model, n.factors = n.factors) J <- nrow(dat$coords.obs) y <- dat$y X <- dat$X.obs X.p <- dat$X.p X.re <- dat$X.re.obs X.p.re <- dat$X.p.re sites <- dat$sites species <- dat$species # Package all data into a list occ.covs <- cbind(X) colnames(occ.covs) <- c('int', 'occ.cov.1') #colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2], det.cov.1.2 = X.p[[1]][, , 3], det.cov.1.3 = X.p[[1]][, , 4]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2], det.cov.2.2 = X.p[[2]][, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, species = species) # Take a look at the data.list structure for integrated multi-species # occupancy models. # Priors prior.list <- list(beta.comm.normal = list(mean = 0,var = 2.73), alpha.comm.normal = list(mean = list(0, 0), var = list(2.72, 2.72)), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = list(0.1, 0.1), b = list(0.1, 0.1))) inits.list <- list(alpha.comm = list(0, 0), beta.comm = 0, tau.sq.beta = 1, tau.sq.alpha = list(1, 1), alpha = list(a = matrix(rnorm(p.det.long[1] * N[1]), N[1], p.det.long[1]), b = matrix(rnorm(p.det.long[2] * N[2]), N[2], p.det.long[2])), beta = 0) # Fit the model. # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- intMsPGOcc(occ.formula = ~ occ.cov.1, det.formula = list(f.1 = ~ det.cov.1.1 + det.cov.1.2 + det.cov.1.3, f.2 = ~ det.cov.2.1 + det.cov.2.2), inits = inits.list, priors = prior.list, data = data.list, n.samples = 100, n.omp.threads = 1, verbose = TRUE, n.report = 10, n.burn = 50, n.thin = 1, n.chains = 1) summary(out, level = 'community')
set.seed(91) J.x <- 10 J.y <- 10 # Total number of data sources across the study region J.all <- J.x * J.y # Number of data sources. n.data <- 2 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) n.rep <- list() n.rep[[1]] <- rep(3, J.obs[1]) n.rep[[2]] <- rep(4, J.obs[2]) # Number of species observed in each data source N <- c(8, 3) # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.4, 0.3) # Detection # Detection covariates alpha.mean <- list() tau.sq.alpha <- list() # Number of detection parameters in each data source p.det.long <- c(4, 3) for (i in 1:n.data) { alpha.mean[[i]] <- runif(p.det.long[i], -1, 1) tau.sq.alpha[[i]] <- runif(p.det.long[i], 0.1, 1) } # Random effects psi.RE <- list() p.RE <- list() beta <- matrix(NA, nrow = max(N), ncol = p.occ) for (i in 1:p.occ) { beta[, i] <- rnorm(max(N), beta.mean[i], sqrt(tau.sq.beta[i])) } alpha <- list() for (i in 1:n.data) { alpha[[i]] <- matrix(NA, nrow = N[i], ncol = p.det.long[i]) for (t in 1:p.det.long[i]) { alpha[[i]][, t] <- rnorm(N[i], alpha.mean[[i]][t], sqrt(tau.sq.alpha[[i]])[t]) } } sp <- FALSE factor.model <- FALSE # Simulate occupancy data dat <- simIntMsOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = sp, factor.model = factor.model, n.factors = n.factors) J <- nrow(dat$coords.obs) y <- dat$y X <- dat$X.obs X.p <- dat$X.p X.re <- dat$X.re.obs X.p.re <- dat$X.p.re sites <- dat$sites species <- dat$species # Package all data into a list occ.covs <- cbind(X) colnames(occ.covs) <- c('int', 'occ.cov.1') #colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2], det.cov.1.2 = X.p[[1]][, , 3], det.cov.1.3 = X.p[[1]][, , 4]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2], det.cov.2.2 = X.p[[2]][, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, species = species) # Take a look at the data.list structure for integrated multi-species # occupancy models. # Priors prior.list <- list(beta.comm.normal = list(mean = 0,var = 2.73), alpha.comm.normal = list(mean = list(0, 0), var = list(2.72, 2.72)), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = list(0.1, 0.1), b = list(0.1, 0.1))) inits.list <- list(alpha.comm = list(0, 0), beta.comm = 0, tau.sq.beta = 1, tau.sq.alpha = list(1, 1), alpha = list(a = matrix(rnorm(p.det.long[1] * N[1]), N[1], p.det.long[1]), b = matrix(rnorm(p.det.long[2] * N[2]), N[2], p.det.long[2])), beta = 0) # Fit the model. # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- intMsPGOcc(occ.formula = ~ occ.cov.1, det.formula = list(f.1 = ~ det.cov.1.1 + det.cov.1.2 + det.cov.1.3, f.2 = ~ det.cov.2.1 + det.cov.2.2), inits = inits.list, priors = prior.list, data = data.list, n.samples = 100, n.omp.threads = 1, verbose = TRUE, n.report = 10, n.burn = 50, n.thin = 1, n.chains = 1) summary(out, level = 'community')
Function for fitting single-species integrated occupancy models using Polya-Gamma latent variables. Data integration is done using a joint likelihood framework, assuming distinct detection models for each data source that are each conditional on a single latent occurrence process.
intPGOcc(occ.formula, det.formula, data, inits, priors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.data, k.fold.only = FALSE, ...)
intPGOcc(occ.formula, det.formula, data, inits, priors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.data, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a list of symbolic descriptions of the models to be fit for the detection portion of the model using R's model syntax for each data set. Each element in the list is a formula for the detection model of a given data set. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
n.samples |
the number of posterior samples to collect in each chain. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. This will have no
impact on model run times for non-spatial models. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.data |
an integer specifying the specific data set to hold out values from. If not specified, data from all data set locations will be incorporated into the k-fold cross-validation. |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class intPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a |
psi.samples |
a |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
scoring rule (deviance) from k-fold cross-validation. A
separate deviance value is returned for each data source. Only included if
|
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can be
extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological monographs, 85(1), 3-28.
Finley, A. O., Datta, A., and Banerjee, S. (2020). spNNGP R package for nearest neighbor Gaussian process models. arXiv preprint arXiv:2001.09111.
set.seed(1008) # Simulate Data ----------------------------------------------------------- J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 1) p.occ <- length(beta) # Detection covariates alpha <- list() for (i in 1:n.data) { alpha[[i]] <- runif(2, -1, 1) } p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Simulate occupancy data. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , 2]) det.covs[[4]] <- list(det.cov.4.1 = X.p[[4]][, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites) J <- length(dat$z.obs) # Initial values inits.list <- list(alpha = list(0, 0, 0, 0), beta = 0, z = rep(1, J)) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = list(0, 0, 0, 0), var = list(2.72, 2.72, 2.72, 2.72))) n.samples <- 5000 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- intPGOcc(occ.formula = ~ occ.cov, det.formula = list(f.1 = ~ det.cov.1.1, f.2 = ~ det.cov.2.1, f.3 = ~ det.cov.3.1, f.4 = ~ det.cov.4.1), data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 1000, n.thin = 1, n.chains = 1) summary(out)
set.seed(1008) # Simulate Data ----------------------------------------------------------- J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 1) p.occ <- length(beta) # Detection covariates alpha <- list() for (i in 1:n.data) { alpha[[i]] <- runif(2, -1, 1) } p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Simulate occupancy data. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , 2]) det.covs[[4]] <- list(det.cov.4.1 = X.p[[4]][, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites) J <- length(dat$z.obs) # Initial values inits.list <- list(alpha = list(0, 0, 0, 0), beta = 0, z = rep(1, J)) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = list(0, 0, 0, 0), var = list(2.72, 2.72, 2.72, 2.72))) n.samples <- 5000 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- intPGOcc(occ.formula = ~ occ.cov, det.formula = list(f.1 = ~ det.cov.1.1, f.2 = ~ det.cov.2.1, f.3 = ~ det.cov.3.1, f.4 = ~ det.cov.4.1), data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 1000, n.thin = 1, n.chains = 1) summary(out)
Function for fitting a joint species distribution model with species correlations. This model does not explicitly account for imperfect detection (see lfMsPGOcc()
). We use Polya-gamma latent variables and a factor modeling approach.
lfJSDM(formula, data, inits, priors, n.factors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
lfJSDM(formula, data, inits, priors, n.factors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
formula |
a symbolic description of the model to be fit for the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
n.factors |
the number of factors to use in the latent factor model approach.
Typically, the number of factors is set to be small (e.g., 4-5) relative to the
total number of species in the community, which will lead to substantial
decreases in computation time. However, the value can be anywhere
between 0 and N (the number of species in the community). When set to 0, the model
assumes there are no residual species correlations, which is equivalent to the
|
n.samples |
the number of posterior samples to collect in each chain. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. This will have no impact
on model run times for non-spatial models. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class lfJSDM
that is a list comprised of:
beta.comm.samples |
a |
tau.sq.beta.samples |
a |
beta.samples |
a |
lambda.samples |
a |
psi.samples |
a three-dimensional array of posterior samples for the latent probability of occurrence/detection values for each species. |
sigma.sq.psi.samples |
a |
w.samples |
a three-dimensional array of posterior samples for the latent effects for each latent factor. |
beta.star.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood value associated with each site and species. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
k.fold.deviance |
vector of scoring rules (deviance) from k-fold cross-validation.
A separate value is reported for each species.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection probability
estimated values are not included in the model object, but can be extracted
using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological monographs, 85(1), 3-28.
set.seed(400) J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- rep(1, J) N <- 10 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.6, 1.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.2, 1.7) # Detection # Fix this to be constant and really close to 1. alpha.mean <- c(9) tau.sq.alpha <- c(0.05) p.det <- length(alpha.mean) # Random effects # Include a single random effect psi.RE <- list(levels = c(20), sigma.sq.psi = c(2)) p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } alpha.true <- alpha # Factor model factor.model <- TRUE n.factors <- 4 dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = FALSE, factor.model = TRUE, n.factors = 4) X <- dat$X y <- dat$y X.re <- dat$X.re coords <- dat$coords occ.covs <- cbind(X, X.re) colnames(occ.covs) <- c('int', 'occ.cov.1', 'occ.cov.2', 'occ.re.1') data.list <- list(y = y[, , 1], covs = occ.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1)) inits.list <- list(beta.comm = 0, beta = 0, tau.sq.beta = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- lfJSDM(formula = ~ occ.cov.1 + occ.cov.2 + (1 | occ.re.1), data = data.list, inits = inits.list, priors = prior.list, n.factors = 4, n.samples = 1000, n.report = 500, n.burn = 500, n.thin = 2, n.chains = 1) summary(out)
set.seed(400) J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- rep(1, J) N <- 10 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.6, 1.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.2, 1.7) # Detection # Fix this to be constant and really close to 1. alpha.mean <- c(9) tau.sq.alpha <- c(0.05) p.det <- length(alpha.mean) # Random effects # Include a single random effect psi.RE <- list(levels = c(20), sigma.sq.psi = c(2)) p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } alpha.true <- alpha # Factor model factor.model <- TRUE n.factors <- 4 dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = FALSE, factor.model = TRUE, n.factors = 4) X <- dat$X y <- dat$y X.re <- dat$X.re coords <- dat$coords occ.covs <- cbind(X, X.re) colnames(occ.covs) <- c('int', 'occ.cov.1', 'occ.cov.2', 'occ.re.1') data.list <- list(y = y[, , 1], covs = occ.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1)) inits.list <- list(beta.comm = 0, beta = 0, tau.sq.beta = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- lfJSDM(formula = ~ occ.cov.1 + occ.cov.2 + (1 | occ.re.1), data = data.list, inits = inits.list, priors = prior.list, n.factors = 4, n.samples = 1000, n.report = 500, n.burn = 500, n.thin = 2, n.chains = 1) summary(out)
Function for fitting multi-species occupancy models with species correlations (i.e., a joint species distribution model with imperfect detection). We use Polya-gamma latent variables and a factor modeling approach for dimension reduction.
lfMsPGOcc(occ.formula, det.formula, data, inits, priors, n.factors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
lfMsPGOcc(occ.formula, det.formula, data, inits, priors, n.factors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
n.factors |
the number of factors to use in the latent factor model approach. Typically, the number of factors is set to be small (e.g., 4-5) relative to the total number of species in the community, which will lead to substantial decreases in computation time. However, the value can be anywhere between 1 and N (the number of species in the community). |
n.samples |
the number of posterior samples to collect in each chain. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. This will have no impact
on model run times for non-spatial models. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class lfMsPGOcc
that is a list comprised of:
beta.comm.samples |
a |
alpha.comm.samples |
a |
tau.sq.beta.samples |
a |
tau.sq.alpha.samples |
a |
beta.samples |
a |
alpha.samples |
a |
lambda.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occurrence values for each species. |
psi.samples |
a three-dimensional array of posterior samples for the latent occurrence probability values for each species. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
w.samples |
a three-dimensional array of posterior samples for the latent effects for each latent factor. |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood value associated with each site and species. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
k.fold.deviance |
vector of scoring rules (deviance) from k-fold cross-validation.
A separate value is reported for each species.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can
be extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological monographs, 85(1), 3-28.
Dorazio, R. M., and Royle, J. A. (2005). Estimating size and composition of biological communities by modeling the occurrence of species. Journal of the American Statistical Association, 100(470), 389-398.
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 8 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) p.RE <- list() # Include a random intercept on detection p.RE <- list(levels = c(40), sigma.sq.p = c(2)) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 4 dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE, factor.model = TRUE, n.factors = n.factors, p.RE = p.RE) y <- dat$y X <- dat$X X.p <- dat$X.p X.p.re <- dat$X.p.re # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3], det.re = X.p.re[, , 1]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = dat$coords) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) # Initial values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, lambda = lambda.inits, z = apply(y, c(1, 2), max, na.rm = TRUE)) n.samples <- 300 n.burn <- 200 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- lfMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2 + (1 | det.re), data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.factors = n.factors, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out, level = 'community')
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 8 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) p.RE <- list() # Include a random intercept on detection p.RE <- list(levels = c(40), sigma.sq.p = c(2)) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 4 dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE, factor.model = TRUE, n.factors = n.factors, p.RE = p.RE) y <- dat$y X <- dat$X X.p <- dat$X.p X.p.re <- dat$X.p.re # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3], det.re = X.p.re[, , 1]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = dat$coords) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) # Initial values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, lambda = lambda.inits, z = apply(y, c(1, 2), max, na.rm = TRUE)) n.samples <- 300 n.burn <- 200 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- lfMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2 + (1 | det.re), data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.factors = n.factors, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out, level = 'community')
Function for fitting multi-species occupancy models using Polya-Gamma latent variables.
msPGOcc(occ.formula, det.formula, data, inits, priors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
msPGOcc(occ.formula, det.formula, data, inits, priors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
n.samples |
the number of posterior samples to collect in each chain. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. This will have no impact
on model run times for non-spatial models. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class msPGOcc
that is a list comprised of:
beta.comm.samples |
a |
alpha.comm.samples |
a |
tau.sq.beta.samples |
a |
tau.sq.alpha.samples |
a |
beta.samples |
a |
alpha.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occurrence values for each species. |
psi.samples |
a three-dimensional array of posterior samples for the latent occurrence probability values for each species. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood value associated with each site and species. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
k.fold.deviance |
vector of scoring rules (deviance) from k-fold cross-validation.
A separate value is reported for each species.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection probability
estimated values are not included in the model object, but can be extracted
using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological monographs, 85(1), 3-28.
Dorazio, R. M., and Royle, J. A. (2005). Estimating size and composition of biological communities by modeling the occurrence of species. Journal of the American Statistical Association, 100(470), 389-398.
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE) y <- dat$y X <- dat$X X.p <- dat$X.p # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) # Initial values inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = apply(y, c(1, 2), max, na.rm = TRUE)) n.samples <- 3000 n.burn <- 2000 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- msPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out, level = 'community')
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE) y <- dat$y X <- dat$X X.p <- dat$X.p # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) # Initial values inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = apply(y, c(1, 2), max, na.rm = TRUE)) n.samples <- 3000 n.burn <- 2000 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- msPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out, level = 'community')
Detection-nondetection data of 12 foliage gleaning bird species in 2015 in the Bartlett Experimental Forest in New Hampshire, USA. These data were collected as part of the National Ecological Observatory Network (NEON). Data were collected at 80 sites where observers recorded the number of all bird species observed during a six minute, 125m radius point count survey once during the breeding season. The six minute survey was split into three two-minute intervals following a removal design where the observer recorded the interval during which a species was first observed (if any) with a 1, intervals prior to observation with a 0, and then mentally removed the species from subsequent intervals (marked with NA), which enables modeling of data in an occupancy modeling framework. The 12 species included in the data set are as follows: (1) AMRE: American Redstart; (2) BAWW: Black-and-white Warbler; (3) BHVI: Blue-headed Vireo; (4) BLBW: Blackburnian Warbler; (5) BLPW: Blackpoll Warbler; (6) BTBW: Black-throated Blue Warbler; (7) BTNW: BLack-throated Green Warbler; (8) CAWA: Canada Warbler; (9) MAWA: Magnolia Warbler; (10) NAWA: Nashville Warbler; (11) OVEN: Ovenbird; (12) REVI: Red-eyed Vireo.
data(neon2015)
data(neon2015)
neon2015
is a list with four elements:
y
: a three-dimensional array of detection-nondetection data with
dimensions of species (12), sites (80) and replicates (3).
occ.covs
: a numeric matrix with 80 rows and one column consisting of the
elevation at each site.
det.covs
: a list of two numeric vectors with 80 elements. The
first element is the day of year when the survey was conducted for a given
site. The second element is the time of day when the survey began.
coords
: a numeric matrix with 80 rows and two columns containing the
site coordinates (Easting and Northing) in UTM Zone 19. The proj4string is
"+proj=utm +zone=19 +units=m +datum=NAD83".
NEON (National Ecological Observatory Network). Breeding landbird point counts, RELEASE-2021 (DP1.10003.001). https://doi.org/10.48443/s730-dy13. Dataset accessed from https://data.neonscience.org on October 10, 2021
Doser, J. W., Leuenberger, W., Sillett, T. S., Hallworth, M. T. & Zipkin, E. F. (2022). Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources. Methods in Ecology and Evolution, 00, 1-14. doi:10.1111/2041-210X.13811
Barnett, D. T., Duffy, P. A., Schimel, D. S., Krauss, R. E., Irvine, K. M., Davis, F. W.,Gross, J. E., Azuaje, E. I., Thorpe, A. S., Gudex-Cross, D., et al. (2019). The terrestrial organism and biogeochemistry spatial sampling design for the national ecological observatory network. Ecosphere, 10(2):e02540.
Function for fitting single-species occupancy models using Polya-Gamma latent variables.
PGOcc(occ.formula, det.formula, data, inits, priors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
PGOcc(occ.formula, det.formula, data, inits, priors, n.samples, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
n.samples |
the number of posterior samples to collect in each chain. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within-chains. This will have no impact
on model run time for non-spatial models. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class PGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a |
psi.samples |
a |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
scoring rule (deviance) from k-fold cross-validation.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can be
extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological monographs, 85(1), 3-28.
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
set.seed(400) J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, -0.15) p.occ <- length(beta) alpha <- c(0.7, 0.4) p.det <- length(alpha) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) occ.covs <- dat$X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov = dat$X.p[, , 2]) # Data bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72)) # Initial values inits.list <- list(alpha = 0, beta = 0, z = apply(data.list$y, 1, max, na.rm = TRUE)) n.samples <- 5000 n.report <- 1000 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- PGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = n.report, n.burn = 1000, n.thin = 1, n.chains = 1) summary(out)
set.seed(400) J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, -0.15) p.occ <- length(beta) alpha <- c(0.7, 0.4) p.det <- length(alpha) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) occ.covs <- dat$X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov = dat$X.p[, , 2]) # Data bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72)) # Initial values inits.list <- list(alpha = 0, beta = 0, z = apply(data.list$y, 1, max, na.rm = TRUE)) n.samples <- 5000 n.report <- 1000 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- PGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = n.report, n.burn = 1000, n.thin = 1, n.chains = 1) summary(out)
Function for fitting a linear (mixed) model as a second-stage model where the response variable itself comes from a previous model fit and has uncertainty associated with it. The response variable is assumed to be a set of estimates from a previous model fit, where each value in the response variable has a posterior MCMC sample of estimates. This function is useful for doing "posthoc" analyses of model estimates (e.g., exploring how species traits relate to species-specific parameter estimates from a multi-species occupancy model). Such analyses are sometimes referred to as "two-stage" analyses.
postHocLM(formula, data, inits, priors, verbose = FALSE, n.report = 100, n.samples, n.chains = 1, ...)
postHocLM(formula, data, inits, priors, verbose = FALSE, n.report = 100, n.samples, n.chains = 1, ...)
formula |
a symbolic description of the model to be fit for the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
verbose |
if |
n.report |
the interval to report MCMC progress. |
n.samples |
the number of posterior samples to collect in each chain. Note that
by default, the same number of MCMC samples fit in the first stage model is
assumed to be fit for the second stage model. If |
n.chains |
the number of chains to run in sequence. |
... |
currently no additional arguments |
An object of class postHocLM
that is a list comprised of:
beta.samples |
a |
tau.sq.samples |
a |
y.hat.samples |
a |
sigma.sq.samples |
a |
beta.star.samples |
a |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
bayes.R2 |
a |
The return object will include additional objects used for subsequent summarization.
Jeffrey W. Doser [email protected],
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
# Simulate Data ----------------------------------------------------------- set.seed(100) N <- 100 beta <- c(0, 0.5, 1.2) tau.sq <- 1 p <- length(beta) X <- matrix(1, nrow = N, ncol = p) if (p > 1) { for (i in 2:p) { X[, i] <- rnorm(N) } # i } mu <- X[, 1] * beta[1] + X[, 2] * beta[2] + X[, 3] * beta[3] y <- rnorm(N, mu, sqrt(tau.sq)) # Replicate y n.samples times and add a small amount of noise that corresponds # to uncertainty from a first stage model. n.samples <- 1000 y <- matrix(y, n.samples, N, byrow = TRUE) y <- y + rnorm(length(y), 0, 0.25) # Package data for use with postHocLM ------------------------------------- colnames(X) <- c('int', 'cov.1', 'cov.2') data.list <- list(y = y, covs = X) data <- data.list inits <- list(beta = 0, tau.sq = 1) priors <- list(beta.normal = list(mean = 0, var = 10000), tau.sq.ig = c(0.001, 0.001)) # Run the model ----------------------------------------------------------- out <- postHocLM(formula = ~ cov.1 + cov.2, inits = inits, data = data.list, priors = priors, verbose = FALSE, n.chains = 1) summary(out)
# Simulate Data ----------------------------------------------------------- set.seed(100) N <- 100 beta <- c(0, 0.5, 1.2) tau.sq <- 1 p <- length(beta) X <- matrix(1, nrow = N, ncol = p) if (p > 1) { for (i in 2:p) { X[, i] <- rnorm(N) } # i } mu <- X[, 1] * beta[1] + X[, 2] * beta[2] + X[, 3] * beta[3] y <- rnorm(N, mu, sqrt(tau.sq)) # Replicate y n.samples times and add a small amount of noise that corresponds # to uncertainty from a first stage model. n.samples <- 1000 y <- matrix(y, n.samples, N, byrow = TRUE) y <- y + rnorm(length(y), 0, 0.25) # Package data for use with postHocLM ------------------------------------- colnames(X) <- c('int', 'cov.1', 'cov.2') data.list <- list(y = y, covs = X) data <- data.list inits <- list(beta = 0, tau.sq = 1) priors <- list(beta.normal = list(mean = 0, var = 10000), tau.sq.ig = c(0.001, 0.001)) # Run the model ----------------------------------------------------------- out <- postHocLM(formula = ~ cov.1 + cov.2, inits = inits, data = data.list, priors = priors, verbose = FALSE, n.chains = 1) summary(out)
Function for performing posterior predictive checks on spOccupancy
model objects.
ppcOcc(object, fit.stat, group, ...)
ppcOcc(object, fit.stat, group, ...)
object |
an object of class |
fit.stat |
a quoted keyword that specifies the fit statistic
to use in the posterior predictive check. Supported fit statistics are
|
group |
a positive integer indicating the way to group the detection-nondetection data for the posterior predictive check. Value 1 will group values by row (site) and value 2 will group values by column (replicate). |
... |
currently no additional arguments |
Standard GoF assessments are not valid for binary data, and posterior predictive checks must be performed on some sort of binned data.
An object of class ppcOcc
that is a list comprised of:
fit.y |
a numeric vector of posterior samples for the
fit statistic calculated on the observed data when |
fit.y.rep |
a numeric vector of posterior samples for the
fit statistic calculated on a replicate data set generated from the
model when |
fit.y.group.quants |
a matrix consisting of posterior quantiles
for the fit statistic using the observed data for each unique element
the fit statistic is calculated for (i.e., sites when group = 1,
replicates when group = 2) when |
fit.y.rep.group.quants |
a matrix consisting of posterior quantiles
for the fit statistic using the model replicated data for each unique element
the fit statistic is calculated for (i.e., sites when group = 1,
replicates when group = 2) when |
The return object will include additional objects used for standard extractor functions.
Jeffrey W. Doser [email protected],
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, -0.15) p.occ <- length(beta) alpha <- c(0.7, 0.4) p.det <- length(alpha) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) occ.covs <- dat$X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov = dat$X.p[, , 2]) # Data bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72)) # Initial values inits.list <- list(alpha = 0, beta = 0, z = apply(data.list$y, 1, max, na.rm = TRUE)) n.samples <- 5000 n.report <- 1000 out <- PGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = n.report, n.burn = 4000, n.thin = 1) # Posterior predictive check ppc.out <- ppcOcc(out, fit.stat = 'chi-squared', group = 1) summary(ppc.out)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, -0.15) p.occ <- length(beta) alpha <- c(0.7, 0.4) p.det <- length(alpha) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) occ.covs <- dat$X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov = dat$X.p[, , 2]) # Data bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72)) # Initial values inits.list <- list(alpha = 0, beta = 0, z = apply(data.list$y, 1, max, na.rm = TRUE)) n.samples <- 5000 n.report <- 1000 out <- PGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = n.report, n.burn = 4000, n.thin = 1) # Posterior predictive check ppc.out <- ppcOcc(out, fit.stat = 'chi-squared', group = 1) summary(ppc.out)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'intMsPGOcc'. Prediction is currently possible only for the latent occupancy state.
## S3 method for class 'intMsPGOcc' predict(object, X.0, ignore.RE = FALSE, ...)
## S3 method for class 'intMsPGOcc' predict(object, X.0, ignore.RE = FALSE, ...)
object |
an object of class intMsPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, random effects will be ignored and prediction will only use the fixed effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the random effect. |
... |
currently no additional arguments |
A list object of class predict.intMsPGOcc
consisting of:
psi.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence probability values. |
z.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence values. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(91) J.x <- 10 J.y <- 10 # Total number of data sources across the study region J.all <- J.x * J.y # Number of data sources. n.data <- 2 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) n.rep <- list() n.rep[[1]] <- rep(3, J.obs[1]) n.rep[[2]] <- rep(4, J.obs[2]) # Number of species observed in each data source N <- c(8, 3) # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.4, 0.3) # Detection # Detection covariates alpha.mean <- list() tau.sq.alpha <- list() # Number of detection parameters in each data source p.det.long <- c(4, 3) for (i in 1:n.data) { alpha.mean[[i]] <- runif(p.det.long[i], -1, 1) tau.sq.alpha[[i]] <- runif(p.det.long[i], 0.1, 1) } # Random effects psi.RE <- list() p.RE <- list() beta <- matrix(NA, nrow = max(N), ncol = p.occ) for (i in 1:p.occ) { beta[, i] <- rnorm(max(N), beta.mean[i], sqrt(tau.sq.beta[i])) } alpha <- list() for (i in 1:n.data) { alpha[[i]] <- matrix(NA, nrow = N[i], ncol = p.det.long[i]) for (t in 1:p.det.long[i]) { alpha[[i]][, t] <- rnorm(N[i], alpha.mean[[i]][t], sqrt(tau.sq.alpha[[i]])[t]) } } sp <- FALSE factor.model <- FALSE # Simulate occupancy data dat <- simIntMsOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = sp, factor.model = factor.model, n.factors = n.factors) J <- nrow(dat$coords.obs) y <- dat$y X <- dat$X.obs X.p <- dat$X.p X.re <- dat$X.re.obs X.p.re <- dat$X.p.re sites <- dat$sites species <- dat$species # Package all data into a list occ.covs <- cbind(X) colnames(occ.covs) <- c('int', 'occ.cov.1') #colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2], det.cov.1.2 = X.p[[1]][, , 3], det.cov.1.3 = X.p[[1]][, , 4]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2], det.cov.2.2 = X.p[[2]][, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, species = species) # Take a look at the data.list structure for integrated multi-species # occupancy models. # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.73), alpha.comm.normal = list(mean = list(0, 0), var = list(2.72, 2.72)), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = list(0.1, 0.1), b = list(0.1, 0.1))) inits.list <- list(alpha.comm = list(0, 0), beta.comm = 0, tau.sq.beta = 1, tau.sq.alpha = list(1, 1), alpha = list(a = matrix(rnorm(p.det.long[1] * N[1]), N[1], p.det.long[1]), b = matrix(rnorm(p.det.long[2] * N[2]), N[2], p.det.long[2])), beta = 0) # Fit the model. # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- intMsPGOcc(occ.formula = ~ occ.cov.1, det.formula = list(f.1 = ~ det.cov.1.1 + det.cov.1.2 + det.cov.1.3, f.2 = ~ det.cov.2.1 + det.cov.2.2), inits = inits.list, priors = prior.list, data = data.list, n.samples = 100, n.omp.threads = 1, verbose = TRUE, n.report = 10, n.burn = 50, n.thin = 1, n.chains = 1) #Predict at new locations. X.0 <- dat$X.pred psi.0 <- dat$psi.pred out.pred <- predict(out, X.0, ignore.RE = TRUE) # Create prediction for one species. curr.sp <- 2 psi.hat.quants <- apply(out.pred$psi.0.samples[,curr.sp, ], 2, quantile, c(0.025, 0.5, 0.975)) plot(psi.0[curr.sp, ], psi.hat.quants[2, ], pch = 19, xlab = 'True', ylab = 'Predicted', ylim = c(min(psi.hat.quants), max(psi.hat.quants)), main = paste("Species ", curr.sp, sep = '')) segments(psi.0[curr.sp, ], psi.hat.quants[1, ], psi.0[curr.sp, ], psi.hat.quants[3, ]) lines(psi.0[curr.sp, ], psi.0[curr.sp, ])
set.seed(91) J.x <- 10 J.y <- 10 # Total number of data sources across the study region J.all <- J.x * J.y # Number of data sources. n.data <- 2 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) n.rep <- list() n.rep[[1]] <- rep(3, J.obs[1]) n.rep[[2]] <- rep(4, J.obs[2]) # Number of species observed in each data source N <- c(8, 3) # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.4, 0.3) # Detection # Detection covariates alpha.mean <- list() tau.sq.alpha <- list() # Number of detection parameters in each data source p.det.long <- c(4, 3) for (i in 1:n.data) { alpha.mean[[i]] <- runif(p.det.long[i], -1, 1) tau.sq.alpha[[i]] <- runif(p.det.long[i], 0.1, 1) } # Random effects psi.RE <- list() p.RE <- list() beta <- matrix(NA, nrow = max(N), ncol = p.occ) for (i in 1:p.occ) { beta[, i] <- rnorm(max(N), beta.mean[i], sqrt(tau.sq.beta[i])) } alpha <- list() for (i in 1:n.data) { alpha[[i]] <- matrix(NA, nrow = N[i], ncol = p.det.long[i]) for (t in 1:p.det.long[i]) { alpha[[i]][, t] <- rnorm(N[i], alpha.mean[[i]][t], sqrt(tau.sq.alpha[[i]])[t]) } } sp <- FALSE factor.model <- FALSE # Simulate occupancy data dat <- simIntMsOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = sp, factor.model = factor.model, n.factors = n.factors) J <- nrow(dat$coords.obs) y <- dat$y X <- dat$X.obs X.p <- dat$X.p X.re <- dat$X.re.obs X.p.re <- dat$X.p.re sites <- dat$sites species <- dat$species # Package all data into a list occ.covs <- cbind(X) colnames(occ.covs) <- c('int', 'occ.cov.1') #colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2], det.cov.1.2 = X.p[[1]][, , 3], det.cov.1.3 = X.p[[1]][, , 4]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2], det.cov.2.2 = X.p[[2]][, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, species = species) # Take a look at the data.list structure for integrated multi-species # occupancy models. # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.73), alpha.comm.normal = list(mean = list(0, 0), var = list(2.72, 2.72)), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = list(0.1, 0.1), b = list(0.1, 0.1))) inits.list <- list(alpha.comm = list(0, 0), beta.comm = 0, tau.sq.beta = 1, tau.sq.alpha = list(1, 1), alpha = list(a = matrix(rnorm(p.det.long[1] * N[1]), N[1], p.det.long[1]), b = matrix(rnorm(p.det.long[2] * N[2]), N[2], p.det.long[2])), beta = 0) # Fit the model. # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- intMsPGOcc(occ.formula = ~ occ.cov.1, det.formula = list(f.1 = ~ det.cov.1.1 + det.cov.1.2 + det.cov.1.3, f.2 = ~ det.cov.2.1 + det.cov.2.2), inits = inits.list, priors = prior.list, data = data.list, n.samples = 100, n.omp.threads = 1, verbose = TRUE, n.report = 10, n.burn = 50, n.thin = 1, n.chains = 1) #Predict at new locations. X.0 <- dat$X.pred psi.0 <- dat$psi.pred out.pred <- predict(out, X.0, ignore.RE = TRUE) # Create prediction for one species. curr.sp <- 2 psi.hat.quants <- apply(out.pred$psi.0.samples[,curr.sp, ], 2, quantile, c(0.025, 0.5, 0.975)) plot(psi.0[curr.sp, ], psi.hat.quants[2, ], pch = 19, xlab = 'True', ylab = 'Predicted', ylim = c(min(psi.hat.quants), max(psi.hat.quants)), main = paste("Species ", curr.sp, sep = '')) segments(psi.0[curr.sp, ], psi.hat.quants[1, ], psi.0[curr.sp, ], psi.hat.quants[3, ]) lines(psi.0[curr.sp, ], psi.0[curr.sp, ])
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'intPGOcc'.
## S3 method for class 'intPGOcc' predict(object, X.0, ignore.RE = FALSE, type = 'occupancy', ...)
## S3 method for class 'intPGOcc' predict(object, X.0, ignore.RE = FALSE, type = 'occupancy', ...)
object |
an object of class intPGOcc |
X.0 |
the design matrix for prediction locations. This should include a column of 1s for the intercept. Covariates should have the same column names as those used when fitting the model with |
ignore.RE |
logical value that specifies whether or not to remove random occurrence (or detection if |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. Note that prediction of detection probability is not currently supported for integrated models. |
... |
currently no additional arguments |
An object of class predict.intPGOcc
that is a list comprised of:
psi.0.samples |
a |
z.0.samples |
a |
The return object will include additional objects used for standard extractor functions.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(1008) # Simulate Data ----------------------------------------------------------- J.x <- 10 J.y <- 10 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 1) p.occ <- length(beta) # Detection covariates alpha <- list() for (i in 1:n.data) { alpha[[i]] <- runif(2, -1, 1) } p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Simulate occupancy data. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , 2]) det.covs[[4]] <- list(det.cov.4.1 = X.p[[4]][, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites) J <- length(dat$z.obs) # Initial values inits.list <- list(alpha = list(0, 0, 0, 0), beta = 0, z = rep(1, J)) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = list(0, 0, 0, 0), var = list(2.72, 2.72, 2.72, 2.72))) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. n.samples <- 5000 out <- intPGOcc(occ.formula = ~ occ.cov, det.formula = list(f.1 = ~ det.cov.1.1, f.2 = ~ det.cov.2.1, f.3 = ~ det.cov.3.1, f.4 = ~ det.cov.4.1), data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 4000, n.thin = 1) summary(out) # Prediction X.0 <- dat$X.pred psi.0 <- dat$psi.pred out.pred <- predict(out, X.0) psi.hat.quants <- apply(out.pred$psi.0.samples, 2, quantile, c(0.025, 0.5, 0.975)) plot(psi.0, psi.hat.quants[2, ], pch = 19, xlab = 'True', ylab = 'Fitted', ylim = c(min(psi.hat.quants), max(psi.hat.quants))) segments(psi.0, psi.hat.quants[1, ], psi.0, psi.hat.quants[3, ]) lines(psi.0, psi.0)
set.seed(1008) # Simulate Data ----------------------------------------------------------- J.x <- 10 J.y <- 10 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 1) p.occ <- length(beta) # Detection covariates alpha <- list() for (i in 1:n.data) { alpha[[i]] <- runif(2, -1, 1) } p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Simulate occupancy data. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , 2]) det.covs[[4]] <- list(det.cov.4.1 = X.p[[4]][, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites) J <- length(dat$z.obs) # Initial values inits.list <- list(alpha = list(0, 0, 0, 0), beta = 0, z = rep(1, J)) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = list(0, 0, 0, 0), var = list(2.72, 2.72, 2.72, 2.72))) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. n.samples <- 5000 out <- intPGOcc(occ.formula = ~ occ.cov, det.formula = list(f.1 = ~ det.cov.1.1, f.2 = ~ det.cov.2.1, f.3 = ~ det.cov.3.1, f.4 = ~ det.cov.4.1), data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 4000, n.thin = 1) summary(out) # Prediction X.0 <- dat$X.pred psi.0 <- dat$psi.pred out.pred <- predict(out, X.0) psi.hat.quants <- apply(out.pred$psi.0.samples, 2, quantile, c(0.025, 0.5, 0.975)) plot(psi.0, psi.hat.quants[2, ], pch = 19, xlab = 'True', ylab = 'Fitted', ylim = c(min(psi.hat.quants), max(psi.hat.quants))) segments(psi.0, psi.hat.quants[1, ], psi.0, psi.hat.quants[3, ]) lines(psi.0, psi.0)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'lfJSDM'.
## S3 method for class 'lfJSDM' predict(object, X.0, coords.0, ignore.RE = FALSE, ...)
## S3 method for class 'lfJSDM' predict(object, X.0, coords.0, ignore.RE = FALSE, ...)
object |
an object of class lfJSDM |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the model, the levels of the random effects at the new locations should be included as a column in the design matrix. The ordering of the levels should match the ordering used to fit the data in |
coords.0 |
the spatial coordinates corresponding to |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, random effects will be ignored and prediction will only use the fixed effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the random effect. |
... |
currently no additional arguments |
A list object of class predict.lfJSDM
that consists of:
psi.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence probability values. |
z.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence values. |
w.0.samples |
a three-dimensional array of posterior predictive samples for the latent factors. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 3 dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE, factor.model = TRUE, n.factors = n.factors) n.samples <- 5000 # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) # Summarize the multiple replicates into a single value for use in a JSDM y <- apply(dat$y[, -pred.indx, ], c(1, 2), max, na.rm = TRUE) # Covariates X <- dat$X[-pred.indx, ] # Spatial coordinates coords <- dat$coords[-pred.indx, ] # Prediction values X.0 <- dat$X[pred.indx, ] psi.0 <- dat$psi[, pred.indx] coords.0 <- dat$coords[pred.indx, ] # Package all data into a list covs <- X[, 2, drop = FALSE] colnames(covs) <- c('occ.cov') data.list <- list(y = y, covs = covs, coords = coords) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1)) # Initial values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, tau.sq.beta = 1, lambda = lambda.inits) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- lfJSDM(formula = ~ occ.cov, data = data.list, inits = inits.list, n.samples = n.samples, n.factors = 3, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 4000) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0)
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 3 dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE, factor.model = TRUE, n.factors = n.factors) n.samples <- 5000 # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) # Summarize the multiple replicates into a single value for use in a JSDM y <- apply(dat$y[, -pred.indx, ], c(1, 2), max, na.rm = TRUE) # Covariates X <- dat$X[-pred.indx, ] # Spatial coordinates coords <- dat$coords[-pred.indx, ] # Prediction values X.0 <- dat$X[pred.indx, ] psi.0 <- dat$psi[, pred.indx] coords.0 <- dat$coords[pred.indx, ] # Package all data into a list covs <- X[, 2, drop = FALSE] colnames(covs) <- c('occ.cov') data.list <- list(y = y, covs = covs, coords = coords) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1)) # Initial values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, tau.sq.beta = 1, lambda = lambda.inits) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- lfJSDM(formula = ~ occ.cov, data = data.list, inits = inits.list, n.samples = n.samples, n.factors = 3, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 4000) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'lfMsPGOcc'. Prediction is possible for both the latent occupancy state as well as detection.
## S3 method for class 'lfMsPGOcc' predict(object, X.0, coords.0, ignore.RE = FALSE, type = 'occupancy', include.w = TRUE, ...)
## S3 method for class 'lfMsPGOcc' predict(object, X.0, coords.0, ignore.RE = FALSE, type = 'occupancy', include.w = TRUE, ...)
object |
an object of class lfMsPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, random effects will be ignored and prediction will only use the fixed effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the random effect. |
... |
currently no additional arguments |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
include.w |
a logical value used to indicate whether the latent factors should be included in the predictions. By default, this is set to |
A list object of class predict.lfMsPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence probability values. |
z.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence values. |
w.0.samples |
a three-dimensional array of posterior predictive samples for the latent factors. |
When type = 'detection'
, the list consists of:
p.0.samples |
a three-dimensional array of posterior predictive samples for the detection probability values. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 3 dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE, factor.model = TRUE, n.factors = n.factors) n.samples <- 5000 # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Spatial coordinates coords <- dat$coords[-pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] psi.0 <- dat$psi[, pred.indx] coords.0 <- dat$coords[pred.indx, ] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) # Initial values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, lambda = lambda.inits, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- lfMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.samples = n.samples, n.factors = 3, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 4000) summary(out, level = 'community') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0)
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 3 dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE, factor.model = TRUE, n.factors = n.factors) n.samples <- 5000 # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Spatial coordinates coords <- dat$coords[-pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] psi.0 <- dat$psi[, pred.indx] coords.0 <- dat$coords[pred.indx, ] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) # Initial values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, lambda = lambda.inits, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- lfMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.samples = n.samples, n.factors = 3, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 4000) summary(out, level = 'community') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'msPGOcc'. Prediction is possible for both the latent occupancy state as well as detection.
## S3 method for class 'msPGOcc' predict(object, X.0, ignore.RE = FALSE, type = 'occupancy', ...)
## S3 method for class 'msPGOcc' predict(object, X.0, ignore.RE = FALSE, type = 'occupancy', ...)
object |
an object of class msPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, random effects will be ignored and prediction will only use the fixed effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the random effect. |
... |
currently no additional arguments |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
A list object of class predict.msPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence probability values. |
z.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence values. |
When type = 'detection'
, the list consists of:
p.0.samples |
a three-dimensional array of posterior predictive samples for the detection probability values. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE) n.samples <- 5000 # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] psi.0 <- dat$psi[, pred.indx] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) # Initial values inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- msPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 4000) summary(out, level = 'community') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0)
set.seed(400) J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -0.1) tau.sq.alpha <- c(0.2, 0.3, 1) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp = FALSE) n.samples <- 5000 # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] psi.0 <- dat$psi[, pred.indx] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Occupancy initial values prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) # Initial values inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- msPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1000, n.burn = 4000) summary(out, level = 'community') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'PGOcc'. Prediction is possible for both the latent occupancy state as well as detection.
## S3 method for class 'PGOcc' predict(object, X.0, ignore.RE = FALSE, type = 'occupancy', ...)
## S3 method for class 'PGOcc' predict(object, X.0, ignore.RE = FALSE, type = 'occupancy', ...)
object |
an object of class PGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
ignore.RE |
logical value that specifies whether or not to remove random occurrence (or detection if |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
... |
currently no additional arguments |
A list object of class predict.PGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a |
z.0.samples |
a |
When type = 'detection'
, the list consists of:
p.0.samples |
a |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Prediction covariates X.0 <- dat$X[pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = rep(0, p.occ), var = rep(2.72, p.occ)), alpha.normal = list(mean = rep(0, p.det), var = rep(2.72, p.det))) # Initial values inits.list <- list(alpha = rep(0, p.det), beta = rep(0, p.occ), z = apply(y, 1, max, na.rm = TRUE)) n.samples <- 5000 n.report <- 1000 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- PGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = n.report, n.burn = 4000, n.thin = 1) summary(out) # Predict at new locations ------------------------------------------------ colnames(X.0) <- c('intercept', 'occ.cov') out.pred <- predict(out, X.0) psi.0.quants <- apply(out.pred$psi.0.samples, 2, quantile, c(0.025, 0.5, 0.975)) plot(dat$psi[pred.indx], psi.0.quants[2, ], pch = 19, xlab = 'True', ylab = 'Fitted', ylim = c(min(psi.0.quants), max(psi.0.quants))) segments(dat$psi[pred.indx], psi.0.quants[1, ], dat$psi[pred.indx], psi.0.quants[3, ]) lines(dat$psi[pred.indx], dat$psi[pred.indx])
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Prediction covariates X.0 <- dat$X[pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = rep(0, p.occ), var = rep(2.72, p.occ)), alpha.normal = list(mean = rep(0, p.det), var = rep(2.72, p.det))) # Initial values inits.list <- list(alpha = rep(0, p.det), beta = rep(0, p.occ), z = apply(y, 1, max, na.rm = TRUE)) n.samples <- 5000 n.report <- 1000 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- PGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = n.report, n.burn = 4000, n.thin = 1) summary(out) # Predict at new locations ------------------------------------------------ colnames(X.0) <- c('intercept', 'occ.cov') out.pred <- predict(out, X.0) psi.0.quants <- apply(out.pred$psi.0.samples, 2, quantile, c(0.025, 0.5, 0.975)) plot(dat$psi[pred.indx], psi.0.quants[2, ], pch = 19, xlab = 'True', ylab = 'Fitted', ylim = c(min(psi.0.quants), max(psi.0.quants))) segments(dat$psi[pred.indx], psi.0.quants[1, ], dat$psi[pred.indx], psi.0.quants[3, ]) lines(dat$psi[pred.indx], dat$psi[pred.indx])
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'sfJSDM'.
## S3 method for class 'sfJSDM' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, ...)
## S3 method for class 'sfJSDM' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, ...)
object |
an object of class sfJSDM |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the model, the levels of the random effects at the new locations should be included as a column in the design matrix. The ordering of the levels should match the ordering used to fit the data in |
coords.0 |
the spatial coordinates corresponding to |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report sampling progress. |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, unstructured random effects will be ignored and prediction will only use the fixed effects and the spatial random effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the unstructured random effects. |
... |
currently no additional arguments |
An list object of class predict.sfJSDM
that consists of:
psi.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence probability values. |
z.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence values. |
w.0.samples |
a three-dimensional array of posterior predictive samples for the latent spatial factors. |
run.time |
execution time reported using |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 5 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 3 phi <- runif(n.factors, 3/1, 3/.4) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sigma.sq = sigma.sq, sp = TRUE, cov.model = 'exponential', factor.model = TRUE, n.factors = n.factors) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) # Summarize the multiple replicates into a single value for use in a JSDM y <- apply(dat$y[, -pred.indx, ], c(1, 2), max, na.rm = TRUE) # Occupancy covariates X <- dat$X[-pred.indx, ] # Coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) # Prediction values X.0 <- dat$X[pred.indx, ] coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[, pred.indx] # Package all data into a list covs <- X[, 2, drop = FALSE] colnames(covs) <- c('occ.cov') data.list <- list(y = y, covs = covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1)) # Starting values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(beta.comm = 0, beta = 0, tau.sq.beta = 1, phi = 3 / .5, sigma.sq = 2, lambda = lambda.inits) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- sfJSDM(formula = ~ occ.cov, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, n.factors = 3, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 100, n.thin = 1) summary(out, level = 'both') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 5 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 3 phi <- runif(n.factors, 3/1, 3/.4) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sigma.sq = sigma.sq, sp = TRUE, cov.model = 'exponential', factor.model = TRUE, n.factors = n.factors) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) # Summarize the multiple replicates into a single value for use in a JSDM y <- apply(dat$y[, -pred.indx, ], c(1, 2), max, na.rm = TRUE) # Occupancy covariates X <- dat$X[-pred.indx, ] # Coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) # Prediction values X.0 <- dat$X[pred.indx, ] coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[, pred.indx] # Package all data into a list covs <- X[, 2, drop = FALSE] colnames(covs) <- c('occ.cov') data.list <- list(y = y, covs = covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1)) # Starting values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(beta.comm = 0, beta = 0, tau.sq.beta = 1, phi = 3 / .5, sigma.sq = 2, lambda = lambda.inits) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- sfJSDM(formula = ~ occ.cov, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, n.factors = 3, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 100, n.thin = 1) summary(out, level = 'both') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'sfMsPGOcc'. Prediction is possible for both the latent occupancy state as well as detection.
## S3 method for class 'sfMsPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
## S3 method for class 'sfMsPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
object |
an object of class sfMsPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report sampling progress. |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, unstructured random effects will be ignored and prediction will only use the fixed effects and the spatial random effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the unstructured random effects. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
grid.index.0 |
an indexing vector used to specify how each row in |
... |
currently no additional arguments |
An list object of class predict.sfMsPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence probability values. |
z.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence values. |
w.0.samples |
a three-dimensional array of posterior predictive samples for the latent spatial factors. |
run.time |
execution time reported using |
When type = 'detection'
, the list consists of:
p.0.samples |
a three-dimensional array of posterior predictive samples for the detection probability values. |
run.time |
execution time reported using |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 5 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 3 phi <- runif(n.factors, 3/1, 3/.4) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sigma.sq = sigma.sq, sp = TRUE, cov.model = 'exponential', factor.model = TRUE, n.factors = n.factors) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[, pred.indx] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1), sigma.sq.ig = list(a = 2, b = 2)) # Starting values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, sigma.sq = 2, lambda = lambda.inits, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- sfMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, n.factors = 3, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 100, n.thin = 1) summary(out, level = 'both') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 5 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 3 phi <- runif(n.factors, 3/1, 3/.4) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sigma.sq = sigma.sq, sp = TRUE, cov.model = 'exponential', factor.model = TRUE, n.factors = n.factors) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[, pred.indx] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1), sigma.sq.ig = list(a = 2, b = 2)) # Starting values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, sigma.sq = 2, lambda = lambda.inits, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- sfMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, n.factors = 3, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 100, n.thin = 1) summary(out, level = 'both') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'spIntPGOcc'.
## S3 method for class 'spIntPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', ...)
## S3 method for class 'spIntPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', ...)
object |
an object of class |
X.0 |
the design matrix for prediction locations. This should include a column of 1s for the intercept. Covariates should have the same column names as those used when fitting the model with |
coords.0 |
the spatial coordinates corresponding to |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report sampling progress. |
ignore.RE |
logical value that specifies whether or not to remove random occurrence (or detection if |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. Note that prediction of detection probability is not currently supported for integrated models. |
... |
currently no additional arguments |
An object of class predict.spIntPGOcc
that is a list comprised of:
psi.0.samples |
a |
z.0.samples |
a |
The return object will include additional objects used for standard extractor functions.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Hooten, M. B., and Hefley, T. J. (2019). Bringing Bayesian models to life. CRC Press.
set.seed(400) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 8 J.y <- 8 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 0.5) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- runif(2, 0, 1) alpha[[2]] <- runif(3, 0, 1) alpha[[3]] <- runif(2, -1, 1) alpha[[4]] <- runif(4, -1, 1) p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) sigma.sq <- 2 phi <- 3 / .5 sp <- TRUE # Simulate occupancy data. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = sp, phi = phi, sigma.sq = sigma.sq, cov.model = 'spherical') y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites X.0 <- dat$X.pred psi.0 <- dat$psi.pred coords <- as.matrix(dat$coords.obs) coords.0 <- as.matrix(dat$coords.pred) # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2], det.cov.2.2 = X.p[[2]][, , 3]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , 2]) det.covs[[4]] <- list(det.cov.4.1 = X.p[[4]][, , 2], det.cov.4.2 = X.p[[4]][, , 3], det.cov.4.3 = X.p[[4]][, , 4]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, coords = coords) J <- length(dat$z.obs) # Initial values inits.list <- list(alpha = list(0, 0, 0, 0), beta = 0, phi = 3 / .5, sigma.sq = 2, w = rep(0, J), z = rep(1, J)) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = list(0, 0, 0, 0), var = list(2.72, 2.72, 2.72, 2.72)), phi.unif = c(3/1, 3/.1), sigma.sq.ig = c(2, 2)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 40 # Batch length batch.length <- 25 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spIntPGOcc(occ.formula = ~ occ.cov, det.formula = list(f.1 = ~ det.cov.1.1, f.2 = ~ det.cov.2.1 + det.cov.2.2, f.3 = ~ det.cov.3.1, f.4 = ~ det.cov.4.1 + det.cov.4.2 + det.cov.4.3), data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "spherical", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 500, n.thin = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
set.seed(400) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 8 J.y <- 8 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 0.5) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- runif(2, 0, 1) alpha[[2]] <- runif(3, 0, 1) alpha[[3]] <- runif(2, -1, 1) alpha[[4]] <- runif(4, -1, 1) p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) sigma.sq <- 2 phi <- 3 / .5 sp <- TRUE # Simulate occupancy data. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = sp, phi = phi, sigma.sq = sigma.sq, cov.model = 'spherical') y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites X.0 <- dat$X.pred psi.0 <- dat$psi.pred coords <- as.matrix(dat$coords.obs) coords.0 <- as.matrix(dat$coords.pred) # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2], det.cov.2.2 = X.p[[2]][, , 3]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , 2]) det.covs[[4]] <- list(det.cov.4.1 = X.p[[4]][, , 2], det.cov.4.2 = X.p[[4]][, , 3], det.cov.4.3 = X.p[[4]][, , 4]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, coords = coords) J <- length(dat$z.obs) # Initial values inits.list <- list(alpha = list(0, 0, 0, 0), beta = 0, phi = 3 / .5, sigma.sq = 2, w = rep(0, J), z = rep(1, J)) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = list(0, 0, 0, 0), var = list(2.72, 2.72, 2.72, 2.72)), phi.unif = c(3/1, 3/.1), sigma.sq.ig = c(2, 2)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 40 # Batch length batch.length <- 25 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spIntPGOcc(occ.formula = ~ occ.cov, det.formula = list(f.1 = ~ det.cov.1.1, f.2 = ~ det.cov.2.1 + det.cov.2.2, f.3 = ~ det.cov.3.1, f.4 = ~ det.cov.4.1 + det.cov.4.2 + det.cov.4.3), data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "spherical", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 500, n.thin = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'spMsPGOcc'. Prediction is possible for both the latent occupancy state as well as detection.
## S3 method for class 'spMsPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', ...)
## S3 method for class 'spMsPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', ...)
object |
an object of class spMsPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report sampling progress. |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, unstructured random effects will be ignored and prediction will only use the fixed effects and the spatial random effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the unstructured random effects. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
... |
currently no additional arguments |
An list object of class predict.spMsPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence probability values. |
z.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence values. |
w.0.samples |
a three-dimensional array of posterior predictive samples for the latent spatial random effects. |
run.time |
execution time reported using |
When type = 'detection'
, the list consists of:
p.0.samples |
a three-dimensional array of posterior predictive samples for the detection probability values. |
run.time |
execution time reported using |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 5 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } phi <- runif(N, 3/1, 3/.4) sigma.sq <- runif(N, 0.3, 3) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sigma.sq = sigma.sq, sp = TRUE, cov.model = 'exponential') # Number of batches n.batch <- 30 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[, pred.indx] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3] ) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1), sigma.sq.ig = list(a = 2, b = 2)) # Starting values inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, sigma.sq = 2, w = matrix(0, nrow = N, ncol = nrow(X)), z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 500, n.thin = 1) summary(out, level = 'both') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 5 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } phi <- runif(N, 3/1, 3/.4) sigma.sq <- runif(N, 0.3, 3) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sigma.sq = sigma.sq, sp = TRUE, cov.model = 'exponential') # Number of batches n.batch <- 30 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[, pred.indx] # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3] ) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1), sigma.sq.ig = list(a = 2, b = 2)) # Starting values inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, sigma.sq = 2, w = matrix(0, nrow = N, ncol = nrow(X)), z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 500, n.thin = 1) summary(out, level = 'both') # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
The function predict
collects posterior predictive samples for a set of new
locations given an object of class 'spPGOcc'. Prediction is possible for both the
latent occupancy state as well as detection.
## S3 method for class 'spPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
## S3 method for class 'spPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
object |
an object of class |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, unstructured random effects will be ignored and prediction will only use the fixed effects and the spatial random effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the unstructured random effects. |
n.report |
the interval to report sampling progress. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
grid.index.0 |
an indexing vector used to specify how each row in |
... |
currently no additional arguments |
A list object of class predict.spPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a |
z.0.samples |
a |
w.0.samples |
a |
run.time |
execution time reported using |
When type = 'detection'
, the list consists of:
p.0.samples |
a |
run.time |
execution time reported using |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Hooten, M. B., and Hefley, T. J. (2019). Bringing Bayesian models to life. CRC Press.
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) phi <- 3 / .6 sigma.sq <- 2 dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential') # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .5), replace = FALSE) y <- dat$y[-pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Prediction covariates X.0 <- dat$X[pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] coords <- as.matrix(dat$coords[-pred.indx, ]) coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[pred.indx] w.0 <- dat$w[pred.indx] # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = c(2, 2), phi.unif = c(3/1, 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 2, w = rep(0, nrow(X)), z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = 'exponential', tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = FALSE, n.neighbors = 15, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) phi <- 3 / .6 sigma.sq <- 2 dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential') # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .5), replace = FALSE) y <- dat$y[-pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Prediction covariates X.0 <- dat$X[pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] coords <- as.matrix(dat$coords[-pred.indx, ]) coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[pred.indx] w.0 <- dat$w[pred.indx] # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = c(2, 2), phi.unif = c(3/1, 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 2, w = rep(0, nrow(X)), z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = 'exponential', tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = FALSE, n.neighbors = 15, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'stIntPGOcc'. Prediction is only currently possible for the latent occupancy state. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'stIntPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', forecast = FALSE, ...)
## S3 method for class 'stIntPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', forecast = FALSE, ...)
object |
an object of class stIntPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
n.report |
the interval to report sampling progress. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. Currently only occupancy prediction is supported for integrated models. |
forecast |
a logical value indicating whether prediction is occurring at non-sampled primary time periods (e.g., forecasting). |
... |
currently no additional arguments |
A list object of class predict.stIntPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
z.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
w.0.samples |
a |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and z.samples
portions of the output list from the model object of class stIntPGOcc
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list() p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial parameters sigma.sq <- 0.9 phi <- 3 / .5 # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential') y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites coords <- dat$coords.obs # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons, coords = coords) # Testing occ.formula <- ~ trend + occ.cov.1 # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- stIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, NNGP = TRUE, n.neighbors = 15, cov.model = 'exponential', n.batch = 3, batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) t.cols <- 1:n.time.total out.pred <- predict(out, X.0 = dat$X.pred, coords.0 = dat$coords.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list() p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial parameters sigma.sq <- 0.9 phi <- 3 / .5 # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential') y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites coords <- dat$coords.obs # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons, coords = coords) # Testing occ.formula <- ~ trend + occ.cov.1 # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- stIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, NNGP = TRUE, n.neighbors = 15, cov.model = 'exponential', n.batch = 3, batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) t.cols <- 1:n.time.total out.pred <- predict(out, X.0 = dat$X.pred, coords.0 = dat$coords.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'stMsPGOcc'. Prediction is possible for both the latent occupancy state as well as detection. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'stMsPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
## S3 method for class 'stMsPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
object |
an object of class stMsPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
n.report |
the interval to report sampling progress. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
grid.index.0 |
an indexing vector used to specify how each row in |
... |
currently no additional arguments |
A list object of class predict.stMsPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a four-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, species, site, and primary time period. |
z.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, species, site, and primary time period. |
w.0.samples |
a three-dimensional array of posterior predictive samples for the latent spatial factors with dimensions correpsonding to MCMC sample, latent factor, and site. |
When type = 'detection'
, the list consists of:
p.0.samples |
a four-dimensional object of posterior predictive samples for the detection probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and z.samples
portions of the output list from the model object of class stMsPGOcc
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) # n.rep[j, 1:n.time[j]] <- rep(4, n.time[j]) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1) p.svc <- length(svc.cols) n.factors <- 3 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' ar1 <- TRUE sigma.sq.t <- runif(N, 0.05, 1) rho <- runif(N, 0.1, 1) dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model, ar1 = ar1, sigma.sq.t = sigma.sq.t, rho = rho) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] # Coordinates coords <- dat$coords[-pred.indx, ] coords.0 <- dat$coords[pred.indx, ] occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), rho.unif = list(a = -1, b = 1), sigma.sq.t.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / .9, b = 3 / .1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, rho = 0.5, sigma.sq.t = 0.5, phi = 3 / .5, z = z.init) # Tuning tuning.list <- list(phi = 1, rho = 0.5) # Number of batches n.batch <- 2 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- stMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, ar1 = TRUE, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, cov.model = 'exponential', priors = prior.list, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new sites across all n.max.years # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.pred, coords.0, t.cols = t.cols, type = 'occupancy') str(out.pred)
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) # n.rep[j, 1:n.time[j]] <- rep(4, n.time[j]) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1) p.svc <- length(svc.cols) n.factors <- 3 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' ar1 <- TRUE sigma.sq.t <- runif(N, 0.05, 1) rho <- runif(N, 0.1, 1) dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model, ar1 = ar1, sigma.sq.t = sigma.sq.t, rho = rho) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] # Coordinates coords <- dat$coords[-pred.indx, ] coords.0 <- dat$coords[pred.indx, ] occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), rho.unif = list(a = -1, b = 1), sigma.sq.t.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / .9, b = 3 / .1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, rho = 0.5, sigma.sq.t = 0.5, phi = 3 / .5, z = z.init) # Tuning tuning.list <- list(phi = 1, rho = 0.5) # Number of batches n.batch <- 2 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- stMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, ar1 = TRUE, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, cov.model = 'exponential', priors = prior.list, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new sites across all n.max.years # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.pred, coords.0, t.cols = t.cols, type = 'occupancy') str(out.pred)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'stPGOcc'. Prediction is possible for both the latent occupancy state as well as detection. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'stPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', forecast = FALSE, grid.index.0, ...)
## S3 method for class 'stPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', forecast = FALSE, grid.index.0, ...)
object |
an object of class stPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
n.report |
the interval to report sampling progress. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
forecast |
a logical value indicating whether prediction is occurring at non-sampled primary time periods (e.g., forecasting). |
grid.index.0 |
an indexing vector used to specify how each row in |
... |
currently no additional arguments |
A list object of class predict.stPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
z.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
w.0.samples |
a |
When type = 'detection'
, the list consists of:
p.0.samples |
a three-dimensional object of posterior predictive samples for the detection probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and z.samples
portions of the output list from the model object of class stPGOcc
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(500) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Spatial ----------------------------- sp <- TRUE cov.model <- "exponential" sigma.sq <- 2 phi <- 3 / .4 # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = cov.model, ar1 = FALSE) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] psi.0 <- dat$psi[pred.indx, ] # Coordinates coords <- dat$coords[-pred.indx, ] coords.0 <- dat$coords[pred.indx, ] # Package all data into a list # Occurrence occ.covs <- list(int = X[, , 1], trend = X[, , 2], occ.cov.1 = X[, , 3]) # Detection det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) # Data list bundle data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = c(2, 2), phi.unif = c(3 / 1, 3 / 0.1)) # Initial values z.init <- apply(y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init, phi = 3 / .5, sigma.sq = 2, w = rep(0, J)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- stPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, priors = prior.list, cov.model = "exponential", tuning = tuning.list, NNGP = TRUE, ar1 = FALSE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.chains = 1) summary(out) # Predict at new sites across all n.max.years # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.0, coords.0, t.cols = t.cols, type = 'occupancy') str(out.pred)
set.seed(500) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Spatial ----------------------------- sp <- TRUE cov.model <- "exponential" sigma.sq <- 2 phi <- 3 / .4 # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = cov.model, ar1 = FALSE) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] psi.0 <- dat$psi[pred.indx, ] # Coordinates coords <- dat$coords[-pred.indx, ] coords.0 <- dat$coords[pred.indx, ] # Package all data into a list # Occurrence occ.covs <- list(int = X[, , 1], trend = X[, , 2], occ.cov.1 = X[, , 3]) # Detection det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) # Data list bundle data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = c(2, 2), phi.unif = c(3 / 1, 3 / 0.1)) # Initial values z.init <- apply(y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init, phi = 3 / .5, sigma.sq = 2, w = rep(0, J)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- stPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, priors = prior.list, cov.model = "exponential", tuning = tuning.list, NNGP = TRUE, ar1 = FALSE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.chains = 1) summary(out) # Predict at new sites across all n.max.years # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.0, coords.0, t.cols = t.cols, type = 'occupancy') str(out.pred)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'svcMsPGOcc'. Prediction is possible for both the latent occupancy state as well as detection.
## S3 method for class 'svcMsPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', ...)
## S3 method for class 'svcMsPGOcc' predict(object, X.0, coords.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', ...)
object |
an object of class svcMsPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
n.report |
the interval to report sampling progress. |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, unstructured random effects will be ignored and prediction will only use the fixed effects and the spatial random effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the unstructured random effects. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
... |
currently no additional arguments |
An list object of class predict.svcMsPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence probability values. |
z.0.samples |
a three-dimensional array of posterior predictive samples for the latent occurrence values. |
w.0.samples |
a four-dimensional array of posterior predictive samples for the spatially-varying coefficients, with dimensions corresponding to MCMC sample, spatial factor, site, and spatially varying coefficient. |
run.time |
execution time reported using |
When type = 'detection'
, the list consists of:
p.0.samples |
a three-dimensional array of posterior predictive samples for the detection probability values. |
run.time |
execution time reported using |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- sample(5, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.2, 0.3, -0.1, 0.4) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 0.4, 0.5, 0.3) # Detection alpha.mean <- c(0, 1.2, -0.5) tau.sq.alpha <- c(1, 0.5, 1.3) p.det <- length(alpha.mean) # No random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } # Number of spatial factors for each SVC n.factors <- 2 # The intercept and first two covariates have spatially-varying effects svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) q.p.svc <- n.factors * p.svc # Spatial decay parameters phi <- runif(q.p.svc, 3 / 0.9, 3 / 0.1) # A length N vector indicating the proportion of simulated locations # that are within the range for a given species. range.probs <- runif(N, 1, 1) factor.model <- TRUE cov.model <- 'spherical' sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, phi = phi, sp = sp, svc.cols = svc.cols, cov.model = cov.model, n.factors = n.factors, factor.model = factor.model, range.probs = range.probs) # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] coords.0 <- as.matrix(dat$coords[pred.indx, ]) # Prep data for spOccupancy ----------------------------------------------- # Occurrence covariates occ.covs <- cbind(X) colnames(occ.covs) <- c('int', 'occ.cov.1', 'occ.cov.2', 'occ.cov.3', 'occ.cov.4') # Detection covariates det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) # Data list data.list <- list(y = y, coords = coords, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / 1, b = 3 / .1)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 2 # Batch length batch.length <- 25 n.burn <- 0 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2 + occ.cov.3 + occ.cov.4, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, n.factors = n.factors, batch.length = batch.length, std.by.sp = TRUE, accept.rate = 0.43, priors = prior.list, svc.cols = svc.cols, cov.model = "spherical", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE) # Get SVC samples for each species at prediction locations svc.samples <- getSVCSamples(out, out.pred)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- sample(5, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.2, 0.3, -0.1, 0.4) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 0.4, 0.5, 0.3) # Detection alpha.mean <- c(0, 1.2, -0.5) tau.sq.alpha <- c(1, 0.5, 1.3) p.det <- length(alpha.mean) # No random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } # Number of spatial factors for each SVC n.factors <- 2 # The intercept and first two covariates have spatially-varying effects svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) q.p.svc <- n.factors * p.svc # Spatial decay parameters phi <- runif(q.p.svc, 3 / 0.9, 3 / 0.1) # A length N vector indicating the proportion of simulated locations # that are within the range for a given species. range.probs <- runif(N, 1, 1) factor.model <- TRUE cov.model <- 'spherical' sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, phi = phi, sp = sp, svc.cols = svc.cols, cov.model = cov.model, n.factors = n.factors, factor.model = factor.model, range.probs = range.probs) # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) # Detection covariates X.p <- dat$X.p[-pred.indx, , ] # Prediction values X.0 <- dat$X[pred.indx, ] coords.0 <- as.matrix(dat$coords[pred.indx, ]) # Prep data for spOccupancy ----------------------------------------------- # Occurrence covariates occ.covs <- cbind(X) colnames(occ.covs) <- c('int', 'occ.cov.1', 'occ.cov.2', 'occ.cov.3', 'occ.cov.4') # Detection covariates det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) # Data list data.list <- list(y = y, coords = coords, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / 1, b = 3 / .1)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 2 # Batch length batch.length <- 25 n.burn <- 0 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2 + occ.cov.3 + occ.cov.4, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, n.factors = n.factors, batch.length = batch.length, std.by.sp = TRUE, accept.rate = 0.43, priors = prior.list, svc.cols = svc.cols, cov.model = "spherical", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE) # Get SVC samples for each species at prediction locations svc.samples <- getSVCSamples(out, out.pred)
The function predict
collects posterior predictive samples for a set of new
locations given an object of class 'svcPGBinom'.
## S3 method for class 'svcPGBinom' predict(object, X.0, coords.0, weights.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, ...)
## S3 method for class 'svcPGBinom' predict(object, X.0, coords.0, weights.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, ...)
object |
an object of class |
X.0 |
the design matrix of covariates at the prediction locations. Note that for spatially-varying coefficients models the order of covariates in |
coords.0 |
the spatial coordinates corresponding to |
weights.0 |
a numeric vector containing the binomial weights (i.e., the total number of
Bernoulli trials) at each site. If |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, unstructured random effects will be ignored and prediction will only use the fixed effects and the spatial random effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the unstructured random effects. |
n.report |
the interval to report sampling progress. |
... |
currently no additional arguments |
A list object of class predict.svcPGBinom
consisting of:
psi.0.samples |
a |
y.0.samples |
a |
w.0.samples |
a three-dimensional array of posterior predictive samples for the spatial random effects, with dimensions corresponding to MCMC iteration, coefficient, and site. |
run.time |
execution time reported using |
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(1000) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Binomial weights weights <- sample(10, J, replace = TRUE) beta <- c(0, 0.5, -0.2, 0.75) p <- length(beta) # No unstructured random effects psi.RE <- list() # Spatial parameters sp <- TRUE # Two spatially-varying covariates. svc.cols <- c(1, 2) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.4, 1.5) phi <- runif(p.svc, 3/1, 3/0.2) # Simulate the data dat <- simBinom(J.x = J.x, J.y = J.y, weights = weights, beta = beta, psi.RE = psi.RE, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi) # Binomial data y <- dat$y # Covariates X <- dat$X # Spatial coordinates coords <- dat$coords # Subset data for prediction if desired pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y.0 <- y[pred.indx, drop = FALSE] X.0 <- X[pred.indx, , drop = FALSE] coords.0 <- coords[pred.indx, ] y <- y[-pred.indx, drop = FALSE] X <- X[-pred.indx, , drop = FALSE] coords <- coords[-pred.indx, ] weights.0 <- weights[pred.indx] weights <- weights[-pred.indx] # Package all data into a list # Covariates covs <- cbind(X) colnames(covs) <- c('int', 'cov.1', 'cov.2', 'cov.3') # Data list bundle data.list <- list(y = y, covs = covs, coords = coords, weights = weights) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3 / 1, b = 3 / 0.1)) # Starting values inits.list <- list(beta = 0, alpha = 0, sigma.sq = 1, phi = phi) # Tuning tuning.list <- list(phi = 1) n.batch <- 10 batch.length <- 25 n.burn <- 100 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcPGBinom(formula = ~ cov.1 + cov.2 + cov.3, svc.cols = c(1, 2), data = data.list, n.batch = n.batch, batch.length = batch.length, inits = inits.list, priors = prior.list, accept.rate = 0.43, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.report = 2, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, weights.0, verbose = FALSE) str(out.pred)
set.seed(1000) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Binomial weights weights <- sample(10, J, replace = TRUE) beta <- c(0, 0.5, -0.2, 0.75) p <- length(beta) # No unstructured random effects psi.RE <- list() # Spatial parameters sp <- TRUE # Two spatially-varying covariates. svc.cols <- c(1, 2) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.4, 1.5) phi <- runif(p.svc, 3/1, 3/0.2) # Simulate the data dat <- simBinom(J.x = J.x, J.y = J.y, weights = weights, beta = beta, psi.RE = psi.RE, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi) # Binomial data y <- dat$y # Covariates X <- dat$X # Spatial coordinates coords <- dat$coords # Subset data for prediction if desired pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y.0 <- y[pred.indx, drop = FALSE] X.0 <- X[pred.indx, , drop = FALSE] coords.0 <- coords[pred.indx, ] y <- y[-pred.indx, drop = FALSE] X <- X[-pred.indx, , drop = FALSE] coords <- coords[-pred.indx, ] weights.0 <- weights[pred.indx] weights <- weights[-pred.indx] # Package all data into a list # Covariates covs <- cbind(X) colnames(covs) <- c('int', 'cov.1', 'cov.2', 'cov.3') # Data list bundle data.list <- list(y = y, covs = covs, coords = coords, weights = weights) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3 / 1, b = 3 / 0.1)) # Starting values inits.list <- list(beta = 0, alpha = 0, sigma.sq = 1, phi = phi) # Tuning tuning.list <- list(phi = 1) n.batch <- 10 batch.length <- 25 n.burn <- 100 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcPGBinom(formula = ~ cov.1 + cov.2 + cov.3, svc.cols = c(1, 2), data = data.list, n.batch = n.batch, batch.length = batch.length, inits = inits.list, priors = prior.list, accept.rate = 0.43, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.report = 2, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, weights.0, verbose = FALSE) str(out.pred)
The function predict
collects posterior predictive samples for a set of new
locations given an object of class 'svcPGOcc'. Prediction is possible for both the
latent occupancy state as well as detection.
## S3 method for class 'svcPGOcc' predict(object, X.0, coords.0, weights.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
## S3 method for class 'svcPGOcc' predict(object, X.0, coords.0, weights.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
object |
an object of class |
X.0 |
the design matrix of covariates at the prediction locations. This should include a column of 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
weights.0 |
not used for objects of class |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
a logical value indicating whether to include unstructured random effects for prediction. If TRUE, unstructured random effects will be ignored and prediction will only use the fixed effects and the spatial random effects. If FALSE, random effects will be included in the prediction for both observed and unobserved levels of the unstructured random effects. |
n.report |
the interval to report sampling progress. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
grid.index.0 |
an indexing vector used to specify how each row in |
... |
currently no additional arguments |
A list object of class predict.svcPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a |
z.0.samples |
a |
w.0.samples |
a three-dimensional array of posterior predictive samples for the spatial random effects, with dimensions corresponding to MCMC iteration, coefficient, and site. |
run.time |
execution time reported using |
When type = 'detection'
, the list consists of:
p.0.samples |
a |
run.time |
execution time reported using |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Hooten, M. B., and Hefley, T. J. (2019). Bringing Bayesian models to life. CRC Press.
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) phi <- c(3 / .6, 3 / .8) sigma.sq <- c(0.5, 0.9) svc.cols <- c(1, 2) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential', svc.cols = svc.cols) # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .5), replace = FALSE) y <- dat$y[-pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Prediction covariates X.0 <- dat$X[pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] coords <- as.matrix(dat$coords[-pred.indx, ]) coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[pred.indx] w.0 <- dat$w[pred.indx, , drop = FALSE] # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 0.5), phi.unif = list(a = 3/1, b = 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 0.5, z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = 'exponential', tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, svc.cols = c(1, 2), n.neighbors = 15, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) phi <- c(3 / .6, 3 / .8) sigma.sq <- c(0.5, 0.9) svc.cols <- c(1, 2) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential', svc.cols = svc.cols) # Split into fitting and prediction data set pred.indx <- sample(1:J, round(J * .5), replace = FALSE) y <- dat$y[-pred.indx, ] # Occupancy covariates X <- dat$X[-pred.indx, ] # Prediction covariates X.0 <- dat$X[pred.indx, ] # Detection covariates X.p <- dat$X.p[-pred.indx, , ] coords <- as.matrix(dat$coords[-pred.indx, ]) coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[pred.indx] w.0 <- dat$w[pred.indx, , drop = FALSE] # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 0.5), phi.unif = list(a = 3/1, b = 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 0.5, z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = 'exponential', tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, svc.cols = c(1, 2), n.neighbors = 15, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1) summary(out) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'svcTIntPGOcc'. Detection prediction is not currently supported. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'svcTIntPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', forecast = FALSE, ...)
## S3 method for class 'svcTIntPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', forecast = FALSE, ...)
object |
an object of class svcTIntPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
n.report |
the interval to report sampling progress. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. Detection prediction is not currently supported for integrated models. |
forecast |
a logical value indicating whether prediction is occurring at non-sampled primary time periods (e.g., forecasting). |
... |
currently no additional arguments |
A list object of class predict.svcTIntPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
z.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
w.0.samples |
a three-dimensional array of posterior predictive samples for the spatial random effects, with dimensions corresponding to MCMC iteration, coefficient, and site. |
When type = 'detection'
, the list consists of:
p.0.samples |
a three-dimensional object of posterior predictive samples for the detection probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and z.samples
portions of the output list from the model object of class svcTIntPGOcc
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list() p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial parameters svc.cols <- c(1, 2) sigma.sq <- c(0.9, 0.5) phi <- c(3 / .5, 3 / .8) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential', svc.cols = svc.cols) y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites coords <- dat$coords.obs # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons, coords = coords) # Testing occ.formula <- ~ trend + occ.cov.1 # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- svcTIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, NNGP = TRUE, n.neighbors = 15, cov.model = 'exponential', n.batch = 3, svc.cols = c(1, 2), batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) summary(out) t.cols <- 1:n.time.total out.pred <- predict(out, X.0 = dat$X.pred, coords.0 = dat$coords.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list() p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial parameters svc.cols <- c(1, 2) sigma.sq <- c(0.9, 0.5) phi <- c(3 / .5, 3 / .8) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential', svc.cols = svc.cols) y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites coords <- dat$coords.obs # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons, coords = coords) # Testing occ.formula <- ~ trend + occ.cov.1 # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- svcTIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, NNGP = TRUE, n.neighbors = 15, cov.model = 'exponential', n.batch = 3, svc.cols = c(1, 2), batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) summary(out) t.cols <- 1:n.time.total out.pred <- predict(out, X.0 = dat$X.pred, coords.0 = dat$coords.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'svcTMsPGOcc'. Prediction is possible for both the latent occupancy state as well as detection. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'svcTMsPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
## S3 method for class 'svcTMsPGOcc' predict(object, X.0, coords.0, t.cols, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', grid.index.0, ...)
object |
an object of class svcTMsPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
n.report |
the interval to report sampling progress. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
grid.index.0 |
an indexing vector used to specify how each row in |
... |
currently no additional arguments |
A list object of class predict.svcTMsPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a four-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, species, site, and primary time period. |
z.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, species, site, and primary time period. |
w.0.samples |
a four-dimensional array of posterior predictive samples for the latent spatial factors with dimensions correpsonding to MCMC sample, latent factor, site, and spatially-varying coefficient. |
When type = 'detection'
, the list consists of:
p.0.samples |
a four-dimensional object of posterior predictive samples for the detection probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and z.samples
portions of the output list from the model object of class svcTMsPGOcc
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) # n.rep[j, 1:n.time[j]] <- rep(4, n.time[j]) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1, 2) p.svc <- length(svc.cols) n.factors <- 2 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' ar1 <- TRUE sigma.sq.t <- runif(N, 0.05, 1) rho <- runif(N, 0.1, 1) dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model, ar1 = ar1, sigma.sq.t = sigma.sq.t, rho = rho) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] # Coordinates coords <- dat$coords[-pred.indx, ] coords.0 <- dat$coords[pred.indx, ] occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), rho.unif = list(a = -1, b = 1), sigma.sq.t.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / .9, b = 3 / .1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, rho = 0.5, sigma.sq.t = 0.5, phi = 3 / .5, z = z.init) # Tuning tuning.list <- list(phi = 1, rho = 0.5) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, ar1 = TRUE, svc.cols = svc.cols, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, cov.model = 'exponential', priors = prior.list, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new sites across all n.max.years # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.pred, coords.0, t.cols = t.cols, type = 'occupancy') str(out.pred) # Extract SVC samples for each species at prediction locations svc.samples <- getSVCSamples(out, out.pred) str(svc.samples)
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) # n.rep[j, 1:n.time[j]] <- rep(4, n.time[j]) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1, 2) p.svc <- length(svc.cols) n.factors <- 2 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' ar1 <- TRUE sigma.sq.t <- runif(N, 0.05, 1) rho <- runif(N, 0.1, 1) dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model, ar1 = ar1, sigma.sq.t = sigma.sq.t, rho = rho) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] # Coordinates coords <- dat$coords[-pred.indx, ] coords.0 <- dat$coords[pred.indx, ] occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), rho.unif = list(a = -1, b = 1), sigma.sq.t.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / .9, b = 3 / .1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, rho = 0.5, sigma.sq.t = 0.5, phi = 3 / .5, z = z.init) # Tuning tuning.list <- list(phi = 1, rho = 0.5) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, ar1 = TRUE, svc.cols = svc.cols, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, cov.model = 'exponential', priors = prior.list, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new sites across all n.max.years # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.pred, coords.0, t.cols = t.cols, type = 'occupancy') str(out.pred) # Extract SVC samples for each species at prediction locations svc.samples <- getSVCSamples(out, out.pred) str(svc.samples)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'svcTPGBinom'. Prediction is possible for both the latent occupancy state as well as detection. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'svcTPGBinom' predict(object, X.0, coords.0, t.cols, weights.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, ...)
## S3 method for class 'svcTPGBinom' predict(object, X.0, coords.0, t.cols, weights.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, ...)
object |
an object of class svcTPGBinom |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
weights.0 |
a numeric site by primary time period matrix containing the binomial weights (i.e., the total number of
Bernoulli trials) at each site and primary time period. If |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
n.report |
the interval to report sampling progress. |
... |
currently no additional arguments |
A list object of class predict.svcTPGBinom
that consists of:
psi.0.samples |
a three-dimensional object of posterior predictive samples for the occurrence probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
y.0.samples |
a three-dimensional object of posterior predictive samples for the predicted binomial data with dimensions corresponding to posterior predictive sample, site, and primary time period. |
w.0.samples |
a three-dimensional array of posterior predictive samples for the spatial random effects, with dimensions corresponding to MCMC iteration, coefficient, and site. |
run.time |
execution time reported using |
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and y.rep.samples
portions of the output list from the model object of class svcTPGBinom
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(1000) # Sites J.x <- 15 J.y <- 15 J <- J.x * J.y # Years sampled n.time <- sample(10, J, replace = TRUE) # Binomial weights weights <- matrix(NA, J, max(n.time)) for (j in 1:J) { weights[j, 1:n.time[j]] <- sample(5, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(-2, -0.5, -0.2, 0.75) p.occ <- length(beta) trend <- TRUE sp.only <- 0 psi.RE <- list() # Spatial parameters ------------------ sp <- TRUE svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3/1, 3/0.2) # Temporal parameters ----------------- ar1 <- TRUE rho <- 0.8 sigma.sq.t <- 1 # Get all the data dat <- simTBinom(J.x = J.x, J.y = J.y, n.time = n.time, weights = weights, beta = beta, psi.RE = psi.RE, sp.only = sp.only, trend = trend, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, rho = rho, sigma.sq.t = sigma.sq.t, ar1 = TRUE, x.positive = FALSE) # Prep the data for spOccupancy ------------------------------------------- # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, , drop = FALSE] y.0 <- dat$y[pred.indx, , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Spatial coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[pred.indx, ] w.0 <- dat$w[pred.indx, ] weights.0 <- weights[pred.indx, ] weights <- weights[-pred.indx, ] # Package all data into a list covs <- list(int = X[, , 1], trend = X[, , 2], cov.1 = X[, , 3], cov.2 = X[, , 4]) # Data list bundle data.list <- list(y = y, covs = covs, weights = weights, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3/1, b = 3/.1)) # Starting values inits.list <- list(beta = beta, alpha = 0, sigma.sq = 1, phi = 3 / 0.5, nu = 1) # Tuning tuning.list <- list(phi = 0.4, nu = 0.3, rho = 0.2) # MCMC information n.batch <- 2 n.burn <- 0 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTPGBinom(formula = ~ trend + cov.1 + cov.2, svc.cols = svc.cols, data = data.list, n.batch = n.batch, batch.length = 25, inits = inits.list, priors = prior.list, accept.rate = 0.43, cov.model = "exponential", ar1 = TRUE, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.report = 25, n.burn = n.burn, n.thin = n.thin, n.chains = 1) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, t.cols = 1:max(n.time), weights = weights.0, n.report = 10) str(out.pred)
set.seed(1000) # Sites J.x <- 15 J.y <- 15 J <- J.x * J.y # Years sampled n.time <- sample(10, J, replace = TRUE) # Binomial weights weights <- matrix(NA, J, max(n.time)) for (j in 1:J) { weights[j, 1:n.time[j]] <- sample(5, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(-2, -0.5, -0.2, 0.75) p.occ <- length(beta) trend <- TRUE sp.only <- 0 psi.RE <- list() # Spatial parameters ------------------ sp <- TRUE svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3/1, 3/0.2) # Temporal parameters ----------------- ar1 <- TRUE rho <- 0.8 sigma.sq.t <- 1 # Get all the data dat <- simTBinom(J.x = J.x, J.y = J.y, n.time = n.time, weights = weights, beta = beta, psi.RE = psi.RE, sp.only = sp.only, trend = trend, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, rho = rho, sigma.sq.t = sigma.sq.t, ar1 = TRUE, x.positive = FALSE) # Prep the data for spOccupancy ------------------------------------------- # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, , drop = FALSE] y.0 <- dat$y[pred.indx, , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Spatial coordinates coords <- as.matrix(dat$coords[-pred.indx, ]) coords.0 <- as.matrix(dat$coords[pred.indx, ]) psi.0 <- dat$psi[pred.indx, ] w.0 <- dat$w[pred.indx, ] weights.0 <- weights[pred.indx, ] weights <- weights[-pred.indx, ] # Package all data into a list covs <- list(int = X[, , 1], trend = X[, , 2], cov.1 = X[, , 3], cov.2 = X[, , 4]) # Data list bundle data.list <- list(y = y, covs = covs, weights = weights, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3/1, b = 3/.1)) # Starting values inits.list <- list(beta = beta, alpha = 0, sigma.sq = 1, phi = 3 / 0.5, nu = 1) # Tuning tuning.list <- list(phi = 0.4, nu = 0.3, rho = 0.2) # MCMC information n.batch <- 2 n.burn <- 0 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTPGBinom(formula = ~ trend + cov.1 + cov.2, svc.cols = svc.cols, data = data.list, n.batch = n.batch, batch.length = 25, inits = inits.list, priors = prior.list, accept.rate = 0.43, cov.model = "exponential", ar1 = TRUE, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.report = 25, n.burn = n.burn, n.thin = n.thin, n.chains = 1) # Predict at new locations ------------------------------------------------ out.pred <- predict(out, X.0, coords.0, t.cols = 1:max(n.time), weights = weights.0, n.report = 10) str(out.pred)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'svcTPGOcc'. Prediction is possible for both the latent occupancy state as well as detection. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'svcTPGOcc' predict(object, X.0, coords.0, t.cols, weights.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', forecast = FALSE, grid.index.0, ...)
## S3 method for class 'svcTPGOcc' predict(object, X.0, coords.0, t.cols, weights.0, n.omp.threads = 1, verbose = TRUE, n.report = 100, ignore.RE = FALSE, type = 'occupancy', forecast = FALSE, grid.index.0, ...)
object |
an object of class svcTPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
coords.0 |
the spatial coordinates corresponding to |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
weights.0 |
not used for objects of class |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
n.report |
the interval to report sampling progress. |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
grid.index.0 |
an indexing vector used to specify how each row in |
forecast |
a logical value indicating whether prediction is occurring at non-sampled primary time periods (e.g., forecasting). |
... |
currently no additional arguments |
A list object of class predict.svcTPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
z.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
w.0.samples |
a three-dimensional array of posterior predictive samples for the spatial random effects, with dimensions corresponding to MCMC iteration, coefficient, and site. |
When type = 'detection'
, the list consists of:
p.0.samples |
a three-dimensional object of posterior predictive samples for the detection probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and z.samples
portions of the output list from the model object of class svcTPGOcc
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(500) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Spatial ----------------------------- svc.cols <- c(1, 2) p.svc <- length(svc.cols) sp <- TRUE cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3 / .9, 3 / .1) # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = cov.model, ar1 = FALSE, svc.cols = svc.cols) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] psi.0 <- dat$psi[pred.indx, ] # Coordinates coords <- dat$coords[-pred.indx, ] coords.0 <- dat$coords[pred.indx, ] # Package all data into a list # Occurrence occ.covs <- list(int = X[, , 1], trend = X[, , 2], occ.cov.1 = X[, , 3]) # Detection det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) # Data list bundle data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 0.5), phi.unif = list(a = 3 / 1, b = 3 / 0.1)) # Initial values z.init <- apply(y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init, phi = 3 / .5, sigma.sq = 2, w = rep(0, J)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, priors = prior.list, cov.model = "exponential", svc.cols = svc.cols, tuning = tuning.list, NNGP = TRUE, ar1 = FALSE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.chains = 1) summary(out) # Predict at new sites across all n.max.years # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.0, coords.0, t.cols = t.cols, type = 'occupancy') str(out.pred)
set.seed(500) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Spatial ----------------------------- svc.cols <- c(1, 2) p.svc <- length(svc.cols) sp <- TRUE cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3 / .9, 3 / .1) # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = cov.model, ar1 = FALSE, svc.cols = svc.cols) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] psi.0 <- dat$psi[pred.indx, ] # Coordinates coords <- dat$coords[-pred.indx, ] coords.0 <- dat$coords[pred.indx, ] # Package all data into a list # Occurrence occ.covs <- list(int = X[, , 1], trend = X[, , 2], occ.cov.1 = X[, , 3]) # Detection det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) # Data list bundle data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 0.5), phi.unif = list(a = 3 / 1, b = 3 / 0.1)) # Initial values z.init <- apply(y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init, phi = 3 / .5, sigma.sq = 2, w = rep(0, J)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, priors = prior.list, cov.model = "exponential", svc.cols = svc.cols, tuning = tuning.list, NNGP = TRUE, ar1 = FALSE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.chains = 1) summary(out) # Predict at new sites across all n.max.years # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.0, coords.0, t.cols = t.cols, type = 'occupancy') str(out.pred)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'tIntPGOcc'. Prediction is currently only possible for the latent occupancy state. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'tIntPGOcc' predict(object, X.0, t.cols, ignore.RE = FALSE, type = 'occupancy', ...)
## S3 method for class 'tIntPGOcc' predict(object, X.0, t.cols, ignore.RE = FALSE, type = 'occupancy', ...)
object |
an object of class tIntPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. Detection prediction is not currently supported for integrated models. |
... |
currently no additional arguments |
A list object of class predict.tIntPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
z.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and z.samples
portions of the output list from the model object of class tIntPGOcc
.
Jeffrey W. Doser [email protected]
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list() p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE) y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons) # Testing occ.formula <- ~ trend + occ.cov.1 # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- tIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, n.batch = 3, batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) t.cols <- 1:n.time.total out.pred <- predict(out, X.0 = dat$X.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list() p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE) y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons) # Testing occ.formula <- ~ trend + occ.cov.1 # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- tIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, n.batch = 3, batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) t.cols <- 1:n.time.total out.pred <- predict(out, X.0 = dat$X.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'tMsPGOcc'. Prediction is possible for both the latent occupancy state as well as detection. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'tMsPGOcc' predict(object, X.0, t.cols, ignore.RE = FALSE, type = 'occupancy', ...)
## S3 method for class 'tMsPGOcc' predict(object, X.0, t.cols, ignore.RE = FALSE, type = 'occupancy', ...)
object |
an object of class tMsPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
... |
currently no additional arguments |
A list object of class predict.tMsPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a four-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, species, site, and primary time period. |
z.0.samples |
a four-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, species, site, and primary time period. |
When type = 'detection'
, the list consists of:
p.0.samples |
a four-dimensional object of posterior predictive samples for the detection probability values with dimensions corresponding to posterior predictive sample, species, site, and primary time period. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and z.samples
portions of the output list from the model object of class tMsPGOcc
.
Jeffrey W. Doser [email protected]
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) # n.rep[j, 1:n.time[j]] <- rep(4, n.time[j]) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- FALSE dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = sp) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = z.init) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- tMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new sites during time periods 1, 2, and 5 # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) # n.rep[j, 1:n.time[j]] <- rep(4, n.time[j]) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- FALSE dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = sp) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = z.init) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- tMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out) # Predict at new sites during time periods 1, 2, and 5 # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
The function predict
collects posterior predictive samples for a set of new locations given an object of class 'tPGOcc'. Prediction is possible for both the latent occupancy state as well as detection. Predictions are currently only possible for sampled primary time periods.
## S3 method for class 'tPGOcc' predict(object, X.0, t.cols, ignore.RE = FALSE, type = 'occupancy', ...)
## S3 method for class 'tPGOcc' predict(object, X.0, t.cols, ignore.RE = FALSE, type = 'occupancy', ...)
object |
an object of class tPGOcc |
X.0 |
the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if |
t.cols |
an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations ( |
ignore.RE |
logical value that specifies whether or not to remove random unstructured occurrence (or detection if |
type |
a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates. |
... |
currently no additional arguments |
A list object of class predict.tPGOcc
. When type = 'occupancy'
, the list consists of:
psi.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
z.0.samples |
a three-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
When type = 'detection'
, the list consists of:
p.0.samples |
a three-dimensional object of posterior predictive samples for the detection probability values with dimensions corresponding to posterior predictive sample, site, and primary time period. |
The return object will include additional objects used for standard extractor functions.
When ignore.RE = FALSE
, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.
Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples
and z.samples
portions of the output list from the model object of class tPGOcc
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(990) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = FALSE, ar1 = FALSE) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] psi.0 <- dat$psi[pred.indx, ] # Package all data into a list # Occurrence occ.covs <- list(int = X[, , 1], trend = X[, , 2], occ.cov.1 = X[, , 3]) # Detection det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) # Data list bundle data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72)) # Starting values z.init <- apply(y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init) n.batch <- 100 batch.length <- 25 n.burn <- 2000 n.thin <- 1 # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- tPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, priors = prior.list, n.batch = n.batch, batch.length = batch.length, ar1 = FALSE, verbose = TRUE, n.report = 500, n.burn = n.burn, n.thin = n.thin, n.chains = 1) # Predict at new sites across during time periods 1, 2, and 5 # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
set.seed(990) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = FALSE, ar1 = FALSE) # Subset data for prediction pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[-pred.indx, , , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , , drop = FALSE] # Prediction covariates X.0 <- dat$X[pred.indx, , , drop = FALSE] # Detection covariates X.p <- dat$X.p[-pred.indx, , , , drop = FALSE] psi.0 <- dat$psi[pred.indx, ] # Package all data into a list # Occurrence occ.covs <- list(int = X[, , 1], trend = X[, , 2], occ.cov.1 = X[, , 3]) # Detection det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) # Data list bundle data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72)) # Starting values z.init <- apply(y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init) n.batch <- 100 batch.length <- 25 n.burn <- 2000 n.thin <- 1 # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- tPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, priors = prior.list, n.batch = n.batch, batch.length = batch.length, ar1 = FALSE, verbose = TRUE, n.report = 500, n.burn = n.burn, n.thin = n.thin, n.chains = 1) # Predict at new sites across during time periods 1, 2, and 5 # Take a look at array of covariates for prediction str(X.0) # Subset to only grab time periods 1, 2, and 5 t.cols <- c(1, 2, 5) X.pred <- X.0[, t.cols, ] out.pred <- predict(out, X.pred, t.cols = t.cols, type = 'occupancy') str(out.pred)
PGOcc
modelsMethod for calculating occupancy and detection residuals for single-species occupancy models (PGOcc
) following the approach of Wright et al. (2019).
## S3 method for class 'PGOcc' residuals(object, n.post.samples = 100, ...)
## S3 method for class 'PGOcc' residuals(object, n.post.samples = 100, ...)
object |
object of class |
n.post.samples |
the number of posterior MCMC samples to calculate the residuals for. By default this is set to 100. If set to a value less than the total number of MCMC samples saved for the model, residuals will be calculated for a random subset of the total MCMC samples. Maximum value is the total number of MCMC samples saved. |
... |
currently no additional arguments |
A list comprised of:
occ.resids |
a matrix of occupancy residuals with first dimension equal to |
det.resids |
a three-dimensional array of detection residuals with first dimension equal to |
Jeffrey W. Doser [email protected]
Wright, W. J., Irvine, K. M., & Higgs, M. D. (2019). Identifying occupancy model inadequacies: can residuals separately assess detection and presence?. Ecology, 100(6), e02703.
spPGOcc
modelsMethod for calculating occupancy and detection residuals for single-species spatial occupancy models (spPGOcc
) following the approach of Wright et al. (2019).
## S3 method for class 'spPGOcc' residuals(object, n.post.samples = 100, ...)
## S3 method for class 'spPGOcc' residuals(object, n.post.samples = 100, ...)
object |
object of class |
n.post.samples |
the number of posterior MCMC samples to calculate the residuals for. By default this is set to 100. If set to a value less than the total number of MCMC samples saved for the model, residuals will be calculated for a random subset of the total MCMC samples. Maximum value is the total number of MCMC samples saved. |
... |
currently no additional arguments |
A list comprised of:
occ.resids |
a matrix of occupancy residuals with first dimension equal to |
det.resids |
a three-dimensional array of detection residuals with first dimension equal to |
Jeffrey W. Doser [email protected]
Wright, W. J., Irvine, K. M., & Higgs, M. D. (2019). Identifying occupancy model inadequacies: can residuals separately assess detection and presence?. Ecology, 100(6), e02703.
svcPGOcc
modelsMethod for calculating occupancy and detection residuals for single-species spatially varying coefficient occupancy models (svcPGOcc
) following the approach of Wright et al. (2019).
## S3 method for class 'svcPGOcc' residuals(object, n.post.samples = 100, ...)
## S3 method for class 'svcPGOcc' residuals(object, n.post.samples = 100, ...)
object |
object of class |
n.post.samples |
the number of posterior MCMC samples to calculate the residuals for. By default this is set to 100. If set to a value less than the total number of MCMC samples saved for the model, residuals will be calculated for a random subset of the total MCMC samples. Maximum value is the total number of MCMC samples saved. |
... |
currently no additional arguments |
A list comprised of:
occ.resids |
a matrix of occupancy residuals with first dimension equal to |
det.resids |
a three-dimensional array of detection residuals with first dimension equal to |
Jeffrey W. Doser [email protected]
Wright, W. J., Irvine, K. M., & Higgs, M. D. (2019). Identifying occupancy model inadequacies: can residuals separately assess detection and presence?. Ecology, 100(6), e02703.
The function sfJSDM
fits a spatially-explicit joint species distribution model. This model does not explicitly account for imperfect detection (see sfMsPGOcc()
). We use Polya-Gamma latent variables and a spatial factor modeling approach. Currently, models are implemented using a Nearest Neighbor Gaussian Process.
sfJSDM(formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', std.by.sp = FALSE, n.factors, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, monitors, keep.only.mean.95, shared.spatial = FALSE, ...)
sfJSDM(formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', std.by.sp = FALSE, n.factors, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, monitors, keep.only.mean.95, shared.spatial = FALSE, ...)
formula |
a symbolic description of the model to be fit for the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
std.by.sp |
a logical value indicating whether the covariates are standardized
separately for each species within the corresponding range for each species ( |
n.factors |
the number of factors to use in the spatial factor model approach. Typically, the number of factors is set to be small (e.g., 4-5) relative to the total number of species in the community, which will lead to substantial decreases in computation time. However, the value can be anywhere between 1 and N (the number of species in the community). |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Defaul is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. Note this is specified in terms of batches and not overall samples for spatial models. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
monitors |
a character vector used to indicate if only a subset of the model
model parameters are desired to be monitored. If posterior samples of all parameters
are desired, then don't specify the argument (this is the default). When working
with a large number of species and/or sites, the full model object can be quite
large, and so this argument can be used to only return samples of specific
parameters to help reduce the size of this resulting object. Valid tags include
|
keep.only.mean.95 |
not currently supported. |
shared.spatial |
a logical value used to specify whether a common spatial process
should be estimated for all species instead of the factor modeling approach. If true,
a spatial variance parameter |
... |
currently no additional arguments |
An object of class sfJSDM
that is a list comprised of:
beta.comm.samples |
a |
tau.sq.beta.samples |
a |
beta.samples |
a |
theta.samples |
a |
lambda.samples |
a |
psi.samples |
a three-dimensional array of posterior samples for the latent occurrence probability values for each species. |
w.samples |
a three-dimensional array of posterior samples for
the latent spatial random effects for each latent factor. Array
dimensions correspond to MCMC sample, latent factor, and site.
If |
sigma.sq.psi.samples |
a |
beta.star.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood value associated with each site and species. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
k.fold.deviance |
vector of scoring rules (deviance) from k-fold cross-validation.
A separate value is reported for each species.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection probability
estimated values are not included in the model object, but can be extracted
using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Finley, A. O., Datta, A., and Banerjee, S. (2020). spNNGP R package for nearest neighbor Gaussian process models. arXiv preprint arXiv:2001.09111.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological Monographs, 85(1), 3-28.
Christensen, W. F., and Amemiya, Y. (2002). Latent variable analysis of multivariate spatial data. Journal of the American Statistical Association, 97(457), 302-317.
J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6) # Detection alpha.mean <- c(0) tau.sq.alpha <- c(1) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } alpha.true <- alpha n.factors <- 3 phi <- rep(3 / .7, n.factors) sigma.sq <- rep(2, n.factors) nu <- rep(2, n.factors) dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, nu = nu, cov.model = 'matern', factor.model = TRUE, n.factors = n.factors) pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , drop = FALSE] coords <- as.matrix(dat$coords[-pred.indx, , drop = FALSE]) # Prediction covariates X.0 <- dat$X[pred.indx, , drop = FALSE] coords.0 <- as.matrix(dat$coords[pred.indx, , drop = FALSE]) # Detection covariates X.p <- dat$X.p[-pred.indx, , , drop = FALSE] y <- apply(y, c(1, 2), max, na.rm = TRUE) data.list <- list(y = y, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), nu.unif = list(0.5, 2.5)) # Starting values inits.list <- list(beta.comm = 0, beta = 0, fix = TRUE, tau.sq.beta = 1) # Tuning tuning.list <- list(phi = 1, nu = 0.25) batch.length <- 25 n.batch <- 5 n.report <- 100 formula <- ~ 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- sfJSDM(formula = formula, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "matern", tuning = tuning.list, n.factors = 3, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 0, n.thin = 1, n.chains = 2) summary(out)
J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6) # Detection alpha.mean <- c(0) tau.sq.alpha <- c(1) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } alpha.true <- alpha n.factors <- 3 phi <- rep(3 / .7, n.factors) sigma.sq <- rep(2, n.factors) nu <- rep(2, n.factors) dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, nu = nu, cov.model = 'matern', factor.model = TRUE, n.factors = n.factors) pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , drop = FALSE] coords <- as.matrix(dat$coords[-pred.indx, , drop = FALSE]) # Prediction covariates X.0 <- dat$X[pred.indx, , drop = FALSE] coords.0 <- as.matrix(dat$coords[pred.indx, , drop = FALSE]) # Detection covariates X.p <- dat$X.p[-pred.indx, , , drop = FALSE] y <- apply(y, c(1, 2), max, na.rm = TRUE) data.list <- list(y = y, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), nu.unif = list(0.5, 2.5)) # Starting values inits.list <- list(beta.comm = 0, beta = 0, fix = TRUE, tau.sq.beta = 1) # Tuning tuning.list <- list(phi = 1, nu = 0.25) batch.length <- 25 n.batch <- 5 n.report <- 100 formula <- ~ 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- sfJSDM(formula = formula, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "matern", tuning = tuning.list, n.factors = 3, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 0, n.thin = 1, n.chains = 2) summary(out)
The function sfMsPGOcc
fits multi-species spatial occupancy models with species correlations (i.e., a spatially-explicit joint species distribution model with imperfect detection). We use Polya-Gamma latent variables and a spatial factor modeling approach. Currently, models are implemented using a Nearest Neighbor Gaussian Process.
sfMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.factors, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
sfMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.factors, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). Only right-hand side of formula is specified. See example below. |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.factors |
the number of factors to use in the spatial factor model approach. Typically, the number of factors is set to be small (e.g., 4-5) relative to the total number of species in the community, which will lead to substantial decreases in computation time. However, the value can be anywhere between 1 and N (the number of species in the community). |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Defaul is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. Note this is specified in terms of batches and not overall samples for spatial models. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class sfMsPGOcc
that is a list comprised of:
beta.comm.samples |
a |
alpha.comm.samples |
a |
tau.sq.beta.samples |
a |
tau.sq.alpha.samples |
a |
beta.samples |
a |
alpha.samples |
a |
theta.samples |
a |
lambda.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occurrence values for each species. |
psi.samples |
a three-dimensional array of posterior samples for the latent occupancy probability values for each species. |
w.samples |
a three-dimensional array of posterior samples for the latent spatial random effects for each latent factor. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood value associated with each site and species. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
k.fold.deviance |
vector of scoring rules (deviance) from k-fold cross-validation.
A separate value is reported for each species.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can
be extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Finley, A. O., Datta, A., and Banerjee, S. (2020). spNNGP R package for nearest neighbor Gaussian process models. arXiv preprint arXiv:2001.09111.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological Monographs, 85(1), 3-28.
Christensen, W. F., and Amemiya, Y. (2002). Latent variable analysis of multivariate spatial data. Journal of the American Statistical Association, 97(457), 302-317.
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 8 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Random effects psi.RE <- list() # Include a non-spatial random effect on occurrence psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.5)) p.RE <- list() # Include a random effect on detection p.RE <- list(levels = c(40), sigma.sq.p = c(2)) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 4 phi <- runif(n.factors, 3/1, 3/.4) dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sp = TRUE, cov.model = 'exponential', factor.model = TRUE, n.factors = n.factors, psi.RE = psi.RE, p.RE = p.RE) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length y <- dat$y X <- dat$X X.p <- dat$X.p X.p.re <- dat$X.p.re X.re <- dat$X.re coords <- as.matrix(dat$coords) # Package all data into a list occ.covs <- cbind(X, X.re) colnames(occ.covs) <- c('int', 'occ.cov', 'occ.re') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3], det.re = X.p.re[, , 1]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1)) # Initial values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, lambda = lambda.inits, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- sfMsPGOcc(occ.formula = ~ occ.cov + (1 | occ.re), det.formula = ~ det.cov.1 + det.cov.2 + (1 | det.re), data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1, n.chains = 1) summary(out)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 8 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Random effects psi.RE <- list() # Include a non-spatial random effect on occurrence psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.5)) p.RE <- list() # Include a random effect on detection p.RE <- list(levels = c(40), sigma.sq.p = c(2)) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } n.factors <- 4 phi <- runif(n.factors, 3/1, 3/.4) dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sp = TRUE, cov.model = 'exponential', factor.model = TRUE, n.factors = n.factors, psi.RE = psi.RE, p.RE = p.RE) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length y <- dat$y X <- dat$X X.p <- dat$X.p X.p.re <- dat$X.p.re X.re <- dat$X.re coords <- as.matrix(dat$coords) # Package all data into a list occ.covs <- cbind(X, X.re) colnames(occ.covs) <- c('int', 'occ.cov', 'occ.re') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3], det.re = X.p.re[, , 1]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1)) # Initial values lambda.inits <- matrix(0, N, n.factors) diag(lambda.inits) <- 1 lambda.inits[lower.tri(lambda.inits)] <- rnorm(sum(lower.tri(lambda.inits))) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, lambda = lambda.inits, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- sfMsPGOcc(occ.formula = ~ occ.cov + (1 | occ.re), det.formula = ~ det.cov.1 + det.cov.2 + (1 | det.re), data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1, n.chains = 1) summary(out)
The function simBinom
simulates single-species binomial data for simulation studies, power assessments, or function testing. Data can be optionally simulated with a spatial Gaussian Process in the model. Non-spatial random intercepts can also be included in the model.
simBinom(J.x, J.y, weights, beta, psi.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, x.positive = FALSE, ...)
simBinom(J.x, J.y, weights, beta, psi.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, x.positive = FALSE, ...)
J.x |
a single numeric value indicating the number of sites to simulate data along the horizontal axis. Total number of sites with simulated data is |
J.y |
a single numeric value indicating the number of sites to simulate data along the vertical axis. Total number of sites with simulated data is |
weights |
a numeric vector of length |
beta |
a numeric vector containing the intercept and regression coefficient parameters for the model. |
psi.RE |
a list used to specify the non-spatial random intercepts included in the model. The list must have two tags: |
sp |
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: |
sigma.sq |
a numeric value indicating the spatial variance parameter. Ignored when |
phi |
a numeric value indicating the spatial decay parameter. Ignored when |
nu |
a numeric value indicating the spatial smoothness parameter. Only used when |
x.positive |
a logical value indicating whether the simulated covariates should be simulated as random standard normal covariates ( |
... |
currently no additional arguments |
A list comprised of:
X |
a |
coords |
a |
w |
a matrix of the spatial random effect values for each site. The number of columns is determined by the |
psi |
a |
y |
a length |
X.w |
a two dimensional matrix containing the covariate effects (including an intercept) whose effects are assumed to be spatially-varying. Rows correspond to sites and columns correspond to covariate effects. |
X.re |
a numeric matrix containing the levels of any unstructured random effect included in the model. Only relevant when random effects are specified in |
beta.star |
a numeric vector that contains the simulated random effects for each given level of the random effects included in the model. Only relevant when random effects are included in the model. |
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) J.x <- 10 J.y <- 10 weights <- rep(4, J.x * J.y) beta <- c(0.5, -0.15) svc.cols <- c(1, 2) phi <- c(3 / .6, 3 / 0.2) sigma.sq <- c(1.2, 0.9) psi.RE <- list(levels = 10, sigma.sq.psi = 1.2) dat <- simBinom(J.x = J.x, J.y = J.y, weights = weights, beta = beta, psi.RE = psi.RE, sp = TRUE, svc.cols = svc.cols, cov.model = 'spherical', sigma.sq = sigma.sq, phi = phi)
set.seed(400) J.x <- 10 J.y <- 10 weights <- rep(4, J.x * J.y) beta <- c(0.5, -0.15) svc.cols <- c(1, 2) phi <- c(3 / .6, 3 / 0.2) sigma.sq <- c(1.2, 0.9) psi.RE <- list(levels = 10, sigma.sq.psi = 1.2) dat <- simBinom(J.x = J.x, J.y = J.y, weights = weights, beta = beta, psi.RE = psi.RE, sp = TRUE, svc.cols = svc.cols, cov.model = 'spherical', sigma.sq = sigma.sq, phi = phi)
The function simIntMsOcc
simulates multi-species detection-nondetection data from multiple data sources for simulation studies, power assessments, or function testing of integrated occupancy models. Data can optionally be simulated with a spatial Gaussian Process on the occurrence process.
simIntMsOcc(n.data, J.x, J.y, J.obs, n.rep, n.rep.max, N, beta, alpha, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, factor.model = FALSE, n.factors, range.probs, ...)
simIntMsOcc(n.data, J.x, J.y, J.obs, n.rep, n.rep.max, N, beta, alpha, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, factor.model = FALSE, n.factors, range.probs, ...)
n.data |
an integer indicating the number of detection-nondetection data sources to simulate. |
J.x |
a single numeric value indicating the number of sites across the region of interest along the horizontal axis. Total number of sites across the simulated region of interest is |
J.y |
a single numeric value indicating the number of sites across the region of interest along the vertical axis. Total number of sites across the simulated region of interest is |
J.obs |
a numeric vector of length |
n.rep |
a list of length |
n.rep.max |
a vector of numeric values indicating the maximum number of replicate surveys for each data set. This is an optional argument, with its default value set to |
N |
a numeric vector of length |
beta |
a numeric matrix with |
alpha |
a list of length |
psi.RE |
a list used to specify the non-spatial random intercepts included in the occurrence portion of the model. The list must have two tags: |
p.RE |
this argument is not currently supported. In a later version, this argument will allow for simulating data with detection random effects in the different data sources. |
sp |
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: |
sigma.sq |
a numeric vector of length |
phi |
a numeric vector of length |
nu |
a numeric vector of length |
factor.model |
a logical value indicating whether to simulate data following a factor modeling approach that explicitly incoporates species correlations. If |
n.factors |
a single numeric value specifying the number of latent factors to use to simulate the data if |
range.probs |
a numeric vector of length |
... |
currently no additional arguments |
A list comprised of:
X.obs |
a numeric design matrix for the occurrence portion of the model. This matrix contains the intercept and regression coefficients for only the observed sites. |
X.pred |
a numeric design matrix for the occurrence portion of the model at sites where there are no observed data sources. |
X.p |
a list of design matrices for the detection portions of the integrated multi-species occupancy model. Each element in the list is a design matrix of detection covariates for each data source. |
coords.obs |
a numeric matrix of coordinates of each observed site. Required for spatial models. |
coords.pred |
a numeric matrix of coordinates of each site in the study region without any data sources. Only used for spatial models. |
w |
a species (or factor) x site matrix of the spatial random effects for each species. Only used to simulate data when |
w.pred |
a matrix of the spatial random random effects for each species (or factor) at locations without any observation. |
psi.obs |
a species x site matrix of the occurrence probabilities for each species at the observed sites. Note that values are provided for all species, even if some species are only monitored at a subset of these points. |
psi.pred |
a species x site matrix of the occurrence probabilities for sites without any observations. |
z.obs |
a species x site matrix of the latent occurrence states at each observed site. Note that values are provided for all species, even if some species are only monitored at a subset of these points. |
z.pred |
a species x site matrix of the latent occurrence states at each site without any observations. |
p |
a list of detection probability arrays for each of the |
y |
a list of arrays of the raw detection-nondetection data for each site and replicate combination for each species in the data set. Each array has dimensions corresponding to species, site, and replicate, respectively. |
Jeffrey W. Doser [email protected],
Doser, J. W., Leuenberger, W., Sillett, T. S., Hallworth, M. T. & Zipkin, E. F. (2022). Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources. Methods in Ecology and Evolution, 00, 1-14. doi:10.1111/2041-210X.13811
set.seed(91) J.x <- 10 J.y <- 10 # Total number of data sources across the study region J.all <- J.x * J.y # Number of data sources. n.data <- 2 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) n.rep <- list() n.rep[[1]] <- rep(3, J.obs[1]) n.rep[[2]] <- rep(4, J.obs[2]) # Number of species observed in each data source N <- c(8, 3) # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.4, 0.3) # Detection # Detection covariates alpha.mean <- list() tau.sq.alpha <- list() # Number of detection parameters in each data source p.det.long <- c(4, 3) for (i in 1:n.data) { alpha.mean[[i]] <- runif(p.det.long[i], -1, 1) tau.sq.alpha[[i]] <- runif(p.det.long[i], 0.1, 1) } # Random effects psi.RE <- list() p.RE <- list() beta <- matrix(NA, nrow = max(N), ncol = p.occ) for (i in 1:p.occ) { beta[, i] <- rnorm(max(N), beta.mean[i], sqrt(tau.sq.beta[i])) } alpha <- list() for (i in 1:n.data) { alpha[[i]] <- matrix(NA, nrow = N[i], ncol = p.det.long[i]) for (t in 1:p.det.long[i]) { alpha[[i]][, t] <- rnorm(N[i], alpha.mean[[i]][t], sqrt(tau.sq.alpha[[i]])[t]) } } sp <- FALSE factor.model <- FALSE # Simulate occupancy data dat <- simIntMsOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = sp, factor.model = factor.model, n.factors = n.factors) str(dat)
set.seed(91) J.x <- 10 J.y <- 10 # Total number of data sources across the study region J.all <- J.x * J.y # Number of data sources. n.data <- 2 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) n.rep <- list() n.rep[[1]] <- rep(3, J.obs[1]) n.rep[[2]] <- rep(4, J.obs[2]) # Number of species observed in each data source N <- c(8, 3) # Community-level covariate effects # Occurrence beta.mean <- c(0.2, 0.5) p.occ <- length(beta.mean) tau.sq.beta <- c(0.4, 0.3) # Detection # Detection covariates alpha.mean <- list() tau.sq.alpha <- list() # Number of detection parameters in each data source p.det.long <- c(4, 3) for (i in 1:n.data) { alpha.mean[[i]] <- runif(p.det.long[i], -1, 1) tau.sq.alpha[[i]] <- runif(p.det.long[i], 0.1, 1) } # Random effects psi.RE <- list() p.RE <- list() beta <- matrix(NA, nrow = max(N), ncol = p.occ) for (i in 1:p.occ) { beta[, i] <- rnorm(max(N), beta.mean[i], sqrt(tau.sq.beta[i])) } alpha <- list() for (i in 1:n.data) { alpha[[i]] <- matrix(NA, nrow = N[i], ncol = p.det.long[i]) for (t in 1:p.det.long[i]) { alpha[[i]][, t] <- rnorm(N[i], alpha.mean[[i]][t], sqrt(tau.sq.alpha[[i]])[t]) } } sp <- FALSE factor.model <- FALSE # Simulate occupancy data dat <- simIntMsOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = sp, factor.model = factor.model, n.factors = n.factors) str(dat)
The function simIntOcc
simulates single-species detection-nondetection data from multiple data sources for simulation studies, power assessments, or function testing of integrated occupancy models. Data can optionally be simulated with a spatial Gaussian Process on the occurrence process.
simIntOcc(n.data, J.x, J.y, J.obs, n.rep, n.rep.max, beta, alpha, psi.RE = list(), p.RE = list(), sp = FALSE, cov.model, sigma.sq, phi, nu, ...)
simIntOcc(n.data, J.x, J.y, J.obs, n.rep, n.rep.max, beta, alpha, psi.RE = list(), p.RE = list(), sp = FALSE, cov.model, sigma.sq, phi, nu, ...)
n.data |
an integer indicating the number of detection-nondetection data sources to simulate. |
J.x |
a single numeric value indicating the number of sites across the region of interest along the horizontal axis. Total number of sites across the simulated region of interest is |
J.y |
a single numeric value indicating the number of sites across the region of interest along the vertical axis. Total number of sites across the simulated region of interest is |
J.obs |
a numeric vector of length |
n.rep |
a list of length |
n.rep.max |
a vector of numeric values indicating the maximum number of replicate surveys for each data set. This is an optional argument, with its default value set to |
beta |
a numeric vector containing the intercept and regression coefficient parameters for the occurrence portion of the single-species occupancy model. |
alpha |
a list of length |
psi.RE |
a list used to specify the non-spatial random intercepts included in the occupancy portion of the model. The list must have two tags: |
p.RE |
a list used to specify the non-spatial random intercepts included in the detection portion of the model. The list must be a list of lists, where the individual lists contain the detection coefficients for each data set in the integrated model. Each of the lists must have two tags: |
sp |
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to |
cov.model |
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: |
sigma.sq |
a numeric value indicating the spatial variance parameter. Ignored when |
phi |
a numeric value indicating the spatial range parameter. Ignored when |
nu |
a numeric value indicating the spatial smoothness parameter. Only used when |
... |
currently no additional arguments |
A list comprised of:
X.obs |
a numeric design matrix for the occurrence portion of the model. This matrix contains the intercept and regression coefficients for only the observed sites. |
X.pred |
a numeric design matrix for the occurrence portion of the model at sites where there are no observed data sources. |
X.p |
a list of design matrices for the detection portions of the integrated occupancy model. Each element in the list is a design matrix of detection covariates for each data source. |
coords.obs |
a numeric matrix of coordinates of each observed site. Required for spatial models. |
coords.pred |
a numeric matrix of coordinates of each site in the study region without any data sources. Only used for spatial models. |
D.obs |
a distance matrix of observed sites. Only used for spatial models. |
D.pred |
a distance matrix of sites in the study region without any observed data. Only used for spatial models. |
w.obs |
a matrix of the spatial random effects at observed locations. Only used to simulate data when |
.
w.pred |
a matrix of the spatial random random effects at locations without any observation. |
psi.obs |
a matrix of the occurrence probabilities for each observed site. |
psi.pred |
a matrix of the occurrence probabilities for sites without any observations. |
z.obs |
a vector of the latent occurrence states at each observed site. |
z.pred |
a vector of the latent occurrence states at each site without any observations. |
p |
a list of detection probability matrices for each of the |
y |
a list of matrices of the raw detection-nondetection data for each site and replicate combination. |
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 1, -3) p.occ <- length(beta) # Detection covariates alpha <- list() for (i in 1:n.data) { alpha[[i]] <- runif(sample(1:4, 1), -1, 1) } p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) sigma.sq <- 2 phi <- 3 / .5 sp <- TRUE # Simulate occupancy data. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = TRUE, cov.model = 'gaussian', sigma.sq = sigma.sq, phi = phi)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 1, -3) p.occ <- length(beta) # Detection covariates alpha <- list() for (i in 1:n.data) { alpha[[i]] <- runif(sample(1:4, 1), -1, 1) } p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) sigma.sq <- 2 phi <- 3 / .5 sp <- TRUE # Simulate occupancy data. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = TRUE, cov.model = 'gaussian', sigma.sq = sigma.sq, phi = phi)
The function simMsOcc
simulates multi-species detection-nondetection data for simulation studies, power assessments, or function testing. Data can be optionally simulated with a spatial Gaussian Process in the occurrence portion of the model, as well as an option to allow for species correlations using a factor modeling approach. Non-spatial random intercepts can also be included in the detection or occurrence portions of the occupancy model.
simMsOcc(J.x, J.y, n.rep, n.rep.max, N, beta, alpha, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, factor.model = FALSE, n.factors, range.probs, shared.spatial = FALSE, grid, ...)
simMsOcc(J.x, J.y, n.rep, n.rep.max, N, beta, alpha, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, factor.model = FALSE, n.factors, range.probs, shared.spatial = FALSE, grid, ...)
J.x |
a single numeric value indicating the number of sites to simulate detection-nondetection data along the horizontal axis. Total number of sites with simulated data is |
J.y |
a single numeric value indicating the number of sites to simulate detection-nondetection data along the vertical axis. Total number of sites with simulated data is |
n.rep |
a numeric vector of length |
n.rep.max |
a single numeric value indicating the maximum number of replicate surveys. This is an optional argument, with its default value set to |
N |
a single numeric value indicating the number of species to simulate detection-nondetection data. |
beta |
a numeric matrix with |
alpha |
a numeric matrix with |
psi.RE |
a list used to specify the non-spatial random intercepts included in the occurrence portion of the model. The list must have two tags: |
p.RE |
a list used to specify the non-spatial random intercepts included in the detection portion of the model. The list must have two tags: |
sp |
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: |
sigma.sq |
a numeric vector of length |
phi |
a numeric vector of length |
nu |
a numeric vector of length |
factor.model |
a logical value indicating whether to simulate data following a factor modeling approach that explicitly incoporates species correlations. If |
n.factors |
a single numeric value specifying the number of latent factors to use to simulate the data if |
range.probs |
a numeric vector of length |
shared.spatial |
a logical value indicating used to specify whether a common spatial process should be estimated for all species instead of the factor modeling approach. |
grid |
an atomic vector used to specify the grid across which to simulate the latent spatial processes. This argument is used to simulate the underlying spatial processes at a different resolution than the coordinates (e.g., if coordinates are distributed across a grid). |
... |
currently no additional arguments |
A list comprised of:
X |
a |
X.p |
a three-dimensional numeric array with dimensions corresponding to sites, repeat visits, and number of detection regression coefficients. This is the design matrix used for the detection portion of the occupancy model. |
coords |
a |
w |
a |
psi |
a |
z |
a |
p |
a |
y |
a |
X.p.re |
a three-dimensional numeric array containing the levels of any detection random effect included in the model. Only relevant when detection random effects are specified in |
X.lambda.re |
a numeric matrix containing the levels of any occurrence random effect included in the model. Only relevant when occurrence random effects are specified in |
alpha.star |
a numeric matrix where each row contains the simulated detection random effects for each given level of the random effects included in the detection model. Only relevant when detection random effects are included in the model. |
beta.star |
a numeric matrix where each row contains the simulated occurrence random effects for each given level of the random effects included in the occurrence model. Only relevant when occurrence random effects are included in the model. |
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 10 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2) tau.sq.alpha <- c(0.2, 0.3) p.det <- length(alpha.mean) psi.RE <- list(levels = c(10), sigma.sq.psi = c(1.5)) p.RE <- list(levels = c(15), sigma.sq.p = 0.8) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } # Spatial parameters if desired phi <- runif(N, 3/1, 3/.1) sigma.sq <- runif(N, 0.3, 3) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, cov.model = 'exponential', phi = phi, sigma.sq = sigma.sq)
J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 10 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2) tau.sq.alpha <- c(0.2, 0.3) p.det <- length(alpha.mean) psi.RE <- list(levels = c(10), sigma.sq.psi = c(1.5)) p.RE <- list(levels = c(15), sigma.sq.p = 0.8) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } # Spatial parameters if desired phi <- runif(N, 3/1, 3/.1) sigma.sq <- runif(N, 0.3, 3) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, cov.model = 'exponential', phi = phi, sigma.sq = sigma.sq)
The function simOcc
simulates single-species occurrence data for simulation studies, power assessments, or function testing. Data can be optionally simulated with a spatial Gaussian Process in the occurrence portion of the model. Non-spatial random intercepts can also be included in the detection or occurrence portions of the occupancy model.
simOcc(J.x, J.y, n.rep, n.rep.max, beta, alpha, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, x.positive = FALSE, grid, ...)
simOcc(J.x, J.y, n.rep, n.rep.max, beta, alpha, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, x.positive = FALSE, grid, ...)
J.x |
a single numeric value indicating the number of sites to simulate detection-nondetection data along the horizontal axis. Total number of sites with simulated data is |
J.y |
a single numeric value indicating the number of sites to simulate detection-nondetection data along the vertical axis. Total number of sites with simulated data is |
n.rep |
a numeric vector of length |
n.rep.max |
a single numeric value indicating the maximum number of replicate surveys. This is an optional argument, with its default value set to |
beta |
a numeric vector containing the intercept and regression coefficient parameters for the occupancy portion of the single-species occupancy model. |
alpha |
a numeric vector containing the intercept and regression coefficient parameters for the detection portion of the single-species occupancy model. |
psi.RE |
a list used to specify the non-spatial random intercepts included in the occupancy portion of the model. The list must have two tags: |
p.RE |
a list used to specify the non-spatial random intercepts included in the detection portion of the model. The list must have two tags: |
sp |
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: |
sigma.sq |
a numeric value indicating the spatial variance parameter. Ignored when |
phi |
a numeric value indicating the spatial decay parameter. Ignored when |
nu |
a numeric value indicating the spatial smoothness parameter. Only used when |
x.positive |
a logical value indicating whether the simulated covariates should be simulated as random standard normal covariates ( |
grid |
an atomic vector used to specify the grid across which to simulate the latent spatial processes. This argument is used to simulate the underlying spatial processes at a different resolution than the coordinates (e.g., if coordinates are distributed across a grid). |
... |
currently no additional arguments |
A list comprised of:
X |
a |
X.p |
a three-dimensional numeric array with dimensions corresponding to sites, repeat visits, and number of detection regression coefficients. This is the design matrix used for the detection portion of the occupancy model. |
coords |
a |
w |
a matrix of the spatial random effect values for each site. The number of columns is determined by the |
psi |
a |
z |
a length |
p |
a |
y |
a |
X.p.re |
a three-dimensional numeric array containing the levels of any detection random effect included in the model. Only relevant when detection random effects are specified in |
X.re |
a numeric matrix containing the levels of any occurrence random effect included in the model. Only relevant when occurrence random effects are specified in |
alpha.star |
a numeric vector that contains the simulated detection random effects for each given level of the random effects included in the detection model. Only relevant when detection random effects are included in the model. |
beta.star |
a numeric vector that contains the simulated occurrence random effects for each given level of the random effects included in the occurrence model. Only relevant when occurrence random effects are included in the model. |
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(400) J.x <- 10 J.y <- 10 n.rep <- rep(4, J.x * J.y) beta <- c(0.5, -0.15) alpha <- c(0.7, 0.4) phi <- 3 / .6 sigma.sq <- 2 psi.RE <- list(levels = 10, sigma.sq.psi = 1.2) p.RE <- list(levels = 15, sigma.sq.p = 0.8) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, cov.model = 'spherical', sigma.sq = sigma.sq, phi = phi)
set.seed(400) J.x <- 10 J.y <- 10 n.rep <- rep(4, J.x * J.y) beta <- c(0.5, -0.15) alpha <- c(0.7, 0.4) phi <- 3 / .6 sigma.sq <- 2 psi.RE <- list(levels = 10, sigma.sq.psi = 1.2) p.RE <- list(levels = 15, sigma.sq.p = 0.8) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, cov.model = 'spherical', sigma.sq = sigma.sq, phi = phi)
The function simTBinom
simulates multi-season single-species binomial data for simulation studies, power assessments, or function testing. Data can be optionally simulated with a spatial Gaussian Process in the model. Non-spatial random intercepts can also be included in the model.
simTBinom(J.x, J.y, n.time, weights, beta, sp.only = 0, trend = TRUE, psi.RE = list(), sp = FALSE, cov.model, sigma.sq, phi, nu, svc.cols = 1, ar1 = FALSE, rho, sigma.sq.t, x.positive = FALSE, ...)
simTBinom(J.x, J.y, n.time, weights, beta, sp.only = 0, trend = TRUE, psi.RE = list(), sp = FALSE, cov.model, sigma.sq, phi, nu, svc.cols = 1, ar1 = FALSE, rho, sigma.sq.t, x.positive = FALSE, ...)
J.x |
a single numeric value indicating the number of sites to simulate data along the horizontal axis. Total number of sites with simulated data is |
J.y |
a single numeric value indicating the number of sites to simulate data along the vertical axis. Total number of sites with simulated data is |
n.time |
a single numeric value indicating the number of primary time periods (denoted T) over which sampling occurs. |
weights |
a numeric matrix with rows corresponding to sites and columns corresponding to primary time periods that indicates the number of Bernoulli trials at each of the site/time period combinations. |
beta |
a numeric vector containing the intercept and regression coefficient parameters for the model. |
sp.only |
a numeric vector specifying which occurrence covariates should only vary over space and not over time. The numbers in the vector correspond to the elements in the vector of regression coefficients ( |
trend |
a logical value. If |
psi.RE |
a list used to specify the non-spatial random intercepts included in the model. The list must have two tags: |
sp |
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: |
sigma.sq |
a numeric value indicating the spatial variance parameter. Ignored when |
phi |
a numeric value indicating the spatial decay parameter. Ignored when |
nu |
a numeric value indicating the spatial smoothness parameter. Only used when |
ar1 |
a logical value indicating whether to simulate a temporal random effect with an AR(1) process. By default, set to |
rho |
a numeric value indicating the AR(1) temporal correlation parameter. Ignored when |
sigma.sq.t |
a numeric value indicating the AR(1) temporal variance parameter. Ignored when |
x.positive |
a logical value indicating whether the simulated covariates should be simulated as random standard normal covariates ( |
... |
currently no additional arguments |
A list comprised of:
X |
a |
coords |
a |
w |
a matrix of the spatial random effect values for each site. The number of columns is determined by the |
psi |
a |
z |
a |
X.w |
a three dimensional array containing the covariate effects (including an intercept) whose effects are assumed to be spatially-varying. Dimensions correspond to sites, primary time periods, and covariate. |
X.re |
a numeric matrix containing the levels of any unstructured random effect included in the model. Only relevant when random effects are specified in |
beta.star |
a numeric vector that contains the simulated random effects for each given level of the random effects included in the model. Only relevant when random effects are included in the model. |
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
set.seed(1000) # Sites J.x <- 15 J.y <- 15 J <- J.x * J.y # Years sampled n.time <- sample(10, J, replace = TRUE) # Binomial weights weights <- matrix(NA, J, max(n.time)) for (j in 1:J) { weights[j, 1:n.time[j]] <- sample(5, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(-2, -0.5, -0.2, 0.75) p.occ <- length(beta) trend <- TRUE sp.only <- 0 psi.RE <- list() # Spatial parameters ------------------ sp <- TRUE svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3/1, 3/0.2) # Temporal parameters ----------------- ar1 <- TRUE rho <- 0.8 sigma.sq.t <- 1 dat <- simTBinom(J.x = J.x, J.y = J.y, n.time = n.time, weights = weights, beta = beta, psi.RE = psi.RE, sp.only = sp.only, trend = trend, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, rho = rho, sigma.sq.t = sigma.sq.t, ar1 = TRUE, x.positive = FALSE)
set.seed(1000) # Sites J.x <- 15 J.y <- 15 J <- J.x * J.y # Years sampled n.time <- sample(10, J, replace = TRUE) # Binomial weights weights <- matrix(NA, J, max(n.time)) for (j in 1:J) { weights[j, 1:n.time[j]] <- sample(5, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(-2, -0.5, -0.2, 0.75) p.occ <- length(beta) trend <- TRUE sp.only <- 0 psi.RE <- list() # Spatial parameters ------------------ sp <- TRUE svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3/1, 3/0.2) # Temporal parameters ----------------- ar1 <- TRUE rho <- 0.8 sigma.sq.t <- 1 dat <- simTBinom(J.x = J.x, J.y = J.y, n.time = n.time, weights = weights, beta = beta, psi.RE = psi.RE, sp.only = sp.only, trend = trend, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, rho = rho, sigma.sq.t = sigma.sq.t, ar1 = TRUE, x.positive = FALSE)
The function simTIntOcc
simulates single-species detection-nondetection data from multiple data sources over multiple seasons for simulation studies, power assessments, or function testing of integrated multi-season occupancy models. Data can optionally be simulated with a spatial Gaussian Process on the occurrence process. Non-spatial random intercepts can be included in the detection or occurrence portions of the model.
simTIntOcc(n.data, J.x, J.y, J.obs, n.time, data.seasons, n.rep, n.rep.max, beta, alpha, sp.only = 0, trend = TRUE, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, ar1 = FALSE, rho, sigma.sq.t, x.positive = FALSE, ...)
simTIntOcc(n.data, J.x, J.y, J.obs, n.time, data.seasons, n.rep, n.rep.max, beta, alpha, sp.only = 0, trend = TRUE, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, ar1 = FALSE, rho, sigma.sq.t, x.positive = FALSE, ...)
n.data |
an integer indicating the number of detection-nondetection data sources to simulate. |
J.x |
a single numeric value indicating the number of sites across the region of interest along the horizontal axis. Total number of sites across the simulated region of interest is |
J.y |
a single numeric value indicating the number of sites across the region of interest along the vertical axis. Total number of sites across the simulated region of interest is |
J.obs |
a numeric vector of length |
n.time |
a numeric vector of lencth |
data.seasons |
a list of length |
n.rep |
a list of length |
n.rep.max |
a vector of numeric values indicating the maximum number of replicate surveys for each data set. This is an optional argument, with its default value set to |
beta |
a numeric vector containing the intercept and regression coefficient parameters for the occupancy portion of the model. Note that if |
alpha |
a list of length |
sp.only |
a numeric vector specifying which occurrence covariates should only vary over space and not over time. The numbers in the vector correspond to the elements in the vector of regression coefficients ( |
trend |
a logical value. If |
psi.RE |
a list used to specify the non-spatial random intercepts included in the occupancy portion of the model. The list must have two tags: |
p.RE |
a list used to specify the non-spatial random intercepts included in the detection portion of the model. The list must be a list of lists, where the individual lists contain the detection coefficients for each data set in the integrated model. Each of the lists must have two tags: |
sp |
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: |
sigma.sq |
a numeric value indicating the spatial variance parameter. Ignored when |
phi |
a numeric value indicating the spatial range parameter. Ignored when |
nu |
a numeric value indicating the spatial smoothness parameter. Only used when |
ar1 |
a logical value indicating whether to simulate a temporal random effect with an AR(1) process. By default, set to |
rho |
a numeric value indicating the AR(1) temporal correlation parameter. Ignored when |
sigma.sq.t |
a numeric value indicating the AR(1) temporal variance parameter. Ignored when |
x.positive |
a logical value indicating whether the simulated covariates should be simulated as random standard normal covariates ( |
... |
currently no additional arguments |
A list comprised of:
X.obs |
a three-dimensional numeric array with dimensions corresponding to sites, primary time periods, and occurrence covariate containing the design matrix for the occurrence portion of the occupancy model. This matrix contains the intercept and regression coefficients for only the observed sites. |
X.pred |
a three-dimensional numeric array with dimensions corresponding to sites, primary time periods, and occurrence covariate containing the design matrix for the occurrence portion of the occupancy model. This matrix contains the intercept and regression coefficients for the sites in the study region where there are no observed data sources. |
X.pred |
a numeric design matrix for the occurrence portion of the model at sites where there are no observed data sources. |
X.p |
a list of design matrices for the detection portions of the integrated occupancy model. Each element in the list is a design matrix of detection covariates for each data source. Each design matrix is formatted as a four-dimensional array with dimensions corresponding to sites, primary time period, secondary time period, and covariate. |
coords.obs |
a numeric matrix of coordinates of each observed site. Required for spatial models. |
coords.pred |
a numeric matrix of coordinates of each site in the study region without any data sources. Only used for spatial models. |
w.obs |
a matrix of the spatial random effects at observed locations. Only used to simulate data when |
.
w.pred |
a matrix of the spatial random random effects at locations without any observation. |
psi.obs |
a matrix of the occurrence probabilities for each observed site and primary time period. |
psi.pred |
a matrix of the occurrence probabilities for sites without any observations. |
z.obs |
a matrix of the latent occurrence states at each observed site and primary time period. |
z.pred |
a matrix of the latent occurrence states at each site without any observations. |
p |
a list of detection probability arrays for each of the |
y |
a list of arrays of the raw detection-nondetection data for each site, primary time period, and replicate combination. |
X.p.re |
a list of four-dimensional numeric arrays containing the levels of any detection random effect included in the model for each data source. Only relevant when detection random effects are specified in |
X.re.obs |
a numeric array containing the levels of any occurrence random effect included in the model at the sites where there is at least one data source. Dimensions correspond to site, primary time period, and parameter. Only relevant when occurrence random effects are specified in |
X.re.pred |
a numeric array containing the levels of any occurrence random effect included in the model at the sites where there are no data sources sampled. Dimensions correspond to site, primary time period, and parameter. Only relevant when occurrence random effects are specified in |
alpha.star |
a list of numeric vectors that contains the simulated detection random effects for each given level of the random effects included in the detection model for each data set. Only relevant when detection random effects are included in the model. |
beta.star |
a numeric vector that contains the simulated occurrence random effects for each given level of the random effects included in the occurrence model. Only relevant when occurrence random effects are included in the model. |
eta |
a |
Jeffrey W. Doser [email protected]
# Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) # Random occupancy effects psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.6)) # Detection covariates alpha <- list() for (i in 1:n.data) { alpha[[i]] <- runif(3, 0, 1) } # Detection random effects p.RE <- list() p.RE[[1]] <- list(levels = c(35), sigma.sq.p = c(0.5)) p.RE[[2]] <- list(levels = c(20, 10), sigma.sq.p = c(0.7, 0.3)) p.RE[[3]] <- list(levels = c(20), sigma.sq.p = c(0.6)) p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial components sigma.sq <- 2 phi <- 3 / .5 nu <- 1 sp <- TRUE # Temporal parameters ar1 <- TRUE rho <- 0.9 sigma.sq.t <- 1.5 svc.cols <- c(1) n.rep.max <- sapply(n.rep, max, na.rm = TRUE) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, n.rep.max = n.rep.max, beta = beta, alpha = alpha, trend = TRUE, psi.RE = psi.RE, p.RE = p.RE, sp = sp, svc.cols = svc.cols, cov.model = 'exponential', sigma.sq = sigma.sq, phi = phi, nu = nu, ar1 = ar1, rho = rho, sigma.sq.t = sigma.sq.t)
# Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) # Random occupancy effects psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.6)) # Detection covariates alpha <- list() for (i in 1:n.data) { alpha[[i]] <- runif(3, 0, 1) } # Detection random effects p.RE <- list() p.RE[[1]] <- list(levels = c(35), sigma.sq.p = c(0.5)) p.RE[[2]] <- list(levels = c(20, 10), sigma.sq.p = c(0.7, 0.3)) p.RE[[3]] <- list(levels = c(20), sigma.sq.p = c(0.6)) p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial components sigma.sq <- 2 phi <- 3 / .5 nu <- 1 sp <- TRUE # Temporal parameters ar1 <- TRUE rho <- 0.9 sigma.sq.t <- 1.5 svc.cols <- c(1) n.rep.max <- sapply(n.rep, max, na.rm = TRUE) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, n.rep.max = n.rep.max, beta = beta, alpha = alpha, trend = TRUE, psi.RE = psi.RE, p.RE = p.RE, sp = sp, svc.cols = svc.cols, cov.model = 'exponential', sigma.sq = sigma.sq, phi = phi, nu = nu, ar1 = ar1, rho = rho, sigma.sq.t = sigma.sq.t)
The function simTMsOcc
simulates multi-species multi-season detection-nondetection data for simulation studies, power assessments, or function testing. Data can be optionally simulated with a spatial Gaussian Process in the occurrence portion of the model, as well as an option to allow for species correlations using a factor modeling approach. Non-spatial random intercepts can also be included in the detection or occurrence portions of the occupancy model.
simTMsOcc(J.x, J.y, n.time, n.rep, N, beta, alpha, sp.only = 0, trend = TRUE, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, ar1 = FALSE, rho, sigma.sq.t, factor.model = FALSE, n.factors, range.probs, grid, ...)
simTMsOcc(J.x, J.y, n.time, n.rep, N, beta, alpha, sp.only = 0, trend = TRUE, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, ar1 = FALSE, rho, sigma.sq.t, factor.model = FALSE, n.factors, range.probs, grid, ...)
J.x |
a single numeric value indicating the number of sites to simulate detection-nondetection data along the horizontal axis. Total number of sites with simulated data is |
J.y |
a single numeric value indicating the number of sites to simulate detection-nondetection data along the vertical axis. Total number of sites with simulated data is |
n.time |
a single numeric value indicating the number of primary time periods (denoted T) over which sampling occurs. |
n.rep |
a numeric matrix indicating the number of replicates at each site during each primary time period. The matrix must have |
N |
a single numeric value indicating the number of species to simulate detection-nondetection data. |
beta |
a numeric matrix with |
alpha |
a numeric matrix with |
sp.only |
a numeric vector specifying which occurrence covariates should only vary over space and not over time. The numbers in the vector correspond to the elements in the vector of regression coefficients ( |
trend |
a logical value. If |
psi.RE |
a list used to specify the non-spatial random intercepts included in the occurrence portion of the model. The list must have two tags: |
p.RE |
a list used to specify the non-spatial random intercepts included in the detection portion of the model. The list must have two tags: |
sp |
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: |
sigma.sq |
a numeric vector of length |
phi |
a numeric vector of length |
nu |
a numeric vector of length |
ar1 |
a logical value indicating whether to simulate a temporal random effect with an AR(1) process. By default, set to |
rho |
a vector of |
sigma.sq.t |
a vector of |
factor.model |
a logical value indicating whether to simulate data following a factor modeling approach that explicitly incoporates species correlations. If |
n.factors |
a single numeric value specifying the number of latent factors to use to simulate the data if |
range.probs |
a numeric vector of length |
grid |
an atomic vector used to specify the grid across which to simulate the latent spatial processes. This argument is used to simulate the underlying spatial processes at a different resolution than the coordinates (e.g., if coordinates are distributed across a grid). |
... |
currently no additional arguments |
A list comprised of:
X |
a |
X.p |
a four-dimensional numeric array with dimensions corresponding to sites, primary time periods, repeat visits, and number of detection regression coefficients. This is the design matrix used for the detection portion of the occupancy model. |
coords |
a |
w |
a |
psi |
a |
z |
a |
p |
a |
y |
a |
X.p.re |
a four-dimensional numeric array containing the levels of any detection random effect included in the model. Only relevant when detection random effects are specified in |
X.re |
a numeric matrix containing the levels of any occurrence random effect included in the model. Only relevant when occurrence random effects are specified in |
alpha.star |
a numeric matrix where each row contains the simulated detection random effects for each given level of the random effects included in the detection model. Only relevant when detection random effects are included in the model. |
beta.star |
a numeric matrix where each row contains the simulated occurrence random effects for each given level of the random effects included in the occurrence model. Only relevant when occurrence random effects are included in the model. |
eta |
a numeric matrix with each row corresponding to species and column corresponding to time period of the AR(1) temporal random effects. |
Jeffrey W. Doser [email protected],
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) # n.rep[j, 1:n.time[j]] <- rep(4, n.time[j]) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1, 2) p.svc <- length(svc.cols) n.factors <- 3 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model) str(dat)
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) # n.rep[j, 1:n.time[j]] <- rep(4, n.time[j]) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1, 2) p.svc <- length(svc.cols) n.factors <- 3 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model) str(dat)
The function simTOcc
simulates multi-season single-species occurrence data for simulation studies, power assessments, or function testing. Data can be optionally simulated with a spatial Gaussian Process in the occurrence portion of the model. Non-spatial random intercepts can also be included in the detection or occurrence portions of the occupancy model.
simTOcc(J.x, J.y, n.time, n.rep, n.rep.max, beta, alpha, sp.only = 0, trend = TRUE, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, ar1 = FALSE, rho, sigma.sq.t, x.positive = FALSE, mis.spec.type = 'none', scale.param = 1, avail, grid, ...)
simTOcc(J.x, J.y, n.time, n.rep, n.rep.max, beta, alpha, sp.only = 0, trend = TRUE, psi.RE = list(), p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu, ar1 = FALSE, rho, sigma.sq.t, x.positive = FALSE, mis.spec.type = 'none', scale.param = 1, avail, grid, ...)
J.x |
a single numeric value indicating the number of sites to simulate detection-nondetection data along the horizontal axis. Total number of sites with simulated data is |
J.y |
a single numeric value indicating the number of sites to simulate detection-nondetection data along the vertical axis. Total number of sites with simulated data is |
n.time |
a single numeric value indicating the number of primary time periods (denoted T) over which sampling occurs. |
n.rep |
a numeric matrix indicating the number of replicates at each site during each primary time period. The matrix must have |
n.rep.max |
a single numeric value indicating the maximum number of replicate surveys. This is an optional argument, with its default value set to |
beta |
a numeric vector containing the intercept and regression coefficient parameters for the occupancy portion of the single-species occupancy model. Note that if |
alpha |
a numeric vector containing the intercept and regression coefficient parameters for the detection portion of the single-species occupancy model. |
sp.only |
a numeric vector specifying which occurrence covariates should only vary over space and not over time. The numbers in the vector correspond to the elements in the vector of regression coefficients ( |
trend |
a logical value. If |
psi.RE |
a list used to specify the unstructured random intercepts included in the occupancy portion of the model. The list must have two tags: |
p.RE |
a list used to specify the unstructured random intercepts included in the detection portion of the model. The list must have two tags: |
sp |
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: |
sigma.sq |
a numeric value indicating the spatial variance parameter. Ignored when |
phi |
a numeric value indicating the spatial decay parameter. Ignored when |
nu |
a numeric value indicating the spatial smoothness parameter. Only used when |
ar1 |
a logical value indicating whether to simulate a temporal random effect with an AR(1) process. By default, set to |
rho |
a numeric value indicating the AR(1) temporal correlation parameter. Ignored when |
sigma.sq.t |
a numeric value indicating the AR(1) temporal variance parameter. Ignored when |
x.positive |
a logical value indicating whether the simulated covariates should be simulated as random standard normal covariates ( |
mis.spec.type |
a quoted keyword indicating the type of model mis-specification to use when simulating the data. These correspond to model mis-specification of the functional relationship between occupancy/detection probability and covariates. Valid keywords are: |
scale.param |
a positive number between 0 and 1 that indicates the scale parameter for the occupancy portion of the model when |
avail |
a site x primary time period x visit array indicating the availability probability of the species during each survey simulated at the given site/primary time period/visit combination. This can be used to assess impacts of non-constant availability across replicate surveys in simulation studies. Values should fall between 0 and 1. When not specified, availability is set to 1 for all surveys. |
grid |
an atomic vector used to specify the grid across which to simulate the latent spatial processes. This argument is used to simulate the underlying spatial processes at a different resolution than the coordinates (e.g., if coordinates are distributed across a grid). |
... |
currently no additional arguments |
A list comprised of:
X |
a |
X.p |
a four-dimensional numeric array with dimensions corresponding to sites, primary time periods, repeat visits, and number of detection regression coefficients. This is the design matrix used for the detection portion of the occupancy model. |
coords |
a |
w |
a |
psi |
a |
z |
a |
p |
a |
y |
a |
X.p.re |
a four-dimensional numeric array containing the levels of any detection random effect included in the model. Only relevant when detection random effects are specified in |
X.re |
a numeric matrix containing the levels of any occurrence random effect included in the model. Only relevant when occurrence random effects are specified in |
alpha.star |
a numeric vector that contains the simulated detection random effects for each given level of the random effects included in the detection model. Only relevant when detection random effects are included in the model. |
beta.star |
a numeric vector that contains the simulated occurrence random effects for each given level of the random effects included in the occurrence model. Only relevant when occurrence random effects are included in the model. |
eta |
a |
Jeffrey W. Doser [email protected],
Stoudt, S., P. de Valpine, and W. Fithian. Non-parametric identifiability in species distribution and abundance models: why it matters and how to diagnose a lack of fit using simulation. Journal of Statistical Theory and Practice 17, 39 (2023). https://doi.org/10.1007/s42519-023-00336-5.
J.x <- 10 J.y <- 10 J <- J.x * J.y # Number of time periods sampled n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- # Fixed beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list(levels = c(10), sigma.sq.psi = c(1)) # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list(levels = c(10), sigma.sq.p = c(0.5)) # Spatial parameters ------------------ sp <- TRUE cov.model <- "exponential" sigma.sq <- 2 phi <- 3 / .4 nu <- 1 # Temporal parameters ----------------- ar1 <- TRUE rho <- 0.5 sigma.sq.t <- 0.8 # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = sp, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, ar1 = ar1, rho = rho, sigma.sq.t = sigma.sq.t) str(dat)
J.x <- 10 J.y <- 10 J <- J.x * J.y # Number of time periods sampled n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- # Fixed beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list(levels = c(10), sigma.sq.psi = c(1)) # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list(levels = c(10), sigma.sq.p = c(0.5)) # Spatial parameters ------------------ sp <- TRUE cov.model <- "exponential" sigma.sq <- 2 phi <- 3 / .4 nu <- 1 # Temporal parameters ----------------- ar1 <- TRUE rho <- 0.5 sigma.sq.t <- 0.8 # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = sp, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, ar1 = ar1, rho = rho, sigma.sq.t = sigma.sq.t) str(dat)
The function spIntPGOcc
fits single-species integrated spatial occupancy models using Polya-Gamma latent variables. Models can be fit using either a full Gaussian process or a Nearest Neighbor Gaussian Process for large data sets. Data integration is done using a joint likelihood framework, assuming distinct detection models for each data source that are each conditional on a single latent occupancy process.
spIntPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = "exponential", NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.data, k.fold.only = FALSE, ...)
spIntPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = "exponential", NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.data, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a list of symbolic descriptions of the models to be fit for the detection portion of the model using R's model syntax for each data set. Each element in the list is a formula for the detection model of a given data set. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches to run for each chain for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. Note this is specified in terms of batches and not overall samples for spatial models. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.data |
an integer specifying the specific data set to hold out values from. If not specified, data from all data set locations will be incorporated into the k-fold cross-validation. |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class spIntPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a |
psi.samples |
a |
theta.samples |
a |
w.samples |
a |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
scoring rule (deviance) from k-fold cross-validation. A
separate deviance value is returned for each data source. Only included if
|
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can be
extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Finley, A. O., Datta, A., and Banerjee, S. (2020). spNNGP R package for nearest neighbor Gaussian process models. arXiv preprint arXiv:2001.09111.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological Monographs, 85(1), 3-28.
Hooten, M. B., and Hefley, T. J. (2019). Bringing Bayesian models to life. CRC Press.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
set.seed(400) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 8 J.y <- 8 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 0.5) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- runif(2, 0, 1) alpha[[2]] <- runif(3, 0, 1) alpha[[3]] <- runif(2, -1, 1) alpha[[4]] <- runif(4, -1, 1) p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) sigma.sq <- 2 phi <- 3 / .5 sp <- TRUE # Simulate occupancy data from multiple data sources. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = sp, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential') y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites X.0 <- dat$X.pred psi.0 <- dat$psi.pred coords <- as.matrix(dat$coords.obs) coords.0 <- as.matrix(dat$coords.pred) # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2], det.cov.2.2 = X.p[[2]][, , 3]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , 2]) det.covs[[4]] <- list(det.cov.4.1 = X.p[[4]][, , 2], det.cov.4.2 = X.p[[4]][, , 3], det.cov.4.3 = X.p[[4]][, , 4]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, coords = coords) J <- length(dat$z.obs) # Initial values inits.list <- list(alpha = list(0, 0, 0, 0), beta = 0, phi = 3 / .5, sigma.sq = 2, w = rep(0, J), z = rep(1, J)) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = list(0, 0, 0, 0), var = list(2.72, 2.72, 2.72, 2.72)), phi.unif = c(3/1, 3/.1), sigma.sq.ig = c(2, 2)) # Tuning tuning.list <- list(phi = 0.3) # Number of batches n.batch <- 2 # Batch length batch.length <- 25 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spIntPGOcc(occ.formula = ~ occ.cov, det.formula = list(f.1 = ~ det.cov.1.1, f.2 = ~ det.cov.2.1 + det.cov.2.2, f.3 = ~ det.cov.3.1, f.4 = ~ det.cov.4.1 + det.cov.4.2 + det.cov.4.3), data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = FALSE, n.report = 10, n.burn = 10, n.thin = 1) summary(out)
set.seed(400) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 8 J.y <- 8 J.all <- J.x * J.y # Number of data sources. n.data <- 4 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE) # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- sample(1:4, size = J.obs[i], replace = TRUE) } # Occupancy covariates beta <- c(0.5, 0.5) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- runif(2, 0, 1) alpha[[2]] <- runif(3, 0, 1) alpha[[3]] <- runif(2, -1, 1) alpha[[4]] <- runif(4, -1, 1) p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) sigma.sq <- 2 phi <- 3 / .5 sp <- TRUE # Simulate occupancy data from multiple data sources. dat <- simIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.rep = n.rep, beta = beta, alpha = alpha, sp = sp, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential') y <- dat$y X <- dat$X.obs X.p <- dat$X.p sites <- dat$sites X.0 <- dat$X.pred psi.0 <- dat$psi.pred coords <- as.matrix(dat$coords.obs) coords.0 <- as.matrix(dat$coords.pred) # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , 2]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , 2], det.cov.2.2 = X.p[[2]][, , 3]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , 2]) det.covs[[4]] <- list(det.cov.4.1 = X.p[[4]][, , 2], det.cov.4.2 = X.p[[4]][, , 3], det.cov.4.3 = X.p[[4]][, , 4]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, coords = coords) J <- length(dat$z.obs) # Initial values inits.list <- list(alpha = list(0, 0, 0, 0), beta = 0, phi = 3 / .5, sigma.sq = 2, w = rep(0, J), z = rep(1, J)) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = list(0, 0, 0, 0), var = list(2.72, 2.72, 2.72, 2.72)), phi.unif = c(3/1, 3/.1), sigma.sq.ig = c(2, 2)) # Tuning tuning.list <- list(phi = 0.3) # Number of batches n.batch <- 2 # Batch length batch.length <- 25 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spIntPGOcc(occ.formula = ~ occ.cov, det.formula = list(f.1 = ~ det.cov.1.1, f.2 = ~ det.cov.2.1 + det.cov.2.2, f.3 = ~ det.cov.3.1, f.4 = ~ det.cov.4.1 + det.cov.4.2 + det.cov.4.3), data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = FALSE, n.report = 10, n.burn = 10, n.thin = 1) summary(out)
The function spMsPGOcc
fits multi-species spatial occupancy models using Polya-Gamma latent variables. Models can be fit using either a full Gaussian process or a Nearest Neighbor Gaussian Process for large data sets.
spMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
spMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Defaul is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. Note this is specified in terms of batches and not overall samples for spatial models. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class spMsPGOcc
that is a list comprised of:
beta.comm.samples |
a |
alpha.comm.samples |
a |
tau.sq.beta.samples |
a |
tau.sq.alpha.samples |
a |
beta.samples |
a |
alpha.samples |
a |
theta.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occurrence values for each species. |
psi.samples |
a three-dimensional array of posterior samples for the latent occupancy probability values for each species. |
w.samples |
a three-dimensional array of posterior samples for the latent spatial random effects for each species. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
alpha.star.samples |
a |
beta.star.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood value associated with each site and species. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
k.fold.deviance |
vector of scoring rules (deviance) from k-fold cross-validation.
A separate value is reported for each species.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection probability
estimated values are not included in the model object, but can be extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Finley, A. O., Datta, A., and Banerjee, S. (2020). spNNGP R package for nearest neighbor Gaussian process models. arXiv preprint arXiv:2001.09111.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological Monographs, 85(1), 3-28.
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 5 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } phi <- runif(N, 3/1, 3/.4) sigma.sq <- runif(N, 0.3, 3) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sigma.sq = sigma.sq, sp = TRUE, cov.model = 'exponential') # Number of batches n.batch <- 30 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length y <- dat$y X <- dat$X X.p <- dat$X.p coords <- as.matrix(dat$coords) # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1), sigma.sq.ig = list(a = 2, b = 2)) # Initial values inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, sigma.sq = 2, w = matrix(0, nrow = N, ncol = nrow(X)), z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 500, n.thin = 1, n.chains = 1) summary(out, level = 'both')
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 7 J.y <- 7 J <- J.x * J.y n.rep <- sample(2:4, size = J, replace = TRUE) N <- 5 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.15) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 0.3) # Detection alpha.mean <- c(0.5, 0.2, -.2) tau.sq.alpha <- c(0.2, 0.3, 0.8) p.det <- length(alpha.mean) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } phi <- runif(N, 3/1, 3/.4) sigma.sq <- runif(N, 0.3, 3) sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, phi = phi, sigma.sq = sigma.sq, sp = TRUE, cov.model = 'exponential') # Number of batches n.batch <- 30 # Batch length batch.length <- 25 n.samples <- n.batch * batch.length y <- dat$y X <- dat$X X.p <- dat$X.p coords <- as.matrix(dat$coords) # Package all data into a list occ.covs <- X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3/1, b = 3/.1), sigma.sq.ig = list(a = 2, b = 2)) # Initial values inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, sigma.sq = 2, w = matrix(0, nrow = N, ncol = nrow(X)), z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spMsPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 500, n.thin = 1, n.chains = 1) summary(out, level = 'both')
The function spPGOcc
fits single-species spatial occupancy models using Polya-Gamma latent variables. Models can be fit using either a full Gaussian process or a Nearest Neighbor Gaussian Process for large data sets.
spPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = "exponential", NNGP = TRUE, n.neighbors = 15, search.type = "cb", n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
spPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = "exponential", NNGP = TRUE, n.neighbors = 15, search.type = "cb", n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within-chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of MCMC chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class spPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a |
psi.samples |
a |
theta.samples |
a |
w.samples |
a |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
soring rule (deviance) from k-fold cross-validation.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability values are not included in the model object, but can be
extracted using fitted()
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Finley, A. O., Datta, A., and Banerjee, S. (2020). spNNGP R package for nearest neighbor Gaussian process models. arXiv preprint arXiv:2001.09111.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological Monographs, 85(1), 3-28.
Hooten, M. B., and Hefley, T. J. (2019). Bringing Bayesian models to life. CRC Press.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
set.seed(350) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, -0.15) p.occ <- length(beta) alpha <- c(0.7, 0.4, -0.2) p.det <- length(alpha) phi <- 3 / .6 sigma.sq <- 2 dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential') y <- dat$y X <- dat$X X.p <- dat$X.p coords <- as.matrix(dat$coords) # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = c(2, 2), phi.unif = c(3/1, 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 2, w = rep(0, nrow(X)), z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, priors = prior.list, cov.model = "exponential", tuning = tuning.list, NNGP = FALSE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.chains = 1) summary(out)
set.seed(350) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, -0.15) p.occ <- length(beta) alpha <- c(0.7, 0.4, -0.2) p.det <- length(alpha) phi <- 3 / .6 sigma.sq <- 2 dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential') y <- dat$y X <- dat$X X.p <- dat$X.p coords <- as.matrix(dat$coords) # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = c(2, 2), phi.unif = c(3/1, 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 2, w = rep(0, nrow(X)), z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- spPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, priors = prior.list, cov.model = "exponential", tuning = tuning.list, NNGP = FALSE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.chains = 1) summary(out)
Function for fitting single-species multi-season spatial integrated occupancy models using Polya-Gamma latent variables. Data integration is done using a joint likelihood framework, assuming distinct detection models for each data source that are each conditional on a single latent occurrence process. Models are fit using Nearest Neighbor Gaussian Processes.
stIntPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
stIntPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a list of symbolic descriptions of the models to be fit for the detection portion of the model using R's model syntax for each data set. Each element in the list is a formula for the detection model of a given data set. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If |
n.report |
the interval to report MCMC progress. Note this is specified in terms of batches, not MCMC samples. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
... |
currently no additional arguments |
An object of class stIntPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occupancy values, with dimensions corresponding to posterior sample, site, and primary time period. Note this object will contain predicted occupancy values for sites/primary time periods that were not sampled. |
psi.samples |
a three-dimensional array of posterior samples for the latent occupancy probability values, with dimensions corresponding to posterior sample, site, and primary time period. Note this object will contain predicted occupancy probabilities for sites/primary time periods that were not sampled. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
theta.samples |
a |
w.samples |
a |
eta.samples |
a |
p.samples |
a list of four-dimensional arrays consisting of the posterior samples of detection probability for each data source. For each data source, the dimensions of the four-dimensional array correspond to MCMC sample, site, season, and replicate within season. |
like.samples |
a two-dimensional array of posterior samples for the likelihood values associated with each site and primary time period, for each individual data source. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
The return object will include additional objects used for subsequent prediction and/or model fit evaluation.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.6)) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial parameters sigma.sq <- 0.9 phi <- 3 / .5 # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential') y <- dat$y X <- dat$X.obs X.re <- dat$X.re.obs X.p <- dat$X.p sites <- dat$sites coords <- dat$coords.obs # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3], occ.factor.1 = X.re[, , 1]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons, coords = coords) # Testing occ.formula <- ~ trend + occ.cov.1 + (1 | occ.factor.1) # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- stIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, NNGP = TRUE, n.neighbors = 15, cov.model = 'exponential', n.batch = 3, batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) summary(out)
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.6)) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial parameters sigma.sq <- 0.9 phi <- 3 / .5 # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential') y <- dat$y X <- dat$X.obs X.re <- dat$X.re.obs X.p <- dat$X.p sites <- dat$sites coords <- dat$coords.obs # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3], occ.factor.1 = X.re[, , 1]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons, coords = coords) # Testing occ.formula <- ~ trend + occ.cov.1 + (1 | occ.factor.1) # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- stIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, NNGP = TRUE, n.neighbors = 15, cov.model = 'exponential', n.batch = 3, batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) summary(out)
The function stMsPGOcc
fits multi-species multi-season spatial occupancy models with species correlations (i.e., a spatially-explicit joint species distribution model with imperfect detection). We use Polya-Gamma latent variables and a spatial factor modeling approach. Models are implemented using a Nearest Neighbor Gaussian Process.
stMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.factors, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
stMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.factors, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). Only right-hand side of formula is specified. See example below. |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.factors |
the number of factors to use in the spatial factor model approach. Typically, the number of factors is set to be small (e.g., 4-5) relative to the total number of species in the community, which will lead to substantial decreases in computation time. However, the value can be anywhere between 1 and N (the number of species in the community). |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Defaul is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. Note this is specified in terms of batches and not overall samples for spatial models. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
... |
currently no additional arguments |
An object of class stMsPGOcc
that is a list comprised of:
beta.comm.samples |
a |
alpha.comm.samples |
a |
tau.sq.beta.samples |
a |
tau.sq.alpha.samples |
a |
beta.samples |
a |
alpha.samples |
a |
theta.samples |
a |
lambda.samples |
a |
z.samples |
a four-dimensional array of posterior samples for the latent occurrence values for each species. Dimensions corresopnd to MCMC sample, species, site, and primary time period. |
psi.samples |
a four-dimensional array of posterior samples for the latent occupancy probability values for each species. Dimensions correspond to MCMC sample, species, site, and primary time period. |
w.samples |
a three-dimensional array of posterior samples for the latent spatial random effects for each spatial factor. Dimensions correspond to MCMC sample, factor, and site. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a four-dimensional array of posterior samples for the likelihood value used for calculating WAIC. Dimensions correspond to MCMC sample, species, site, and time period. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can
be extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Finley, A. O., Datta, A., and Banerjee, S. (2020). spNNGP R package for nearest neighbor Gaussian process models. arXiv preprint arXiv:2001.09111.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological Monographs, 85(1), 3-28.
Christensen, W. F., and Amemiya, Y. (2002). Latent variable analysis of multivariate spatial data. Journal of the American Statistical Association, 97(457), 302-317.
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1) p.svc <- length(svc.cols) n.factors <- 3 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' ar1 <- TRUE sigma.sq.t <- runif(N, 0.05, 1) rho <- runif(N, 0.1, 1) dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model, ar1 = ar1, sigma.sq.t = sigma.sq.t, rho = rho) y <- dat$y X <- dat$X X.p <- dat$X.p coords <- dat$coords X.re <- dat$X.re X.p.re <- dat$X.p.re occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), rho.unif = list(a = -1, b = 1), sigma.sq.t.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / .9, b = 3 / .1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, rho = 0.5, sigma.sq.t = 0.5, phi = 3 / .5, z = z.init) # Tuning tuning.list <- list(phi = 1, rho = 0.5) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- stMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, ar1 = TRUE, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, cov.model = 'exponential', priors = prior.list, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1) p.svc <- length(svc.cols) n.factors <- 3 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' ar1 <- TRUE sigma.sq.t <- runif(N, 0.05, 1) rho <- runif(N, 0.1, 1) dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model, ar1 = ar1, sigma.sq.t = sigma.sq.t, rho = rho) y <- dat$y X <- dat$X X.p <- dat$X.p coords <- dat$coords X.re <- dat$X.re X.p.re <- dat$X.p.re occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), rho.unif = list(a = -1, b = 1), sigma.sq.t.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / .9, b = 3 / .1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, rho = 0.5, sigma.sq.t = 0.5, phi = 3 / .5, z = z.init) # Tuning tuning.list <- list(phi = 1, rho = 0.5) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- stMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, ar1 = TRUE, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, cov.model = 'exponential', priors = prior.list, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
Function for fitting multi-season single-species spatial occupancy models using Polya-Gamma latent variables.
stPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
stPGOcc(occ.formula, det.formula, data, inits, priors, tuning, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If |
n.report |
the interval to report MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class stPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occupancy values, with dimensions corresponding to posterior sample, site, and primary time period. |
psi.samples |
a three-dimensional array of posterior samples for the latent occupancy probability values, with dimensions corresponding to posterior sample, site, and primary time period. |
theta.samples |
a |
w.samples |
a |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
eta.samples |
a |
.
like.samples |
a three-dimensional array of posterior samples for the likelihood values associated with each site and primary time period. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
scoring rule (deviance) from k-fold cross-validation.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can be
extracted using fitted()
. Note that if k.fold.only = TRUE
, the
return list object will only contain run.time
and k.fold.deviance
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Kery, M., & Royle, J. A. (2021). Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 2: Dynamic and advanced models. Academic Press. Section 4.6.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological monographs, 85(1), 3-28.
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
set.seed(500) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Spatial ----------------------------- sp <- TRUE cov.model <- "exponential" sigma.sq <- 2 phi <- 3 / .4 # Temporal ---------------------------- rho <- 0.5 sigma.sq.t <- 1 # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = cov.model, ar1 = TRUE, sigma.sq.t = sigma.sq.t, rho = rho) # Package all data into a list # Occurrence occ.covs <- list(int = dat$X[, , 1], trend = dat$X[, , 2], occ.cov.1 = dat$X[, , 3]) # Detection det.covs <- list(det.cov.1 = dat$X.p[, , , 2], det.cov.2 = dat$X.p[, , , 3]) # Data list bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs, coords = dat$coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = c(2, 2), phi.unif = c(3 / 1, 3 / 0.1), rho.unif = c(-1, 1), sigma.sq.t.ig = c(2, 1)) # Initial values z.init <- apply(dat$y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init, phi = 3 / .5, sigma.sq = 2, w = rep(0, J), rho = 0, sigma.sq.t = 0.5) # Tuning tuning.list <- list(phi = 1, rho = 1) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- stPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, priors = prior.list, cov.model = "exponential", tuning = tuning.list, NNGP = TRUE, ar1 = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.chains = 1) summary(out)
set.seed(500) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Spatial ----------------------------- sp <- TRUE cov.model <- "exponential" sigma.sq <- 2 phi <- 3 / .4 # Temporal ---------------------------- rho <- 0.5 sigma.sq.t <- 1 # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = cov.model, ar1 = TRUE, sigma.sq.t = sigma.sq.t, rho = rho) # Package all data into a list # Occurrence occ.covs <- list(int = dat$X[, , 1], trend = dat$X[, , 2], occ.cov.1 = dat$X[, , 3]) # Detection det.covs <- list(det.cov.1 = dat$X.p[, , , 2], det.cov.2 = dat$X.p[, , , 3]) # Data list bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs, coords = dat$coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = c(2, 2), phi.unif = c(3 / 1, 3 / 0.1), rho.unif = c(-1, 1), sigma.sq.t.ig = c(2, 1)) # Initial values z.init <- apply(dat$y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init, phi = 3 / .5, sigma.sq = 2, w = rep(0, J), rho = 0, sigma.sq.t = 0.5) # Tuning tuning.list <- list(phi = 1, rho = 1) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- stPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, priors = prior.list, cov.model = "exponential", tuning = tuning.list, NNGP = TRUE, ar1 = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.chains = 1) summary(out)
Methods for extracting information from fitted integrated multi-species occupancy (intMsPGOcc
) models.
## S3 method for class 'intMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'intMsPGOcc' print(x, ...) ## S3 method for class 'intMsPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'intMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'intMsPGOcc' print(x, ...) ## S3 method for class 'intMsPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class intMsPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a intMsPGOcc
object.
Methods for extracting information from fitted single species integrated occupancy (intPGOcc
) model.
## S3 method for class 'intPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'intPGOcc' print(x, ...) ## S3 method for class 'intPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'intPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'intPGOcc' print(x, ...) ## S3 method for class 'intPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class intPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a intPGOcc
object.
Methods for extracting information from a fitted latent factor joint species distribution model (lfJSDM
).
## S3 method for class 'lfJSDM' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'lfJSDM' print(x, ...) ## S3 method for class 'lfJSDM' plot(x, param, density = TRUE, ...)
## S3 method for class 'lfJSDM' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'lfJSDM' print(x, ...) ## S3 method for class 'lfJSDM' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class lfJSDM
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a lfJSDM
object.
Methods for extracting information from a fitted latent factor multi-species occupancy model (lfMsPGOcc
).
## S3 method for class 'lfMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'lfMsPGOcc' print(x, ...) ## S3 method for class 'lfMsPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'lfMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'lfMsPGOcc' print(x, ...) ## S3 method for class 'lfMsPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class lfMsPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a lfMsPGOcc
object.
Methods for extracting information from fitted multi-species occupancy (msPGOcc
) model.
## S3 method for class 'msPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'msPGOcc' print(x, ...) ## S3 method for class 'msPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'msPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'msPGOcc' print(x, ...) ## S3 method for class 'msPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class msPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a msPGOcc
object.
Methods for extracting information from fitted single-species occupancy (PGOcc
) model.
## S3 method for class 'PGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'PGOcc' print(x, ...) ## S3 method for class 'PGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'PGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'PGOcc' print(x, ...) ## S3 method for class 'PGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class PGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a PGOcc
object.
Methods for extracting information from fitted posthoc linear models (postHocLM
).
## S3 method for class 'postHocLM' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'postHocLM' print(x, ...)
## S3 method for class 'postHocLM' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'postHocLM' print(x, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class postHocLM
, including methods to the generic functions print
and summary
.
No return value, called to display summary information of a postHocLM
object.
Methods for extracting information from posterior predictive check objects of class ppcOcc
.
## S3 method for class 'ppcOcc' summary(object, level, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'ppcOcc' summary(object, level, digits = max(3L, getOption("digits") - 3L), ...)
object |
object of class |
level |
a quoted keyword for multi-species models that indicates
the level to summarize the posterior predictive check. Valid key words
are: |
digits |
number of digits to report. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted posterior predictive
check objects of class ppcOcc
, including methods to the generic function
summary
.
No return value, called to display summary information of a ppcOcc
object.
Methods for extracting information from fitted spatial factor joint species distribution models (sfJSDM
).
## S3 method for class 'sfJSDM' summary(object, level, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'sfJSDM' print(x, ...) ## S3 method for class 'sfJSDM' plot(x, param, density = TRUE, ...)
## S3 method for class 'sfJSDM' summary(object, level, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'sfJSDM' print(x, ...) ## S3 method for class 'sfJSDM' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class sfJSDM
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a sfJSDM
object.
Methods for extracting information from fitted spatial factor multi-species occupancy model.
## S3 method for class 'sfMsPGOcc' summary(object, level, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'sfMsPGOcc' print(x, ...) ## S3 method for class 'sfMsPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'sfMsPGOcc' summary(object, level, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'sfMsPGOcc' print(x, ...) ## S3 method for class 'sfMsPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class sfMsPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a sfMsPGOcc
object.
Methods for extracting information from fitted single-species spatial integrated occupancy (spIntPGOcc
) model.
## S3 method for class 'spIntPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'spIntPGOcc' print(x, ...) ## S3 method for class 'spIntPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'spIntPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'spIntPGOcc' print(x, ...) ## S3 method for class 'spIntPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class spIntPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a spIntPGOcc
object.
Methods for extracting information from fitted multi-species spatial occupancy (spMsPGOcc
) model.
## S3 method for class 'spMsPGOcc' summary(object, level, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'spMsPGOcc' print(x, ...) ## S3 method for class 'spMsPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'spMsPGOcc' summary(object, level, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'spMsPGOcc' print(x, ...) ## S3 method for class 'spMsPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class spMsPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a spMsPGOcc
object.
Methods for extracting information from fitted single-species spatial
occupancy (spPGOcc
) model.
## S3 method for class 'spPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'spPGOcc' print(x, ...) ## S3 method for class 'spPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'spPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'spPGOcc' print(x, ...) ## S3 method for class 'spPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class spPGOcc
, including methods to the generic functions
print
, summary
, and plot
.
No return value, called to display summary information of a spPGOcc
object.
Methods for extracting information from fitted multi-season single-species spatial integrated occupancy (stIntPGOcc
) model.
## S3 method for class 'stIntPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'stIntPGOcc' print(x, ...) ## S3 method for class 'stIntPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'stIntPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'stIntPGOcc' print(x, ...) ## S3 method for class 'stIntPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class stIntPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a stIntPGOcc
object.
Methods for extracting information from fitted multi-species, multi-season spatial occupancy (stMsPGOcc
) model.
## S3 method for class 'stMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'stMsPGOcc' print(x, ...) ## S3 method for class 'stMsPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'stMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'stMsPGOcc' print(x, ...) ## S3 method for class 'stMsPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class stMsPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a stMsPGOcc
object.
Methods for extracting information from fitted multi-season single-species spatial occupancy (stPGOcc
) model.
## S3 method for class 'stPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'stPGOcc' print(x, ...) ## S3 method for class 'stPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'stPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'stPGOcc' print(x, ...) ## S3 method for class 'stPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class stPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a stPGOcc
object.
Methods for extracting information from fitted multi-species spatially-varying coefficient occupancy model.
## S3 method for class 'svcMsPGOcc' summary(object, level, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcMsPGOcc' print(x, ...) ## S3 method for class 'svcMsPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'svcMsPGOcc' summary(object, level, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcMsPGOcc' print(x, ...) ## S3 method for class 'svcMsPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class svcMsPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a svcMsPGOcc
object.
Methods for extracting information from fitted single-species spatially-varying
coefficient binomial model (svcPGBinom
).
## S3 method for class 'svcPGBinom' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcPGBinom' print(x, ...) ## S3 method for class 'svcPGBinom' plot(x, param, density = TRUE, ...)
## S3 method for class 'svcPGBinom' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcPGBinom' print(x, ...) ## S3 method for class 'svcPGBinom' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class svcPGBinom
, including methods to the generic functions
print
, summary
, and plot
.
No return value, called to display summary information of a svcPGBinom
object.
Methods for extracting information from fitted single-species
spatially-varying coefficient occupancy (svcPGOcc
) model.
## S3 method for class 'svcPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcPGOcc' print(x, ...) ## S3 method for class 'svcPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'svcPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcPGOcc' print(x, ...) ## S3 method for class 'svcPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class svcPGOcc
, including methods to the generic functions
print
, summary
, and plot
.
No return value, called to display summary information of a svcPGOcc
object.
Methods for extracting information from fitted multi-season single-species spatially-varying coefficient integrated occupancy (svcTIntPGOcc
) model.
## S3 method for class 'svcTIntPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcTIntPGOcc' print(x, ...) ## S3 method for class 'svcTIntPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'svcTIntPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcTIntPGOcc' print(x, ...) ## S3 method for class 'svcTIntPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class svcTIntPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a svcTIntPGOcc
object.
Methods for extracting information from fitted multi-species, multi-season spatially-varying coefficient occupancy (svcTMsPGOcc
) model.
## S3 method for class 'svcTMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcTMsPGOcc' print(x, ...) ## S3 method for class 'svcTMsPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'svcTMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcTMsPGOcc' print(x, ...) ## S3 method for class 'svcTMsPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class svcTMsPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a svcTMsPGOcc
object.
Methods for extracting information from fitted multi-season single-species
spatially-varying coefficient binomial model (svcTPGBinom
).
## S3 method for class 'svcTPGBinom' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcTPGBinom' print(x, ...) ## S3 method for class 'svcTPGBinom' plot(x, param, density = TRUE, ...)
## S3 method for class 'svcTPGBinom' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcTPGBinom' print(x, ...) ## S3 method for class 'svcTPGBinom' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class svcTPGBinom
, including methods to the generic functions
print
, summary
, plot
.
No return value, called to display summary information of a svcTPGBinom
object.
Methods for extracting information from fitted multi-season single-species spatially-varying coefficient occupancy (svcTPGOcc
) model.
## S3 method for class 'svcTPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcTPGOcc' print(x, ...) ## S3 method for class 'svcTPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'svcTPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svcTPGOcc' print(x, ...) ## S3 method for class 'svcTPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class svcTPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a svcTPGOcc
object.
Methods for extracting information from fitted multi-season single-species integrated occupancy (tIntPGOcc
) model.
## S3 method for class 'tIntPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'tIntPGOcc' print(x, ...) ## S3 method for class 'tIntPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'tIntPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'tIntPGOcc' print(x, ...) ## S3 method for class 'tIntPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class tIntPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a tIntPGOcc
object.
Methods for extracting information from fitted multi-species, multi-season occupancy (tMsPGOcc
) model.
## S3 method for class 'tMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'tMsPGOcc' print(x, ...) ## S3 method for class 'tMsPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'tMsPGOcc' summary(object, level = 'both', quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'tMsPGOcc' print(x, ...) ## S3 method for class 'tMsPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
level |
a quoted keyword that indicates the level to summarize the
model results. Valid key words are: |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class tMsPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a tMsPGOcc
object.
Methods for extracting information from fitted multi-season single-species occupancy (tPGOcc
) model.
## S3 method for class 'tPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'tPGOcc' print(x, ...) ## S3 method for class 'tPGOcc' plot(x, param, density = TRUE, ...)
## S3 method for class 'tPGOcc' summary(object, quantiles = c(0.025, 0.5, 0.975), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'tPGOcc' print(x, ...) ## S3 method for class 'tPGOcc' plot(x, param, density = TRUE, ...)
object , x
|
object of class |
quantiles |
for |
digits |
for |
param |
parameter name for which to generate a traceplot. Valid names are
|
density |
logical value indicating whether to also generate a density plot for each parameter in addition to the MCMC traceplot. |
... |
currently no additional arguments |
A set of standard extractor functions for fitted model objects of
class tPGOcc
, including methods to the generic functions print
, summary
, and plot
.
No return value, called to display summary information of a tPGOcc
object.
The function svcMsPGOcc
fits multi-species spatially-varying coefficient occupancy models with species correlations (i.e., a spatially-explicit joint species distribution model with imperfect detection). We use Polya-Gamma latent variables and a spatial factor modeling approach. Models are implemented using a Nearest Neighbor Gaussian Process.
svcMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', std.by.sp = FALSE, n.factors, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
svcMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', std.by.sp = FALSE, n.factors, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). Only right-hand side of formula is specified. See example below. |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
std.by.sp |
a logical value indicating whether the covariates are standardized
separately for each species within the corresponding range for each species ( |
n.factors |
the number of factors to use in the spatial factor model approach. Note this corresponds to the number of factors used for each spatially-varying coefficient that is estimated in the model. Typically, the number of factors is set to be small (e.g., 4-5) relative to the total number of species in the community, which will lead to substantial decreases in computation time. However, the value can be anywhere between 1 and N (the number of species in the community). |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Defaul is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. Note this is specified in terms of batches and not overall samples for spatial models. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
... |
currently no additional arguments |
An object of class svcMsPGOcc
that is a list comprised of:
beta.comm.samples |
a |
alpha.comm.samples |
a |
tau.sq.beta.samples |
a |
tau.sq.alpha.samples |
a |
beta.samples |
a |
alpha.samples |
a |
theta.samples |
a |
lambda.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occurrence values for each species. |
psi.samples |
a three-dimensional array of posterior samples for the latent occupancy probability values for each species. |
w.samples |
a four-dimensional array of posterior samples for the latent spatial random effects for each spatial factor within each spatially-varying coefficient. Dimensions correspond to MCMC sample, factor, site, and spatially-varying coefficient. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood value associated with each site and species. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can
be extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed, A. S., & Zipkin, E. F. (2024A). Modeling complex species-environment relationships through spatially-varying coefficient occupancy models. Journal of Agricultural, Biological and Environmental Statistics. doi:10.1007/s13253-023-00595-6.
Doser, J. W., Kery, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, 33(4), e13814. doi:10.1111/geb.13814.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- sample(5, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.2, 0.3, -0.1, 0.4) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 0.4, 0.5, 0.3) # Detection alpha.mean <- c(0, 1.2, -0.5) tau.sq.alpha <- c(1, 0.5, 1.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list(levels = 15, sigma.sq.psi = 0.7) p.RE <- list(levels = 20, sigma.sq.p = 0.5) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } # Number of spatial factors for each SVC n.factors <- 2 # The intercept and first two covariates have spatially-varying effects svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) q.p.svc <- n.factors * p.svc # Spatial decay parameters phi <- runif(q.p.svc, 3 / 0.9, 3 / 0.1) # A length N vector indicating the proportion of simulated locations # that are within the range for a given species. range.probs <- runif(N, 0.4, 1) factor.model <- TRUE cov.model <- 'spherical' sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, phi = phi, sp = sp, svc.cols = svc.cols, cov.model = cov.model, n.factors = n.factors, factor.model = factor.model, range.probs = range.probs) y <- dat$y X <- dat$X X.re <- dat$X.re X.p <- dat$X.p X.p.re <- dat$X.p.re coords <- dat$coords range.ind <- dat$range.ind # Prep data for spOccupancy ----------------------------------------------- # Occurrence covariates occ.covs <- cbind(X, X.re) colnames(occ.covs) <- c('int', 'occ.cov.1', 'occ.cov.2', 'occ.cov.3', 'occ.cov.4', 'occ.factor.1') # Detection covariates det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3], det.factor.1 = X.p.re[, , 1]) # Data list data.list <- list(y = y, coords = coords, occ.covs = occ.covs, det.covs = det.covs, range.ind = range.ind) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / 1, b = 3 / .1)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 2 # Batch length batch.length <- 25 n.burn <- 0 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2 + occ.cov.3 + occ.cov.4 + (1 | occ.factor.1), det.formula = ~ det.cov.1 + det.cov.2 + (1 | det.factor.1), data = data.list, inits = inits.list, n.batch = n.batch, n.factors = n.factors, batch.length = batch.length, std.by.sp = TRUE, accept.rate = 0.43, priors = prior.list, svc.cols = svc.cols, cov.model = "spherical", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 10 J.y <- 10 J <- J.x * J.y n.rep <- sample(5, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2, -0.2, 0.3, -0.1, 0.4) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 0.4, 0.5, 0.3) # Detection alpha.mean <- c(0, 1.2, -0.5) tau.sq.alpha <- c(1, 0.5, 1.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list(levels = 15, sigma.sq.psi = 0.7) p.RE <- list(levels = 20, sigma.sq.p = 0.5) # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } # Number of spatial factors for each SVC n.factors <- 2 # The intercept and first two covariates have spatially-varying effects svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) q.p.svc <- n.factors * p.svc # Spatial decay parameters phi <- runif(q.p.svc, 3 / 0.9, 3 / 0.1) # A length N vector indicating the proportion of simulated locations # that are within the range for a given species. range.probs <- runif(N, 0.4, 1) factor.model <- TRUE cov.model <- 'spherical' sp <- TRUE dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, phi = phi, sp = sp, svc.cols = svc.cols, cov.model = cov.model, n.factors = n.factors, factor.model = factor.model, range.probs = range.probs) y <- dat$y X <- dat$X X.re <- dat$X.re X.p <- dat$X.p X.p.re <- dat$X.p.re coords <- dat$coords range.ind <- dat$range.ind # Prep data for spOccupancy ----------------------------------------------- # Occurrence covariates occ.covs <- cbind(X, X.re) colnames(occ.covs) <- c('int', 'occ.cov.1', 'occ.cov.2', 'occ.cov.3', 'occ.cov.4', 'occ.factor.1') # Detection covariates det.covs <- list(det.cov.1 = X.p[, , 2], det.cov.2 = X.p[, , 3], det.factor.1 = X.p.re[, , 1]) # Data list data.list <- list(y = y, coords = coords, occ.covs = occ.covs, det.covs = det.covs, range.ind = range.ind) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / 1, b = 3 / .1)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = apply(y, c(1, 2), max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 2 # Batch length batch.length <- 25 n.burn <- 0 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2 + occ.cov.3 + occ.cov.4 + (1 | occ.factor.1), det.formula = ~ det.cov.1 + det.cov.2 + (1 | det.factor.1), data = data.list, inits = inits.list, n.batch = n.batch, n.factors = n.factors, batch.length = batch.length, std.by.sp = TRUE, accept.rate = 0.43, priors = prior.list, svc.cols = svc.cols, cov.model = "spherical", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
The function svcPGBinom
fits single-species spatially-varying coefficient binomial models using Polya-Gamma latent variables. Models are fit using Nearest Neighbor Gaussian Processes.
svcPGBinom(formula, data, inits, priors, tuning, svc.cols = 1, cov.model = "exponential", NNGP = TRUE, n.neighbors = 15, search.type = "cb", n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
svcPGBinom(formula, data, inits, priors, tuning, svc.cols = 1, cov.model = "exponential", NNGP = TRUE, n.neighbors = 15, search.type = "cb", n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
formula |
a symbolic description of the model to be fit using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of MCMC chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class svcPGBinom
that is a list comprised of:
beta.samples |
a |
y.rep.samples |
a |
psi.samples |
a |
theta.samples |
a |
w.samples |
a three-dimensional array of posterior samples for the latent spatial random effects for all spatially-varying coefficients. Dimensions correspond to MCMC sample, coefficient, and sites. |
sigma.sq.psi.samples |
a |
beta.star.samples |
a |
like.samples |
a |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
soring rule (deviance) from k-fold cross-validation.
Only included if |
The return object will include additional objects used for subsequent prediction and/or model fit evaluation.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed, A. S., & Zipkin, E. F. (2024A). Modeling complex species-environment relationships through spatially-varying coefficient occupancy models. Journal of Agricultural, Biological and Environmental Statistics. doi:10.1007/s13253-023-00595-6.
Doser, J. W., Kery, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, 33(4), e13814. doi:10.1111/geb.13814.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
set.seed(1000) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Binomial weights weights <- sample(10, J, replace = TRUE) beta <- c(0, 0.5, -0.2, 0.75) p <- length(beta) # No unstructured random effects psi.RE <- list() # Spatial parameters sp <- TRUE # Two spatially-varying covariates. svc.cols <- c(1, 2) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.4, 1.5) phi <- runif(p.svc, 3/1, 3/0.2) # Simulate the data dat <- simBinom(J.x = J.x, J.y = J.y, weights = weights, beta = beta, psi.RE = psi.RE, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi) # Binomial data y <- dat$y # Covariates X <- dat$X # Spatial coordinates coords <- dat$coords # Package all data into a list # Covariates covs <- cbind(X) colnames(covs) <- c('int', 'cov.1', 'cov.2', 'cov.3') # Data list bundle data.list <- list(y = y, covs = covs, coords = coords, weights = weights) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3 / 1, b = 3 / 0.1)) # Starting values inits.list <- list(beta = 0, alpha = 0, sigma.sq = 1, phi = phi) # Tuning tuning.list <- list(phi = 1) n.batch <- 10 batch.length <- 25 n.burn <- 100 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcPGBinom(formula = ~ cov.1 + cov.2 + cov.3, svc.cols = c(1, 2), data = data.list, n.batch = n.batch, batch.length = batch.length, inits = inits.list, priors = prior.list, accept.rate = 0.43, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.report = 2, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
set.seed(1000) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Binomial weights weights <- sample(10, J, replace = TRUE) beta <- c(0, 0.5, -0.2, 0.75) p <- length(beta) # No unstructured random effects psi.RE <- list() # Spatial parameters sp <- TRUE # Two spatially-varying covariates. svc.cols <- c(1, 2) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.4, 1.5) phi <- runif(p.svc, 3/1, 3/0.2) # Simulate the data dat <- simBinom(J.x = J.x, J.y = J.y, weights = weights, beta = beta, psi.RE = psi.RE, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi) # Binomial data y <- dat$y # Covariates X <- dat$X # Spatial coordinates coords <- dat$coords # Package all data into a list # Covariates covs <- cbind(X) colnames(covs) <- c('int', 'cov.1', 'cov.2', 'cov.3') # Data list bundle data.list <- list(y = y, covs = covs, coords = coords, weights = weights) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3 / 1, b = 3 / 0.1)) # Starting values inits.list <- list(beta = 0, alpha = 0, sigma.sq = 1, phi = phi) # Tuning tuning.list <- list(phi = 1) n.batch <- 10 batch.length <- 25 n.burn <- 100 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcPGBinom(formula = ~ cov.1 + cov.2 + cov.3, svc.cols = c(1, 2), data = data.list, n.batch = n.batch, batch.length = batch.length, inits = inits.list, priors = prior.list, accept.rate = 0.43, cov.model = "exponential", tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.report = 2, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
The function svcPGOcc
fits single-species spatially-varying coefficient occupancy models using Polya-Gamma latent variables. Models are fit using Nearest Neighbor Gaussian Processes.
svcPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = "exponential", NNGP = TRUE, n.neighbors = 15, search.type = "cb", n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
svcPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = "exponential", NNGP = TRUE, n.neighbors = 15, search.type = "cb", n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of MCMC chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class svcPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a |
psi.samples |
a |
theta.samples |
a |
w.samples |
a three-dimensional array of posterior samples for the latent spatial random effects for all spatially-varying coefficients. Dimensions correspond to MCMC sample, coefficient, and sites. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
soring rule (deviance) from k-fold cross-validation.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability values are not included in the model object, but can be
extracted using fitted()
.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed, A. S., & Zipkin, E. F. (2024A). Modeling complex species-environment relationships through spatially-varying coefficient occupancy models. Journal of Agricultural, Biological and Environmental Statistics. doi:10.1007/s13253-023-00595-6.
Doser, J. W., Kery, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, 33(4), e13814. doi:10.1111/geb.13814.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) phi <- c(3 / .6, 3 / .8) sigma.sq <- c(1.2, 0.7) svc.cols <- c(1, 2) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential', svc.cols = svc.cols) # Detection-nondetection data y <- dat$y # Occupancy covariates X <- dat$X # Detection covarites X.p <- dat$X.p # Spatial coordinates coords <- dat$coords # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3/1, b = 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 2, w = matrix(0, nrow = length(svc.cols), ncol = nrow(X)), z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = 'exponential', svc.cols = c(1, 2), tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1) summary(out)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, 2) p.occ <- length(beta) alpha <- c(0, 1) p.det <- length(alpha) phi <- c(3 / .6, 3 / .8) sigma.sq <- c(1.2, 0.7) svc.cols <- c(1, 2) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sigma.sq = sigma.sq, phi = phi, sp = TRUE, cov.model = 'exponential', svc.cols = svc.cols) # Detection-nondetection data y <- dat$y # Occupancy covariates X <- dat$X # Detection covarites X.p <- dat$X.p # Spatial coordinates coords <- dat$coords # Package all data into a list occ.covs <- X[, -1, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov.1 = X.p[, , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Number of batches n.batch <- 10 # Batch length batch.length <- 25 n.iter <- n.batch * batch.length # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3/1, b = 3/.1)) # Initial values inits.list <- list(alpha = 0, beta = 0, phi = 3 / .5, sigma.sq = 2, w = matrix(0, nrow = length(svc.cols), ncol = nrow(X)), z = apply(y, 1, max, na.rm = TRUE)) # Tuning tuning.list <- list(phi = 1) # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcPGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov.1, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = 'exponential', svc.cols = c(1, 2), tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 50, n.thin = 1) summary(out)
Function for fitting single-species multi-season spatially-varying coefficient integrated occupancy models using Polya-Gamma latent variables. Data integration is done using a joint likelihood framework, assuming distinct detection models for each data source that are each conditional on a single latent occurrence process. Models are fit using Nearest Neighbor Gaussian Processes.
svcTIntPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
svcTIntPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a list of symbolic descriptions of the models to be fit for the detection portion of the model using R's model syntax for each data set. Each element in the list is a formula for the detection model of a given data set. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If |
n.report |
the interval to report MCMC progress. Note this is specified in terms of batches, not MCMC samples. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run. |
... |
currently no additional arguments |
An object of class svcTIntPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occupancy values, with dimensions corresponding to posterior sample, site, and primary time period. Note this object will contain predicted occupancy values for sites/primary time periods that were not sampled. |
psi.samples |
a three-dimensional array of posterior samples for the latent occupancy probability values, with dimensions corresponding to posterior sample, site, and primary time period. Note this object will contain predicted occupancy probabilities for sites/primary time periods that were not sampled. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
theta.samples |
a |
w.samples |
a three-dimensional array of posterior samples for the latent spatial random effects for all spatially-varying coefficients. Dimensions correspond to MCMC sample, coefficient, and sites. |
eta.samples |
a |
p.samples |
a list of four-dimensional arrays consisting of the posterior samples of detection probability for each data source. For each data source, the dimensions of the four-dimensional array correspond to MCMC sample, site, season, and replicate within season. |
like.samples |
a two-dimensional array of posterior samples for the likelihood values associated with each site and primary time period, for each individual data source. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
The return object will include additional objects used for subsequent prediction and/or model fit evaluation.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed, A. S., & Zipkin, E. F. (2024). Modeling complex species-environment relationships through spatially-varying coefficient occupancy models. Journal of Agricultural, Biological and Environmental Statistics. doi:10.1007/s13253-023-00595-6.
Doser, J. W., Kery, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, 33(4), e13814. doi:10.1111/geb.13814.
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.6)) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial parameters svc.cols <- c(1, 2) sigma.sq <- c(0.9, 0.5) phi <- c(3 / .5, 3 / .8) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential', svc.cols = svc.cols) y <- dat$y X <- dat$X.obs X.re <- dat$X.re.obs X.p <- dat$X.p sites <- dat$sites coords <- dat$coords.obs # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3], occ.factor.1 = X.re[, , 1]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons, coords = coords) # Testing occ.formula <- ~ trend + occ.cov.1 + (1 | occ.factor.1) # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- svcTIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, NNGP = TRUE, n.neighbors = 15, cov.model = 'exponential', n.batch = 3, svc.cols = c(1, 2), batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) summary(out)
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.6)) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Spatial parameters svc.cols <- c(1, 2) sigma.sq <- c(0.9, 0.5) phi <- c(3 / .5, 3 / .8) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, cov.model = 'exponential', svc.cols = svc.cols) y <- dat$y X <- dat$X.obs X.re <- dat$X.re.obs X.p <- dat$X.p sites <- dat$sites coords <- dat$coords.obs # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3], occ.factor.1 = X.re[, , 1]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons, coords = coords) # Testing occ.formula <- ~ trend + occ.cov.1 + (1 | occ.factor.1) # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- svcTIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, NNGP = TRUE, n.neighbors = 15, cov.model = 'exponential', n.batch = 3, svc.cols = c(1, 2), batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) summary(out)
The function svcTMsPGOcc
fits multi-species multi-season spatially-varying coefficient occupancy models with species correlations (i.e., a spatially-explicit joint species distribution model with imperfect detection). We use Polya-Gamma latent variables and a spatial factor modeling approach. Models are implemented using a Nearest Neighbor Gaussian Process.
svcTMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', std.by.sp = FALSE, n.factors, svc.by.sp, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
svcTMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', std.by.sp = FALSE, n.factors, svc.by.sp, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). Only right-hand side of formula is specified. See example below. |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
std.by.sp |
a logical value indicating whether the covariates are standardized
separately for each species within the corresponding range for each species ( |
n.factors |
the number of factors to use in the spatial factor model approach. Note this corresponds to the number of factors used for each spatially-varying coefficient that is estimated in the model. Typically, the number of factors is set to be small (e.g., 4-5) relative to the total number of species in the community, which will lead to substantial decreases in computation time. However, the value can be anywhere between 1 and N (the number of species in the community). |
svc.by.sp |
an optional list with length equal to |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Defaul is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. Note this is specified in terms of batches and not overall samples for spatial models. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
... |
currently no additional arguments |
An object of class svcTMsPGOcc
that is a list comprised of:
beta.comm.samples |
a |
alpha.comm.samples |
a |
tau.sq.beta.samples |
a |
tau.sq.alpha.samples |
a |
beta.samples |
a |
alpha.samples |
a |
theta.samples |
a |
lambda.samples |
a |
z.samples |
a four-dimensional array of posterior samples for the latent occurrence values for each species. Dimensions corresopnd to MCMC sample, species, site, and primary time period. |
psi.samples |
a four-dimensional array of posterior samples for the latent occupancy probability values for each species. Dimensions correspond to MCMC sample, species, site, and primary time period. |
w.samples |
a four-dimensional array of posterior samples for the latent spatial random effects for each spatial factor within each spatially-varying coefficient. Dimensions correspond to MCMC sample, factor, site, and spatially-varying coefficient. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a four-dimensional array of posterior samples for the likelihood value used for calculating WAIC. Dimensions correspond to MCMC sample, species, site, and time period. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can
be extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed, A. S., & Zipkin, E. F. (2024A). Modeling complex species-environment relationships through spatially-varying coefficient occupancy models. Journal of Agricultural, Biological and Environmental Statistics. doi:10.1007/s13253-023-00595-6.
Doser, J. W., Kery, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, 33(4), e13814. doi:10.1111/geb.13814.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1, 2) p.svc <- length(svc.cols) n.factors <- 3 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model) y <- dat$y X <- dat$X X.p <- dat$X.p coords <- dat$coords X.re <- dat$X.re X.p.re <- dat$X.p.re occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / .9, b = 3 / .1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, z = z.init) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, svc.cols = svc.cols, cov.model = 'exponential', priors = prior.list, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- TRUE svc.cols <- c(1, 2) p.svc <- length(svc.cols) n.factors <- 3 phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3) factor.model <- TRUE cov.model <- 'exponential' dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model, svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp, cov.model = cov.model) y <- dat$y X <- dat$X X.p <- dat$X.p coords <- dat$coords X.re <- dat$X.re X.p.re <- dat$X.p.re occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1), phi.unif = list(a = 3 / .9, b = 3 / .1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, phi = 3 / .5, z = z.init) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, NNGP = TRUE, n.neighbors = 5, n.factors = n.factors, svc.cols = svc.cols, cov.model = 'exponential', priors = prior.list, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
The function svcTPGBinom
fits multi-season single-species spatially-varying coefficient binomial models using Polya-Gamma latent variables. Models are fit using Nearest Neighbor Gaussian Processes.
svcTPGBinom(formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
svcTPGBinom(formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
formula |
a symbolic description of the model to be fit using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of MCMC chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class svcTPGBinom
that is a list comprised of:
beta.samples |
a |
y.rep.samples |
a three-dimensional array of posterior samples for the fitted data values, with dimensions corresponding to posterior sample, site, and primary time period. |
psi.samples |
a three-dimensional array of posterior samples for the occurrence probability values, with dimensions corresponding to posterior sample, site, and primary time period. |
theta.samples |
a |
w.samples |
a three-dimensional array of posterior samples for the latent spatial random effects for all spatially-varying coefficients. Dimensions correspond to MCMC sample, coefficient, and sites. |
sigma.sq.psi.samples |
a |
beta.star.samples |
a |
eta.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood values associated with each site and primary time period. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
soring rule (deviance) from k-fold cross-validation.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation.
Note that if k.fold.only = TRUE
, the
return list object will only contain run.time
and k.fold.deviance
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed, A. S., & Zipkin, E. F. (2024A). Modeling complex species-environment relationships through spatially-varying coefficient occupancy models. Journal of Agricultural, Biological and Environmental Statistics. doi:10.1007/s13253-023-00595-6.
Doser, J. W., Kery, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, 33(4), e13814. doi:10.1111/geb.13814.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi:10.1080/10618600.2018.1537924.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
set.seed(1000) # Sites J.x <- 15 J.y <- 15 J <- J.x * J.y # Years sampled n.time <- sample(10, J, replace = TRUE) # Binomial weights weights <- matrix(NA, J, max(n.time)) for (j in 1:J) { weights[j, 1:n.time[j]] <- sample(5, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(-2, -0.5, -0.2, 0.75) p.occ <- length(beta) trend <- TRUE sp.only <- 0 psi.RE <- list() # Spatial parameters ------------------ sp <- TRUE svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3/1, 3/0.2) # Temporal parameters ----------------- ar1 <- TRUE rho <- 0.8 sigma.sq.t <- 1 # Get all the data dat <- simTBinom(J.x = J.x, J.y = J.y, n.time = n.time, weights = weights, beta = beta, psi.RE = psi.RE, sp.only = sp.only, trend = trend, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, rho = rho, sigma.sq.t = sigma.sq.t, ar1 = TRUE, x.positive = FALSE) # Prep the data for spOccupancy ------------------------------------------- y <- dat$y X <- dat$X X.re <- dat$X.re coords <- dat$coords # Package all data into a list covs <- list(int = X[, , 1], trend = X[, , 2], cov.1 = X[, , 3], cov.2 = X[, , 4]) # Data list bundle data.list <- list(y = y, covs = covs, weights = weights, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3/1, b = 3/.1), sigma.sq.t.ig = c(2, 0.5), rho.unif = c(-1, 1)) # Starting values inits.list <- list(beta = beta, alpha = 0, sigma.sq = 1, phi = 3 / 0.5, sigma.sq.t = 0.5, rho = 0) # Tuning tuning.list <- list(phi = 0.4, nu = 0.3, rho = 0.2) # MCMC settings n.batch <- 2 n.burn <- 0 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTPGBinom(formula = ~ trend + cov.1 + cov.2, svc.cols = svc.cols, data = data.list, n.batch = n.batch, batch.length = 25, inits = inits.list, priors = prior.list, accept.rate = 0.43, cov.model = "exponential", ar1 = TRUE, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1)
set.seed(1000) # Sites J.x <- 15 J.y <- 15 J <- J.x * J.y # Years sampled n.time <- sample(10, J, replace = TRUE) # Binomial weights weights <- matrix(NA, J, max(n.time)) for (j in 1:J) { weights[j, 1:n.time[j]] <- sample(5, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(-2, -0.5, -0.2, 0.75) p.occ <- length(beta) trend <- TRUE sp.only <- 0 psi.RE <- list() # Spatial parameters ------------------ sp <- TRUE svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3/1, 3/0.2) # Temporal parameters ----------------- ar1 <- TRUE rho <- 0.8 sigma.sq.t <- 1 # Get all the data dat <- simTBinom(J.x = J.x, J.y = J.y, n.time = n.time, weights = weights, beta = beta, psi.RE = psi.RE, sp.only = sp.only, trend = trend, sp = sp, svc.cols = svc.cols, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, rho = rho, sigma.sq.t = sigma.sq.t, ar1 = TRUE, x.positive = FALSE) # Prep the data for spOccupancy ------------------------------------------- y <- dat$y X <- dat$X X.re <- dat$X.re coords <- dat$coords # Package all data into a list covs <- list(int = X[, , 1], trend = X[, , 2], cov.1 = X[, , 3], cov.2 = X[, , 4]) # Data list bundle data.list <- list(y = y, covs = covs, weights = weights, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), sigma.sq.ig = list(a = 2, b = 1), phi.unif = list(a = 3/1, b = 3/.1), sigma.sq.t.ig = c(2, 0.5), rho.unif = c(-1, 1)) # Starting values inits.list <- list(beta = beta, alpha = 0, sigma.sq = 1, phi = 3 / 0.5, sigma.sq.t = 0.5, rho = 0) # Tuning tuning.list <- list(phi = 0.4, nu = 0.3, rho = 0.2) # MCMC settings n.batch <- 2 n.burn <- 0 n.thin <- 1 # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTPGBinom(formula = ~ trend + cov.1 + cov.2, svc.cols = svc.cols, data = data.list, n.batch = n.batch, batch.length = 25, inits = inits.list, priors = prior.list, accept.rate = 0.43, cov.model = "exponential", ar1 = TRUE, tuning = tuning.list, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1)
Function for fitting multi-season single-species spatially-varying coefficient occupancy models using Polya-Gamma latent variables. Models are fit using Nearest Neighbor Gaussian Processes.
svcTPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
svcTPGOcc(occ.formula, det.formula, data, inits, priors, tuning, svc.cols = 1, cov.model = 'exponential', NNGP = TRUE, n.neighbors = 15, search.type = 'cb', n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. |
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
|
NNGP |
if |
n.neighbors |
number of neighbors used in the NNGP. Only used if
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If |
n.report |
the interval to report MCMC progress. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class svcTPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occupancy values, with dimensions corresponding to posterior sample, site, and primary time period. |
psi.samples |
a three-dimensional array of posterior samples for the latent occupancy probability values, with dimensions corresponding to posterior sample, site, and primary time period. |
theta.samples |
a |
w.samples |
a three-dimensional array of posterior samples for the latent spatial random effects for all spatially-varying coefficients. Dimensions correspond to MCMC sample, coefficient, and sites. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
eta.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood values associated with each site and primary time period. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
scoring rule (deviance) from k-fold cross-validation.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can be
extracted using fitted()
. Note that if k.fold.only = TRUE
, the
return list object will only contain run.time
and k.fold.deviance
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed, A. S., & Zipkin, E. F. (2024). Modeling complex species-environment relationships through spatially-varying coefficient occupancy models. Journal of Agricultural, Biological and Environmental Statistics. doi:10.1007/s13253-023-00595-6.
Doser, J. W., Kery, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, 33(4), e13814. doi:10.1111/geb.13814.
set.seed(1000) # Sites J.x <- 15 J.y <- 15 J <- J.x * J.y # Years sampled n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(-2, -0.5, -0.2, 0.75) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(1, 0.7, -0.5) p.RE <- list() # Spatial parameters ------------------ sp <- TRUE svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3 / 1, 3 / 0.2) rho <- 0.8 sigma.sq.t <- 1 ar1 <- TRUE x.positive <- FALSE # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = sp, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, svc.cols = svc.cols, ar1 = ar1, rho = rho, sigma.sq.t = sigma.sq.t, x.positive = x.positive) # Prep the data for svcTPGOcc --------------------------------------------- # Full data set y <- dat$y X <- dat$X X.re <- dat$X.re X.p <- dat$X.p X.p.re <- dat$X.p.re coords <- dat$coords # Package all data into a list occ.covs <- list(int = X[, , 1], trend = X[, , 2], occ.cov.1 = X[, , 3], occ.cov.2 = X[, , 4]) # Detection det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) # Data list bundle data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), phi.unif = list(a = 3/1, b = 3/.1)) # Starting values z.init <- apply(y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, sigma.sq = 1, phi = 3 / 0.5, z = z.init, nu = 1) # Tuning tuning.list <- list(phi = 0.4, nu = 0.3, rho = 0.5, sigma.sq = 0.5) # MCMC settings n.batch <- 2 n.burn <- 0 n.thin <- 1 # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTPGOcc(occ.formula = ~ trend + occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, tuning = tuning.list, priors = prior.list, cov.model = "exponential", svc.cols = svc.cols, NNGP = TRUE, ar1 = TRUE, n.neighbors = 5, n.batch = n.batch, batch.length = 25, verbose = TRUE, n.report = 25, n.burn = n.burn, n.thin = n.thin, n.chains = 1)
set.seed(1000) # Sites J.x <- 15 J.y <- 15 J <- J.x * J.y # Years sampled n.time <- sample(10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(-2, -0.5, -0.2, 0.75) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(1, 0.7, -0.5) p.RE <- list() # Spatial parameters ------------------ sp <- TRUE svc.cols <- c(1, 2, 3) p.svc <- length(svc.cols) cov.model <- "exponential" sigma.sq <- runif(p.svc, 0.1, 1) phi <- runif(p.svc, 3 / 1, 3 / 0.2) rho <- 0.8 sigma.sq.t <- 1 ar1 <- TRUE x.positive <- FALSE # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = sp, cov.model = cov.model, sigma.sq = sigma.sq, phi = phi, svc.cols = svc.cols, ar1 = ar1, rho = rho, sigma.sq.t = sigma.sq.t, x.positive = x.positive) # Prep the data for svcTPGOcc --------------------------------------------- # Full data set y <- dat$y X <- dat$X X.re <- dat$X.re X.p <- dat$X.p X.p.re <- dat$X.p.re coords <- dat$coords # Package all data into a list occ.covs <- list(int = X[, , 1], trend = X[, , 2], occ.cov.1 = X[, , 3], occ.cov.2 = X[, , 4]) # Detection det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) # Data list bundle data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, coords = coords) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), phi.unif = list(a = 3/1, b = 3/.1)) # Starting values z.init <- apply(y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, sigma.sq = 1, phi = 3 / 0.5, z = z.init, nu = 1) # Tuning tuning.list <- list(phi = 0.4, nu = 0.3, rho = 0.5, sigma.sq = 0.5) # MCMC settings n.batch <- 2 n.burn <- 0 n.thin <- 1 # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- svcTPGOcc(occ.formula = ~ trend + occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, tuning = tuning.list, priors = prior.list, cov.model = "exponential", svc.cols = svc.cols, NNGP = TRUE, ar1 = TRUE, n.neighbors = 5, n.batch = n.batch, batch.length = 25, verbose = TRUE, n.report = 25, n.burn = n.burn, n.thin = n.thin, n.chains = 1)
Function for fitting single-species multi-season integrated occupancy models using Polya-Gamma latent variables. Data integration is done using a joint likelihood framework, assuming distinct detection models for each data source that are each conditional on a single latent occurrence process.
tIntPGOcc(occ.formula, det.formula, data, inits, priors, tuning, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
tIntPGOcc(occ.formula, det.formula, data, inits, priors, tuning, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a list of symbolic descriptions of the models to be fit for the detection portion of the model using R's model syntax for each data set. Each element in the list is a formula for the detection model of a given data set. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. This will have no impact on
model run times for non-spatial models. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If |
n.report |
the interval to report MCMC progress. Note this is specified in terms of batches, not MCMC samples. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run. |
... |
currently no additional arguments |
An object of class tIntPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occupancy values, with dimensions corresponding to posterior sample, site, and primary time period. Note this object will contain predicted occupancy values for sites/primary time periods that were not sampled. |
psi.samples |
a three-dimensional array of posterior samples for the latent occupancy probability values, with dimensions corresponding to posterior sample, site, and primary time period. Note this object will contain predicted occupancy probabilities for sites/primary time periods that were not sampled. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
theta.samples |
a |
eta.samples |
a |
p.samples |
a list of four-dimensional arrays consisting of the posterior samples of detection probability for each data source. For each data source, the dimensions of the four-dimensional array correspond to MCMC sample, site, season, and replicate within season. |
like.samples |
a two-dimensional array of posterior samples for the likelihood values associated with each site and primary time period, for each individual data source. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
The return object will include additional objects used for subsequent prediction and/or model fit evaluation.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.6)) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE) y <- dat$y X <- dat$X.obs X.re <- dat$X.re.obs X.p <- dat$X.p sites <- dat$sites # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3], occ.factor.1 = X.re[, , 1]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons) # Testing occ.formula <- ~ trend + occ.cov.1 + (1 | occ.factor.1) # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- tIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, n.batch = 3, batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) summary(out)
set.seed(332) # Simulate Data ----------------------------------------------------------- # Number of locations in each direction. This is the total region of interest # where some sites may or may not have a data source. J.x <- 15 J.y <- 15 J.all <- J.x * J.y # Number of data sources. n.data <- 3 # Sites for each data source. J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.4 * J.all), n.data, replace = TRUE) # Maximum number of years for each data set n.time.max <- c(4, 8, 10) # Number of years each site in each data set is sampled n.time <- list() for (i in 1:n.data) { n.time[[i]] <- sample(1:n.time.max[i], J.obs[i], replace = TRUE) } # Replicates for each data source. n.rep <- list() for (i in 1:n.data) { n.rep[[i]] <- matrix(NA, J.obs[i], n.time.max[i]) for (j in 1:J.obs[i]) { n.rep[[i]][j, sample(1:n.time.max[i], n.time[[i]][j], replace = FALSE)] <- sample(1:4, n.time[[i]][j], replace = TRUE) } } # Total number of years across all data sets n.time.total <- 10 # List denoting the specific years each data set was sampled during. data.seasons <- list() for (i in 1:n.data) { data.seasons[[i]] <- sort(sample(1:n.time.total, n.time.max[i], replace = FALSE)) } # Occupancy covariates beta <- c(0, 0.4, 0.3) trend <- TRUE # Random occupancy covariates psi.RE <- list(levels = c(20), sigma.sq.psi = c(0.6)) p.occ <- length(beta) # Detection covariates alpha <- list() alpha[[1]] <- c(0, 0.2, -0.5) alpha[[2]] <- c(-1, 0.5, 0.3, -0.8) alpha[[3]] <- c(-0.5, 1) p.RE <- list() p.det.long <- sapply(alpha, length) p.det <- sum(p.det.long) # Simulate occupancy data. dat <- simTIntOcc(n.data = n.data, J.x = J.x, J.y = J.y, J.obs = J.obs, n.time = n.time, data.seasons = data.seasons, n.rep = n.rep, beta = beta, alpha = alpha, trend = trend, psi.RE = psi.RE, p.RE = p.RE) y <- dat$y X <- dat$X.obs X.re <- dat$X.re.obs X.p <- dat$X.p sites <- dat$sites # Package all data into a list occ.covs <- list(trend = X[, , 2], occ.cov.1 = X[, , 3], occ.factor.1 = X.re[, , 1]) det.covs <- list() # Add covariates one by one det.covs[[1]] <- list(det.cov.1.1 = X.p[[1]][, , , 2], det.cov.1.2 = X.p[[1]][, , , 3]) det.covs[[2]] <- list(det.cov.2.1 = X.p[[2]][, , , 2], det.cov.2.2 = X.p[[2]][, , , 3], det.cov.2.3 = X.p[[2]][, , , 4]) det.covs[[3]] <- list(det.cov.3.1 = X.p[[3]][, , , 2]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs, sites = sites, seasons = data.seasons) # Testing occ.formula <- ~ trend + occ.cov.1 + (1 | occ.factor.1) # Note that the names are not necessary. det.formula <- list(f.1 = ~ det.cov.1.1 + det.cov.1.2, f.2 = ~ det.cov.2.1 + det.cov.2.2 + det.cov.2.3, f.3 = ~ det.cov.3.1) # NOTE: this is a short run of the model, in reality we would run the # model for much longer. out <- tIntPGOcc(occ.formula = occ.formula, det.formula = det.formula, data = data.list, n.batch = 3, batch.length = 25, n.report = 1, n.burn = 25, n.thin = 1, n.chains = 1) summary(out)
The function tMsPGOcc
fits multi-species multi-season occupancy models using Polya-Gamma data augmentation.
tMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
tMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). Only right-hand side of formula is specified. See example below. |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Defaul is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing within chains. This will have no
impact on model run times for non-spatial models. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting |
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model for each species. If |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. Note this is specified in terms of batches and not overall samples for spatial models. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run in sequence. |
... |
currently no additional arguments |
An object of class tMsPGOcc
that is a list comprised of:
beta.comm.samples |
a |
alpha.comm.samples |
a |
tau.sq.beta.samples |
a |
tau.sq.alpha.samples |
a |
beta.samples |
a |
alpha.samples |
a |
theta.samples |
a |
eta.samples |
a three-dimensional array of posterior samples for the species-specific AR(1) random effects for each primary time period. Dimensions correspond to MCMC sample, species, and primary time period. |
z.samples |
a four-dimensional array of posterior samples for the latent occurrence values for each species. Dimensions corresopnd to MCMC sample, species, site, and primary time period. |
psi.samples |
a four-dimensional array of posterior samples for the latent occupancy probability values for each species. Dimensions correspond to MCMC sample, species, site, and primary time period. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
like.samples |
a four-dimensional array of posterior samples for the likelihood value used for calculating WAIC. Dimensions correspond to MCMC sample, species, site, and time period. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
MCMC sampler execution time reported using |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can
be extracted using fitted()
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349-367.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Kery, M., & Royle, J. A. (2021). Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 2: Dynamic and advanced models. Academic Press. Section 4.6.
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- FALSE dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = sp) y <- dat$y X <- dat$X X.p <- dat$X.p X.re <- dat$X.re X.p.re <- dat$X.p.re occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = z.init) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- tMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
# Simulate Data ----------------------------------------------------------- set.seed(500) J.x <- 8 J.y <- 8 J <- J.x * J.y # Years sampled n.time <- sample(3:10, J, replace = TRUE) # n.time <- rep(10, J) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE) } N <- 7 # Community-level covariate effects # Occurrence beta.mean <- c(-3, -0.2, 0.5) trend <- FALSE sp.only <- 0 p.occ <- length(beta.mean) tau.sq.beta <- c(0.6, 1.5, 1.4) # Detection alpha.mean <- c(0, 1.2, -1.5) tau.sq.alpha <- c(1, 0.5, 2.3) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } sp <- FALSE dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = sp) y <- dat$y X <- dat$X X.p <- dat$X.p X.re <- dat$X.re X.p.re <- dat$X.p.re occ.covs <- list(occ.cov.1 = X[, , 2], occ.cov.2 = X[, , 3]) det.covs <- list(det.cov.1 = X.p[, , , 2], det.cov.2 = X.p[, , , 3]) data.list <- list(y = y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), alpha.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), tau.sq.alpha.ig = list(a = 0.1, b = 0.1)) z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0, alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1, z = z.init) # Tuning tuning.list <- list(phi = 1) # Number of batches n.batch <- 5 # Batch length batch.length <- 25 n.burn <- 25 n.thin <- 1 n.samples <- n.batch * batch.length # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- tMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = 1, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
Function for fitting multi-season single-species occupancy models using Polya-Gamma latent variables.
tPGOcc(occ.formula, det.formula, data, inits, priors, tuning, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
tPGOcc(occ.formula, det.formula, data, inits, priors, tuning, n.batch, batch.length, accept.rate = 0.43, n.omp.threads = 1, verbose = TRUE, ar1 = FALSE, n.report = 100, n.burn = round(.10 * n.batch * batch.length), n.thin = 1, n.chains = 1, k.fold, k.fold.threads = 1, k.fold.seed = 100, k.fold.only = FALSE, ...)
occ.formula |
a symbolic description of the model to be fit for the occurrence portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
det.formula |
a symbolic description of the model to be fit for the detection portion of the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015). |
data |
a list containing data necessary for model fitting.
Valid tags are |
inits |
a list with each tag corresponding to a parameter name.
Valid tags are |
priors |
a list with each tag corresponding to a parameter name.
Valid tags are |
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are |
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
batch.length |
the length of each MCMC batch in each chain to run for the Adaptive MCMC sampler. See Roberts and Rosenthal (2009) for details. |
accept.rate |
target acceptance rate for Adaptive MCMC. Default is 0.43. See Roberts and Rosenthal (2009) for details. |
n.omp.threads |
a positive integer indicating the number of threads
to use for SMP parallel processing within chains. This will have no impact on
model run times for non-spatial models. The package must be compiled for
OpenMP support. For most Intel-based machines, we recommend setting
|
verbose |
if |
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If |
n.report |
the interval to report MCMC progress. Note this is specified in terms of batches, not MCMC samples. |
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the |
n.chains |
the number of chains to run. |
k.fold |
specifies the number of k folds for cross-validation.
If not specified as an argument, then cross-validation is not performed
and |
k.fold.threads |
number of threads to use for cross-validation. If
|
k.fold.seed |
seed used to split data set into |
k.fold.only |
a logical value indicating whether to only perform
cross-validation ( |
... |
currently no additional arguments |
An object of class tPGOcc
that is a list comprised of:
beta.samples |
a |
alpha.samples |
a |
z.samples |
a three-dimensional array of posterior samples for the latent occupancy values, with dimensions corresponding to posterior sample, site, and primary time period. Note this object will contain predicted occupancy values for sites/primary time periods that were not sampled. |
psi.samples |
a three-dimensional array of posterior samples for the latent occupancy probability values, with dimensions corresponding to posterior sample, site, and primary time period. Note this object will contain predicted occupancy probabilities for sites/primary time periods that were not sampled. |
sigma.sq.psi.samples |
a |
sigma.sq.p.samples |
a |
beta.star.samples |
a |
alpha.star.samples |
a |
theta.samples |
a |
eta.samples |
a |
like.samples |
a three-dimensional array of posterior samples for the likelihood values associated with each site and primary time period. Used for calculating WAIC. |
rhat |
a list of Gelman-Rubin diagnostic values for some of the model parameters. |
ESS |
a list of effective sample sizes for some of the model parameters. |
run.time |
execution time reported using |
k.fold.deviance |
scoring rule (deviance) from k-fold cross-validation.
Only included if |
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can be
extracted using fitted()
. Note that if k.fold.only = TRUE
, the
return list object will only contain run.time
and k.fold.deviance
.
Some of the underlying code used for generating random numbers from the Polya-Gamma distribution is taken from the pgdraw package written by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm 6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108:1339-1349.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Kery, M., & Royle, J. A. (2021). Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 2: Dynamic and advanced models. Academic Press. Section 4.6.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model selection for ecologists. Ecological monographs, 85(1), 3-28.
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
set.seed(500) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(5:10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Temporal parameters ----------------- rho <- 0.7 sigma.sq.t <- 0.6 # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = FALSE, ar1 = TRUE, sigma.sq.t = sigma.sq.t, rho = rho) # Package all data into a list # Occurrence occ.covs <- list(int = dat$X[, , 1], trend = dat$X[, , 2], occ.cov.1 = dat$X[, , 3]) # Detection det.covs <- list(det.cov.1 = dat$X.p[, , , 2], det.cov.2 = dat$X.p[, , , 3]) # Data list bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), rho.unif = c(-1, 1), sigma.sq.t.ig = c(2, 0.5)) # Starting values z.init <- apply(dat$y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init) # Tuning tuning.list <- list(rho = 0.5) n.batch <- 20 batch.length <- 25 n.samples <- n.batch * batch.length n.burn <- 100 n.thin <- 1 # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- tPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, priors = prior.list, tuning = tuning.list, n.batch = n.batch, batch.length = batch.length, verbose = TRUE, ar1 = TRUE, n.report = 25, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
set.seed(500) # Sites J.x <- 10 J.y <- 10 J <- J.x * J.y # Primary time periods n.time <- sample(5:10, J, replace = TRUE) n.time.max <- max(n.time) # Replicates n.rep <- matrix(NA, J, max(n.time)) for (j in 1:J) { n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE) } # Occurrence -------------------------- beta <- c(0.4, 0.5, -0.9) trend <- TRUE sp.only <- 0 psi.RE <- list() # Detection --------------------------- alpha <- c(-1, 0.7, -0.5) p.RE <- list() # Temporal parameters ----------------- rho <- 0.7 sigma.sq.t <- 0.6 # Get all the data dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, psi.RE = psi.RE, p.RE = p.RE, sp = FALSE, ar1 = TRUE, sigma.sq.t = sigma.sq.t, rho = rho) # Package all data into a list # Occurrence occ.covs <- list(int = dat$X[, , 1], trend = dat$X[, , 2], occ.cov.1 = dat$X[, , 3]) # Detection det.covs <- list(det.cov.1 = dat$X.p[, , , 2], det.cov.2 = dat$X.p[, , , 3]) # Data list bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = 0, var = 2.72), alpha.normal = list(mean = 0, var = 2.72), rho.unif = c(-1, 1), sigma.sq.t.ig = c(2, 0.5)) # Starting values z.init <- apply(dat$y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0)) inits.list <- list(beta = 0, alpha = 0, z = z.init) # Tuning tuning.list <- list(rho = 0.5) n.batch <- 20 batch.length <- 25 n.samples <- n.batch * batch.length n.burn <- 100 n.thin <- 1 # Run the model # Note that this is just a test case and more iterations/chains may need to # be run to ensure convergence. out <- tPGOcc(occ.formula = ~ trend + occ.cov.1, det.formula = ~ det.cov.1 + det.cov.2, data = data.list, inits = inits.list, priors = prior.list, tuning = tuning.list, n.batch = n.batch, batch.length = batch.length, verbose = TRUE, ar1 = TRUE, n.report = 25, n.burn = n.burn, n.thin = n.thin, n.chains = 1) summary(out)
Function for updating a previously run spOccupancy or spAbundance model with additional MCMC iterations. This function is useful for situations where a model is run for a long time but convergence/adequate mixing of the MCMC chains is not reached. Instead of re-running the entire model again, this function allows you to pick up where you left off. This function is currently in development, and only currently works with the following spOccupancy and spAbundance model objects: msAbund, sfJSDM, lfJSDM. Note that cross-validation is not possible when updating the model.
updateMCMC(object, n.batch, n.samples, n.burn = 0, n.thin, keep.orig = TRUE, verbose = TRUE, n.report = 100, save.fitted = TRUE, ...)
updateMCMC(object, n.batch, n.samples, n.burn = 0, n.thin, keep.orig = TRUE, verbose = TRUE, n.report = 100, save.fitted = TRUE, ...)
object |
a |
n.batch |
the number of additional MCMC batches in each chain to run for the adaptive MCMC sampler. Only valid for model types fit with an adaptive MCMC sampler |
n.samples |
the number of posterior samples to collect in each chain. Only
valid for model types that are run with a fully Gibbs sampler and have
|
n.burn |
the number of samples out of the total |
n.thin |
the thinning interval for collection of MCMC samples in
the updated model run. The thinning occurs after the |
keep.orig |
A logical value indicating whether or not the samples from the original run of the model should be kept or discarded. |
verbose |
if |
n.report |
the interval to report Metropolis sampler acceptance and MCMC progress. |
save.fitted |
logical value indicating whether or not fitted values
and likelihood values should be saved in the resulting model object. This is only
relevant for models of class |
... |
currently no additional arguments |
An object of the same class as the original model fit provided in the argument
object. See the manual page for the original model type for complete details.
Jeffrey W. Doser [email protected],
J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6) # Detection alpha.mean <- c(0) tau.sq.alpha <- c(1) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } alpha.true <- alpha n.factors <- 3 phi <- rep(3 / .7, n.factors) sigma.sq <- rep(2, n.factors) nu <- rep(2, n.factors) dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, nu = nu, cov.model = 'matern', factor.model = TRUE, n.factors = n.factors) pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , drop = FALSE] coords <- as.matrix(dat$coords[-pred.indx, , drop = FALSE]) # Prediction covariates X.0 <- dat$X[pred.indx, , drop = FALSE] coords.0 <- as.matrix(dat$coords[pred.indx, , drop = FALSE]) # Detection covariates X.p <- dat$X.p[-pred.indx, , , drop = FALSE] y <- apply(y, c(1, 2), max, na.rm = TRUE) data.list <- list(y = y, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), nu.unif = list(0.5, 2.5)) # Starting values inits.list <- list(beta.comm = 0, beta = 0, fix = TRUE, tau.sq.beta = 1) # Tuning tuning.list <- list(phi = 1, nu = 0.25) batch.length <- 25 n.batch <- 2 n.report <- 100 formula <- ~ 1 out <- sfJSDM(formula = formula, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "matern", tuning = tuning.list, n.factors = 3, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 0, n.thin = 1, n.chains = 2) summary(out) # Update the initial model fit out.new <- updateMCMC(out, n.batch = 1, keep.orig = TRUE, verbose = TRUE, n.report = 1) summary(out.new)
J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep<- sample(2:4, size = J, replace = TRUE) N <- 6 # Community-level covariate effects # Occurrence beta.mean <- c(0.2) p.occ <- length(beta.mean) tau.sq.beta <- c(0.6) # Detection alpha.mean <- c(0) tau.sq.alpha <- c(1) p.det <- length(alpha.mean) # Random effects psi.RE <- list() p.RE <- list() # Draw species-level effects from community means. beta <- matrix(NA, nrow = N, ncol = p.occ) alpha <- matrix(NA, nrow = N, ncol = p.det) for (i in 1:p.occ) { beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i])) } for (i in 1:p.det) { alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i])) } alpha.true <- alpha n.factors <- 3 phi <- rep(3 / .7, n.factors) sigma.sq <- rep(2, n.factors) nu <- rep(2, n.factors) dat <- simMsOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, N = N, beta = beta, alpha = alpha, psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, phi = phi, nu = nu, cov.model = 'matern', factor.model = TRUE, n.factors = n.factors) pred.indx <- sample(1:J, round(J * .25), replace = FALSE) y <- dat$y[, -pred.indx, , drop = FALSE] # Occupancy covariates X <- dat$X[-pred.indx, , drop = FALSE] coords <- as.matrix(dat$coords[-pred.indx, , drop = FALSE]) # Prediction covariates X.0 <- dat$X[pred.indx, , drop = FALSE] coords.0 <- as.matrix(dat$coords[pred.indx, , drop = FALSE]) # Detection covariates X.p <- dat$X.p[-pred.indx, , , drop = FALSE] y <- apply(y, c(1, 2), max, na.rm = TRUE) data.list <- list(y = y, coords = coords) # Priors prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72), tau.sq.beta.ig = list(a = 0.1, b = 0.1), nu.unif = list(0.5, 2.5)) # Starting values inits.list <- list(beta.comm = 0, beta = 0, fix = TRUE, tau.sq.beta = 1) # Tuning tuning.list <- list(phi = 1, nu = 0.25) batch.length <- 25 n.batch <- 2 n.report <- 100 formula <- ~ 1 out <- sfJSDM(formula = formula, data = data.list, inits = inits.list, n.batch = n.batch, batch.length = batch.length, accept.rate = 0.43, priors = prior.list, cov.model = "matern", tuning = tuning.list, n.factors = 3, n.omp.threads = 1, verbose = TRUE, NNGP = TRUE, n.neighbors = 5, search.type = 'cb', n.report = 10, n.burn = 0, n.thin = 1, n.chains = 2) summary(out) # Update the initial model fit out.new <- updateMCMC(out, n.batch = 1, keep.orig = TRUE, verbose = TRUE, n.report = 1) summary(out.new)
Function for computing the Widely Applicable Information Criterion
(WAIC; Watanabe 2010) for spOccupancy
model objects.
waicOcc(object, by.sp = FALSE, ...)
waicOcc(object, by.sp = FALSE, ...)
object |
an object of class |
by.sp |
a logical value indicating whether to return a separate WAIC value for each species in a multi-species occupancy model or a single value for all species. |
... |
currently no additional arguments |
The effective number of parameters is calculated following the recommendations
of Gelman et al. (2014). Note that when fitting multi-species occupancy models with the
range.ind
tag, it is not valid to use WAIC to compare a model that
uses range.ind
(i.e., restricts certain species to a subset of the locations)
with a model that does not use range.ind
(i.e., assumes all species can occur at
all locations in the data set) or that uses different range.ind
values.
When object
is of class PGOcc
, spPGOcc
, msPGOcc
, spMsPGOcc
, lfJSDM
, sfJSDM
, lfMsPGOcc
, sfMsPGOcc
, tPGOcc
, stPGOcc
, svcPGBinom
, svcPGOcc
, svcTPGOcc
, svcTPGBinom
, svcMsPGOcc
, tMsPGOcc
, stMsPGOcc
, svcTMsPGOcc
returns a vector with three elements corresponding to
estimates of the expected log pointwise predictive density (elpd), the
effective number of parameters (pD), and the WAIC. When by.sp = TRUE
for multi-species models, object is a data frame with each row corresponding to a different species. When object
is
of class intPGOcc
or spIntPGOcc
, returns a data frame with
columns elpd, pD, and WAIC, with each row corresponding to the estimated
values for each data source in the integrated model.
Jeffrey W. Doser [email protected],
Andrew O. Finley [email protected]
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11:3571-3594.
Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. (2013). Bayesian Data Analysis. 3rd edition. CRC Press, Taylor and Francis Group
Gelman, A., J. Hwang, and A. Vehtari (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24:997-1016.
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, -0.15) p.occ <- length(beta) alpha <- c(0.7, 0.4) p.det <- length(alpha) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) occ.covs <- dat$X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov = dat$X.p[, , 2]) # Data bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = rep(0, p.occ), var = rep(2.72, p.occ)), alpha.normal = list(mean = rep(0, p.det), var = rep(2.72, p.det))) # Initial values inits.list <- list(alpha = rep(0, p.det), beta = rep(0, p.occ), z = apply(data.list$y, 1, max, na.rm = TRUE)) n.samples <- 5000 n.report <- 1000 out <- PGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = n.report, n.burn = 4000, n.thin = 1) # Calculate WAIC waicOcc(out)
set.seed(400) # Simulate Data ----------------------------------------------------------- J.x <- 8 J.y <- 8 J <- J.x * J.y n.rep <- sample(2:4, J, replace = TRUE) beta <- c(0.5, -0.15) p.occ <- length(beta) alpha <- c(0.7, 0.4) p.det <- length(alpha) dat <- simOcc(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta, alpha = alpha, sp = FALSE) occ.covs <- dat$X[, 2, drop = FALSE] colnames(occ.covs) <- c('occ.cov') det.covs <- list(det.cov = dat$X.p[, , 2]) # Data bundle data.list <- list(y = dat$y, occ.covs = occ.covs, det.covs = det.covs) # Priors prior.list <- list(beta.normal = list(mean = rep(0, p.occ), var = rep(2.72, p.occ)), alpha.normal = list(mean = rep(0, p.det), var = rep(2.72, p.det))) # Initial values inits.list <- list(alpha = rep(0, p.det), beta = rep(0, p.occ), z = apply(data.list$y, 1, max, na.rm = TRUE)) n.samples <- 5000 n.report <- 1000 out <- PGOcc(occ.formula = ~ occ.cov, det.formula = ~ det.cov, data = data.list, inits = inits.list, n.samples = n.samples, priors = prior.list, n.omp.threads = 1, verbose = TRUE, n.report = n.report, n.burn = 4000, n.thin = 1) # Calculate WAIC waicOcc(out)