Title: | Fitting the Centered Autologistic and Sparse Spatial Generalized Linear Mixed Models for Areal Data |
---|---|
Description: | Provides tools for analyzing spatial data, especially non- Gaussian areal data. The current version supports the sparse restricted spatial regression model of Hughes and Haran (2013) <DOI:10.1111/j.1467-9868.2012.01041.x>, the centered autologistic model of Caragea and Kaiser (2009) <DOI:10.1198/jabes.2009.07032>, and the Bayesian spatial filtering model of Hughes (2017) <arXiv:1706.04651>. |
Authors: | John Hughes <[email protected]> and Xiaohui Cui |
Maintainer: | John Hughes <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2-2 |
Built: | 2024-12-26 06:35:44 UTC |
Source: | CRAN |
A contains the adjacency matrix for the infant mortality data analyzed in (Hughes and Haran, 2013).
data(A)
data(A)
The data were obtained from the 2008 Area Resource File (ARF), a county-level database maintained by the Bureau of Health Professions, Health Resources and Services Administration, US Department of Health and Human Services.
data(A)
data(A)
Return an adjacency matrix for a square lattice.
adjacency.matrix(m, n = NULL)
adjacency.matrix(m, n = NULL)
m |
the number of rows in the lattice. |
n |
the number of columns in the lattice. Defaults to |
This function builds the adjacency matrix for the m
by n
square lattice.
A matrix of 0s and 1s, where
is equal to 1 iff vertices
and
are adjacent.
Fit a centered autologistic model using maximum pseudolikelihood estimation or MCMC for Bayesian inference.
autologistic(formula, data, A, method = c("PL", "Bayes"), model = TRUE, x = FALSE, y = FALSE, verbose = FALSE, control = list())
autologistic(formula, data, A, method = c("PL", "Bayes"), model = TRUE, x = FALSE, y = FALSE, verbose = FALSE, control = list())
formula |
an object of class |
data |
an optional data frame, list, or environment (or object coercible by |
A |
the adjacency matrix for the underlying graph. The matrix need not be binary, but it must be numeric and symmetric. |
method |
the method to use for inference. “ |
model |
a logical value indicating whether the model frame should be included as a component of the returned value. |
x |
a logical value indicating whether the model matrix used in the fitting process should be returned as a component of the returned value. |
y |
a logical value indicating whether the response vector used in the fitting process should be returned as a component of the returned value. |
verbose |
a logical value indicating whether to print various messages to the screen, including progress updates when |
control |
a list of the following control parameters.
|
This function fits the centered autologistic model of Caragea and Kaiser (2009) using maximum pseudolikelihood estimation or MCMC for Bayesian inference. The joint distribution for the centered autologistic model is
where is the parameter vector,
is an intractable normalizing function,
is the response vector,
is the design matrix,
is a
-vector of regression coefficients,
is the adjacency matrix for the underlying graph,
is the vector of independence expectations,
and
is the spatial dependence parameter.
Maximum pseudolikelihood estimation sidesteps the intractability of by maximizing the product of the conditional likelihoods.
Confidence intervals are then obtained using a parametric bootstrap or a normal approximation, i.e., sandwich estimation. The bootstrap datasets are generated by perfect sampling (
rautologistic
).
The bootstrap samples can be generated in parallel using the parallel package.
Bayesian inference is obtained using the auxiliary variable algorithm of Moller et al. (2006).
The auxiliary variables are generated by perfect sampling.
The prior distributions are (1) zero-mean normal with independent coordinates for , and (2) uniform for
.
The common standard deviation for the normal prior can be supplied by the user as control parameter
sigma
. The default is 1,000. The uniform prior has support [0, 2] by default, but the right endpoint can be supplied (as control parameter eta.max
) by the user.
The posterior covariance matrix of is estimated using samples obtained during a training run. The default number of iterations for the training run is 100,000, but this can be controlled by the user (via parameter
trainit
). The estimated covariance matrix is then used as the proposal variance for a Metropolis-Hastings random walk. The proposal distribution is normal. The posterior samples obtained during the second run are used for inference. The length of the run can be controlled by the user via parameters minit
, maxit
, and tol
. The first determines the minimum number of iterations. If minit
has been reached, the sampler will terminate when maxit
is reached or all Monte Carlo standard errors are smaller than tol
, whichever happens first. The default values for minit
, maxit
, and tol
are 100,000; 1,000,000; and 0.01, respectively.
autologistic
returns an object of class “autologistic
”, which is a list containing the following components.
coefficients |
the point estimate of |
fitted.values |
the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. |
linear.predictors |
the linear fit on link scale. |
residuals |
the response residuals. |
iter |
the size of the bootstrap/posterior sample. |
sample |
(where relevant) an |
mcse |
(where relevant) a |
S |
(where relevant) the estimated sandwich matrix. |
accept |
(Bayes) the acceptance rate for the MCMC sampler. |
y |
if requested (the default), the |
X |
if requested, the model matrix. |
model |
if requested (the default), the model frame. |
call |
the matched call. |
formula |
the formula supplied. |
method |
the method used for inference. |
convergence |
the integer code returned by |
message |
a character string to go along with |
terms |
the |
data |
the |
xlevels |
(where relevant) a record of the levels of the factors used in fitting. |
control |
a list containing the names and values of the control parameters. |
value |
the value of the negative log pseudolikelihood at its minimum. |
Caragea, P. and Kaiser, M. (2009) Autologistic models with interpretable parameters. Journal of Agricultural, Biological, and Environmental Statistics, 14(3), 281–300.
Hughes, J., Haran, M. and Caragea, P. C. (2011) Autologistic models for binary data on a lattice. Environmetrics, 22(7), 857–871.
Moller, J., Pettitt, A., Berthelsen, K., and Reeves, R. (2006) An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2), 451–458.
rautologistic
, residuals.autologistic
, summary.autologistic
, vcov.autologistic
# Use the 20 x 20 square lattice as the underlying graph. n = 20 A = adjacency.matrix(n) # Assign coordinates to each vertex such that the coordinates are restricted to the unit square # centered at the origin. x = rep(0:(n - 1) / (n - 1), times = n) - 0.5 y = rep(0:(n - 1) / (n - 1), each = n) - 0.5 X = cbind(x, y) # Use the vertex locations as spatial covariates. beta = c(2, 2) # These are the regression coefficients. col1 = "white" col2 = "black" colfunc = colorRampPalette(c(col1, col2)) # Simulate a dataset with the above mentioned regression component and eta equal to 0.6. This # value of eta corresponds to dependence that is moderate in strength. theta = c(beta, eta = 0.6) set.seed(123456) Z = rautologistic(X, A, theta) # Find the MPLE, and do not compute confidence intervals. fit = autologistic(Z ~ X - 1, A = A, control = list(confint = "none")) summary(fit) # The following examples are not executed by default since the computation is time consuming. # Compute confidence intervals based on the normal approximation. Estimate the "filling" in the # sandwich matrix using a parallel parametric bootstrap, where the computation is distributed # across six cores on the host workstation. # set.seed(123456) # fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE, # control = list(confint = "sandwich", nodes = 6)) # summary(fit) # Compute confidence intervals based on a parallel parametric bootstrap. Use a bootstrap sample # of size 500, and distribute the computation across six cores on the host workstation. # set.seed(123456) # fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE, # control = list(confint = "bootstrap", bootit = 500, nodes = 6)) # summary(fit) # Do MCMC for Bayesian inference. The length of the training run will be 10,000, and # the length of the subsequent inferential run will be at least 10,000. # set.seed(123456) # fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE, method = "Bayes", # control = list(trainit = 10000, minit = 10000, sigma = 1000)) # summary(fit)
# Use the 20 x 20 square lattice as the underlying graph. n = 20 A = adjacency.matrix(n) # Assign coordinates to each vertex such that the coordinates are restricted to the unit square # centered at the origin. x = rep(0:(n - 1) / (n - 1), times = n) - 0.5 y = rep(0:(n - 1) / (n - 1), each = n) - 0.5 X = cbind(x, y) # Use the vertex locations as spatial covariates. beta = c(2, 2) # These are the regression coefficients. col1 = "white" col2 = "black" colfunc = colorRampPalette(c(col1, col2)) # Simulate a dataset with the above mentioned regression component and eta equal to 0.6. This # value of eta corresponds to dependence that is moderate in strength. theta = c(beta, eta = 0.6) set.seed(123456) Z = rautologistic(X, A, theta) # Find the MPLE, and do not compute confidence intervals. fit = autologistic(Z ~ X - 1, A = A, control = list(confint = "none")) summary(fit) # The following examples are not executed by default since the computation is time consuming. # Compute confidence intervals based on the normal approximation. Estimate the "filling" in the # sandwich matrix using a parallel parametric bootstrap, where the computation is distributed # across six cores on the host workstation. # set.seed(123456) # fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE, # control = list(confint = "sandwich", nodes = 6)) # summary(fit) # Compute confidence intervals based on a parallel parametric bootstrap. Use a bootstrap sample # of size 500, and distribute the computation across six cores on the host workstation. # set.seed(123456) # fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE, # control = list(confint = "bootstrap", bootit = 500, nodes = 6)) # summary(fit) # Do MCMC for Bayesian inference. The length of the training run will be 10,000, and # the length of the subsequent inferential run will be at least 10,000. # set.seed(123456) # fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE, method = "Bayes", # control = list(trainit = 10000, minit = 10000, sigma = 1000)) # summary(fit)
infant contains the infant mortality data analyzed in (Hughes and Haran, 2013).
data(infant)
data(infant)
The data were obtained from the 2008 Area Resource File (ARF), a county-level database maintained by the Bureau of Health Professions, Health Resources and Services Administration, US Department of Health and Human Services.
data(infant)
data(infant)
Provides the information required to apply a sparse SGLMM to conditionally negative binomial outcomes.
negbinomial(theta = stop("'theta' must be specified."), link = "log")
negbinomial(theta = stop("'theta' must be specified."), link = "log")
theta |
the dispersion parameter (must be positive). |
link |
the link function, as a character string, name, or one-element character vector, specifying one of |
An object of class “family
”, a list of functions and expressions needed to fit a negative binomial GLM.
Return a perfect sample from a centered autologistic model.
rautologistic(X, A, theta)
rautologistic(X, A, theta)
X |
the design matrix. |
A |
the adjacency matrix for the underlying graph. The matrix need not be binary, but it must be numeric and symmetric. |
theta |
the vector of parameter values: |
This function implements a perfect sampler for the centered autologistic model. The sampler employs coupling from the past.
A vector that is distributed exactly according to the centered autologistic model with the given design matrix and parameter values.
Moller, J. (1999) Perfect simulation of conditionally specified models. Journal of the Royal Statistical Society, Series B, Methodological, 61, 251–264.
Propp, J. G. and Wilson, D. B. (1996) Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms, 9(1-2), 223–252.
# Use the 20 x 20 square lattice as the underlying graph. n = 20 A = adjacency.matrix(n) # Assign coordinates to each vertex such that the coordinates are restricted to the unit square # centered at the origin. x = rep(0:(n - 1) / (n - 1), times = n) - 0.5 y = rep(0:(n - 1) / (n - 1), each = n) - 0.5 X = cbind(x, y) # Use the vertex locations as spatial covariates. beta = c(2, 2) # These are the regression coefficients. # Simulate a dataset with the above mentioned regression component and eta equal to 0.6. This # value of eta corresponds to dependence that is moderate in strength. theta = c(beta, eta = 0.6) set.seed(123456) Z = rautologistic(X, A, theta)
# Use the 20 x 20 square lattice as the underlying graph. n = 20 A = adjacency.matrix(n) # Assign coordinates to each vertex such that the coordinates are restricted to the unit square # centered at the origin. x = rep(0:(n - 1) / (n - 1), times = n) - 0.5 y = rep(0:(n - 1) / (n - 1), each = n) - 0.5 X = cbind(x, y) # Use the vertex locations as spatial covariates. beta = c(2, 2) # These are the regression coefficients. # Simulate a dataset with the above mentioned regression component and eta equal to 0.6. This # value of eta corresponds to dependence that is moderate in strength. theta = c(beta, eta = 0.6) set.seed(123456) Z = rautologistic(X, A, theta)
Extract model residuals.
## S3 method for class 'autologistic' residuals(object, type = c("deviance", "pearson", "response"), ...)
## S3 method for class 'autologistic' residuals(object, type = c("deviance", "pearson", "response"), ...)
object |
an object of class |
type |
the type of residuals that should be returned. The alternatives are “ |
... |
additional arguments. |
A vector of residuals.
Extract model residuals.
## S3 method for class 'sparse.sglmm' residuals(object, type = c("deviance", "pearson", "response"), ...)
## S3 method for class 'sparse.sglmm' residuals(object, type = c("deviance", "pearson", "response"), ...)
object |
an object of class |
type |
the type of residuals that should be returned. The alternatives are “ |
... |
additional arguments. |
A vector of residuals.
Fit a sparse SGLMM.
sparse.sglmm(formula, family = gaussian, data, offset, A, method = c("BSF", "RSR"), attractive = 50, repulsive = 0, tol = 0.01, minit = 10000, maxit = 1e+06, tune = list(), hyper = list(), model = TRUE, x = FALSE, y = FALSE, verbose = FALSE, parallel = FALSE)
sparse.sglmm(formula, family = gaussian, data, offset, A, method = c("BSF", "RSR"), attractive = 50, repulsive = 0, tol = 0.01, minit = 10000, maxit = 1e+06, tune = list(), hyper = list(), model = TRUE, x = FALSE, y = FALSE, verbose = FALSE, parallel = FALSE)
formula |
an object of class |
family |
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function, or the result of a call to a family function. (See |
data |
an optional data frame, list, or environment (or object coercible by |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be |
A |
the adjacency matrix for the underlying graph. |
method |
the basis to use. The options are Bayesian spatial filtering (“ |
attractive |
the number of attractive Moran eigenvectors to use. The default is 50. See ‘Details’ for more information. |
repulsive |
the number of repulsive Moran eigenvectors to use. The default is 0. See ‘Details’ for more information. |
tol |
a tolerance. If all Monte Carlo standard errors are smaller than |
minit |
the minimum sample size. This should be large enough to permit accurate estimation of Monte Carlo standard errors. The default is 10,000. |
maxit |
the maximum sample size. Sampling from the posterior terminates when all Monte Carlo standard errors are smaller than |
tune |
(where relevant) a list containing |
hyper |
a list containing |
model |
a logical value indicating whether the model frame should be included as a component of the returned value. |
x |
a logical value indicating whether the model matrix used in the fitting process should be returned as a component of the returned value. |
y |
a logical value indicating whether the response vector used in the fitting process should be returned as a component of the returned value. |
verbose |
a logical value indicating whether to print MCMC progress to the screen. Defaults to |
parallel |
(for parallelized computation of the Moran operator) a list containing |
This function fits the sparse restricted spatial regression model of Hughes and Haran (2013), or the Bayesian spatial filtering model of Hughes (2017). The first stage of the model is
or, in vectorized form,
where is the design matrix,
is a
-vector of regression coefficients, the columns of
are
eigenvectors of the Moran operator, and
are spatial random effects. Arguments
attractive
and repulsive
can be used to control the number of eigenvectors used. The default values are 50 and 0, respectively, which corresponds to pure spatial smoothing. Inclusion of some repulsive eigenvectors can be advantageous in certain applications.
The second stage, i.e., the prior for , is
where is a smoothing parameter and
is the graph Laplacian.
The prior for is spherical
-variate normal with mean zero and common standard deviation
sigma.b
, which defaults to 1,000. The prior for is gamma with parameters 0.5 and 2,000. The same prior is used for
(when family is
negbinomial
).
When the response is normally distributed, the identity link is assumed, in which case the first stage is
where are heterogeneity random effects. When the response is Poisson distributed, heterogeneity random effects are optional. In any case, the prior on
is spherical
-variate normal with mean zero and common variance
. The prior for
is gamma with parameters
and
, the values of which are controlled by the user through argument
hyper
.
If the response is Bernoulli, negative binomial, or Poisson, and
are updated using Metropolis-Hastings random walks with normal proposals. The proposal covariance matrix for
is the estimated asymptotic covariance matrix from a
glm
fit to the data (see vcov
). The proposal for is spherical normal with common standard deviation
sigma.s
.
The updates for and
are Gibbs updates irrespective of the response distribution.
If the response is Poisson distributed and heterogeneity random effects are included, those random effects are updated using a Metropolis-Hastings random walk with a spherical normal proposal. The common standard deviation is sigma.h
.
If the response is normally distributed, all updates are Gibbs updates.
If the response is negative binomial, the dispersion parameter is updated using a Metropolis-Hastings random walk with a normal proposal. Said proposal has standard deviation
sigma.theta
, which can be provided by the user as an element of argument tune
.
sparse.sglmm
returns an object of class “sparse.sglmm
”, which is a list containing the following components.
coefficients |
the estimated regression coefficients. |
fitted.values |
the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. |
linear.predictors |
the linear fit on link scale. |
residuals |
the response residuals. |
iter |
the size of the posterior sample. |
beta.sample |
an |
gamma.sample |
an |
delta.sample |
(where relevant) an |
theta.sample |
(where relevant) a vector containing the posterior samples for |
tau.s.sample |
a vector containing the posterior samples for |
tau.h.sample |
(where relevant) a vector containing the posterior samples for |
gamma.est |
the estimate of |
delta.est |
(where relevant) the estimate of |
tau.s.est |
the estimate of |
tau.h.est |
(where relevant) the estimate of |
theta.est |
(where relevant) the estimate of |
beta.mcse |
the Monte Carlo standard errors for |
gamma.mcse |
the Monte Carlo standard errors for |
delta.mcse |
(where relevant) the Monte Carlo standard errors for |
tau.s.mcse |
the Monte Carlo standard error for |
tau.h.mcse |
(where relevant) the Monte Carlo standard error for |
theta.mcse |
(where relevant) the Monte Carlo standard error for |
D.bar |
the goodness of fit component of the DIC. |
pD |
the penalty component of the DIC. |
dic |
the deviance information criterion. |
beta.accept |
the acceptance rate for |
gamma.accept |
the acceptance rate for |
delta.accept |
(where relevant) the acceptance rate for |
theta.accept |
(where relevant) the acceptance rate for |
y |
if requested (the default), the |
X |
if requested, the model matrix. |
M |
if requested, the matrix of Moran eigenvectors. |
eigen.values |
if requested, the spectrum of the Moran operator. |
hyper |
a list containing the names and values of the hyperparameters. |
tune |
a list containing the names and values of the tuning parameters. |
model |
if requested (the default), the model frame. |
call |
the matched call. |
formula |
the formula supplied. |
terms |
the |
data |
the |
offset |
the offset vector used. |
xlevels |
(where relevant) a record of the levels of the factors used in fitting. |
Hughes, J. and Haran, M. (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society, Series B, 75(1), 139–159.
residuals.sparse.sglmm
, summary.sparse.sglmm
, vcov.sparse.sglmm
## Not run: The following code duplicates the analysis described in (Hughes and Haran, 2013). The data are infant mortality data for 3,071 US counties. We do a spatial Poisson regression with offset. data(infant) infant$low_weight = infant$low_weight / infant$births attach(infant) Z = deaths X = cbind(1, low_weight, black, hispanic, gini, affluence, stability) data(A) set.seed(123456) fit = sparse.sglmm(Z ~ X - 1 + offset(log(births)), family = poisson, A = A, method = "RSR", tune = list(sigma.s = 0.02), verbose = TRUE) summary(fit) ## End(Not run)
## Not run: The following code duplicates the analysis described in (Hughes and Haran, 2013). The data are infant mortality data for 3,071 US counties. We do a spatial Poisson regression with offset. data(infant) infant$low_weight = infant$low_weight / infant$births attach(infant) Z = deaths X = cbind(1, low_weight, black, hispanic, gini, affluence, stability) data(A) set.seed(123456) fit = sparse.sglmm(Z ~ X - 1 + offset(log(births)), family = poisson, A = A, method = "RSR", tune = list(sigma.s = 0.02), verbose = TRUE) summary(fit) ## End(Not run)
Print a summary of a centered autologistic model fit.
## S3 method for class 'autologistic' summary(object, alpha = 0.05, digits = 4, ...)
## S3 method for class 'autologistic' summary(object, alpha = 0.05, digits = 4, ...)
object |
an object of class |
alpha |
the significance level for the quantile/HPD intervals. The default is 0.05. |
digits |
the number of significant digits to display. The default is 4. |
... |
additional arguments. |
This function displays (1) the call to autologistic
, (2) the values of the control parameters, (3) a table of estimates, and (4) the size of the bootstrap/posterior sample. Each row of the table of estimates shows a parameter estimate, the confidence or HPD interval for the parameter, and, where applicable, the Monte Carlo standard error.
Print a summary of a sparse SGLMM fit.
## S3 method for class 'sparse.sglmm' summary(object, alpha = 0.05, digits = 4, ...)
## S3 method for class 'sparse.sglmm' summary(object, alpha = 0.05, digits = 4, ...)
object |
an object of class |
alpha |
the significance level used to compute the HPD intervals. The default is 0.05. |
digits |
the number of significant digits to display. The default is 4. |
... |
additional arguments. |
This function displays (1) the call to sparse.sglmm
, (2) the values of the hyperparameters and tuning parameters, (3) a table of estimates, (4) the DIC value for the fit, and (5) the number of posterior samples. Each row of the table of estimates shows an estimated regression coefficient, the HPD interval for the coefficient, and the Monte Carlo standard error.
autologistic
model object.Return the estimated covariance matrix for an autologistic
model object.
## S3 method for class 'autologistic' vcov(object, ...)
## S3 method for class 'autologistic' vcov(object, ...)
object |
a fitted |
... |
additional arguments. |
An estimate of the covariance matrix of the parameters (in a Bayesian setting), or an estimate of the covariance matrix of the maximum pseudolikelihood estimator of the parameters.
sparse.sglmm
model object.Return the covariance matrix of the regression parameters of a sparse.sglmm
model object.
## S3 method for class 'sparse.sglmm' vcov(object, ...)
## S3 method for class 'sparse.sglmm' vcov(object, ...)
object |
a fitted |
... |
additional arguments. |
An estimate of the posterior covariance matrix of the regression coefficients.