Title: | Spatial Bayesian Factor Analysis |
---|---|
Description: | Implements a spatial Bayesian non-parametric factor analysis model with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC). Spatial correlation is introduced in the columns of the factor loadings matrix using a Bayesian non-parametric prior, the probit stick-breaking process. Areal spatial data is modeled using a conditional autoregressive (CAR) prior and point-referenced spatial data is treated using a Gaussian process. The response variable can be modeled as Gaussian, probit, Tobit, or Binomial (using Polya-Gamma augmentation). Temporal correlation is introduced for the latent factors through a hierarchical structure and can be specified as exponential or first-order autoregressive. Full details of the package can be found in the accompanying vignette. Furthermore, the details of the package can be found in "Bayesian Non-Parametric Factor Analysis for Longitudinal Spatial Surfaces", by Berchuck et al (2019), <arXiv:1911.04337>. The paper is in press at the journal Bayesian Analysis. |
Authors: | Samuel I. Berchuck [aut, cre] |
Maintainer: | Samuel I. Berchuck <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3 |
Built: | 2024-12-30 09:07:23 UTC |
Source: | CRAN |
bfa_sp
is a Markov chain Monte Carlo (MCMC) sampler for a Bayesian spatial factor analysis model. The spatial component is
introduced using a Probit stick-breaking process prior on the factor loadings. The model is implemented using a Bayesian hierarchical framework.
bfa_sp( formula, data, dist, time, K, L = Inf, trials = NULL, family = "normal", temporal.structure = "exponential", spatial.structure = "discrete", starting = NULL, hypers = NULL, tuning = NULL, mcmc = NULL, seed = 54, gamma.shrinkage = TRUE, include.space = TRUE, clustering = TRUE )
bfa_sp( formula, data, dist, time, K, L = Inf, trials = NULL, family = "normal", temporal.structure = "exponential", spatial.structure = "discrete", starting = NULL, hypers = NULL, tuning = NULL, mcmc = NULL, seed = 54, gamma.shrinkage = TRUE, include.space = TRUE, clustering = TRUE )
formula |
A |
data |
A required |
dist |
A |
time |
A |
K |
A scalar that indicates the dimension (i.e., quantity) of latent factors. |
L |
The number of latent clusters. If finite, a scalar indicating the number of clusters for each column of the factor loadings matrix. By default |
trials |
A variable in |
family |
Character string indicating the distribution of the observed data. Options
include: |
temporal.structure |
Character string indicating the temporal kernel. Options include:
|
spatial.structure |
Character string indicating the type of spatial process. Options include:
|
starting |
Either When |
hypers |
Either When
|
tuning |
Either When |
mcmc |
Either
|
seed |
An integer value used to set the seed for the random number generator (default = 54). |
gamma.shrinkage |
A logical indicating whether a gamma shrinkage process prior is used for the variances of the factor loadings columns. If FALSE, the hyperparameters (A1 and A2) indicate the shape and rate for a gamma prior on the precisions. Default is TRUE. |
include.space |
A logical indicating whether a spatial process should be included. Default is TRUE, however if FALSE the spatial correlation matrix
is fixed as an identity matrix. This specification overrides the |
clustering |
A logical indicating whether the Bayesian non-parametric process should be used, default is TRUE. If FALSE is specified each column is instead modeled with an independent spatial process. |
Details of the underlying statistical model proposed by Berchuck et al. 2019. are forthcoming.
bfa_sp
returns a list containing the following objects
lambda
NKeep x (M x O x K)
matrix
of posterior samples for factor loadings matrix lambda
.
The labels for each column are Lambda_O_M_K.
eta
NKeep x (Nu x K)
matrix
of posterior samples for the latent factors eta
.
The labels for each column are Eta_Nu_K.
beta
NKeep x P
matrix
of posterior samples for beta
.
sigma2
NKeep x (M * (O - C))
matrix
of posterior samples for the variances sigma2
.
The labels for each column are Sigma2_O_M.
kappa
NKeep x ((O * (O + 1)) / 2)
matrix
of posterior samples for kappa
. The
columns have names that describe the samples within them. The row is listed first, e.g.,
Kappa3_2
refers to the entry in row 3
, column 2
.
delta
NKeep x K
matrix
of posterior samples for delta
.
tau
NKeep x K
matrix
of posterior samples for tau
.
upsilon
NKeep x ((K * (K + 1)) / 2)
matrix
of posterior samples for Upsilon
. The
columns have names that describe the samples within them. The row is listed first, e.g.,
Upsilon3_2
refers to the entry in row 3
, column 2
.
psi
NKeep x 1
matrix
of posterior samples for psi
.
xi
NKeep x (M x O x K)
matrix
of posterior samples for factor loadings cluster labels xi
.
The labels for each column are Xi_O_M_K.
rho
NKeep x 1
matrix
of posterior samples for rho
.
metropolis
2 (or 1) x 3
matrix
of metropolis
acceptance rates, updated tuners, and original tuners that result from the pilot
adaptation.
runtime
A character
string giving the runtime of the MCMC sampler.
datobj
A list
of data objects that are used in future bfa_sp
functions
and should be ignored by the user.
dataug
A list
of data augmentation objects that are used in future
bfa_sp
functions and should be ignored by the user.
Reference for Berchuck et al. 2019 is forthcoming.
###Load womblR for example visual field data library(womblR) ###Format data for MCMC sampler blind_spot <- c(26, 35) # define blind spot VFSeries <- VFSeries[order(VFSeries$Location), ] # sort by location VFSeries <- VFSeries[order(VFSeries$Visit), ] # sort by visit VFSeries <- VFSeries[!VFSeries$Location %in% blind_spot, ] # remove blind spot locations dat <- data.frame(Y = VFSeries$DLS / 10) # create data frame with scaled data Time <- unique(VFSeries$Time) / 365 # years since baseline visit W <- HFAII_Queen[-blind_spot, -blind_spot] # visual field adjacency matrix (data object from womblR) M <- dim(W)[1] # number of locations ###Prior bounds for psi TimeDist <- as.matrix(dist(Time)) BPsi <- log(0.025) / -min(TimeDist[TimeDist > 0]) APsi <- log(0.975) / -max(TimeDist) ###MCMC options K <- 10 # number of latent factors O <- 1 # number of spatial observation types Hypers <- list(Sigma2 = list(A = 0.001, B = 0.001), Kappa = list(SmallUpsilon = O + 1, BigTheta = diag(O)), Delta = list(A1 = 1, A2 = 20), Psi = list(APsi = APsi, BPsi = BPsi), Upsilon = list(Zeta = K + 1, Omega = diag(K))) Starting <- list(Sigma2 = 1, Kappa = diag(O), Delta = 2 * (1:K), Psi = (APsi + BPsi) / 2, Upsilon = diag(K)) Tuning <- list(Psi = 1) MCMC <- list(NBurn = 1000, NSims = 1000, NThin = 2, NPilot = 5) ###Fit MCMC Sampler reg.bfa_sp <- bfa_sp(Y ~ 0, data = dat, dist = W, time = Time, K = 10, starting = Starting, hypers = Hypers, tuning = Tuning, mcmc = MCMC, L = Inf, family = "tobit", trials = NULL, temporal.structure = "exponential", spatial.structure = "discrete", seed = 54, gamma.shrinkage = TRUE, include.space = TRUE, clustering = TRUE) ###Note that this code produces the pre-computed data object "reg.bfa_sp" ###More details can be found in the vignette.
###Load womblR for example visual field data library(womblR) ###Format data for MCMC sampler blind_spot <- c(26, 35) # define blind spot VFSeries <- VFSeries[order(VFSeries$Location), ] # sort by location VFSeries <- VFSeries[order(VFSeries$Visit), ] # sort by visit VFSeries <- VFSeries[!VFSeries$Location %in% blind_spot, ] # remove blind spot locations dat <- data.frame(Y = VFSeries$DLS / 10) # create data frame with scaled data Time <- unique(VFSeries$Time) / 365 # years since baseline visit W <- HFAII_Queen[-blind_spot, -blind_spot] # visual field adjacency matrix (data object from womblR) M <- dim(W)[1] # number of locations ###Prior bounds for psi TimeDist <- as.matrix(dist(Time)) BPsi <- log(0.025) / -min(TimeDist[TimeDist > 0]) APsi <- log(0.975) / -max(TimeDist) ###MCMC options K <- 10 # number of latent factors O <- 1 # number of spatial observation types Hypers <- list(Sigma2 = list(A = 0.001, B = 0.001), Kappa = list(SmallUpsilon = O + 1, BigTheta = diag(O)), Delta = list(A1 = 1, A2 = 20), Psi = list(APsi = APsi, BPsi = BPsi), Upsilon = list(Zeta = K + 1, Omega = diag(K))) Starting <- list(Sigma2 = 1, Kappa = diag(O), Delta = 2 * (1:K), Psi = (APsi + BPsi) / 2, Upsilon = diag(K)) Tuning <- list(Psi = 1) MCMC <- list(NBurn = 1000, NSims = 1000, NThin = 2, NPilot = 5) ###Fit MCMC Sampler reg.bfa_sp <- bfa_sp(Y ~ 0, data = dat, dist = W, time = Time, K = 10, starting = Starting, hypers = Hypers, tuning = Tuning, mcmc = MCMC, L = Inf, family = "tobit", trials = NULL, temporal.structure = "exponential", spatial.structure = "discrete", seed = 54, gamma.shrinkage = TRUE, include.space = TRUE, clustering = TRUE) ###Note that this code produces the pre-computed data object "reg.bfa_sp" ###More details can be found in the vignette.
Calculates diagnostic metrics using output from the spBFA
model.
diagnostics( object, diags = c("dic", "dinf", "waic"), keepDeviance = FALSE, keepPPD = FALSE, Verbose = TRUE, seed = 54 )
diagnostics( object, diags = c("dic", "dinf", "waic"), keepDeviance = FALSE, keepPPD = FALSE, Verbose = TRUE, seed = 54 )
object |
A |
diags |
A vector of character strings indicating the diagnostics to compute. Options include: Deviance Information Criterion ("dic"), d-infinity ("dinf") and Watanabe-Akaike information criterion ("waic"). At least one option must be included. Note: The probit model cannot compute the DIC or WAIC diagnostics due to computational issues with computing the multivariate normal CDF. |
keepDeviance |
A logical indicating whether the posterior deviance distribution is returned (default = FALSE). |
keepPPD |
A logical indicating whether the posterior predictive distribution at each observed location is returned (default = FALSE). |
Verbose |
A boolean logical indicating whether progress should be output (default = TRUE). |
seed |
An integer value used to set the seed for the random number generator (default = 54). |
To assess model fit, DIC, d-infinity and WAIC are used. DIC is based on the deviance statistic and penalizes for the complexity of a model with an effective number of parameters estimate pD (Spiegelhalter et al 2002). The d-infinity posterior predictive measure is an alternative diagnostic tool to DIC, where d-infinity=P+G. The G term decreases as goodness of fit increases, and P, the penalty term, inflates as the model becomes over-fit, so small values of both of these terms and, thus, small values of d-infinity are desirable (Gelfand and Ghosh 1998). WAIC is invariant to parametrization and is asymptotically equal to Bayesian cross-validation (Watanabe 2010). WAIC = -2 * (lppd - p_waic_2). Where lppd is the log pointwise predictive density and p_waic_2 is the estimated effective number of parameters based on the variance estimator from Vehtari et al. 2016. (p_waic_1 is the mean estimator).
diagnostics
returns a list containing the diagnostics requested and
possibly the deviance and/or posterior predictive distribution objects.
Samuel I. Berchuck
Gelfand, A. E., & Ghosh, S. K. (1998). Model choice: a minimum posterior predictive loss approach. Biometrika, 1-11.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583-639.
Vehtari, A., Gelman, A., & Gabry, J. (2016). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 1-20.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(Dec), 3571-3594.
###Load pre-computed regression results data(reg.bfa_sp) ###Compute and print diagnostics diags <- diagnostics(reg.bfa_sp) print(unlist(diags))
###Load pre-computed regression results data(reg.bfa_sp) ###Compute and print diagnostics diags <- diagnostics(reg.bfa_sp) print(unlist(diags))
is.spBFA
is a general test of an object being interpretable as a
spBFA
object.
is.spBFA(x)
is.spBFA(x)
x |
object to be tested. |
The spBFA
class is defined as the regression object that
results from the spBFA
regression function.
is.spBFA
returns a logical, depending on whether the object is of class spBFA
.
###Load pre-computed results data(reg.bfa_sp) ###Test function is.spBFA(reg.bfa_sp)
###Load pre-computed results data(reg.bfa_sp) ###Test function is.spBFA(reg.bfa_sp)
Predicts future observations from the spBFA
model.
## S3 method for class 'spBFA' predict( object, NewTimes, NewX = NULL, NewTrials = NULL, type = "temporal", Verbose = TRUE, seed = 54, ... )
## S3 method for class 'spBFA' predict( object, NewTimes, NewX = NULL, NewTrials = NULL, type = "temporal", Verbose = TRUE, seed = 54, ... )
object |
A |
NewTimes |
A numeric vector including desired time(s) points for prediction. |
NewX |
A matrix including covariates at times |
NewTrials |
An array indicating the trials for categorical predictions. The array must have dimension |
type |
A character string indicating the type of prediction, choices include "temporal" and "spatial". Spatial prediction has not been implemented yet. |
Verbose |
A boolean logical indicating whether progress should be output. |
seed |
An integer value used to set the seed for the random number generator (default = 54). |
... |
other arguments. |
predict.spBFA
uses Bayesian krigging to predict vectors at future
time points. The function returns the krigged factors (Eta
) and also the observed outcomes (Y
).
predict.spBFA
returns a list containing the following objects.
Eta
A list
containing NNewVistis
matrices, one for each new time prediction. Each matrix is dimension NKeep x K
, where
K
is the number of latent factors Each matrix contains posterior samples obtained by Bayesian krigging.
Y
A list
containing NNewVistis
posterior predictive distribution
matrices. Each matrix is dimension NKeep x (M * O)
, where M
is the number of spatial locations and O
the number of observation types.
Each matrix is obtained through Bayesian krigging.
Samuel I. Berchuck
###Load pre-computed regression results data(reg.bfa_sp) ###Compute predictions pred <- predict(reg.bfa_sp, NewTimes = 3) pred.observations <- pred$Y$Y10 # observed data predictions pred.krig <- pred$Eta$Eta10 # krigged parameters
###Load pre-computed regression results data(reg.bfa_sp) ###Compute predictions pred <- predict(reg.bfa_sp, NewTimes = 3) pred.observations <- pred$Y$Y10 # observed data predictions pred.krig <- pred$Eta$Eta10 # krigged parameters
bfa_sp
The data object reg.bfa_sp
is a pre-computed spBFA
data object for use in the package vignette and function examples.
data(reg.bfa_sp)
data(reg.bfa_sp)
The data object reg.bfa_sp
is a spBFA
data object that is the result of implementing the MCMC code in the vignette.
It is presented here because the run-time would be unecessarily long when compiling the R package.
spBFA
is a package for Bayesian spatial factor analysis. A corresponding manuscript is forthcoming.
Samuel I. Berchuck [email protected]