Title: | Multivariate Bayesian Model with Shrinkage Priors |
---|---|
Description: | Gibbs sampler for fitting multivariate Bayesian linear regression with shrinkage priors (MBSP), using the three parameter beta normal family. The method is described in Bai and Ghosh (2018) <doi:10.1016/j.jmva.2018.04.010>. |
Authors: | Ray Bai, Malay Ghosh |
Maintainer: | Ray Bai <[email protected]> |
License: | GPL-3 |
Version: | 4.0 |
Built: | 2024-12-04 07:09:42 UTC |
Source: | CRAN |
This function provides a way to draw a sample from the matrix-normal distribution, given the mean matrix, the covariance structure of the rows, and the covariance structure of the columns.
matrix_normal(M, U, V)
matrix_normal(M, U, V)
M |
mean |
U |
|
V |
|
This function provides a way to draw a random matrix from the matrix-normal distribution,
where is the
mean matrix,
is an
covariance matrix, and
is a
covariance matrix.
A randomly drawn matrix from
.
Ray Bai and Malay Ghosh
# Draw a random 50x20 matrix from MN(O,U,V), # where: # O = zero matrix of dimension 50x20 # U has AR(1) structure, # V has sigma^2*I structure # Specify Mean.mat p <- 50 q <- 20 Mean_mat <- matrix(0, nrow=p, ncol=q) # Construct U rho <- 0.5 times <- 1:p H <- abs(outer(times, times, "-")) U <- rho^H # Construct V sigma_sq <- 2 V <- sigma_sq*diag(q) # Draw from MN(Mean_mat, U, V) mn_draw <- matrix_normal(Mean_mat, U, V)
# Draw a random 50x20 matrix from MN(O,U,V), # where: # O = zero matrix of dimension 50x20 # U has AR(1) structure, # V has sigma^2*I structure # Specify Mean.mat p <- 50 q <- 20 Mean_mat <- matrix(0, nrow=p, ncol=q) # Construct U rho <- 0.5 times <- 1:p H <- abs(outer(times, times, "-")) U <- rho^H # Construct V sigma_sq <- 2 V <- sigma_sq*diag(q) # Draw from MN(Mean_mat, U, V) mn_draw <- matrix_normal(Mean_mat, U, V)
This function provides a fully Bayesian approach for obtaining a (nearly) sparse estimate of the regression coefficients matrix
in the multivariate linear regression model,
using the three parameter beta normal (TPBN) family. Here is the
matrix with
samples of
response variables,
is the
design matrix with
samples of
covariates, and
is the
noise matrix with independent rows. The complete model is described in Bai and Ghosh (2018).
If there are confounding variables which must remain in the model and should not be regularized, then these can be included in the model by putting them in a
separate
confounding matrix
. Then the model that is fit is
where is the
regression coefficients matrix corresponding to the confounders. In this case, we put a flat prior on
. By default, confounders are not included.
If the user desires, two information criteria can be computed: the Deviance Information Criterion (DIC) of Spiegelhalter et al. (2002) and the widely applicable information criterion (WAIC) of Watanabe (2010).
MBSP(Y, X, confounders=NULL, u=0.5, a=0.5, tau=NA, max_steps=6000, burnin=1000, save_samples=TRUE, model_criteria=FALSE)
MBSP(Y, X, confounders=NULL, u=0.5, a=0.5, tau=NA, max_steps=6000, burnin=1000, save_samples=TRUE, model_criteria=FALSE)
Y |
Response matrix of |
X |
Design matrix of |
confounders |
Optional design matrix |
u |
The first parameter in the TPBN family. Defaults to |
a |
The second parameter in the TPBN family. Defaults to |
tau |
The global parameter. If the user does not specify this (tau=NA), the Gibbs sampler will use |
max_steps |
The total number of iterations to run in the Gibbs sampler. Defaults to |
burnin |
The number of burn-in iterations for the Gibbs sampler. Defaults to |
save_samples |
A Boolean variable for whether to save all of the posterior samples of the regression coefficients matrix B and the covariance matrix Sigma. Defaults to |
model_criteria |
A Boolean variable for whether to compute the following information criteria: DIC (Deviance Information Criterion) and WAIC (widely applicable information criterion). Can be used to compare models with (for example) different choices of |
The function performs (nearly) sparse estimation of the regression coefficients matrix and variable selection from the
covariates. The lower and upper endpoints of the 95 percent posterior credible intervals for each of the
elements of
are also returned so that the user may assess uncertainty quantification.
In the three parameter beta normal (TPBN) family, corresponds to the horseshoe prior,
corresponds to the Strawderman-Berger prior, and
corresponds to the normal-exponential-gamma (NEG) prior. This function uses the horseshoe prior as the default shrinkage prior.
The user also has the option of including an matrix with
confounding variables. These confounders are variables which are included in the model but should not be regularized.
Finally, if the user specifies model_criteria=TRUE
, then the MBSP
function will compute two model selection criteria: the Deviance Information Criterion (DIC) of Spiegelhalter et al. (2002) and the widely applicable information criterion (WAIC) of Watanabe (2010). This permits model comparisons between (for example) different choices of u
, a
, and tau
. The default horseshoe prior and choice of tau
performs well, but the user may wish to experiment with u
, a
, and tau
. In general, models with lower DIC or WAIC are preferred.
The function returns a list containing the following components:
B_est |
The point estimate of the |
B_CI_lower |
The 2.5th percentile of the posterior density (or the lower endpoint of the 95 percent credible interval) for all |
B_CI_upper |
The 97.5th percentile of the posterior density (or the upper endpoint of the 95 percent credible interval) for all |
active_predictors |
The row indices of the active (nonzero) covariates chosen by our model from the |
B_samples |
All |
C_est |
The point estimate of the |
C_CI_lower |
The 2.5th percentile of the posterior density (or the lower endpoint of the 95 percent credible interval) for all |
C_CI_upper |
The 97.5th percentile of the posterior density (or the upper endpoint of the 95 percent credible interval) for all |
C_samples |
All |
Sigma_est |
The point estimate of the |
Sigma_CI_lower |
The 2.5th percentile of the posterior density (or the lower endpoint of the 95 percent credible interval) for all |
Sigma_CI_upper |
The 97.5th percentile of the posterior density (or the upper endpoint of the 95 percent credible interval) for all |
Sigma_samples |
All |
DIC |
The Deviance Information Criterion (DIC), which can be used for model comparison. Models with smaller DIC are preferred. This only returns a numeric value if |
WAIC |
The widely applicable information criterion (WAIC), which can be used for model comparison. Models with smaller WAIC are preferred. This only returns a numeric value if |
Ray Bai and Malay Ghosh
Armagan, A., Clyde, M., and Dunson, D.B. (2011) Generalized beta mixtures of Gaussians. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger (Eds.) Advances in Neural Information Processing Systems 24, 523-531.
Bai, R. and Ghosh, M. (2018). High-dimensional multivariate posterior consistency under global-local shrinkage priors. Journal of Multivariate Analysis, 167: 157-170.
Berger, J. (1980). A robust generalized Bayes estimator and confidence region for a multivariate normal mean. Annals of Statistics, 8(4): 716-761.
Carvalho, C.M., Polson, N.G., and Scott., J.G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2): 465-480.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4): 583-639.
Strawderman, W.E. (1971). Proper Bayes Minimax Estimators of the Multivariate Normal Mean. Annals of Mathematical Statistics, 42(1): 385-388.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11: 3571-3594.
################################### # Set n, p, q, and sparsity level # ################################### n <- 100 p <- 40 q <- 3 # number of response variables is 3 p_act <- 5 # number of active (nonzero) predictors is 5 ############################# # Generate design matrix X. # ############################# set.seed(1234) times <- 1:p rho <- 0.5 H <- abs(outer(times, times, "-")) V <- rho^H mu <- rep(0, p) # Rows of X are simulated from MVN(0,V) X <- mvtnorm::rmvnorm(n, mu, V) # Center X X <- scale(X, center=TRUE, scale=FALSE) ############################################ # Generate true coefficient matrix B_true. # ############################################ # Entries in nonzero rows are drawn from Unif[(-5,-0.5)U(0.5,5)] B_act <- runif(p_act*q,-5,4) disjoint <- function(x){ if(x <= -0.5) return(x) else return(x+1) } B_act <- matrix(sapply(B_act, disjoint),p_act,q) # Set rest of the rows equal to 0 B_true <- rbind(B_act,matrix(0,p-p_act,q)) B_true <- B_true[sample(1:p),] # permute the rows ######################################### # Generate true error covariance Sigma. # ######################################### sigma_sq=2 times <- 1:q H <- abs(outer(times, times, "-")) Sigma <- sigma_sq * rho^H ############################ # Generate noise matrix E. # ############################ mu <- rep(0,q) E <- mvtnorm::rmvnorm(n, mu, Sigma) ############################## # Generate response matrix Y # ############################## Y <- crossprod(t(X),B_true) + E # Note that there are no confounding variables in this synthetic example ######################################### # Fit the MBSP model on synthetic data. # ######################################### # Should use default of max_steps=6000, burnin=1000 in practice. mbsp_model = MBSP(Y=Y, X=X, max_steps=1000, burnin=500, model_criteria=FALSE) # Recommended to use the default, i.e. can simply use: mbsp_model = MBSP(Y, X) # If you want to return the DIC and WAIC, have to set model_criteria=TRUE. # indices of the true nonzero rows true_active_predictors <- which(rowSums(B_true)!=0) true_active_predictors # variables selected by the MBSP model mbsp_model$active_predictors # true regression coeficients in the true nonzero rows B_true[true_active_predictors, ] # the MBSP model's estimates of the nonzero rows mbsp_model$B_est[true_active_predictors, ]
################################### # Set n, p, q, and sparsity level # ################################### n <- 100 p <- 40 q <- 3 # number of response variables is 3 p_act <- 5 # number of active (nonzero) predictors is 5 ############################# # Generate design matrix X. # ############################# set.seed(1234) times <- 1:p rho <- 0.5 H <- abs(outer(times, times, "-")) V <- rho^H mu <- rep(0, p) # Rows of X are simulated from MVN(0,V) X <- mvtnorm::rmvnorm(n, mu, V) # Center X X <- scale(X, center=TRUE, scale=FALSE) ############################################ # Generate true coefficient matrix B_true. # ############################################ # Entries in nonzero rows are drawn from Unif[(-5,-0.5)U(0.5,5)] B_act <- runif(p_act*q,-5,4) disjoint <- function(x){ if(x <= -0.5) return(x) else return(x+1) } B_act <- matrix(sapply(B_act, disjoint),p_act,q) # Set rest of the rows equal to 0 B_true <- rbind(B_act,matrix(0,p-p_act,q)) B_true <- B_true[sample(1:p),] # permute the rows ######################################### # Generate true error covariance Sigma. # ######################################### sigma_sq=2 times <- 1:q H <- abs(outer(times, times, "-")) Sigma <- sigma_sq * rho^H ############################ # Generate noise matrix E. # ############################ mu <- rep(0,q) E <- mvtnorm::rmvnorm(n, mu, Sigma) ############################## # Generate response matrix Y # ############################## Y <- crossprod(t(X),B_true) + E # Note that there are no confounding variables in this synthetic example ######################################### # Fit the MBSP model on synthetic data. # ######################################### # Should use default of max_steps=6000, burnin=1000 in practice. mbsp_model = MBSP(Y=Y, X=X, max_steps=1000, burnin=500, model_criteria=FALSE) # Recommended to use the default, i.e. can simply use: mbsp_model = MBSP(Y, X) # If you want to return the DIC and WAIC, have to set model_criteria=TRUE. # indices of the true nonzero rows true_active_predictors <- which(rowSums(B_true)!=0) true_active_predictors # variables selected by the MBSP model mbsp_model$active_predictors # true regression coeficients in the true nonzero rows B_true[true_active_predictors, ] # the MBSP model's estimates of the nonzero rows mbsp_model$B_est[true_active_predictors, ]