Title: | Nonparametric Varying Coefficient Spike-and-Slab Lasso |
---|---|
Description: | Fits Bayesian regularized varying coefficient models with the Nonparametric Varying Coefficient Spike-and-Slab Lasso (NVC-SSL) introduced by Bai et al. (2023) <arXiv:1907.06477>. Functions to fit frequentist penalized varying coefficients are also provided, with the option of employing the group lasso penalty of Yuan and Lin (2006) <doi:10.1111/j.1467-9868.2005.00532.x>, the group minimax concave penalty (MCP) of Breheny and Huang <doi:10.1007/s11222-013-9424-2>, or the group smoothly clipped absolute deviation (SCAD) penalty of Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2>. |
Authors: | Ray Bai |
Maintainer: | Ray Bai <[email protected]> |
License: | GPL-3 |
Version: | 2.0 |
Built: | 2024-12-12 07:03:19 UTC |
Source: | CRAN |
This function implements frequentist penalized nonparametric varying coefficient (NVC) models. It supports the following penalty functions: the group lasso penalty of Yuan and Lin (2006), the group minimax concave penalty (MCP) of Breheny and Huang (2015), and the group smoothly clipped absolute deviation (SCAD) penalty of Breheny and Huang (2015). This function solves a penalized regression problem of the form,
where is the total number of observations,
is the loss function, and
is a penalty function with regularization parameter
. Since the objective function is rescaled by
, the penalty
is typically smaller than the spike hyperparameter
used by the
NVC_SSL
function. The BIC criterion is used to select the optimal tuning parameter .
NVC_frequentist(y, t, X, n_basis=8, penalty=c("gLASSO","gSCAD","gMCP"), lambda=NULL, include_intercept=TRUE)
NVC_frequentist(y, t, X, n_basis=8, penalty=c("gLASSO","gSCAD","gMCP"), lambda=NULL, include_intercept=TRUE)
y |
|
t |
|
X |
|
n_basis |
number of basis functions to use. Default is |
penalty |
string specifying which penalty function to use. Specify |
lambda |
grid of tuning parameters. If |
include_intercept |
Boolean variable for whether or not to include an intercept function |
The function returns a list containing the following components:
t_ordered |
all |
classifications |
|
beta_hat |
|
beta0_hat |
estmate of the intercept function |
gamma_hat |
estimated basis coefficients (needed for prediction) for the optimal |
lambda_min |
the individual |
lambda0_all |
grid of all |
BIC_all |
|
beta_est_all_lambda |
list of length |
beta0_est_all_lambda |
|
gamma_est_all_lambda |
|
classifications_all_lambda |
|
iters_to_converge |
number of iterations it took for the group ascent algorithm to converge for each entry in |
Bai, R., Boland, M. R., and Chen, Y. (2023). "Scalable high-dimensional Bayesian varying coefficient models with unknown within-subject covariance." arXiv pre-print arXiv:arXiv:1907.06477.
Breheny, P. and Huang, J. (2015). "Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors." Statistics and Computing, 25:173-187.
Wei, F., Huang, J., and Li, H. (2011). "Variable selection and estimation in high-dimensional varying coefficient models." Statistica Sinica, 21:1515-1540.
Yuan, M. and Lin, Y. (2006). "Model selection and estimation in regression with grouped variables." Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68:49-67.
## Load data data(SimulatedData) attach(SimulatedData) y = SimulatedData$y t = SimulatedData$t id = SimulatedData$id X = SimulatedData[,4:103] ## Fit frequentist penalized NVC model with the SCAD penalty. ## Can set penalty as "gLASSO", "gSCAD", or "gMCP". ## No need to specify an 'id' argument when using NVC_frequentist() function NVC_gSCAD_mod = NVC_frequentist(y, t, X, penalty="gSCAD") ## Classifications. First varying coefficients are selected as nonzero NVC_gSCAD_mod$classifications ## Optimal lambda chosen from BIC NVC_gSCAD_mod$lambda_min ## Plot first estimated varying coefficient function t_ordered = NVC_gSCAD_mod$t_ordered beta_hat= NVC_gSCAD_mod$beta_hat plot(t_ordered, beta_hat[,1], lwd=3, type='l', col='blue', xlab="Time", ylim = c(-12,12), ylab=expression(beta[1])) ## Plot third estimated varying coefficient function plot(t_ordered, beta_hat[,3], lwd=3, type='l', col='blue', xlab="Time", ylim = c(-4,2), ylab=expression(beta[3])) ## Plot fifth estimated varying coefficient function plot(t_ordered, beta_hat[,5], lwd=3, type='l', col='blue', xlab="Time", ylim = c(0,15), ylab=expression(beta[5]))
## Load data data(SimulatedData) attach(SimulatedData) y = SimulatedData$y t = SimulatedData$t id = SimulatedData$id X = SimulatedData[,4:103] ## Fit frequentist penalized NVC model with the SCAD penalty. ## Can set penalty as "gLASSO", "gSCAD", or "gMCP". ## No need to specify an 'id' argument when using NVC_frequentist() function NVC_gSCAD_mod = NVC_frequentist(y, t, X, penalty="gSCAD") ## Classifications. First varying coefficients are selected as nonzero NVC_gSCAD_mod$classifications ## Optimal lambda chosen from BIC NVC_gSCAD_mod$lambda_min ## Plot first estimated varying coefficient function t_ordered = NVC_gSCAD_mod$t_ordered beta_hat= NVC_gSCAD_mod$beta_hat plot(t_ordered, beta_hat[,1], lwd=3, type='l', col='blue', xlab="Time", ylim = c(-12,12), ylab=expression(beta[1])) ## Plot third estimated varying coefficient function plot(t_ordered, beta_hat[,3], lwd=3, type='l', col='blue', xlab="Time", ylim = c(-4,2), ylab=expression(beta[3])) ## Plot fifth estimated varying coefficient function plot(t_ordered, beta_hat[,5], lwd=3, type='l', col='blue', xlab="Time", ylim = c(0,15), ylab=expression(beta[5]))
This is a function to predict the responses for new subjects
at new time points
with new covariates
. The function accepts an estimated NVC model that was fit using either the
NVC_SSL
or NVC_frequentist
functions and returns the predicted 's. This function can be used for either out-of-sample predictions or for in-sample predictions if the "new" subjects are the same as the ones used to obtain the fitted NVC model.
NVC_predict(NVC_mod, t_new, id_new, X_new)
NVC_predict(NVC_mod, t_new, id_new, X_new)
NVC_mod |
an object with a fitted NVC model returned by the |
t_new |
vector of new observation times |
id_new |
vector of new labels, where a label corresponds to one of the new subjects |
X_new |
new design matrix with columns |
The function returns a list containing the following components:
id |
vector of each |
time |
vector of each |
y_pred |
vector of predicted responses corresponding to each |
Bai, R., Boland, M. R., and Chen, Y. (2023). "Scalable high-dimensional Bayesian varying coefficient models with unknown within-subject covariance." arXiv pre-print arXiv:arXiv:1907.06477.
## Load simulated data data(SimulatedData) attach(SimulatedData) y = SimulatedData$y t = SimulatedData$t id = SimulatedData$id X = SimulatedData[,4:103] ## Fit frequentist penalized NVC model with the group lasso penalty. ## No need to specify an 'id' argument when using NVC_frequentist() function. NVC_gLASSO_mod = NVC_frequentist(y=y, t=t, X=X, penalty="gLASSO") ## Make in-sample predictions. Here, we DO need to specify 'id' argument NVC_gLASSO_predictions = NVC_predict(NVC_gLASSO_mod, t_new=t, id_new=id, X_new=X) ## Subjects NVC_gLASSO_predictions$id ## Observation times NVC_gLASSO_predictions$time ## Predicted responses NVC_gLASSO_predictions$y_pred ## Fit NVC-SSL model to the data instead. Here, we do need to specify id NVC_SSL_mod = NVC_SSL(y=y, t=t, id=id, X=X) NVC_SSL_predictions = NVC_predict(NVC_SSL_mod, t_new = t, id_new=id, X_new=X) ## Subjects NVC_SSL_predictions$id ## Observation times NVC_SSL_predictions$time ## Predicted responses NVC_SSL_predictions$y_pred
## Load simulated data data(SimulatedData) attach(SimulatedData) y = SimulatedData$y t = SimulatedData$t id = SimulatedData$id X = SimulatedData[,4:103] ## Fit frequentist penalized NVC model with the group lasso penalty. ## No need to specify an 'id' argument when using NVC_frequentist() function. NVC_gLASSO_mod = NVC_frequentist(y=y, t=t, X=X, penalty="gLASSO") ## Make in-sample predictions. Here, we DO need to specify 'id' argument NVC_gLASSO_predictions = NVC_predict(NVC_gLASSO_mod, t_new=t, id_new=id, X_new=X) ## Subjects NVC_gLASSO_predictions$id ## Observation times NVC_gLASSO_predictions$time ## Predicted responses NVC_gLASSO_predictions$y_pred ## Fit NVC-SSL model to the data instead. Here, we do need to specify id NVC_SSL_mod = NVC_SSL(y=y, t=t, id=id, X=X) NVC_SSL_predictions = NVC_predict(NVC_SSL_mod, t_new = t, id_new=id, X_new=X) ## Subjects NVC_SSL_predictions$id ## Observation times NVC_SSL_predictions$time ## Predicted responses NVC_SSL_predictions$y_pred
This function implements the Nonparametric Varying Coefficient Spike-and-Slab Lasso (NVC-SSL) model of Bai et al. (2023) for high-dimensional NVC models. The function returns the MAP estimator for the varying coefficients obtained from the ECM algorithm described in Bai et al. (2023). The BIC criterion is used to select the optimal spike hyperparameter
.
If the user specifies return_CI=TRUE
, then this function will also return the 95 percent pointwise posterior credible intervals for the varying coefficients obtained from Gibbs sampling. If the number of covariates
is large, then the user can additionally use the approximate MCMC algorithm introduced in Bai et al. (2023) (
approx_MCMC=TRUE
) which is much faster than the exact Gibbs sampler and gives higher simultaneous coverage.
Finally, this function returns the number of iterations and the runtime for the ECM algorithms and MCMC algorithms which can be used for benchmarking and timing comparisons.
NVC_SSL(y, t, id, X, n_basis=8, lambda0=seq(from=300,to=10,by=-10), lambda1=1, a=1, b=ncol(X), c0=1, d0=1, nu=n_basis+2, Phi=diag(n_basis), include_intercept=TRUE, tol=1e-6, max_iter=100, return_CI=FALSE, approx_MCMC=FALSE, n_samples=1500, burn=500, print_iter=TRUE)
NVC_SSL(y, t, id, X, n_basis=8, lambda0=seq(from=300,to=10,by=-10), lambda1=1, a=1, b=ncol(X), c0=1, d0=1, nu=n_basis+2, Phi=diag(n_basis), include_intercept=TRUE, tol=1e-6, max_iter=100, return_CI=FALSE, approx_MCMC=FALSE, n_samples=1500, burn=500, print_iter=TRUE)
y |
|
t |
|
id |
|
X |
|
n_basis |
number of basis functions to use. Default is |
lambda0 |
grid of spike hyperparameters. Default is to tune |
lambda1 |
slab hyperparameter. Default is |
a |
hyperparameter in |
b |
hyperparameter in |
c0 |
hyperparameter in Inverse-Gamma |
d0 |
hyperparameter in Inverse-Gamma |
nu |
degrees of freedom for Inverse-Wishart prior on |
Phi |
scale matrix in the Inverse-Wishart prior on |
include_intercept |
Boolean variable for whether or not to include an intercept function |
tol |
convergence criteria for the ECM algorithm. Default is |
max_iter |
maximum number of iterations to run ECM algorithm. Default is |
return_CI |
Boolean variable for whether or not to return the 95 percent pointwise credible bands. Set |
approx_MCMC |
Boolean variable for whether or not to run the approximate MCMC algorithm instead of the exact MCMC algorithm. If |
n_samples |
number of MCMC samples to save for posterior inference. The default is to save |
burn |
number of initial MCMC samples to discard during the warm-up period. Default is |
print_iter |
Boolean variable for whether or not to print the progress of the algorithms. Default is |
The function returns a list containing the following components:
t_ordered |
all |
classifications |
|
beta_hat |
|
beta0_hat |
MAP estimate of the intercept function |
gamma_hat |
MAP estimates of the basis coefficients (needed for prediction) for the optimal |
beta_post_mean |
|
beta_CI_lower |
|
beta_CI_upper |
|
beta0_post_mean |
Posterior mean estimate of the intercept function |
beta0_CI_lower |
Lower endpoints of the 95 percent pointwise posterior credible intervals (CIs) for the intercept function |
beta0_CI_upper |
Upper endpoints of the 95 percent pointwise posterior credible intervals (CIs) for the intercept function |
gamma_post_mean |
Posterior mean estimates of all the basis coefficients. This is not returned if |
gamma_CI_lower |
Lower endpoints of the 95 percent posterior credible intervals for the basis coefficients. This is not returned if |
gamma_CI_upper |
Upper endpoints of the 95 percent posterior credible intervals for the basis coefficients. This is not returned if |
post_incl |
|
lambda0_min |
the individual |
lambda0_all |
grid of all |
BIC_all |
|
beta_est_all_lambda0 |
list of length |
beta0_est_all_lambda0 |
|
gamma_est_all_lambda0 |
|
classifications_all_lambda0 |
|
ECM_iters_to_converge |
number of iterations it took for the ECM algorithm to converge for each entry in |
ECM_runtimes |
|
gibbs_runtime |
number of minutes it took for the Gibbs sampling algorithm to run for the total number of MCMC iterations given in |
gibbs_iters |
total number of MCMC iterations run for posterior inference |
Bai, R., Boland, M. R., and Chen, Y. (2023). "Scalable high-dimensional Bayesian varying coefficient models with unknown within-subject covariance." arXiv pre-print arXiv:arXiv:1907.06477.
Bai, R., Moran, G. E., Antonelli, J. L., Chen, Y., and Boland, M.R. (2022). "Spike-and-slab group lassos for grouped regression and sparse generalized additive models." Journal of the American Statistical Association, 117:184-197.
## Load data data(SimulatedData) attach(SimulatedData) y = SimulatedData$y t = SimulatedData$t id = SimulatedData$id X = SimulatedData[,4:103] ## Fit NVC-SSL model. Default implementation uses a grid of 30 lambdas. ## Below illustration uses just two well-chosen lambdas NVC_SSL_mod = NVC_SSL(y, t, id, X, lambda0=c(60,50)) ## NOTE: Should use default, which will search for lambda0 from a bigger grid # NVC_SSL_mod = NVC_SSL(y, t, id, X) ## Classifications. First 6 varying coefficients are selected as nonzero NVC_SSL_mod$classifications ## Optimal lambda chosen from BIC NVC_SSL_mod$lambda0_min ## Plot first estimated varying coefficient function t_ordered = NVC_SSL_mod$t_ordered beta_hat= NVC_SSL_mod$beta_hat plot(t_ordered, beta_hat[,1], lwd=3, type='l', col='blue', xlab="Time", ylim = c(-12,12), ylab=expression(beta[1])) ## Plot third estimated varying coefficient function plot(t_ordered, beta_hat[,3], lwd=3, type='l', col='blue', xlab="Time", ylim = c(-4,2), ylab=expression(beta[3])) ## Plot fifth estimated varying coefficient function plot(t_ordered, beta_hat[,5], lwd=3, type='l', col='blue', xlab="Time", ylim = c(0,15), ylab=expression(beta[5])) ## If you want credible intervals, then set return_CI=TRUE to also run Gibbs sampler. ## Below, we run a total of 1000 MCMC iterations, discarding the first 500 as burnin ## and keeping the final 500 samples for inference. NVC_SSL_mod_2 = NVC_SSL(y, t, id, X, return_CI=TRUE, approx_MCMC=FALSE, n_samples=500, burn=500) ## Note that NVC_SSL() always computes a MAP estimator first and then ## initializes the Gibbs sampler with the MAP estimator. ## Plot third varying coefficient function and its credible bands t_ordered = NVC_SSL_mod_2$t_ordered beta_MAP = NVC_SSL_mod_2$beta_hat beta_mean = NVC_SSL_mod_2$beta_post_mean beta_CI_lower = NVC_SSL_mod_2$beta_CI_lower beta_CI_upper = NVC_SSL_mod_2$beta_CI_upper plot(t_ordered, beta_MAP[,3], lwd=3, type='l', col='blue', xlab="Time", ylim=c(-5,3), lty=1, ylab=expression(beta[3]), cex.lab=1.5) lines(t_ordered, beta_mean[,3], lwd=3, type='l', col='red', lty=4) lines(t_ordered, beta_CI_lower[,3], lwd=4, type='l', col='purple', lty=3) lines(t_ordered, beta_CI_upper[,3], lwd=4, type='l', col='purple', lty=3) legend("bottomleft", c("MAP", "Mean", "95 percent CI"), lty=c(1,4,3), lwd=c(2,2,3), col=c("blue","red","purple"), inset=c(0,1), xpd=TRUE, horiz=TRUE, bty="n") ## Plot fifth varying coefficient function and its credible bands plot(t_ordered, beta_MAP[,5], lwd=3, type='l', col='blue', xlab="Time", ylim=c(-1,14), lty=1, ylab=expression(beta[5]), cex.lab=1.5) lines(t_ordered, beta_mean[,5], lwd=3, type='l', col='red', lty=4) lines(t_ordered, beta_CI_lower[,5], lwd=4, type='l', col='purple', lty=3) lines(t_ordered, beta_CI_upper[,5], lwd=4, type='l', col='purple', lty=3) legend("bottomleft", c("MAP", "Mean", "95 percent CI"), lty=c(1,4,3), lwd=c(2,2,3), col=c("blue","red","purple"), inset=c(0,1), xpd=TRUE, horiz=TRUE, bty="n")
## Load data data(SimulatedData) attach(SimulatedData) y = SimulatedData$y t = SimulatedData$t id = SimulatedData$id X = SimulatedData[,4:103] ## Fit NVC-SSL model. Default implementation uses a grid of 30 lambdas. ## Below illustration uses just two well-chosen lambdas NVC_SSL_mod = NVC_SSL(y, t, id, X, lambda0=c(60,50)) ## NOTE: Should use default, which will search for lambda0 from a bigger grid # NVC_SSL_mod = NVC_SSL(y, t, id, X) ## Classifications. First 6 varying coefficients are selected as nonzero NVC_SSL_mod$classifications ## Optimal lambda chosen from BIC NVC_SSL_mod$lambda0_min ## Plot first estimated varying coefficient function t_ordered = NVC_SSL_mod$t_ordered beta_hat= NVC_SSL_mod$beta_hat plot(t_ordered, beta_hat[,1], lwd=3, type='l', col='blue', xlab="Time", ylim = c(-12,12), ylab=expression(beta[1])) ## Plot third estimated varying coefficient function plot(t_ordered, beta_hat[,3], lwd=3, type='l', col='blue', xlab="Time", ylim = c(-4,2), ylab=expression(beta[3])) ## Plot fifth estimated varying coefficient function plot(t_ordered, beta_hat[,5], lwd=3, type='l', col='blue', xlab="Time", ylim = c(0,15), ylab=expression(beta[5])) ## If you want credible intervals, then set return_CI=TRUE to also run Gibbs sampler. ## Below, we run a total of 1000 MCMC iterations, discarding the first 500 as burnin ## and keeping the final 500 samples for inference. NVC_SSL_mod_2 = NVC_SSL(y, t, id, X, return_CI=TRUE, approx_MCMC=FALSE, n_samples=500, burn=500) ## Note that NVC_SSL() always computes a MAP estimator first and then ## initializes the Gibbs sampler with the MAP estimator. ## Plot third varying coefficient function and its credible bands t_ordered = NVC_SSL_mod_2$t_ordered beta_MAP = NVC_SSL_mod_2$beta_hat beta_mean = NVC_SSL_mod_2$beta_post_mean beta_CI_lower = NVC_SSL_mod_2$beta_CI_lower beta_CI_upper = NVC_SSL_mod_2$beta_CI_upper plot(t_ordered, beta_MAP[,3], lwd=3, type='l', col='blue', xlab="Time", ylim=c(-5,3), lty=1, ylab=expression(beta[3]), cex.lab=1.5) lines(t_ordered, beta_mean[,3], lwd=3, type='l', col='red', lty=4) lines(t_ordered, beta_CI_lower[,3], lwd=4, type='l', col='purple', lty=3) lines(t_ordered, beta_CI_upper[,3], lwd=4, type='l', col='purple', lty=3) legend("bottomleft", c("MAP", "Mean", "95 percent CI"), lty=c(1,4,3), lwd=c(2,2,3), col=c("blue","red","purple"), inset=c(0,1), xpd=TRUE, horiz=TRUE, bty="n") ## Plot fifth varying coefficient function and its credible bands plot(t_ordered, beta_MAP[,5], lwd=3, type='l', col='blue', xlab="Time", ylim=c(-1,14), lty=1, ylab=expression(beta[5]), cex.lab=1.5) lines(t_ordered, beta_mean[,5], lwd=3, type='l', col='red', lty=4) lines(t_ordered, beta_CI_lower[,5], lwd=4, type='l', col='purple', lty=3) lines(t_ordered, beta_CI_upper[,5], lwd=4, type='l', col='purple', lty=3) legend("bottomleft", c("MAP", "Mean", "95 percent CI"), lty=c(1,4,3), lwd=c(2,2,3), col=c("blue","red","purple"), inset=c(0,1), xpd=TRUE, horiz=TRUE, bty="n")
This is a simulated dataset for illustration. It contains a total of observations at irregularly spaced time points for
subjects. There are
covariates.
data(SimulatedData)
data(SimulatedData)
This simulated dataset contains observations for
subjects, with
covariates. The first column
y
gives the response variables, the second column t
gives the observation times, the third column id
gives the unique IDs for each of the 50 subjects, and columns 4-103 (x1
, ..., x100
) give the covariate values.
This synthetic dataset is a slight modification from Experiment 2 in Section 5.1 of Bai et al. (2023). We use for illustration, instead of
as in the paper.
Bai, R., Boland, M. R., and Chen, Y. (2023). "Scalable high-dimensional Bayesian varying coefficient models with unknown within-subject covariance." arXiv pre-print arXiv:1907.06477.