Title: | Bayesian Penalized Quantile Regression |
---|---|
Description: | Bayesian regularized quantile regression utilizing sparse priors to impose exact sparsity leads to efficient Bayesian shrinkage estimation, variable selection and statistical inference. In this package, we have implemented robust Bayesian variable selection with spike-and-slab priors under high-dimensional linear regression models (Fan et al. (2024) <doi:10.3390/e26090794> and Ren et al. (2023) <doi:10.1111/biom.13670>), and regularized quantile varying coefficient models (Zhou et al.(2023) <doi:10.1016/j.csda.2023.107808>). In particular, valid robust Bayesian inferences under both models in the presence of heavy-tailed errors can be validated on finite samples. The Markov Chain Monte Carlo (MCMC) algorithms of the proposed and alternative models are implemented in C++. |
Authors: | Kun Fan [aut], Cen Wu [aut, cre], Jie Ren [aut], Fei Zhou [aut] |
Maintainer: | Cen Wu <[email protected]> |
License: | GPL-2 |
Version: | 1.1.1 |
Built: | 2025-02-24 02:59:16 UTC |
Source: | CRAN |
In this package, we implement Bayesian penalized quantile regression for the linear regression model and quantile varying coefficient (VC) model. Point-mass spike-and-slab priors have been incorporated in the Bayesian hierarchical models to facilitate Bayesian shrinkage estimation with exact sparsity in both models. The two default methods are Bayesian regularized quantile regression with spike-and-slab priors under the linear model and VC model, correspondingly. In addition to default methods, users can also choose methods without robustness and/or spike–and–slab priors.
The user friendly, integrated interface pqrBayes() allows users to flexibly choose fitting models by specifying the following parameters:
robust: | whether to fit a robust sparse quantile regression model or a quantile varying coefficient |
model or their non-robust counterparts. | |
sparse: | whether to use the spike-and-slab priors to impose exact sparsity. |
model: | whether to fit a linear model or a varying coefficient model. |
The function pqrBayes() returns a pqrBayes object that stores the posterior estimates of regression coefficients.
Fan, K., Subedi, S., Yang, G., Lu, X., Ren, J. and Wu, C. (2024). Is Seeing Believing? A Practitioner's Perspective on High-dimensional Statistical Inference in Cancer Genomics Studies. Entropy, 26(9).794 doi:10.3390/e26090794
Zhou, F., Ren, J., Ma, S. and Wu, C. (2023). The Bayesian regularized quantile varying coefficient model. Computational Statistics & Data Analysis, 107808 doi:10.1016/j.csda.2023.107808
Ren, J., Zhou, F., Li, X., Ma, S., Jiang, Y. and Wu, C. (2023). Robust Bayesian variable selection for gene-environment interactions. Biometrics, 79(2), 684-694 doi:10.1111/biom.13670
Wu, C., and Ma, S. (2015). A selective review of robust variable selection with applications in bioinformatics. Briefings in Bioinformatics, 16(5), 873–883 doi:10.1093/bib/bbu046
Zhou, F., Ren, J., Lu, X., Ma, S. and Wu, C. (2021). Gene–Environment Interaction: a Variable Selection Perspective. Epistasis. Methods in Molecular Biology. 2212:191–223 https://link.springer.com/protocol/10.1007/978-1-0716-0947-7_13
Ren, J., Zhou, F., Li, X., Chen, Q., Zhang, H., Ma, S., Jiang, Y. and Wu, C. (2020) Semi-parametric Bayesian variable selection for gene-environment interactions. Statistics in Medicine, 39: 617– 638 doi:10.1002/sim.8434
Ren, J., Zhou, F., Li, X., Wu, C. and Jiang, Y. (2019) spinBayes: Semi-Parametric Gene-Environment Interaction via Bayesian Variable Selection. R package version 0.1.0. https://CRAN.R-project.org/package=spinBayes
Wu, C., Jiang, Y., Ren, J., Cui, Y. and Ma, S. (2018). Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures. Statistics in Medicine, 37:437–456 doi:10.1002/sim.7518
Wu, C., Shi, X., Cui, Y. and Ma, S. (2015). A penalized robust semiparametric approach for gene-environment interactions. Statistics in Medicine, 34 (30): 4016–4030 doi:10.1002/sim.6609
Wu, C., Cui, Y., and Ma, S. (2014). Integrative analysis of gene–environment interactions under a multi–response partially linear varying coefficient model. Statistics in Medicine, 33(28), 4988–4998 doi:10.1002/sim.6287
Wu, C., Zhong, P.S. and Cui, Y. (2018). Additive varying–coefficient model for nonlinear gene–environment interactions. Statistical Applications in Genetics and Molecular Biology, 17(2) doi:10.1515/sagmb-2017-0008
Wu, C., Zhong, P.S. and Cui, Y. (2013). High dimensional variable selection for gene-environment interactions. Technical Report. Michigan State University.
Calculate 95% empirical coverage probabilities for regression coefficients under linear and VC models, respectively.
coverage(object,coefficient,u.grid=NULL,model="linear")
coverage(object,coefficient,u.grid=NULL,model="linear")
object |
the pqrBayes object. |
coefficient |
the vector of true regression coefficients under a linear model or the matrix of true varying coefficients evaluated on the grid points under a varying coefficient model. |
u.grid |
the vector of grid points under a varying coefficient model. When assessing empirical coverage probabilities under a linear model, u.grid = NULL. |
model |
the model to be fitted. Users can also choose "linear" for a linear model or "VC" for a varying coefficient model. |
c
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e coeff = data$coeff fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") coverage=coverage(fit1,coeff,model="linear")
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e coeff = data$coeff fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") coverage=coverage(fit1,coeff,model="linear")
Simulated data under sparse linear and quantile varying coefficient models
The data_linear object consists of 4 components: g, y, e and coeff. coeff contains the true values of parameters used for generating the response variable .
The data_varying object consists of five components: g, y, u, e and coeff. coeff contains the true values of parameters used for generating the response variable
.
Generating Y using a sparse linear (quantile) regression model
The true data generating model under sparse linear regression:
where ,
,
,
and
.
Generating Y using a (quantile) varying coefficient model
Data generation under sparse (quantile) VC model:
where ,
),
,
and
.
data(data) data = data$data_linear g=data$g dim(g) y=data$y coeff=data$coeff print(coeff) data = data$data_varying g=data$g dim(g) coeff=data$coeff print(coeff)
data(data) data = data$data_linear g=data$g dim(g) y=data$y coeff=data$coeff print(coeff) data = data$data_varying g=data$g dim(g) coeff=data$coeff print(coeff)
Calculate estimated regression coefficients with estimation accuracy from linear and quantile VC models, respectively.
estimation.pqrBayes(object,coefficient,u.grid=NULL,model="linear")
estimation.pqrBayes(object,coefficient,u.grid=NULL,model="linear")
object |
an object of class ‘pqrBayes’. |
coefficient |
the vector of quantile regression coefficients under a linear model or the matrix of true varying coefficients evaluated on the grid points under a varying coefficient model. |
u.grid |
the vector of grid points under a varying coefficient model. When fitting a linear (quantile) regression model, u.grid = NULL. |
model |
the model to be fitted. Users can choose "linear" for a linear model or "VC" for a varying coefficient model. |
an object of class ‘pqrBayes.est’ is returned, which is a list with components:
error |
mean square error or integrated mean square errors and total integrated mean square error. |
coeff.est |
estimated values of the regression coefficients or the varying coefficients. |
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e coeff = data$coeff fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") estimation=estimation.pqrBayes(fit1,coeff,model="linear")
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e coeff = data$coeff fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") estimation=estimation.pqrBayes(fit1,coeff,model="linear")
fit Bayesian penalized quantile regression for linear and varying coefficient models
pqrBayes( g, y, u = NULL, e, quant = 0.5, iterations = 10000, burn.in = NULL, spline, robust = TRUE, sparse = TRUE, model = "linear", hyper = NULL, debugging = FALSE )
pqrBayes( g, y, u = NULL, e, quant = 0.5, iterations = 10000, burn.in = NULL, spline, robust = TRUE, sparse = TRUE, model = "linear", hyper = NULL, debugging = FALSE )
g |
the matrix of predictors (subject to selection). Users do not need to specify an intercept which will be automatically included. |
y |
the response variable. The current version only supports the continuous response. |
u |
a vector of effect modifying variable of the quantile varying coefficient model. When fitting a linear model, u = NULL. |
e |
a matrix of clinical covariates not subject to selection. |
quant |
the quantile level specified by users. The default value is 0.5. |
iterations |
the number of MCMC iterations. The default value is 10,000. |
burn.in |
the number of burn-in iterations. If NULL, the first half of MCMC iterations will be used as burn-ins. |
spline |
a list of the number of interior knots (kn) for B-spline and the degree of B-spline basis (degree). When fitting a linear model, spline = NULL. |
robust |
logical flag. If TRUE, robust methods will be used. Otherwise, non-robust methods will be used. The default value is TRUE. |
sparse |
logical flag. If TRUE, spike-and-slab priors will be adopted to impose exact sparsity on regression coefficients. Otherwise, Laplacian shrinkage will be adopted. The default value is TRUE. |
model |
the model to be fitted. Users can specify "linear" for a linear model and "VC" for a varying coefficient model. |
hyper |
a named list of hyper-parameters. The default value is NULL. |
debugging |
logical flag. If TRUE, progress will be output to the console and extra information will be returned. The default value is FALSE. |
The quantile regression model described in "data
" is:
where 's are the regression coefficients for clinical covariates and
's are the regression coefficients of
.
The quantile varying coefficient model described in "data
" is:
where 's are the regression coefficients for the clinical covariates, and
's are the varying intercept and varying coefficients for predictors (e.g. genetic factors), respectively.
Users can modify the hyper-parameters by providing a named list of hyper-parameters via the argument ‘hyper’. The list can have the following named components
shape parameters of the Beta priors () on
.
the shape parameter and the rate parameter of the Gamma prior on .
Please check the references for more details about the prior distributions.
an object of class "pqrBayes" is returned, which is a list with components:
obj |
a list of posterior samples from the MCMC and other parameters |
coefficients |
a list of posterior estimates of coefficients |
Fan, K., Subedi, S., Yang, G., Lu, X., Ren, J. and Wu, C. (2024). Is Seeing Believing? A Practitioner's Perspective on High-dimensional Statistical Inference in Cancer Genomics Studies. Entropy, 26(9).794 doi:10.3390/e26090794
Zhou, F., Ren, J., Ma, S. and Wu, C. (2023). The Bayesian regularized quantile varying coefficient model. Computational Statistics & Data Analysis, 107808 doi:10.1016/j.csda.2023.107808
Ren, J., Zhou, F., Li, X., Ma, S., Jiang, Y. and Wu, C. (2023). Robust Bayesian variable selection for gene-environment interactions. Biometrics, 79(2), 684-694 doi:10.1111/biom.13670
Ren, J., Zhou, F., Li, X., Chen, Q., Zhang, H., Ma, S., Jiang, Y. and Wu, C. (2020) Semi-parametric Bayesian variable selection for gene-environment interactions. Statistics in Medicine, 39: 617– 638 doi:10.1002/sim.8434
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") ## non-sparse sparse=FALSE fit2=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,sparse = sparse,model="linear") ## non-robust robust = FALSE fit3=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,robust = robust,model="linear") ## The quantile varying coefficient model data(data) data = data$data_varying g=data$g y=data$y u=data$u e=data$e spline = list(kn=2,degree=2) fit1=pqrBayes(g,y,u,e,quant=0.5,spline = spline, model="VC") ## non-sparse sparse=FALSE fit2=pqrBayes(g,y,u,e,quant=0.5,spline = spline,sparse = sparse,model="VC") ## non-robust robust = FALSE fit3=pqrBayes(g,y,u,e,quant=0.5,spline = spline,robust = robust,model="VC")
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") ## non-sparse sparse=FALSE fit2=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,sparse = sparse,model="linear") ## non-robust robust = FALSE fit3=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,robust = robust,model="linear") ## The quantile varying coefficient model data(data) data = data$data_varying g=data$g y=data$y u=data$u e=data$e spline = list(kn=2,degree=2) fit1=pqrBayes(g,y,u,e,quant=0.5,spline = spline, model="VC") ## non-sparse sparse=FALSE fit2=pqrBayes(g,y,u,e,quant=0.5,spline = spline,sparse = sparse,model="VC") ## non-robust robust = FALSE fit3=pqrBayes(g,y,u,e,quant=0.5,spline = spline,robust = robust,model="VC")
Variable selection for a pqrBayes object
pqrBayes.select(object,sparse=T,model="linear")
pqrBayes.select(object,sparse=T,model="linear")
object |
a pqrBayes object. |
sparse |
logical flag. If TRUE, the sparse model is used for variable selection. The default value is TRUE. |
model |
the model to be fitted. Users can also choose "linear" for a linear model or "VC" for a varying coefficient model. |
For class ‘Sparse’, the median probability model (MPM) (Barbieri and Berger, 2004) is used to identify predictors that are significantly associated with the response variable. For class ‘NonSparse’, variable selection is based on 95% credible interval. Please check the references for more details about the variable selection.
an object of class ‘select’ is returned, which includes the indices of the selected predictors (e.g. genetic factors).
Ren, J., Zhou, F., Li, X., Ma, S., Jiang, Y. and Wu, C. (2023). Robust Bayesian variable selection for gene-environment interactions. Biometrics, 79(2), 684-694 doi:10.1111/biom.13670
Barbieri, M.M. and Berger, J.O. (2004). Optimal predictive model selection. Ann. Statist, 32(3):870–897
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") sparse=TRUE select=pqrBayes.select(obj = fit1,sparse = sparse,model="linear") ## The quantile varying coefficient model data(data) data = data$data_varying g=data$g y=data$y u=data$u e=data$e spline = list(kn=2,degree=2) fit1=pqrBayes(g,y,u,e,quant=0.5,spline = spline, model="VC") sparse=TRUE select=pqrBayes.select(obj = fit1,sparse = sparse,model="VC") select ## non-sparse sparse=FALSE spline = list(kn=2,degree=2) fit2=pqrBayes(g,y,u,e,quant=0.5,spline = spline,sparse = sparse,model="VC") select=pqrBayes.select(obj=fit2,sparse=FALSE,model="VC") select
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") sparse=TRUE select=pqrBayes.select(obj = fit1,sparse = sparse,model="linear") ## The quantile varying coefficient model data(data) data = data$data_varying g=data$g y=data$y u=data$u e=data$e spline = list(kn=2,degree=2) fit1=pqrBayes(g,y,u,e,quant=0.5,spline = spline, model="VC") sparse=TRUE select=pqrBayes.select(obj = fit1,sparse = sparse,model="VC") select ## non-sparse sparse=FALSE spline = list(kn=2,degree=2) fit2=pqrBayes(g,y,u,e,quant=0.5,spline = spline,sparse = sparse,model="VC") select=pqrBayes.select(obj=fit2,sparse=FALSE,model="VC") select
Make predictions from a pqrBayes object
predict_pqrBayes(object, g.new, u.new, e.new, y.new, quant, model, ...)
predict_pqrBayes(object, g.new, u.new, e.new, y.new, quant, model, ...)
object |
a pqrBayes object. |
g.new |
a matrix of new predictors (e.g. genetic factors) at which predictions are to be made. When being applied to the linear model, g.new = g. |
u.new |
a vector of new environmental factor at which predictions are to be made. When being applied to the linear model, u.new = NULL. |
e.new |
a vector or matrix of new clinical covariates at which predictions are to be made. When being applied to the linear model, e.new = e. |
y.new |
a vector of the response of new observations. When being applied to the linear model, y.new = y. |
quant |
the quantile level. The default is 0.5. |
model |
the model to be fitted. The default is "VC" for a quantile varying coefficient model. Users can also specify "linear" for a linear model. |
... |
other predict arguments |
g.new (u.new) must have the same number of columns as g (u) used for fitting the model. By default, the clinical covariates are NULL unless provided. The predictions are made based on the posterior estimates of coefficients in the pqrBayes object.
an object of class ‘pqrBayes.pred’ is returned, which is a list with components:
error |
prediction error. |
y.pred |
predicted values of the new observations. |
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") prediction=predict_pqrBayes(fit1,g,u.new=NULL,e.new = e, y.new = y,model="linear")
## The quantile regression model data(data) data = data$data_linear g=data$g y=data$y e=data$e fit1=pqrBayes(g,y,u=NULL,e,quant=0.5,spline=NULL,model="linear") prediction=predict_pqrBayes(fit1,g,u.new=NULL,e.new = e, y.new = y,model="linear")
Print a pqrBayes result
## S3 method for class 'pqrBayes' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'pqrBayes' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
pqrBayes result. |
digits |
significant digits in printout. |
... |
other print arguments. |
No return value, called for side effects.
Print a summary of a pqrBayes.pred object
## S3 method for class 'pqrBayes.pred' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'pqrBayes.pred' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
pqrBayes.pred object. |
digits |
significant digits in printout. |
... |
other print arguments |
No return value, called for side effects.
Print a summary of a select.pqrBayes object
## S3 method for class 'pqrBayes.select' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'pqrBayes.select' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
pqrBayes.select object. |
digits |
significant digits in printout. |
... |
other print arguments |
No return value, called for side effects.