Title: | Augmented Bayesian Model Selection for Regression Models |
---|---|
Description: | Tools to perform model selection alongside estimation under Linear, Logistic, Negative binomial, Quantile, and Skew-Normal regression. Under the spike-and-slab method, a probability for each possible model is estimated with the posterior mean, credibility interval, and standard deviation of coefficients and parameters under the most probable model. |
Authors: | Francisco Segovia [aut, cre], Luis Gutierres [aut], Ramses Mena [aut] |
Maintainer: | Francisco Segovia <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1 |
Built: | 2025-03-13 15:22:38 UTC |
Source: | CRAN |
ens
is a data-base that reo collected by the Chilean National Health Survey in order to study the health status of the population and policy making. This survey was performed between 2016 and 2017
ens
ens
A Data frame with 3238 rows and 15 variables (columns):
pas
Blood systolic pressure
pad
Blood diastolic pressure
age
In years
waist
diameter in centimeters
bmi
body mass index
weight
In centimeters
height
In centimeters
hypertension
Presence of hypertension (1: Yes, 0: No)
sedentary
sedentary person (1: Yes, 0: No)
smoker
1: Yes, 0: No
diabetes
1: Yes, 0: No
depression
1: Yes, 0: No
male
1: Yes, 0: female
scholar
Years of formal education
Chilean Health Ministry (https://epi.minsal.cl/encuesta-ens-anteriores/)
data(ens)
data(ens)
It generates N
observations of the Binomial distribution with parameters ni
(the i-th's individual sizes) and p
(the success probability), where the coefficients are indexed on p
via the logistic function.
gen_base_binomial_reg(N, beta, Covariates, ni = rep(1, N))
gen_base_binomial_reg(N, beta, Covariates, ni = rep(1, N))
N |
The number of observations that will be generated. It must be a positive integer. |
beta |
A vector of coefficients including the intercept. It can be a matrix. |
Covariates |
A data.frame object with the predictors (without intercept) for which we want to test if they are relevant to the response variable. It can also be a ( |
ni |
A vector of size |
The function return a table with the sample of size N from the Binomial distribution indexed with the predictors indicated in the Covariates
argument, the ni
, the number of failures (ni
- y
), and the predictors for each individual .
N<-200 #Number of extractions beta<-c(1, 0, 2, 0, 3, 2) #Coefficient vector p<-length(beta) aux_cov<-rnorm((p-1)*N, 0,1) Covariates<-data.frame(matrix(aux_cov, ncol=p-1, nrow=N)) #Generating the Covariates data.frame colnames(Covariates)<-c("X1", "X2", "X3", "X4", "X5") base<-gen_base_binomial_reg(N, beta, Covariates, ni=rep(1, N)) #Generating the data base
N<-200 #Number of extractions beta<-c(1, 0, 2, 0, 3, 2) #Coefficient vector p<-length(beta) aux_cov<-rnorm((p-1)*N, 0,1) Covariates<-data.frame(matrix(aux_cov, ncol=p-1, nrow=N)) #Generating the Covariates data.frame colnames(Covariates)<-c("X1", "X2", "X3", "X4", "X5") base<-gen_base_binomial_reg(N, beta, Covariates, ni=rep(1, N)) #Generating the data base
It generates N
observations of the Negative binomial distribution with parameters r
(number of success) and (success probability), where the coefficients are indexed on
via the logistic function.
gen_base_NegBinomial_reg(N, beta, r, Covariates)
gen_base_NegBinomial_reg(N, beta, r, Covariates)
N |
The number of observations that will be generated. It must be a positive integer. |
beta |
A vector of coefficients including the intercept. It can be a matrix. |
r |
The number of success parameter. It must be a positive integer. |
Covariates |
A data.frame object with the predictors (without intercept) for which we want to test if they are relevant to the response variable. It can also be a ( |
The function return a sample of size N from the Negative binomial distribution indexed with the predictors indicated in the Covariates
argument, and the predictors for each individual.
N<-10 #Number of extractions beta<-c(0.5, -0.8, 1.0, 0, 0.4, -0.7) #Coefficient vector p<-length(beta) r<-2 #Number of success parameter aux_cov<-rnorm((p-1)*N, 0,1) Covariates<-data.frame(matrix(aux_cov, ncol=p-1, nrow=N)) #Generating the Covariates data.frame colnames(Covariates)<-c("X1", "X2", "X3", "X4", "X5") base<-gen_base_NegBinomial_reg(N, beta, r, Covariates) #Generating the data base
N<-10 #Number of extractions beta<-c(0.5, -0.8, 1.0, 0, 0.4, -0.7) #Coefficient vector p<-length(beta) r<-2 #Number of success parameter aux_cov<-rnorm((p-1)*N, 0,1) Covariates<-data.frame(matrix(aux_cov, ncol=p-1, nrow=N)) #Generating the Covariates data.frame colnames(Covariates)<-c("X1", "X2", "X3", "X4", "X5") base<-gen_base_NegBinomial_reg(N, beta, r, Covariates) #Generating the data base
A Bayesian model selection methodology based on the spike-and-slab strategy and an augmentation technique for Linear, Logistic, Negative Binomial, Quantile, and Skew Normal Regression.
The model considers a response vector of size
and
predictors to perform coefficient estimation and asses which ones are relevant to explain the response distribution. Other parameters related to the family selected are also estimated.
Summary results can be provided using the
summary_gibbs()
R function.
gibbs_abms( y, Covariates, family = "LiR", first_excluded = 0, nchain = 10000, burnin = 2000, tau2 = 1000, rho = 1, ni = rep(1, length(y)), alpha = 0.5, a0 = 1, b0 = 1, d = 2, b2 = 1/2, count.iteration = TRUE )
gibbs_abms( y, Covariates, family = "LiR", first_excluded = 0, nchain = 10000, burnin = 2000, tau2 = 1000, rho = 1, ni = rep(1, length(y)), alpha = 0.5, a0 = 1, b0 = 1, d = 2, b2 = 1/2, count.iteration = TRUE )
y |
A vector of size |
Covariates |
A data.frame object with the predictors (without the intercept) for which we want to test if they are relevant to the response variable. It can also be a ( |
family |
A character object that describes the hierarchical regression model that will be used.
If |
first_excluded |
A non-negative integer that indicates which first columns will not be tested. For example, if |
nchain |
The Gibbs sampler's chain size, it must be a non-negative integer. The default value is 10,000 |
burnin |
The burn-in period of the Gibbs sampler, it must be a non-negative integer and greater than |
tau2 |
The variance prior of each coefficient, it must be a positive real number. Fixed at 1 by deafault |
rho |
The parameter of the Womack prior, it must be a positive real number. Fixed at 1 by deafault |
ni |
For Logistic regression only. A vector of size |
alpha |
For Quantile regression only. The desired quantile for which we want to perform Quantile regression. |
a0 |
This argument depends on the family choosen.
For |
b0 |
This argument depends on the family choosen.
For |
d |
For the Skew-Normal regression only. It is the location hyper-parameter of the t-student prior to the parameter |
b2 |
For the Skew-Normal regression only. It is the scale hyper-parameter of the t-student prior to the parameter lambda (asymmetric parameter of the Skew-Normal distribution). By default is fixed at 1/2, which is recommended. |
count.iteration |
A logical argument. If |
A abms object with the following variables:
family |
This character object prints the name of the fitted hierarchical regression model. It needs to be extracted from the list 'Default'. |
prednames |
A character object that prints the predictors names, using the columns names of the |
Seconds |
How many seconds the method took. It needs to be extracted from the list 'Default'. |
tau2 |
The |
y |
The |
Covariates |
The |
beta_chain |
The coefficients sample for each Gibbs sampler iteration. A ( |
sigma2_chain |
For the Linear, Quantile and Skew-Normal regression only. The variance parameter ( |
r_chain |
For the Negative-Binomial regression only. The number of failure parameter ( |
lambda_chain |
For the Skew-Normal regression only. The asymmetric parameter ( |
model_chain |
The model selected at each Gibbs sampler iteration. A ( |
Z_chain |
For internal use. |
t_chain |
For internal use. |
Azzalini (1985). A class of distributions which includes the normal ones, Scandinavian Journal of Statistics 12(2): 171:178.
Bayes, C. and Branco, M. (2007). Bayesian inference for the skewness parameter of the scalar skew-normal distribution. Brazilian Journal of Probability and Statistics. 21: 141:163.
Kotz, S., Kozubowski, T. and Podgorski, K. (2001). The Laplace Distribution and Generalization, first edn, Birkhauser Basel.
Polson, N., Scott, J., and Windle, J. (2013). Bayesian Inference for Logistic Models Using Polya Gamma Latent Variables. Journal of the American Statistical Association, 108: 1339:1349.
Zhou, W. and Carin, L. (2013). Negative Binomial Process Count and Mixture Modeling. arXiv:1405.0506v1.
################################################## ## Gibbs for Linear Regression ## ################################################## ## Simulating data set.seed(31415) N<-200 r_beta<-as.matrix(c(1, 0, 2, 0)) r_p<-length(r_beta) r_sigma2<-1.5 X<-matrix( c(rep(1, N), rnorm((r_p -1)*N)), ncol=r_p ) Xbeta<-X%*%r_beta y<-rnorm(N, mean=Xbeta , sd=sqrt(r_sigma2)) Covariates<-X[,2:(length(r_beta))]; colnames(Covariates)<-c("X1", "X2", "X3") ## Fitting the model fit<- gibbs_abms(y, Covariates, family="LiR", first_excluded=0, nchain=1000, burnin=20, a0=1, b0=1) summary_gibbs(fit, BF=FALSE) #Summary results ## For more examples, see "Model Ilustrations.R" file in ## https://github.com/SirCornflake/BMS
################################################## ## Gibbs for Linear Regression ## ################################################## ## Simulating data set.seed(31415) N<-200 r_beta<-as.matrix(c(1, 0, 2, 0)) r_p<-length(r_beta) r_sigma2<-1.5 X<-matrix( c(rep(1, N), rnorm((r_p -1)*N)), ncol=r_p ) Xbeta<-X%*%r_beta y<-rnorm(N, mean=Xbeta , sd=sqrt(r_sigma2)) Covariates<-X[,2:(length(r_beta))]; colnames(Covariates)<-c("X1", "X2", "X3") ## Fitting the model fit<- gibbs_abms(y, Covariates, family="LiR", first_excluded=0, nchain=1000, burnin=20, a0=1, b0=1) summary_gibbs(fit, BF=FALSE) #Summary results ## For more examples, see "Model Ilustrations.R" file in ## https://github.com/SirCornflake/BMS
Sampling from the Chines Restaurant distribution.
rCRT(n, b, c)
rCRT(n, b, c)
n |
Number of observations. It must be a positive integer. |
b |
Parameter distribution, a non-negative integer. The number of Bernoulli independent variables that are added. It can be a vector |
c |
Parameter distribution, a positive real number. Used calculate the success probability the j-th Bernoulli independent variable, that is, c/(c +j -1). It can be a vector |
This function generates n
random variables from the (
) distribution
Pitman, Jim (1995). "Exchangeable and Partially Exchangeable Random Partitions". Probability Theory and Related Fields. 102 (2): 145:158
#Generating 4 random variables with parameters b=2 and c=1 rCRT(4,2,1)
#Generating 4 random variables with parameters b=2 and c=1 rCRT(4,2,1)
For abms objects, it returns the posterior mean, standard deviation, and 95% centered credible interval for each parameter. Additionally, it provides all explored models alongside the conditional Bayes factors and marginal Bayes factors estimator between the most probable model and the others that have arisen.
summary_gibbs(fit, BF = FALSE)
summary_gibbs(fit, BF = FALSE)
fit |
An abms object. Such object is obtained by fitting a regression model with the |
BF |
A logical object. if |
A summary of the inference performed by the Bayesian model obtained by the gibbs_abms()
function. The variables are:
Mean_IC |
A table with the posterior mean, standard deviation, and 95% centered credible interval for each parameter |
Explored_Models |
A table with all explored models. If |
## See \code{gibbs_abms()} help page function
## See \code{gibbs_abms()} help page function
Womack probability mass function with 'K' predictors and parameter 'rho'.
womack(K, rho)
womack(K, rho)
K |
Number of predictors. It must be a positive integer. |
rho |
Value for the "rho" parameter. It must be positive real number |
Given that all models of the same hierarchy has the same prior probability, this function returns one value for each hierarchy. including the null model (size=0)
Womack, A., Fuentes, C., and Rodriguez-Taylor, D. (2015). "Model Space Priors for Objective Sparse Bayesian Regression." arXiv:1511.04745. 8:24
#Fixing rho=1 and 3 predictors womack(K=3, rho=1) #it returns Womack prior for all models of size 0 (the null model), 1,2 and 3 (the full model)
#Fixing rho=1 and 3 predictors womack(K=3, rho=1) #it returns Womack prior for all models of size 0 (the null model), 1,2 and 3 (the full model)