Title: | Spike-and-Slab Variational Bayes for Linear and Logistic Regression |
---|---|
Description: | Implements variational Bayesian algorithms to perform scalable variable selection for sparse, high-dimensional linear and logistic regression models. Features include a novel prioritized updating scheme, which uses a preliminary estimator of the variational means during initialization to generate an updating order prioritizing large, more relevant, coefficients. Sparsity is induced via spike-and-slab priors with either Laplace or Gaussian slabs. By default, the heavier-tailed Laplace density is used. Formal derivations of the algorithms and asymptotic consistency results may be found in Kolyan Ray and Botond Szabo (2020) <doi:10.1080/01621459.2020.1847121> and Kolyan Ray, Botond Szabo, and Gabriel Clara (2020) <arXiv:2010.11665>. |
Authors: | Gabriel Clara [aut, cre], Botond Szabo [aut], Kolyan Ray [aut] |
Maintainer: | Gabriel Clara <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2024-12-01 08:02:05 UTC |
Source: | CRAN |
Implements variational Bayesian algorithms to perform scalable variable selection for sparse, high-dimensional linear and logistic regression models. Features include a novel prioritized updating scheme, which uses a preliminary estimator of the variational means during initialization to generate an updating order prioritizing large, more relevant, coefficients. Sparsity is induced via spike-and-slab priors with either Laplace or Gaussian slabs. By default, the heavier-tailed Laplace density is used. Formal derivations of the algorithms and asymptotic consistency results may be found in Kolyan Ray and Botond Szabo (2020) <doi:10.1080/01621459.2020.1847121> and Kolyan Ray, Botond Szabo, and Gabriel Clara (2020) <arXiv:2010.11665>.
For details as they pertain to using the package, consult the
svb.fit
function help page. Detailed descriptions and
derivations of the variational algorithms with Laplace slabs may be found
in the references.
Maintainer: Gabriel Clara [email protected]
Authors:
Botond Szabo
Kolyan Ray
Ray K. and Szabo B. Variational Bayes for high-dimensional linear regression with sparse priors. (2020). Journal of the American Statistical Association.
Ray K., Szabo B., and Clara G. Spike and slab variational Bayes for high dimensional logistic regression. (2020). Advances in Neural Information Processing Systems 33.
Useful links:
Report bugs at https://gitlab.com/gclara/varpack/-/issues
Main function of the sparsevb
package. Computes
mean-field posterior approximations for both linear and logistic regression
models, including variable selection via sparsity-inducing spike and slab
priors.
svb.fit( X, Y, family = c("linear", "logistic"), slab = c("laplace", "gaussian"), mu, sigma = rep(1, ncol(X)), gamma, alpha, beta, prior_scale = 1, update_order, intercept = FALSE, noise_sd, max_iter = 1000, tol = 1e-05 )
svb.fit( X, Y, family = c("linear", "logistic"), slab = c("laplace", "gaussian"), mu, sigma = rep(1, ncol(X)), gamma, alpha, beta, prior_scale = 1, update_order, intercept = FALSE, noise_sd, max_iter = 1000, tol = 1e-05 )
X |
A numeric design matrix, each row of which represents a vector of
covariates/independent variables/features. Though not required, it is
recommended to center and scale the columns to have norm
|
Y |
An |
family |
A character string selecting the regression model, either
|
slab |
A character string specifying the prior slab density, either
|
mu |
An |
sigma |
A positive |
gamma |
An |
alpha |
A positive numeric value, parametrizing the beta hyper-prior on
the inclusion probabilities. If omitted, |
beta |
A positive numeric value, parametrizing the beta hyper-prior on
the inclusion probabilities. If omitted, |
prior_scale |
A numeric value, controlling the scale parameter of the
prior slab density. Used as the scale parameter |
update_order |
A permutation of |
intercept |
A Boolean variable, controlling if an intercept should be included. NB: This feature is still experimental in logistic regression. |
noise_sd |
A positive numerical value, serving as estimate for the
residual noise standard deviation in linear regression. If missing it will
be estimated, see |
max_iter |
A positive integer, controlling the maximum number of iterations for the variational update loop. |
tol |
A small, positive numerical value, controlling the termination criterion for maximum absolute differences between binary entropies of successive iterates. |
Suppose is the
-dimensional true parameter. The
spike-and-slab prior for
may be represented by the
hierarchical scheme
Here, represents the
Dirac measure at
. The slab
may be taken either as a
or
density. The
former has centered density
Given and
, the beta hyper-prior
has density
A straightforward integration shows that the prior inclusion probability of
a coefficient is .
The approximate mean-field posterior, given as a named list
containing numeric vectors "mu"
, "sigma"
, "gamma"
, and
a value "intercept"
. The latter is set to NA
in case
intercept = FALSE
. In mathematical terms, the conditional
distribution of each is given by
### Simulate a linear regression problem of size n times p, with sparsity level s ### n <- 250 p <- 500 s <- 5 ### Generate toy data ### X <- matrix(rnorm(n*p), n, p) #standard Gaussian design matrix theta <- numeric(p) theta[sample.int(p, s)] <- runif(s, -3, 3) #sample non-zero coefficients in random locations pos_TR <- as.numeric(theta != 0) #true positives Y <- X %*% theta + rnorm(n) #add standard Gaussian noise ### Run the algorithm in linear mode with Laplace prior and prioritized initialization ### test <- svb.fit(X, Y, family = "linear") posterior_mean <- test$mu * test$gamma #approximate posterior mean pos <- as.numeric(test$gamma > 0.5) #significant coefficients ### Assess the quality of the posterior estimates ### TPR <- sum(pos[which(pos_TR == 1)])/sum(pos_TR) #True positive rate FDR <- sum(pos[which(pos_TR != 1)])/max(sum(pos), 1) #False discovery rate L2 <- sqrt(sum((posterior_mean - theta)^2)) #L_2-error MSPE <- sqrt(sum((X %*% posterior_mean - Y)^2)/n) #Mean squared prediction error
### Simulate a linear regression problem of size n times p, with sparsity level s ### n <- 250 p <- 500 s <- 5 ### Generate toy data ### X <- matrix(rnorm(n*p), n, p) #standard Gaussian design matrix theta <- numeric(p) theta[sample.int(p, s)] <- runif(s, -3, 3) #sample non-zero coefficients in random locations pos_TR <- as.numeric(theta != 0) #true positives Y <- X %*% theta + rnorm(n) #add standard Gaussian noise ### Run the algorithm in linear mode with Laplace prior and prioritized initialization ### test <- svb.fit(X, Y, family = "linear") posterior_mean <- test$mu * test$gamma #approximate posterior mean pos <- as.numeric(test$gamma > 0.5) #significant coefficients ### Assess the quality of the posterior estimates ### TPR <- sum(pos[which(pos_TR == 1)])/sum(pos_TR) #True positive rate FDR <- sum(pos[which(pos_TR != 1)])/max(sum(pos), 1) #False discovery rate L2 <- sqrt(sum((posterior_mean - theta)^2)) #L_2-error MSPE <- sqrt(sum((X %*% posterior_mean - Y)^2)/n) #Mean squared prediction error