Title: | Optimization via Subsampling (OPTS) |
---|---|
Description: | Subsampling based variable selection for low dimensional generalized linear models. The methods repeatedly subsample the data minimizing an information criterion (AIC/BIC) over a sequence of nested models for each subsample. Marinela Capanu, Mihai Giurcanu, Colin B Begg, Mithat Gonen, Subsampling based variable selection for generalized linear models. |
Authors: | Mihai Giurcanu [aut, cre], Marinela Capanu [aut, ctb], Colin Begg [aut], Mithat Gonen [aut] |
Maintainer: | Mihai Giurcanu <[email protected]> |
License: | GPL-2 |
Version: | 0.1 |
Built: | 2024-11-27 06:31:12 UTC |
Source: | CRAN |
opts
computes the OPTS MLE in low dimensional
case.
opts(X, Y, m, crit = "aic", prop_split = 0.5, cutoff = 0.75, ...)
opts(X, Y, m, crit = "aic", prop_split = 0.5, cutoff = 0.75, ...)
X |
n x p covariate matrix (without intercept) |
Y |
n x 1 binary response vector |
m |
number of subsamples |
crit |
information criterion to select the variables: (a) aic = minimum AIC and (b) bic = minimum BIC |
prop_split |
proportion of subsample size and sample size, default value = 0.5 |
cutoff |
cutoff used to select the variables using the stability selection criterion, default value = 0.75 |
... |
other arguments passed to the glm function, e.g., family = "binomial" |
opts
returns a list:
betahat |
OPTS MLE of regression parameter vector |
Jhat |
estimated set of active predictors (TRUE/FALSE) corresponding to the OPTS MLE |
SE |
standard error of OPTS MLE |
freqs |
relative frequency of selection for all variables |
require(MASS) P = 15 N = 100 M = 20 BETA_vector = c(0.5, rep(0.5, 2), rep(0.5, 2), rep(0, P - 5)) MU_vector = numeric(P) SIGMA_mat = diag(P) X <- mvrnorm(N, MU_vector, Sigma = SIGMA_mat) linearPred <- cbind(rep(1, N), X) Y <- rbinom(N, 1, plogis(linearPred)) # OPTS-AIC MLE opts(X, Y, 10, family = "binomial")
require(MASS) P = 15 N = 100 M = 20 BETA_vector = c(0.5, rep(0.5, 2), rep(0.5, 2), rep(0, P - 5)) MU_vector = numeric(P) SIGMA_mat = diag(P) X <- mvrnorm(N, MU_vector, Sigma = SIGMA_mat) linearPred <- cbind(rep(1, N), X) Y <- rbinom(N, 1, plogis(linearPred)) # OPTS-AIC MLE opts(X, Y, 10, family = "binomial")
opts_th
computes the threshold OPTS MLE in low
dimensional case.
opts_th(X, Y, m, crit = "aic", type = "binseg", prop_split = 0.5, prop_trim = 0.2, q_tail = 0.5, ...)
opts_th(X, Y, m, crit = "aic", type = "binseg", prop_split = 0.5, prop_trim = 0.2, q_tail = 0.5, ...)
X |
n x p covariate matrix (without intercept) |
Y |
n x 1 binary response vector |
m |
number of subsamples |
crit |
information criterion to select the variables: (a) aic = minimum AIC and (b) bic = minimum BIC |
type |
method used to minimize the trimmed and averaged information criterion: (a) min = observed minimum subsampling trimmed average information, (b) sd = observed minimum using the 0.25sd rule (corresponding to OPTS-min in the paper), (c) pelt = PELT changepoint algorithm (corresponding to OPTS-PELT in the paper), (d) binseg = binary segmentation changepoint algorithm (corresponding to OPTS-BinSeg in the paper), (e) amoc = AMOC method. |
prop_split |
proportion of subsample size of the sample size; default value is 0.5 |
prop_trim |
proportion that defines the trimmed mean; default value = 0.2 |
q_tail |
quantiles for the minimum and maximum p-values across the subsample cutpoints used to define the range of cutpoints |
... |
other arguments passed to the glm function, e.g., family = "binomial" |
opts_th
returns a list:
betahat |
STOPES MLE of regression parameters |
SE |
SE of STOPES MLE |
Jhat |
set of active predictors (TRUE/FALSE) corresponding to STOPES MLE |
cuthat |
estimated cutpoint for variable selection |
pval |
marginal p-values from univariate fit |
cutpoits |
subsample cutpoints |
aic_mean |
mean subsample AIC |
bic_mean |
mean subsample BIC |
require(MASS) P = 15 N = 100 M = 20 BETA_vector = c(0.5, rep(0.5, 2), rep(0.5, 2), rep(0, P - 5)) MU_vector = numeric(P) SIGMA_mat = diag(P) X <- mvrnorm(N, MU_vector, Sigma = SIGMA_mat) linearPred <- cbind(rep(1, N), X) Y <- rbinom(N, 1, plogis(linearPred)) # Threshold OPTS-BinSeg MLE opts_th(X, Y, M, family = "binomial")
require(MASS) P = 15 N = 100 M = 20 BETA_vector = c(0.5, rep(0.5, 2), rep(0.5, 2), rep(0, P - 5)) MU_vector = numeric(P) SIGMA_mat = diag(P) X <- mvrnorm(N, MU_vector, Sigma = SIGMA_mat) linearPred <- cbind(rep(1, N), X) Y <- rbinom(N, 1, plogis(linearPred)) # Threshold OPTS-BinSeg MLE opts_th(X, Y, M, family = "binomial")