| Title: | Structured Screen-and-Select Variable Selection in Linear, Generalized Linear, and Survival Models |
|---|---|
| Description: | Performs variable selection using the structured screen-and-select (S3VS) framework in linear models, generalized linear models with binary data, and survival models such as the Cox model and accelerated failure time (AFT) model. |
| Authors: | Nilotpal Sanyal [aut, cre], Padmore N. Prempeh [aut] |
| Maintainer: | Nilotpal Sanyal <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.0 |
| Built: | 2026-05-29 19:26:33 UTC |
| Source: | https://github.com/cran/S3VS |
Performs variable selection using the structured screen-and-select (S3VS) framework in linear models, generalized linear models with binary data, and survival models such as the Cox model and accelerated failure time (AFT) model.
The S3VS package implements the Structured Screen-and-Select Variable Selection (S3VS) framework for linear models, generalized linear models with binary responses, and survival models (Cox proportional hazards and accelerated failure time models).
The central entry point is S3VS, which dispatches to a family-specific routine via the argument family:
S3VS_LM for linear models,
S3VS_GLM for generalized linear models with binary outcomes,
S3VS_SURV for survival models.
The S3VS workflow proceeds through the following steps, each handled by helper functions:
looprun determines whether the iterative screen-and-select process should continue.
get_leadvars identifies leading variables; family-specific versions are get_leadvars_LM, get_leadvars_GLM, and get_leadvars_SURV.
get_leadsets identifies the leading set for each leading variable.
VS_method performs selection within leading sets; family-specific methods include VS_method_LM, VS_method_GLM, VS_method_SURV, and bridge_aft implements BRIDGE specifically for AFT models.
select_vars retains promising variables as selected from an iteration.
remove_vars removes variables deemed uninformative from future iterations (if no variable is selected in the current iteration by select_vars).
update_y enables iterative response updates; family-specific variants include update_y_LM and update_y_GLM.
Together, these functions form a structured, iterative pipeline for efficient variable screening and selection in high-dimensional regression and survival analysis.
pred_S3VS produces predictions using variables selected by S3VS, calling pred_S3VS_LM, pred_S3VS_GLM, or pred_S3VS_SURV as appropriate.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
Maintainer: Nilotpal Sanyal <[email protected]>
bridge_aft fits an accelerated failure time (AFT) model using an iterative reweighted LASSO scheme to approximate a bridge () penalty on the regression coefficients.
bridge_aft(y, X, gamma = 0.5, alpha = 1, max_iter = 100, tol = 1e-05)bridge_aft(y, X, gamma = 0.5, alpha = 1, max_iter = 100, tol = 1e-05)
y |
Response; a list of two elements |
X |
Predictor matrix. Can be a base matrix or something |
gamma |
Bridge penalty exponent |
alpha |
Elastic-net mixing parameter passed to |
max_iter |
Maximum number of outer reweighting iterations (default |
tol |
Convergence tolerance on the |
A list with components:
beta |
Numeric vector of estimated coefficients of length |
gamma |
The bridge exponent used in the fit. |
iterations |
Number of outer reweighting iterations performed. |
Padmore Prempeh <[email protected]>, Nilotpal Sanyal <[email protected]>
Jian Huang and Shuangge Ma. Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis, 16(2):176-195, 2010.
glmnet, cv.glmnet, Surv, survfit
set.seed(1) n <- 50 p <- 10 X <- matrix(rnorm(n * p), n, p) beta_true <- c(runif(10, -1.5, 1.5), rep(0, p - 10)) linpred <- as.vector(X %*% beta_true) ## Generate log-normal AFT survival times (no censoring in this simple example) sigma <- 0.6 logT <- linpred + rnorm(n, sd = sigma) time <- exp(logT) delta <- rep(1, n) # all events (censoring ignored by current implementation) y_surv <- list(time = time, status = delta) fit <- bridge_aft(y_surv, X, gamma = 0.5, alpha = 1, max_iter = 50, tol = 1e-5) str(fit) fit$beta[1:10]set.seed(1) n <- 50 p <- 10 X <- matrix(rnorm(n * p), n, p) beta_true <- c(runif(10, -1.5, 1.5), rep(0, p - 10)) linpred <- as.vector(X %*% beta_true) ## Generate log-normal AFT survival times (no censoring in this simple example) sigma <- 0.6 logT <- linpred + rnorm(n, sd = sigma) time <- exp(logT) delta <- rep(1, n) # all events (censoring ignored by current implementation) y_surv <- list(time = time, status = delta) fit <- bridge_aft(y_surv, X, gamma = 0.5, alpha = 1, max_iter = 50, tol = 1e-5) str(fit) fit$beta[1:10]
get_leadsets identifies, for a specified leading variable, a set of associated predictors, the leading set, based on inter-predictor associations (absolute value of the correlation coefficient).
get_leadsets(x_lead, X, method = c("topk", "fixedthresh", "percthresh"), param)get_leadsets(x_lead, X, method = c("topk", "fixedthresh", "percthresh"), param)
x_lead |
Vector with values of the leading variable |
X |
Predictor matrix. Must contain the leading variable. Can be a base matrix or something |
method |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param |
Tuning parameter for |
A character vector containing the names of the predictors.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) leadvars <- get_leadvars_LM(y = y, X = X, method = "topk", param = list(k=2)) get_leadsets(X[,leadvars[1]], X, method = "percthresh", param = list(thresh = 0.2))# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) leadvars <- get_leadvars_LM(y = y, X = X, method = "topk", param = list(k=2)) get_leadsets(X[,leadvars[1]], X, method = "percthresh", param = list(thresh = 0.2))
get_leadvars screens some predictors as "leading variables" based on predictor-response associations in linear, generalized linear, and survival models.
get_leadvars(y, X, family = c("normal","binomial","survival"), surv_model = c("AFT", "COX"), method = c("topk", "fixedthresh", "percthresh"), param, varsselected = NULL, varsleft = colnames(X), parallel = FALSE)get_leadvars(y, X, family = c("normal","binomial","survival"), surv_model = c("AFT", "COX"), method = c("topk", "fixedthresh", "percthresh"), param, varsselected = NULL, varsleft = colnames(X), parallel = FALSE)
y |
Response. If |
X |
Predictor matrix. Can be a base matrix or something |
family |
Model family; one of |
surv_model |
Character string specifying the survival model ( |
method |
Screening rule, one of |
param |
Tuning parameter for |
varsselected |
Used only when |
varsleft |
Used only when |
parallel |
Logical. If |
A character vector containing the names of the leading variables.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
get_leadvars_LM, get_leadvars_GLM, get_leadvars_SURV,
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Select leading variables leadvars <- get_leadvars(y = y, X = X, family = "normal", method = "topk", param = list(k=2)) leadvars# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Select leading variables leadvars <- get_leadvars(y = y, X = X, family = "normal", method = "topk", param = list(k=2)) leadvars
get_leadvars_GLM screens some predictors as "leading variables" based on predictor-response associations in generalized linear models.
get_leadvars_GLM(y, X, method = c("topk", "fixedetasqthresh", "percetasqthresh"), param)get_leadvars_GLM(y, X, method = c("topk", "fixedetasqthresh", "percetasqthresh"), param)
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Predictor matrix. Can be a base matrix or something |
method |
Screening rule, one of |
param |
Tuning parameter for |
A character vector containing the names of the leading varibales.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Select leading variables leadvars <- get_leadvars_GLM(y = y, X = X, method = "topk", param = list(k=2)) leadvars# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Select leading variables leadvars <- get_leadvars_GLM(y = y, X = X, method = "topk", param = list(k=2)) leadvars
get_leadvars_LM screens some predictors as "leading variables" based on predictor-response associations in linear models.
get_leadvars_LM(y, X, method = c("topk", "fixedcorthresh", "perccorthresh"), param)get_leadvars_LM(y, X, method = c("topk", "fixedcorthresh", "perccorthresh"), param)
y |
Response. A numeric vector. |
X |
Predictor matrix. Can be a base matrix or something |
method |
Screening rule, one of |
param |
Tuning parameter for |
A character vector containing the names of the leading varibales.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Select leading variables leadvars <- get_leadvars_LM(y = y, X = X, method = "topk", param = list(k=2)) leadvars# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Select leading variables leadvars <- get_leadvars_LM(y = y, X = X, method = "topk", param = list(k=2)) leadvars
get_leadvars_SURV screens some predictors as "leading variables" based on predictor-response associations in survival models.
get_leadvars_SURV(y, X, surv_model = c("AFT", "COX"), method = c("topk", "fixedmuthresh", "percmuthresh"), param, varsselected = NULL, varsleft = colnames(X), parallel = FALSE)get_leadvars_SURV(y, X, surv_model = c("AFT", "COX"), method = c("topk", "fixedmuthresh", "percmuthresh"), param, varsselected = NULL, varsleft = colnames(X), parallel = FALSE)
y |
Response. A list with components |
X |
Predictor matrix. Can be a base matrix or something |
surv_model |
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are |
method |
Screening rule, one of |
param |
Tuning parameter for |
varsselected |
A character vector containing the predictors that are already selected in previous iterations. The association measure, conditional utility, is computed controling for these predictors. |
varsleft |
A character vector containing the predictors that are neither selected, nor removed from consideration in previous iterations. Leading predictors are chosen from these predictors. |
parallel |
Logical. If |
A character vector containing the names of the leading varibales.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
# Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Select leading variables leadvars <- get_leadvars_SURV(y = y_surv, X = X, surv_model = "COX", method = "topk", param = list(k=2), varsselected = NULL, varsleft = colnames(X)) leadvars# Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Select leading variables leadvars <- get_leadvars_SURV(y = y_surv, X = X, surv_model = "COX", method = "topk", param = list(k=2), varsselected = NULL, varsleft = colnames(X)) leadvars
looprun evaluates simple stopping criteria for the S3VS procedure and returns an indicator of whether one more iteration should be executed.
looprun(varsselected, varsleft, max_nocollect, m, nskip)looprun(varsselected, varsleft, max_nocollect, m, nskip)
varsselected |
Character vector with names of predictors selected so far. Only its length is used; |
varsleft |
Character vector with names of candidate predictors that remain available for selection in future iterations. Only its length is used; |
max_nocollect |
Integer count of iterations up to now in which no new predictors were selected. |
m |
Maximum allowed number of selected predictors (target cap for |
nskip |
Maximum allowed number of "no-collection" iterations before stopping. |
An additional S3VS iteration is recommended iff all three conditions hold:
1 if another iteration should run, 0 otherwise.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
looprun(varsselected = c("x1","x2","x3"), varsleft = paste0("x", 4:23), max_nocollect = 0, m = 10, nskip = 2)looprun(varsselected = c("x1","x2","x3"), varsleft = paste0("x", 4:23), max_nocollect = 0, m = 10, nskip = 2)
pred_S3VS performs prediction using predictors selected by S3VS in linear, generalized linear, and survival models.
pred_S3VS(y, X, family, surv_model = NULL, method)pred_S3VS(y, X, family, surv_model = NULL, method)
y |
Response. If |
X |
Predictor matrix. This should include predictors selected by S3VS. Can be a base matrix or something |
family |
Model family; one of |
surv_model |
Character string specifying the survival model ( |
method |
Character string indicating the prediction method used. Allowed values depend on |
A list containing:
y.pred |
Predicted response |
coef |
Coefficient estimates of the predictors used for prediction |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
pred_S3VS_LM, pred_S3VS_GLM, pred_S3VS_SURV
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run S3VS for LM res_lm <- S3VS(y = y, X = X, family = "normal", method_xy = "topk", param_xy = list(k=1), method_xx = "topk", param_xx = list(k=3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) pred_lm <- pred_S3VS(y = y, X = X[,res_lm$selected], family = "normal", method = "LASSO")# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run S3VS for LM res_lm <- S3VS(y = y, X = X, family = "normal", method_xy = "topk", param_xy = list(k=1), method_xx = "topk", param_xx = list(k=3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) pred_lm <- pred_S3VS(y = y, X = X[,res_lm$selected], family = "normal", method = "LASSO")
pred_S3VS performs prediction using predictors selected by S3VS in survival models.
pred_S3VS_GLM(y, X, method = c("NLP", "LASSO"))pred_S3VS_GLM(y, X, method = c("NLP", "LASSO"))
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Predictor matrix. This should include predictors selected by S3VS. Can be a base matrix or something |
method |
Character string indicating the prediction method used. Available options are |
A list containing:
y.pred |
Predicted response |
coef |
Coefficient estimates of the predictors used for prediction |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Predict pred_glm <- pred_S3VS_GLM(y = y, X = X[,1:3], method = "LASSO") pred_glm# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Predict pred_glm <- pred_S3VS_GLM(y = y, X = X[,1:3], method = "LASSO") pred_glm
pred_S3VS performs prediction using predictors selected by S3VS in linear models.
pred_S3VS_LM(y, X, method)pred_S3VS_LM(y, X, method)
y |
Response. A numeric vector. |
X |
Predictor matrix. This should include predictors selected by S3VS. Can be a base matrix or something |
method |
Character string indicating the prediction method used. Available options are |
A list containing:
y.pred |
Predicted response |
coef |
Coefficient estimates of the predictors used for prediction |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run S3VS for LM res_lm <- S3VS(y = y, X = X, family = "normal", method_xy = "topk", param_xy = list(k=1), method_xx = "topk", param_xx = list(k=3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) pred_lm <- pred_S3VS_LM(y = y, X = X[,res_lm$selected], method = "LASSO") pred_lm# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run S3VS for LM res_lm <- S3VS(y = y, X = X, family = "normal", method_xy = "topk", param_xy = list(k=1), method_xx = "topk", param_xx = list(k=3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) pred_lm <- pred_S3VS_LM(y = y, X = X[,res_lm$selected], method = "LASSO") pred_lm
pred_S3VS returns predicted survival probabilities using predictors selected by S3VS in generalized linear models.
pred_S3VS_SURV(y, X, surv_model = c("AFT", "COX"), method = c("AFTREG", "AFTGEE"), times)pred_S3VS_SURV(y, X, surv_model = c("AFT", "COX"), method = c("AFTREG", "AFTGEE"), times)
y |
Response. A list with components |
X |
Predictor matrix. This should include predictors selected by S3VS. Can be a base matrix or something |
surv_model |
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are |
method |
Character string indicating the prediction method used. Available options are |
times |
Vector of time points where predicted survival probabilities will be computed. |
A list containing:
y.pred |
Predicted response |
coef |
Coefficient estimates of the predictors used for prediction |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
cv.glmnet, coxph, aftreg, aftgee
# Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Run S3VS for linear models res_surv <- S3VS(y = y_surv, X = X, family = "survival", surv_model = "COX", vsel_method = "COXGLMNET", method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) pred_surv <- pred_S3VS_SURV(y = y_surv, X = X[,res_surv$selected], surv_model = "COX", method = "COXGLMNET") pred_surv# Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Run S3VS for linear models res_surv <- S3VS(y = y_surv, X = X, family = "survival", surv_model = "COX", vsel_method = "COXGLMNET", method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) pred_surv <- pred_S3VS_SURV(y = y_surv, X = X[,res_surv$selected], surv_model = "COX", method = "COXGLMNET") pred_surv
remove_vars combines lists of predictors that were not selected from multiple leading sets into a single set to remove, using either a liberal (union) rule or a conservative (progressive intersection) rule.
remove_vars(listnotselect, method = c("conservative_begin", "conservative_end", "liberal"))remove_vars(listnotselect, method = c("conservative_begin", "conservative_end", "liberal"))
listnotselect |
A |
method |
Aggregation rule; one of
|
The liberal rule favors inclusiveness (drop all predictors that were not selected in an iteration), whereas the conservative rule favors stability across earlier/latter leading sets (drop only predictors consistently absent in earlier/latter leading sets).
Vector with names of the predictors that are not selected till the current S3VS iteration and to be removed from all future iterations.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
listselect <- list( c("V1","V2","V23"), c("V4","V2","V23"), c("V4","V5","V23") ) remove_vars(listselect, method="liberal")listselect <- list( c("V1","V2","V23"), c("V4","V2","V23"), c("V4","V5","V23") ) remove_vars(listselect, method="liberal")
S3VS is the main function that performs variable selection based on the structured screen-and-select framework in linear, generalized linear, and survival models.
S3VS( y, X, family = c("normal", "binomial", "survival"), cor_xy = NULL, surv_model = c("COX", "AFT"), method_xy = c("topk", "fixedthresh", "percthresh"), param_xy, method_xx = c("topk", "fixedthresh", "percthresh"), param_xx, vsel_method = NULL, alpha = 0.5, method_sel = c("conservative", "liberal"), method_rem = c("conservative_begin", "conservative_end", "liberal"), sel_regout = FALSE, rem_regout = FALSE, update_y_thresh = 0.5, m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE )S3VS( y, X, family = c("normal", "binomial", "survival"), cor_xy = NULL, surv_model = c("COX", "AFT"), method_xy = c("topk", "fixedthresh", "percthresh"), param_xy, method_xx = c("topk", "fixedthresh", "percthresh"), param_xx, vsel_method = NULL, alpha = 0.5, method_sel = c("conservative", "liberal"), method_rem = c("conservative_begin", "conservative_end", "liberal"), sel_regout = FALSE, rem_regout = FALSE, update_y_thresh = 0.5, m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE )
y |
Response. If |
X |
Design matrix of predictors. Can be a base matrix or something |
family |
Model family; one of |
cor_xy |
Optional numeric vector of precomputed marginal correlations between |
surv_model |
Character string specifying the survival model (for |
method_xy |
Rule for screening some predictors as "leading variables" based on their association with the response; one of
|
param_xy |
Tuning parameter for |
method_xx |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param_xx |
Tuning parameter for |
vsel_method |
Character string specifying the variable selection method to be used within each leading set. Available options depend on the model type:
|
alpha |
Only used when |
method_sel |
Policy for aggregating predictors selected across leading sets in an iteration; one of |
method_rem |
Policy for excluding predictors when no selections are made in an iteration; one of |
sel_regout |
Logical (GLM only). If |
rem_regout |
Logical (for LM and GLM only). If |
update_y_thresh |
Numeric scalar threshold controlling how the working response |
m |
Integer. Maximum number of S3VS iterations to perform. Defaults to |
nskip |
Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to |
verbose |
Logical. If |
seed |
If supplied, sets the random seed via |
parallel |
Logical. If |
For a continuous response, S3VS considers the linear model (LM)
For a binary response, S3VS considers the generalized linear model (GLM)
For a survival type response, S3VS considers two choices of models–the Cox model
and the AFT model
The general form of the S3VS algorithm consists of the following steps, repeated iteratively until convergence:
Determination of leading variables: 'Leading variables' are determined based on the association of the predictors with the response, following one of three rules. The rule is fixed by the arguments method_xy and param_xy.
Determination of leading sets: For each leading variable, a group of related predictors, called the 'leading set', is determined based on the association of all candidate predictors with the leading variable, following one of three rules. The rule is fixed by the arguments method_xx and param_xx.
Variable selection: Within each leading set, small to moderate-dimensional variable selection is performed using a method fixed by vsel_method.
Aggregation of selected/not-selected variables: Variables selected/not-selected in different leading sets are aggregated using several possible rules, fixed by method_sel and method_rem.
Updation of response and/or set of covariates: At the end of each iteration, the response and predictors may be chosen to be updated or not through argumentsm sel_regout, rem_regout, and update_y_thresh.
The convergence criterion is determined by the arguments m and nkip jointly. For ore details of the individual steps, see the manual of the functions linked below.
A list with the following components:
selected |
A character vector of predictor names that were selected across all iterations. |
selected_iterwise |
A list recording the predictors selected at each iteration, in the order they were considered. |
runtime |
Runtime in seconds. |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
get_leadvars, get_leadsets, VS_method, select_vars, remove_vars, update_y
### [1] For linear model # Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run S3VS for LM res_lm <- S3VS(y = y, X = X, family = "normal", method_xy = "topk", param_xy = list(k=1), method_xx = "topk", param_xx = list(k=3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_lm$selected ### [2] For generalized linear model # Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Run S3VS for for GLM (logistic) res_glm <- S3VS(y = y, X = X, family = "binomial", method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_glm$selected ### [3] For survival model # Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Run S3VS for linear models res_surv <- S3VS(y = y_surv, X = X, family = "survival", surv_model = "COX", method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), vsel_method = "COXGLMNET", method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_surv$selected### [1] For linear model # Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run S3VS for LM res_lm <- S3VS(y = y, X = X, family = "normal", method_xy = "topk", param_xy = list(k=1), method_xx = "topk", param_xx = list(k=3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_lm$selected ### [2] For generalized linear model # Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Run S3VS for for GLM (logistic) res_glm <- S3VS(y = y, X = X, family = "binomial", method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_glm$selected ### [3] For survival model # Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Run S3VS for linear models res_surv <- S3VS(y = y_surv, X = X, family = "survival", surv_model = "COX", method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), vsel_method = "COXGLMNET", method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_surv$selected
S3VS_GLM performs variable selection based on the structured screen-and-select framework in generalized linear models.
S3VS_GLM(y, X, method_xy = c("topk", "fixedetasqthresh", "percetasqthresh"), param_xy, method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx, vsel_method = c("NLP", "LASSO", "ENET", "SCAD", "MCP"), alpha = 0.5, method_sel = c("conservative", "liberal"), method_rem = c("conservative_begin", "conservative_end", "liberal"), sel_regout = FALSE, rem_regout = FALSE, update_y_thresh = NULL, m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE)S3VS_GLM(y, X, method_xy = c("topk", "fixedetasqthresh", "percetasqthresh"), param_xy, method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx, vsel_method = c("NLP", "LASSO", "ENET", "SCAD", "MCP"), alpha = 0.5, method_sel = c("conservative", "liberal"), method_rem = c("conservative_begin", "conservative_end", "liberal"), sel_regout = FALSE, rem_regout = FALSE, update_y_thresh = NULL, m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE)
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Design matrix of predictors. Can be a base matrix or something |
method_xy |
Rule for screening some predictors as "leading variables" based on their association with the response; one of
|
param_xy |
Tuning parameter for |
method_xx |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param_xx |
Tuning parameter for |
vsel_method |
Character string specifying the variable selection method to be used within each leading set. Available options are |
alpha |
Only used when |
method_sel |
Policy for aggregating predictors selected across leading sets in an iteration; one of |
method_rem |
Policy for excluding predictors when no selections are made in an iteration; one of |
sel_regout |
Logical. If |
rem_regout |
Logical. If |
update_y_thresh |
Numeric scalar threshold controlling how the working response |
m |
Integer. Maximum number of S3VS iterations to perform. Defaults to |
nskip |
Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to |
verbose |
Logical. If |
seed |
If supplied, sets the random seed via |
parallel |
Logical. If |
For a binary response, S3VS considers the generalized linear model (GLM)
For the S3VS algorithm, see the manual of the top-level function S3VS.
A list with the following components:
selected |
A character vector of predictor names that were selected across all iterations. |
selected_iterwise |
A list recording the predictors selected at each iteration, in the order they were considered. |
runtime |
Runtime in seconds. |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
get_leadvars_GLM, get_leadsets, VS_method_GLM, select_vars, remove_vars, update_y_GLM
# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Run S3VS for for GLM (logistic) res_glm <- S3VS_GLM(y = y, X = X, method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_glm$selected# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Run S3VS for for GLM (logistic) res_glm <- S3VS_GLM(y = y, X = X, method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_glm$selected
S3VS_LM performs variable selection based on the structured screen-and-select framework in linear models.
S3VS_LM(y, X, cor_xy = NULL, method_xy = c("topk", "fixedcorthresh", "perccorthresh"), param_xy, method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx, vsel_method = c("NLP", "LASSO", "ENET", "SCAD", "MCP"), alpha = 0.5, method_sel = c("conservative", "liberal"), method_rem = c("conservative_begin", "conservative_end", "liberal"), rem_regout = FALSE, m = 100, nskip = 3, verbose = FALSE, seed = NULL)S3VS_LM(y, X, cor_xy = NULL, method_xy = c("topk", "fixedcorthresh", "perccorthresh"), param_xy, method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx, vsel_method = c("NLP", "LASSO", "ENET", "SCAD", "MCP"), alpha = 0.5, method_sel = c("conservative", "liberal"), method_rem = c("conservative_begin", "conservative_end", "liberal"), rem_regout = FALSE, m = 100, nskip = 3, verbose = FALSE, seed = NULL)
y |
Response. A numeric vector. |
X |
Design matrix of predictors. Can be a base matrix or something |
cor_xy |
Optional numeric vector of precomputed marginal correlations between |
method_xy |
Rule for screening some predictors as 'leading variables' based on their association with the response; one of
|
param_xy |
Tuning parameter for |
method_xx |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param_xx |
Tuning parameter for |
vsel_method |
Character string specifying the variable selection method to be used within each leading set. Available options are |
alpha |
Only used when |
method_sel |
Policy for aggregating predictors selected across leading sets in an iteration; one of |
method_rem |
Policy for excluding predictors when no selections are made in an iteration; one of |
rem_regout |
Logical. If |
m |
Integer. Maximum number of S3VS iterations to perform. Defaults to |
nskip |
Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to |
verbose |
Logical. If |
seed |
If supplied, sets the random seed via |
For a continuous response, S3VS considers the linear model (LM)
For the S3VS algorithm, see the manual of the top-level function S3VS.
A list with the following components:
selected |
A character vector of predictor names that were selected across all iterations. |
selected_iterwise |
A list recording the predictors selected at each iteration, in the order they were considered. |
runtime |
Runtime in seconds. |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
get_leadvars_LM, get_leadsets, VS_method_LM, select_vars, remove_vars, update_y_LM
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run S3VS for LM res_lm <- S3VS_LM(y = y, X = X, method_xy = "topk", param_xy = list(k=1), method_xx = "topk", param_xx = list(k=3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictor res_lm$selected# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run S3VS for LM res_lm <- S3VS_LM(y = y, X = X, method_xy = "topk", param_xy = list(k=1), method_xx = "topk", param_xx = list(k=3), vsel_method = "LASSO", method_sel = "conservative", method_rem = "conservative_begin", rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictor res_lm$selected
S3VS_SURV performs variable selection based on the structured screen-and-select framework in survival models.
S3VS_SURV(y, X, surv_model = c("COX", "AFT"), method_xy = c("topk", "fixedmuthresh", "percmuthresh"), param_xy, method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx, vsel_method = c("LASSO", "ENET", "AFTGEE", "BRIDGE", "PVAFT"), alpha = 0.5, method_sel = c("conservative", "liberal"), method_rem = c("conservative_begin", "conservative_end", "liberal"), m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE)S3VS_SURV(y, X, surv_model = c("COX", "AFT"), method_xy = c("topk", "fixedmuthresh", "percmuthresh"), param_xy, method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx, vsel_method = c("LASSO", "ENET", "AFTGEE", "BRIDGE", "PVAFT"), alpha = 0.5, method_sel = c("conservative", "liberal"), method_rem = c("conservative_begin", "conservative_end", "liberal"), m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE)
y |
Response. A list with components |
X |
Design matrix of predictors. Can be a base matrix or something |
surv_model |
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are |
method_xy |
Rule for screening some predictors as "leading variables" based on their association with the response; one of
|
param_xy |
Tuning parameter for |
method_xx |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param_xx |
Tuning parameter for |
vsel_method |
Character string specifying the variable selection method to be used within each leading set. Available options are |
alpha |
Only used when |
method_sel |
Policy for aggregating predictors selected across leading sets in an iteration; one of |
method_rem |
Policy for excluding predictors when no selections are made in an iteration; one of |
m |
Integer. Maximum number of S3VS iterations to perform. Defaults to |
nskip |
Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to |
verbose |
Logical. If |
seed |
If supplied, sets the random seed via |
parallel |
Logical. If |
For a survival type response, S3VS considers two choices of models–the Cox model
and the AFT model
For the S3VS algorithm, see the manual of the top-level function S3VS.
A list with the following components:
selected |
A character vector of predictor names that were selected across all iterations. |
selected_iterwise |
A list recording the predictors selected at each iteration, in the order they were considered. |
runtime |
Runtime in seconds. |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
get_leadvars_SURV, get_leadsets, VS_method_SURV, select_vars, remove_vars
# Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Run S3VS for linear models res_surv <- S3VS(y = y_surv, X = X, family = "survival", surv_model = "COX", method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), vsel_method = "COXGLMNET", method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_surv$selected# Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Run S3VS for linear models res_surv <- S3VS(y = y_surv, X = X, family = "survival", surv_model = "COX", method_xy = "topk", param_xy = list(k = 1), method_xx = "topk", param_xx = list(k = 3), vsel_method = "COXGLMNET", method_sel = "conservative", method_rem = "conservative_begin", sel_regout = FALSE, rem_regout = FALSE, m = 100, nskip = 3, verbose = TRUE, seed = 123) # View selected predictors res_surv$selected
select_vars combines variable selections obtained from multiple leading sets into a single set, using either a liberal (union) or conservative (progressive intersection) rule.
select_vars(listselect, method = c("conservative", "liberal"))select_vars(listselect, method = c("conservative", "liberal"))
listselect |
A |
method |
Aggregation rule. One of
|
The liberal rule favors inclusiveness, while the conservative rule favors stability.
Vector with names of the retained predictors (considered selected in the current iteration of S3VS); if no predictors are retained, character(0)).
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
listselect <- list( c("V1","V2","V23"), c("V4","V2","V23"), c("V4","V5","V23") ) select_vars(listselect, method="conservative")listselect <- list( c("V1","V2","V23"), c("V4","V2","V23"), c("V4","V5","V23") ) select_vars(listselect, method="conservative")
update_y updates the response accounting for the selected predictors in linear models, and selected or removed predictors in generalized linear models.
update_y(y, X, family, vars, update_y_thresh = NULL)update_y(y, X, family, vars, update_y_thresh = NULL)
y |
Response. If |
family |
Model family; one of |
X |
Predictor matrix. Can be a base matrix or something |
vars |
Character vector containing the names of predictors that need to be accounted for. They must appear in |
update_y_thresh |
Numeric scalar threshold used only |
Returns the updated response vector.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) update_y(y = y, X = X, family = "normal", vars = c("V1","V4"))# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) update_y(y = y, X = X, family = "normal", vars = c("V1","V4"))
update_y_LM updates the response accounting for the selected predictors in generalized linear models.
update_y_GLM(y, X, vars, update_y_thresh)update_y_GLM(y, X, vars, update_y_thresh)
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Predictor matrix. Can be a base matrix or something |
vars |
Character vector containing the names of predictors that need to be accounted for. They must appear in |
update_y_thresh |
Numeric scalar threshold. When |
Returns the updated (binary) response vector.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) update_y(family = "binomial", y = y, X = X, vars = c("V1","V4"), update_y_thresh = 0.8)# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) update_y(family = "binomial", y = y, X = X, vars = c("V1","V4"), update_y_thresh = 0.8)
update_y_LM updates the response accounting for the selected predictors in linear models.
update_y_LM(y, X, vars)update_y_LM(y, X, vars)
y |
Response. A numeric vector of length |
X |
Predictor matrix. Can be a base matrix or something |
vars |
Character vector containing the names of predictors that need to be accounted for. They must appear in |
Returns the updated response vector.
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) update_y(family = "normal", y = y, X = X, vars = c("V1","V4"))# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) update_y(family = "normal", y = y, X = X, vars = c("V1","V4"))
VS_method applies the chosen variable-selection algorithm to each leading set produced by S3VS at every iteration.
VS_method(y, X, family, surv_model = NULL, vsel_method, alpha = 0.5, p_thresh = 0.1, gamma = 0.9, verbose = FALSE)VS_method(y, X, family, surv_model = NULL, vsel_method, alpha = 0.5, p_thresh = 0.1, gamma = 0.9, verbose = FALSE)
y |
Response. If |
X |
Predictor matrix. Can be a base matrix or something |
family |
Model family; one of |
surv_model |
Character string specifying the survival model ( |
vsel_method |
Character string indicating the variable-selection engine used inside |
alpha |
Only used when |
p_thresh |
Only used for |
gamma |
Only used for |
verbose |
If |
Details to come...
A list containing:
sel |
Character vector with names of the selected predictors. |
nosel |
Character vector with names of the predictors not selected. |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
VS_method_LM, VS_method_GLM, VS_method_SURV
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run VS_method VS_method(y, X, family = "normal", vsel_method = "NLP", verbose = FALSE)# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run VS_method VS_method(y, X, family = "normal", vsel_method = "NLP", verbose = FALSE)
VS_method applies the chosen variable-selection algorithm for generalized linear models to each leading set produced by S3VS at every iteration.
VS_method_GLM(y, X, vsel_method, alpha = 0.5, verbose = FALSE, parallel = FALSE, ncores = NULL)VS_method_GLM(y, X, vsel_method, alpha = 0.5, verbose = FALSE, parallel = FALSE, ncores = NULL)
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Predictor matrix. Can be a base matrix or something |
vsel_method |
Character string indicating the variable-selection engine used at each iteration. Available options are |
alpha |
Only used when |
verbose |
If |
parallel |
Logical. If |
ncores |
Integer; number of CPU cores to use when |
A list containing:
sel |
Character vector with names of the selected predictors. |
nosel |
Character vector with names of the predictors not selected. |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
modelSelection, cv.glmnet, cv.ncvreg
# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Run VS_method VS_method_GLM(y, X, vsel_method = "LASSO", verbose = FALSE)# Simulate binary data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] prob <- 1 / (1 + exp(-eta)) y <- rbinom(n, size = 1, prob = prob) # Run VS_method VS_method_GLM(y, X, vsel_method = "LASSO", verbose = FALSE)
VS_method applies the chosen variable-selection algorithm for linear models to each leading set produced by S3VS at every iteration.
VS_method_LM(y, X, vsel_method, alpha = 0.5, verbose = FALSE)VS_method_LM(y, X, vsel_method, alpha = 0.5, verbose = FALSE)
y |
Response. A numeric vector. |
X |
Predictor matrix. Can be a base matrix or something |
vsel_method |
Character string indicating the variable-selection engine used at each iteration. Available options are |
alpha |
Only used when |
verbose |
If |
A list containing:
sel |
Character vector with names of the selected predictors. |
nosel |
Character vector with names of the predictors not selected. |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
modelSelection, cv.glmnet, cv.ncvreg
# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run VS_method VS_method_LM(y, X, vsel_method = "NLP", verbose = FALSE)# Simulate continuous data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) y <- X[,1] + 0.5 * X[,2] + rnorm(n) # Run VS_method VS_method_LM(y, X, vsel_method = "NLP", verbose = FALSE)
VS_method applies the chosen variable-selection algorithm for survival models to each leading set produced by S3VS at every iteration.
VS_method_SURV(y, X, surv_model, vsel_method, alpha = 0.5, p_thresh = 0.1, gamma = 0.9, verbose = FALSE, ...)VS_method_SURV(y, X, surv_model, vsel_method, alpha = 0.5, p_thresh = 0.1, gamma = 0.9, verbose = FALSE, ...)
y |
Response. A list with components |
X |
Predictor matrix. Can be a base matrix or something |
surv_model |
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are |
vsel_method |
Character string indicating the variable-selection engine used at each iteration. Available options are |
alpha |
Only used when |
p_thresh |
Only used with |
gamma |
Only used with |
verbose |
If |
... |
Other arguments to be passed inside |
A list containing:
sel |
Character vector with names of the selected predictors. |
nosel |
Character vector with names of the predictors not selected. |
Nilotpal Sanyal <[email protected]>, Padmore N. Prempeh <[email protected]>
cv.glmnet, aftreg, aftgee, bridge_aft, pvaft
# Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Run VS_method VS_method_SURV(y_surv, X, surv_model = "COX", vsel_method = "COXGLMNET", verbose = FALSE)# Simulate survival data (Cox) set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) colnames(X) <- paste0("V", 1:p) eta <- X[,1] + 0.5 * X[,2] base_rate <- 0.05 T_event <- rexp(n, rate = base_rate * exp(eta)) C <- rexp(n, rate = 0.03) time <- pmin(T_event, C) status <- as.integer(T_event <= C) y_surv <- list(time = time, status = status) # Run VS_method VS_method_SURV(y_surv, X, surv_model = "COX", vsel_method = "COXGLMNET", verbose = FALSE)