Title: | Consistent Significance Controlled Variable Selection in Generalized Linear Regression |
---|---|
Description: | Provides significance controlled variable selection algorithms with different directions (forward, backward, stepwise) based on diverse criteria (AIC, BIC, adjusted r-square, PRESS, or p-value). The algorithm selects a final model with only significant variables defined as those with significant p-values after multiple testing correction such as Bonferroni, False Discovery Rate, etc. See Zambom and Kim (2018) <doi:10.1002/sta4.210>. |
Authors: | Jongwook Kim, Adriano Zanin Zambom |
Maintainer: | Adriano Zanin Zambom <[email protected]> |
License: | GPL (>= 2) |
Version: | 4.3 |
Built: | 2024-11-20 06:46:45 UTC |
Source: | CRAN |
Provides significance controlled variable selection algorithms with different directions (forward, backward, stepwise) based on diverse criteria (AIC, BIC, adjusted r-square, PRESS, or p-value). The algorithm selects a final model with only significant variables defined as those with significant p-values after multiple testing correction such as Bonferroni, False Discovery Rate, etc. See Zambom and Kim (2018) <doi:10.1002/sta4.210>.
The DESCRIPTION file:
Package: | SignifReg |
Type: | Package |
Title: | Consistent Significance Controlled Variable Selection in Generalized Linear Regression |
Version: | 4.3 |
Date: | 2022-03-21 |
Imports: | car |
Author: | Jongwook Kim, Adriano Zanin Zambom |
Maintainer: | Adriano Zanin Zambom <[email protected]> |
Description: | Provides significance controlled variable selection algorithms with different directions (forward, backward, stepwise) based on diverse criteria (AIC, BIC, adjusted r-square, PRESS, or p-value). The algorithm selects a final model with only significant variables defined as those with significant p-values after multiple testing correction such as Bonferroni, False Discovery Rate, etc. See Zambom and Kim (2018) <doi:10.1002/sta4.210>. |
License: | GPL (>= 2) |
NeedsCompilation: | no |
Packaged: | 2022-03-21 23:46:25 UTC; adrianozambom |
Repository: | CRAN |
Date/Publication: | 2022-03-22 08:20:02 UTC |
Config/pak/sysreqs: | cmake make libicu-dev |
Jongwook Kim, Adriano Zanin Zambom
Maintainer: Adriano Zanin Zambom <[email protected]>
Zambom A Z, Kim J. Consistent significance controlled variable selection in high-dimensional regression. Stat.2018;7:e210. https://doi.org/10.1002/sta4.210
add1SignifReg adds to the model the predictor, out of the available predictors, which minimizes the criterion (AIC, BIC, r-ajd, PRESS, max p-value) as long as all the p-values of the predictors in the prospective model (including the prospective predictor) are below the chosen correction method (Bonferroni, FDR, None, etc). The function returns the fitted model with the additional predictor if any. A summary table of the prospective models can be printed with print.step = TRUE
.
max_pvalue
indicates the maximum p-value from the multiple t-tests for each predictor. More specifically, the algorithm computes the prospective models with each predictor included, and all p-values of this prospective model. Then, the predictor selected to be added to the model is the one whose generating model has the smallest p-values, in fact, the minimum of the maximum p-values in each prospective model.
add1SignifReg(fit, scope, alpha = 0.05, criterion = "p-value", adjust.method = "fdr", override = FALSE, print.step = FALSE)
add1SignifReg(fit, scope, alpha = 0.05, criterion = "p-value", adjust.method = "fdr", override = FALSE, print.step = FALSE)
fit |
an lm or glm object representing a linear regression model. |
scope |
defines the range of models examined in the stepwise search. This should be either a single formula, or a list containing components upper and lower, both formulae. See the details for how to specify the formulae and how they are used. |
alpha |
Significance level. Default value is 0.05. |
criterion |
Criterion to select predictor variables. |
adjust.method |
Correction for multiple testing accumulation of error. See |
override |
If |
print.step |
If true, information is printed for each step of variable selection.
Default is |
add1SifnifReg returns an object of the class lm
or glm
for a generalized regression model with the additional component steps.info
, which shows the steps taken during the variable selection and model metrics: Deviance, Resid.Df, Resid.Dev, AIC, BIC, adj.rsq, PRESS, max_pvalue, max.VIF, and whether it passed the chosen p-value correction.
Jongwook Kim <[email protected]>
Adriano Zanin Zambom <[email protected]>
Zambom A Z, Kim J. Consistent significance controlled variable selection in high-dimensional regression. Stat.2018;7:e210. https://doi.org/10.1002/sta4.210
SignifReg
, add1summary
, drop1summary
, drop1SignifReg
##mtcars data is used as an example. data(mtcars) nullmodel = lm(mpg~1, mtcars) fullmodel = lm(mpg~., mtcars) scope = list(lower=formula(nullmodel),upper=formula(fullmodel)) fit1 <- lm(mpg~1, data = mtcars) add1SignifReg(fit1, scope = scope, print.step = TRUE) fit2 <- lm(mpg~disp+cyl+wt+qsec, mtcars) add1SignifReg(fit2, scope = scope, criterion="AIC", override="TRUE")
##mtcars data is used as an example. data(mtcars) nullmodel = lm(mpg~1, mtcars) fullmodel = lm(mpg~., mtcars) scope = list(lower=formula(nullmodel),upper=formula(fullmodel)) fit1 <- lm(mpg~1, data = mtcars) add1SignifReg(fit1, scope = scope, print.step = TRUE) fit2 <- lm(mpg~disp+cyl+wt+qsec, mtcars) add1SignifReg(fit2, scope = scope, criterion="AIC", override="TRUE")
Offers summaries of prospective models as every available predictor in the scope is added to the model.
add1summary(fit, scope, alpha = 0.05, adjust.method = "fdr", sort.by = "p-value")
add1summary(fit, scope, alpha = 0.05, adjust.method = "fdr", sort.by = "p-value")
fit |
an lm or glm object representing a model. |
scope |
defines the range of models examined in the stepwise search. This should be either a single formula, or a list containing components upper and lower, both formulae. See the details for how to specify the formulae and how they are used. |
alpha |
Significance level. Default value is 0.05. |
adjust.method |
Correction for multiple testing accumulation of error. See |
sort.by |
The criterion to use to sort the table of prospective models. Must be one of |
max_pvalue
indicates the maximum p-value from the multiple t-tests for each predictor.
a table with the possible inclusions and the metrics of the prospective models: AIC, BIC, adj.rsq, PRESS, max_pvalue, max.VIF, and whether it passed the chosen p-value correction.
Jongwook Kim <[email protected]>
Adriano Zanin Zambom <[email protected]>
Zambom A Z, Kim J. Consistent significance controlled variable selection in high-dimensional regression. Stat.2018;7:e210. https://doi.org/10.1002/sta4.210
SignifReg
, add1SignifReg
, drop1summary
, drop1SignifReg
##mtcars data is used as an example. data(mtcars) nullmodel = lm(mpg~1, mtcars) fullmodel = lm(mpg~., mtcars) scope = list(lower=formula(nullmodel),upper=formula(fullmodel)) fit1 <- lm(mpg~1, mtcars) add1summary(fit1, scope = scope) fit2 <- lm(mpg~disp+cyl+wt+qsec+cyl, data = mtcars) add1summary(fit2, scope = scope)
##mtcars data is used as an example. data(mtcars) nullmodel = lm(mpg~1, mtcars) fullmodel = lm(mpg~., mtcars) scope = list(lower=formula(nullmodel),upper=formula(fullmodel)) fit1 <- lm(mpg~1, mtcars) add1summary(fit1, scope = scope) fit2 <- lm(mpg~disp+cyl+wt+qsec+cyl, data = mtcars) add1summary(fit2, scope = scope)
drop1SignifReg removes from the model the predictor, out of the current predictors, which minimizes the criterion (AIC, BIC, r-ajd, PRESS, max p-value) when a) the p-values of the predictors in the current model do not pass the multiple testing correction (Bonferroni, FDR, None, etc) or b) when the p-values of both current and prospective models pass the correction but the criterion of the prospective model is smaller.
max_pvalue
indicates the maximum p-value from the multiple t-tests for each predictor. More specifically, the algorithm computes the prospective models with each predictor included, and all p-values of this prospective model. Then, the predictor selected to be added to the model is the one whose generating model has the smallest p-values, in fact, the minimum of the maximum p-values in each prospective model.
drop1SignifReg(fit, scope, alpha = 0.05, criterion = "p-value", adjust.method = "fdr", override = FALSE, print.step = FALSE)
drop1SignifReg(fit, scope, alpha = 0.05, criterion = "p-value", adjust.method = "fdr", override = FALSE, print.step = FALSE)
fit |
an lm or glm object representing a model. |
scope |
defines the range of models examined in the stepwise search. This should be either a single formula, or a list containing components upper and lower, both formulae. See the details for how to specify the formulae and how they are used. |
alpha |
Significance level. Default value is 0.05. |
criterion |
Criterion to select predictor variables. |
adjust.method |
Correction for multiple testing accumulation of error. See |
override |
If |
print.step |
If true, information is printed for each step of variable selection.
Default is |
drop1SifnifReg returns an object of the class lm
or glm
for a generalized regression model with the additional component steps.info
, which shows the steps taken during the variable selection and model metrics: Deviance, Resid.Df, Resid.Dev, AIC, BIC, adj.rsq, PRESS, max_pvalue, max.VIF, and whether it passed the chosen p-value correction.
Jongwook Kim <[email protected]>
Adriano Zanin Zambom <[email protected]>
Zambom A Z, Kim J. Consistent significance controlled variable selection in high-dimensional regression. Stat.2018;7:e210. https://doi.org/10.1002/sta4.210
SignifReg
, add1summary
, add1SignifReg
, drop1summary
,
##mtcars data is used as an example. data(mtcars) fit <- lm(mpg~., mtcars) drop1SignifReg(fit, print.step = TRUE)
##mtcars data is used as an example. data(mtcars) fit <- lm(mpg~., mtcars) drop1SignifReg(fit, print.step = TRUE)
Offers summaries of prospective models as every predictor in the model is removed from the model.
drop1summary(fit, scope, alpha = 0.05, adjust.method = "fdr", sort.by = "p-value")
drop1summary(fit, scope, alpha = 0.05, adjust.method = "fdr", sort.by = "p-value")
fit |
an lm or glm object representing a model. |
scope |
defines the range of models examined in the stepwise search. This should be either a single formula, or a list containing components upper and lower, both formulae. See the details for how to specify the formulae and how they are used. |
alpha |
Significance level. Default value is 0.05. |
adjust.method |
Correction for multiple testing accumulation of error. See |
sort.by |
The criterion to use to sort the table of prospective models. Must be one of |
max_pvalue
indicates the maximum p-value from the multiple t-tests for each predictor.
a table with the possible exclusions and the metrics of the prospective models: AIC, BIC, adj.rsq, PRESS, max_pvalue, max.VIF, and whether it passed the chosen p-value correction.
Jongwook Kim <[email protected]>
Adriano Zanin Zambom <[email protected]>
Zambom A Z, Kim J. Consistent significance controlled variable selection in high-dimensional regression. Stat.2018;7:e210. https://doi.org/10.1002/sta4.210
SignifReg
, add1summary
, add1SignifReg
, drop1SignifReg
,
##mtcars data is used as an example. data(mtcars) fit <- lm(mpg~., mtcars) drop1summary(fit)
##mtcars data is used as an example. data(mtcars) fit <- lm(mpg~., mtcars) drop1summary(fit)
Significance controlled variable selection selects variables in a generalized linear regression model with different directions of the algorithm (forward, backward, stepwise) based on a chosen criterion (AIC, BIC, adjusted r-square, PRESS or p-value). The algorithm selects a final model with only significant variables based on a correction choice of False Discovery Rate, Bonferroni, etc from the p.adjust().
SignifReg(fit, scope, alpha = 0.05, direction = "forward", criterion = "p-value", adjust.method = "fdr", trace=FALSE)
SignifReg(fit, scope, alpha = 0.05, direction = "forward", criterion = "p-value", adjust.method = "fdr", trace=FALSE)
fit |
an lm or glm object representing a model. It is an initial model for the variable selection. |
scope |
defines the range of models examined in the stepwise search. This should be either a single formula, or a list containing components upper and lower, both formulae. See the details for how to specify the formulae and how they are used. |
alpha |
Significance level. Default value is 0.05. |
direction |
Direction in variable selection:
|
criterion |
Criterion to select predictor variables. |
adjust.method |
Correction for multiple testing accumulation of error. See |
trace |
If true, information is printed for each step of variable selection.
Default is |
SignifReg selects only significant predictors according to a designated criterion. A model with the best criterion, for example, the smallest AIC, will not be considered if it includes insignificant predictors based on the chosen correction. When the criterion is "p-value", a predictor can be droped only if the current model has an insignificant pedictor, and a predictor can be added as long as the prospective model has all predictors significant (including the one to be added). The predictor to be added or removed is the one that generates a model having the smallest maximum p-value of the t-tests in the prospective models. This step is repeated as long as every predictor is significant according to the correction criterion. In the case that the criterion is "AIC", and "BIC", SignifReg selects, at each step, the model having the smallest value of the criterion among models having only significant predictors according to the chosen correction.
SifnifReg returns an object of the class lm
or glm
for a generalized regression model with the additional component steps.info
, which shows the steps taken during the variable selection and model metrics: Deviance, Resid.Df, Resid.Dev, AIC, BIC, adj.rsq, PRESS, max_pvalue, max.VIF, and whether it passed the chosen p-value correction.
Jongwook Kim <[email protected]>
Adriano Zanin Zambom <[email protected]>
Zambom A Z, Kim J. Consistent significance controlled variable selection in high-dimensional regression. Stat.2018;7:e210. https://doi.org/10.1002/sta4.210
add1SignifReg
, drop1SignifReg
, add1summary
, drop1summary
##mtcars data is used as an example. data(mtcars) nullmodel = lm(mpg~1, mtcars) fullmodel = lm(mpg~., mtcars) scope = list(lower=formula(nullmodel),upper=formula(fullmodel)) fit1 <- lm(mpg~1, mtcars) select.fit = SignifReg(fit1, scope = scope, direction = "forward", trace = TRUE) select.fit$steps.info fit = lm(mpg ~cyl + hp + am + gear, data = mtcars) select.fit = SignifReg(fit,scope=scope, alpha = 0.05,direction = "backward", criterion = "p-value",adjust.method = "fdr",trace=TRUE) select.fit$steps.info fit = lm(mpg ~ cyl + hp + am + gear + disp, data = mtcars) select.fit = SignifReg(fit,scope=scope, alpha = 0.5,direction = "both", criterion = "AIC",adjust.method = "fdr",trace=TRUE) select.fit$steps.info
##mtcars data is used as an example. data(mtcars) nullmodel = lm(mpg~1, mtcars) fullmodel = lm(mpg~., mtcars) scope = list(lower=formula(nullmodel),upper=formula(fullmodel)) fit1 <- lm(mpg~1, mtcars) select.fit = SignifReg(fit1, scope = scope, direction = "forward", trace = TRUE) select.fit$steps.info fit = lm(mpg ~cyl + hp + am + gear, data = mtcars) select.fit = SignifReg(fit,scope=scope, alpha = 0.05,direction = "backward", criterion = "p-value",adjust.method = "fdr",trace=TRUE) select.fit$steps.info fit = lm(mpg ~ cyl + hp + am + gear + disp, data = mtcars) select.fit = SignifReg(fit,scope=scope, alpha = 0.5,direction = "both", criterion = "AIC",adjust.method = "fdr",trace=TRUE) select.fit$steps.info