Title: | L2 Penalized Logistic Regression with Stepwise Variable Selection |
---|---|
Description: | L2 penalized logistic regression for both continuous and discrete predictors, with forward stagewise/forward stepwise variable selection procedure. |
Authors: | Mee Young Park, Trevor Hastie |
Maintainer: | Mee Young Park <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.93 |
Built: | 2024-10-31 20:30:38 UTC |
Source: | CRAN |
This function computes cross-validated deviance or prediction errors
for step.plr.
The parameters that can be cross-validated are
lambda
and cp
.
cv.step.plr(x, y, weights = rep(1, length(y)), nfold = 5, folds = NULL, lambda = c(1e-4, 1e-2, 1), cp = c("aic", "bic"), cv.type=c("deviance", "class"), trace = TRUE, ...)
cv.step.plr(x, y, weights = rep(1, length(y)), nfold = 5, folds = NULL, lambda = c(1e-4, 1e-2, 1), cp = c("aic", "bic"), cv.type=c("deviance", "class"), trace = TRUE, ...)
x |
matrix of features |
y |
binary response |
weights |
optional vector of weights for observations |
nfold |
number of folds to be used in cross-validation. Default is
|
folds |
list of cross-validation folds. Its length must be |
lambda |
vector of the candidate values for |
cp |
vector of the candidate values for |
cv.type |
If |
trace |
If |
... |
other options for |
This function computes cross-validated deviance or prediction errors
for step.plr.
The parameters that can be cross-validated are
lambda
and cp
. If both are input as vectors (of length
greater than 1), then a two-dimensional cross-validation is done. If
either one is input as a single value, then the cross-validation is
done only on the parameter with multiple inputs.
Mee Young Park and Trevor Hastie
Mee Young Park and Trevor Hastie (2008) Penalized Logistic Regression for Detecting Gene Interactions
step.plr
n <- 100 p <- 5 x <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) y <- sample(c(0, 1), n, replace=TRUE) level <- vector("list", length=p) for (i in 1:p) level[[i]] <- seq(3) cvfit <- cv.step.plr(x, y, level=level, lambda=c(1e-4, 1e-2, 1), cp="bic")
n <- 100 p <- 5 x <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) y <- sample(c(0, 1), n, replace=TRUE) level <- vector("list", length=p) for (i in 1:p) level[[i]] <- seq(3) cvfit <- cv.step.plr(x, y, level=level, lambda=c(1e-4, 1e-2, 1), cp="bic")
This function fits a logistic regression model penalizing the size of the L2 norm of the coefficients.
plr(x, y, weights = rep(1,length(y)), offset.subset = NULL, offset.coefficients = NULL, lambda = 1e-4, cp = "bic")
plr(x, y, weights = rep(1,length(y)), offset.subset = NULL, offset.coefficients = NULL, lambda = 1e-4, cp = "bic")
x |
matrix of features |
y |
binary response |
weights |
optional vector of weights for observations |
offset.subset |
optional vector of indices for the predictors for which the
coefficients are preset to |
offset.coefficients |
optional vector of preset coefficient values for the predictors in
|
lambda |
regularization parameter for the L2 norm of the coefficients. The
minimizing criterion in |
cp |
complexity parameter to be used when computing the
score. |
We proposed using logistic regression with a quadratic penalization on
the coefficients for detecting gene interactions as described in
"Penalized Logistic Regression for Detecting Gene Interactions (2008)"
by Park and Hastie. However, this function plr
may be used for
a general purpose.
A plr
object is returned. predict, print,
and
summary
functions can be applied.
coefficients |
vector of the coefficient estimates |
covariance |
sandwich estimate of the covariance matrix for the coefficients |
deviance |
deviance of the fitted model |
null.deviance |
deviance of the null model |
df |
degrees of freedom of the fitted model |
score |
deviance + cp*df |
nobs |
number of observations |
cp |
complexity parameter used when computing the score |
fitted.values |
fitted probabilities |
linear.predictors |
linear predictors computed with the estimated coefficients |
level |
If any categorical factors are input, level - the list of level sets
- is automatically generated and returned. See |
Mee Young Park and Trevor Hastie
Mee Young Park and Trevor Hastie (2008) Penalized Logistic Regression for Detecting Gene Interactions
predict.plr, step.plr
n <- 100 p <- 10 x <- matrix(rnorm(n * p), nrow=n) y <- sample(c(0, 1), n, replace=TRUE) fit <- plr(x, y, lambda=1) p <- 3 z <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x <- data.frame(x1=factor(z[, 1]), x2=factor(z[, 2]), x3=factor(z[, 3])) y <- sample(c(0, 1), n, replace=TRUE) fit <- plr(x, y, lambda=1) # 'level' is automatically generated. Check 'fit$level'.
n <- 100 p <- 10 x <- matrix(rnorm(n * p), nrow=n) y <- sample(c(0, 1), n, replace=TRUE) fit <- plr(x, y, lambda=1) p <- 3 z <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x <- data.frame(x1=factor(z[, 1]), x2=factor(z[, 2]), x3=factor(z[, 3])) y <- sample(c(0, 1), n, replace=TRUE) fit <- plr(x, y, lambda=1) # 'level' is automatically generated. Check 'fit$level'.
This function computes the linear predictors, probability estimates,
or the class labels for new data, using a plr
object.
## S3 method for class 'plr' predict(object, newx = NULL, type = c("link", "response", "class"), ...)
## S3 method for class 'plr' predict(object, newx = NULL, type = c("link", "response", "class"), ...)
object |
|
newx |
matrix of features at which the predictions are made. If
|
type |
If |
... |
other options for prediction |
Mee Young Park and Trevor Hastie
Mee Young Park and Trevor Hastie (2008) Penalized Logistic Regression for Detecting Gene Interactions
plr
n <- 100 p <- 10 x0 <- matrix(rnorm(n * p), nrow=n) y <- sample(c(0, 1), n, replace=TRUE) fit <- plr(x0, y, lambda=1) x1 <- matrix(rnorm(n * p), nrow=n) pred1 <- predict(fit, x1, type="link") pred2 <- predict(fit, x1, type="response") pred3 <- predict(fit, x1, type="class") p <- 3 z <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x0 <- data.frame(x1=factor(z[, 1]), x2=factor(z[, 2]), x3=factor(z[, 3])) y <- sample(c(0, 1), n, replace=TRUE) fit <- plr(x0, y, lambda=1) z <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x1 <- data.frame(x1=factor(z[, 1]), x2=factor(z[, 2]), x3=factor(z[, 3])) pred1 <- predict(fit, x1, type="link") pred2 <- predict(fit, x1, type="response") pred3 <- predict(fit, x1, type="class")
n <- 100 p <- 10 x0 <- matrix(rnorm(n * p), nrow=n) y <- sample(c(0, 1), n, replace=TRUE) fit <- plr(x0, y, lambda=1) x1 <- matrix(rnorm(n * p), nrow=n) pred1 <- predict(fit, x1, type="link") pred2 <- predict(fit, x1, type="response") pred3 <- predict(fit, x1, type="class") p <- 3 z <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x0 <- data.frame(x1=factor(z[, 1]), x2=factor(z[, 2]), x3=factor(z[, 3])) y <- sample(c(0, 1), n, replace=TRUE) fit <- plr(x0, y, lambda=1) z <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x1 <- data.frame(x1=factor(z[, 1]), x2=factor(z[, 2]), x3=factor(z[, 3])) pred1 <- predict(fit, x1, type="link") pred2 <- predict(fit, x1, type="response") pred3 <- predict(fit, x1, type="class")
This function computes the linear predictors, probability estimates,
or the class labels for new data, using a stepplr
object.
## S3 method for class 'stepplr' predict(object, x = NULL, newx = NULL, type = c("link", "response", "class"), ...)
## S3 method for class 'stepplr' predict(object, x = NULL, newx = NULL, type = c("link", "response", "class"), ...)
object |
|
x |
matrix of features used for fitting |
newx |
matrix of features at which the predictions are made. If
|
type |
If |
... |
other options for prediction |
Mee Young Park and Trevor Hastie
Mee Young Park and Trevor Hastie (2008) Penalized Logistic Regression for Detecting Gene Interactions
stepplr
n <- 100 p <- 5 x0 <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x0 <- cbind(rnorm(n), x0) y <- sample(c(0, 1), n, replace=TRUE) level <- vector("list", length=6) for (i in 2:6) level[[i]] <- seq(3) fit <- step.plr(x0, y, level=level) x1 <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x1 <- cbind(rnorm(n), x1) pred1 <- predict(fit, x0, x1, type="link") pred2 <- predict(fit, x0, x1, type="response") pred3 <- predict(fit, x0, x1, type="class")
n <- 100 p <- 5 x0 <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x0 <- cbind(rnorm(n), x0) y <- sample(c(0, 1), n, replace=TRUE) level <- vector("list", length=6) for (i in 2:6) level[[i]] <- seq(3) fit <- step.plr(x0, y, level=level) x1 <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x1 <- cbind(rnorm(n), x1) pred1 <- predict(fit, x0, x1, type="link") pred2 <- predict(fit, x0, x1, type="response") pred3 <- predict(fit, x0, x1, type="class")
This function fits a series of L2 penalized logistic regression models selecting variables through the forward stepwise selection procedure.
step.plr(x, y, weights = rep(1,length(y)), fix.subset = NULL, level = NULL, lambda = 1e-4, cp = "bic", max.terms = 5, type = c("both", "forward", "forward.stagewise"), trace = FALSE)
step.plr(x, y, weights = rep(1,length(y)), fix.subset = NULL, level = NULL, lambda = 1e-4, cp = "bic", max.terms = 5, type = c("both", "forward", "forward.stagewise"), trace = FALSE)
x |
matrix of features |
y |
binary response |
weights |
optional vector of weights for observations |
fix.subset |
vector of indices for the variables that are forced to be in the model |
level |
list of length |
lambda |
regularization parameter for the L2 norm of the
coefficients. The minimizing criterion in |
cp |
complexity parameter to be used when computing the
score. |
max.terms |
maximum number of terms to be added in the forward selection
procedure. Default is |
type |
If |
trace |
If |
This function implements an L2 penalized logistic regression along with the stepwise variable selection procedure, as described in "Penalized Logistic Regression for Detecting Gene Interactions (2008)" by Park and Hastie.
If type="forward",
max.terms
terms are sequentially
added to the model, and the model that minimizes score
is
selected as the optimal fit. If type="both",
a backward
deletion is done in addition, which provides a series of models with a
different combination of the selected terms. The optimal model
minimizing score
is chosen from the second list.
A stepplr
object is returned. anova, predict, print,
and
summary
functions can be applied.
fit |
|
action |
list that stores the selection order of the terms in the optimal model |
action.name |
list of the names of the sequentially added terms - in the same
order as in |
deviance |
deviance of the fitted model |
df |
residual degrees of freedom of the fitted model |
score |
deviance + cp*df, where df is the model degrees of freedom |
group |
vector of the counts for the dummy variables, to be used in
|
y |
response variable used |
weight |
weights used |
fix.subset |
fix.subset used |
level |
level used |
lambda |
lambda used |
cp |
complexity parameter used when computing the score |
type |
type used |
xnames |
column names of |
Mee Young Park and Trevor Hastie
Mee Young Park and Trevor Hastie (2008) Penalized Logistic Regression for Detecting Gene Interactions
cv.step.plr, plr, predict.stepplr
n <- 100 p <- 3 z <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x <- data.frame(x1=factor(z[, 1]), x2=factor(z[, 2]), x3=factor(z[, 3])) y <- sample(c(0, 1), n, replace=TRUE) fit <- step.plr(x, y) # 'level' is automatically generated. Check 'fit$level'. p <- 5 x <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x <- cbind(rnorm(n), x) y <- sample(c(0, 1), n, replace=TRUE) level <- vector("list", length=6) for (i in 2:6) level[[i]] <- seq(3) fit1 <- step.plr(x, y, level=level, cp="aic") fit2 <- step.plr(x, y, level=level, cp=4) fit3 <- step.plr(x, y, level=level, type="forward") fit4 <- step.plr(x, y, level=level, max.terms=10) # This is an example in which 'level' was input manually. # level[[1]] should be either 'NULL' or 'NA' since the first factor is continuous.
n <- 100 p <- 3 z <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x <- data.frame(x1=factor(z[, 1]), x2=factor(z[, 2]), x3=factor(z[, 3])) y <- sample(c(0, 1), n, replace=TRUE) fit <- step.plr(x, y) # 'level' is automatically generated. Check 'fit$level'. p <- 5 x <- matrix(sample(seq(3), n * p, replace=TRUE), nrow=n) x <- cbind(rnorm(n), x) y <- sample(c(0, 1), n, replace=TRUE) level <- vector("list", length=6) for (i in 2:6) level[[i]] <- seq(3) fit1 <- step.plr(x, y, level=level, cp="aic") fit2 <- step.plr(x, y, level=level, cp=4) fit3 <- step.plr(x, y, level=level, type="forward") fit4 <- step.plr(x, y, level=level, max.terms=10) # This is an example in which 'level' was input manually. # level[[1]] should be either 'NULL' or 'NA' since the first factor is continuous.