Title: | Fitting Exact Conditional Logistic Regression with Lasso and Elastic Net Penalties |
---|---|
Description: | Tools for the fitting and cross validation of exact conditional logistic regression models with lasso and elastic net penalties. Uses cyclic coordinate descent and warm starts to compute the entire path efficiently. |
Authors: | Stephen Reid and Robert Tibshirani |
Maintainer: | Stephen Reid <[email protected]> |
License: | GPL-2 |
Version: | 1.5 |
Built: | 2024-10-29 06:23:10 UTC |
Source: | CRAN |
Tools for the fitting and cross validation of exact conditional logistic regression models with lasso and elastic net penalties. Uses cyclic coordinate descent and warm starts to compute the entire path efficiently.
Package: | clogitL1 |
Type: | Package |
Version: | 1.4 |
Date: | 2013-05-06 |
License: | GPL-2 |
Very simple to use. The main fitting function clogitL1
accepts x, y data and a strata vector indicating stratum membership. It fits the exact conditional logistic regression model at a grid of regularisation parameters.
Only 7 functions:
clogitL1
cv.clogitL1
plot.clogitL1
plot.cv.clogitL1
print.clogitL1
summary.clogitL1
summary.cv.clogitL1
Stephen Reid and Rob Tibshirani
Maintainer: Stephen Reid <[email protected]>
http://www.jstatsoft.org/v58/i12/
Fit a sequence of conditional logistic regression models with lasso or elastic net penalties
clogitL1 (x, y, strata, numLambda=100, minLambdaRatio=0.000001, switch=0, alpha = 1)
clogitL1 (x, y, strata, numLambda=100, minLambdaRatio=0.000001, switch=0, alpha = 1)
x |
matrix with rows equalling the number of observations. Contains the p-vector regressor values as rows |
y |
vector of binary responses with 1 for cases and 0 for controls. |
strata |
vector with stratum membership of each observation. |
numLambda |
number of different values of the regularisation parameter |
minLambdaRatio |
ratio of smallest to larget value of regularisation parameter |
switch |
index (between 0 and |
alpha |
parameter controling trade off between lasso and ridge penalties. At value 1, we have a pure lasso penalty; at 0, pure ridge. Intermediate values provide a mixture of the two. |
The sequence of models implied by numLambda
and minLambdaRatio
is fit by coordinate descent with warm starts and sequential strong rules. If alpha=1
, we fit using a lasso penalty. Otherwise we fit with an elastic net penalty. Note that a pure ridge penalty is never obatined, because the function sets a floor for alpha
at 0.000001. This improves the stability of the algorithm. A similar lower bound is set for minLambdaRatio
. The sequence of models can be truncated at fewer than numLambda
models if it is found that a very large proportion of training set deviance is explained by the model in question.
An object of type clogitL1
with the following fields:
beta |
( |
lambda |
vector of length |
nz_beta |
vector of length |
ss_beta |
vector of length |
dev_perc |
vector of length |
y_c |
reordered vector of responses. Grouped by stratum with cases coming first. |
X_c |
reordered matrix of predictors. See above. |
strata_c |
reordered stratum vector. See above. |
nVec |
vector of length the number of unique strata in |
mVec |
vector containing the number of cases in each stratum. |
alpha |
penalty trade off parameter. |
http://www.jstatsoft.org/v58/i12/
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) plot(clcvObj)
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) plot(clcvObj)
Find the best of a sequence of conditional logistic regression models with lasso or elastic net penalties using cross validation
cv.clogitL1 (clObj, numFolds=10)
cv.clogitL1 (clObj, numFolds=10)
clObj |
an object of type |
numFolds |
the number of folds used in cross validation. Defaults to the minimum of 10 or the number of observations |
Performs numFolds
-fold cross validation on an object of type clogitL1
. Using the sequence of regularisation parameters generated by clObj
, the function chooses strata to leave out randomly. The penalised conditional logistic regression model is fit to the non-left-out strata in turn and its deviance compared to an out-of-sample deviance computed on the left-out strata. Fitting models to individual non-left-out strata proceeds using the cyclic coordinate descent-warm start-strong rule type algorithm used in clogitL1
, only with a prespecified sequence of .
An object of type cv.clogitL1
with the following fields:
cv_dev |
matrix of size |
lambda |
vector of regularisation parameters. |
folds |
vector showing the folds membership of each observation. |
mean_cv |
vector containing mean CV deviances for each value of the regularisation parameter. |
se_cv |
vector containing an estimate of the standard error of the CV deviance at each value of the regularisation parameter. |
minCV_lambda |
value of the regularisation parameter at which we have minimum |
minCV1se_lambda |
value of the regularisation parameter corresponding to the 1-SE rule. Selects the simplest model with estimate CV within 1 standard deviation of the minimum cv. |
nz_beta |
number of nonzero parameter estimates at each value of the regularisation parameter. |
http://www.jstatsoft.org/v58/i12/
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) plot(clcvObj)
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) plot(clcvObj)
Takes a clogitL1
object and plots the parameter profile associated with it.
## S3 method for class 'clogitL1' plot(x, logX=T, add.legend=F, add.labels=T, lty=1:ncol(x$beta), col=1:ncol(x$beta), ...)
## S3 method for class 'clogitL1' plot(x, logX=T, add.legend=F, add.labels=T, lty=1:ncol(x$beta), col=1:ncol(x$beta), ...)
x |
an object of type |
logX |
should the horizontal axis be on log scale? |
add.legend |
set to TRUE if legend should be printed in top right hand corner. Legend will contain names of variables in data.frame, if specified, otherwise will be numbered from 1 to p in order encountered in original input matrix x |
add.labels |
set to TRUE if labels are to be added to curves at leftmost side. If variable names are available, these are plotted, otherwise, curves are numbered from 1 to p in order encountered in original input matrix x |
lty |
usual 'lty' plotting parameter. |
col |
usual 'col' plotting parameter. |
... |
additional arguments to |
http://www.jstatsoft.org/v58/i12/
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) plot(clcvObj)
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) plot(clcvObj)
Takes a cv.clogitL1
object and plots the CV deviance curve with standard error bands and minima.
## S3 method for class 'cv.clogitL1' plot(x, ...)
## S3 method for class 'cv.clogitL1' plot(x, ...)
x |
an object of type |
... |
additional arguments to |
http://www.jstatsoft.org/v58/i12/
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) plot(clcvObj)
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) plot(clcvObj)
Takes a clogitL1
object and prints a summary of the sequence of models fitted.
## S3 method for class 'clogitL1' print(x, digits = 6, ...)
## S3 method for class 'clogitL1' print(x, digits = 6, ...)
x |
an object of type |
digits |
the number of significant digits after the decimal to be printed |
... |
additional arguments to |
prints a 3 column data frame with columns:
Df
: number of non-zero parameters in model
DevPerc
: percentage of null deviance explained by current model
Lambda
: associated value
http://www.jstatsoft.org/v58/i12/
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) clObj
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) clObj
Takes a clogitL1
object and produces a summary of the sequence of models fitted.
## S3 method for class 'clogitL1' summary(object, ...)
## S3 method for class 'clogitL1' summary(object, ...)
object |
an object of type |
... |
any additional arguments passed to |
Returns a list with a elements Coefficients
, which holds the matrix of coefficients estimated (each row holding the estimates for a given value of the smoothing parameter) and Lambda
, which holds the vector of smoothing parameters at which fits were produced.
http://www.jstatsoft.org/v58/i12/
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) summary(clObj)
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) summary(clObj)
Provides summary of conditional logistic regression models after cross validation
## S3 method for class 'cv.clogitL1' summary(object, ...)
## S3 method for class 'cv.clogitL1' summary(object, ...)
object |
an object of type |
... |
additional arguments to |
Extracts pertinent information from the supplied cv.clogitL1
objects. See below for details on output value.
A list with the following fields:
lambda_minCV |
value of regularisation parameter minimising CV deviance |
beta_minCV |
coefficient profile at the minimising value of the regularisation parameter. Whole dataset used to compute estimates. |
nz_beta_minCV |
number of non-zero coefficients in the CV deviance minimising coefficient profile. |
lambda_minCV1se |
value of regularisaion parameter minimising CV deviance (using 1 standard error rule) |
beta_minCV1se |
coefficient profile at the 1-standard-error-rule value of the regularisation parameter. Whole dataset used to compute estimates. |
nz_beta_minCV1se |
number of non-zero coefficients in the 1-standard-error-rule coefficient profile. |
http://www.jstatsoft.org/v58/i12/
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) summary(clcvObj)
set.seed(145) # data parameters K = 10 # number of strata n = 5 # number in strata m = 2 # cases per stratum p = 20 # predictors # generate data y = rep(c(rep(1, m), rep(0, n-m)), K) X = matrix (rnorm(K*n*p, 0, 1), ncol = p) # pure noise strata = sort(rep(1:K, n)) par(mfrow = c(1,2)) # fit the conditional logistic model clObj = clogitL1(y=y, x=X, strata) plot(clObj, logX=TRUE) # cross validation clcvObj = cv.clogitL1(clObj) summary(clcvObj)