Title: | Group Lasso Penalized Learning Using a Unified BMD Algorithm |
---|---|
Description: | A unified algorithm, blockwise-majorization-descent (BMD), for efficiently computing the solution paths of the group-lasso penalized least squares, logistic regression, Huberized SVM and squared SVM. The package is an implementation of Yang, Y. and Zou, H. (2015) DOI: <doi:10.1007/s11222-014-9498-5>. |
Authors: | Yi Yang [aut, cre] (http://www.math.mcgill.ca/yyang/), Hui Zou [aut] (http://users.stat.umn.edu/~zouxx019/), Sahir Bhatnagar [aut] (http://sahirbhatnagar.com/) |
Maintainer: | Yi Yang <[email protected]> |
License: | GPL-2 |
Version: | 1.5.1 |
Built: | 2024-12-18 06:41:59 UTC |
Source: | CRAN |
Gene expression data (20 genes for 120 samples) from the microarray experiments of mammalian eye tissue samples of Scheetz et al. (2006).
bardet
bardet
An object of class list
of length 2.
This data set contains 120 samples with 100 predictors (expanded from 20 genes using 5 basis B-splines, as described in Yang, Y. and Zou, H. (2015)).
A list with the following elements:
x |
a [120 x 100] matrix (expanded from a [120 x 20] matrix) giving the expression levels of 20 filtered genes for the 120 samples. Each row corresponds to a subject, each 5 consecutive columns to a grouped gene. |
y |
a numeric vector of length 120 giving expression level of gene TRIM32, which causes Bardet-Biedl syndrome. |
Scheetz, T., Kim, K., Swiderski, R., Philp, A., Braun, T.,
Knudtson, K., Dorrance, A., DiBona, G., Huang, J., Casavant, T. et al.
(2006), “Regulation of gene expression in the mammalian eye and its
relevance to eye disease”, Proceedings of the National Academy of
Sciences 103(39), 14429-14434.
Huang, J., S. Ma, and C.-H. Zhang (2008). “Adaptive Lasso for sparse
high-dimensional regression models”. Statistica Sinica 18,
1603-1618.
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for Computing
Group-Lasso Penalized Learning Problems,” Statistics and Computing.
25(6), 1129-1141.
BugReport: https://github.com/emeryyi/gglasso
# load gglasso library library(gglasso) # load data set data(bardet) # how many samples and how many predictors ? dim(bardet$x) # repsonse y bardet$y
# load gglasso library library(gglasso) # load data set data(bardet) # how many samples and how many predictors ? dim(bardet$x) # repsonse y bardet$y
This function gets coefficients or makes coefficient predictions from a
cross-validated gglasso
model, using the stored "gglasso.fit"
object, and the optimal value chosen for lambda
.
## S3 method for class 'cv.gglasso' coef(object, s = c("lambda.1se", "lambda.min"), ...)
## S3 method for class 'cv.gglasso' coef(object, s = c("lambda.1se", "lambda.min"), ...)
object |
fitted |
s |
value(s) of the penalty parameter |
... |
not used. Other arguments to predict. |
This function makes it easier to use the results of cross-validation to get coefficients or make coefficient predictions.
The coefficients at the requested values for lambda
.
Yi Yang and Hui Zou
Maintainer: Yi Yang <[email protected]>
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for
Computing Group-Lasso Penalized Learning Problems,” Statistics and
Computing. 25(6), 1129-1141.
BugReport:
https://github.com/emeryyi/gglasso
Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths
for generalized linear models via coordinate descent," Journal of
Statistical Software, 33, 1.
http://www.jstatsoft.org/v33/i01/
cv.gglasso
, and predict.cv.gglasso
methods.
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # 5-fold cross validation using group lasso # penalized logisitic regression cv <- cv.gglasso(x=colon$x, y=colon$y, group=group, loss="logit", pred.loss="misclass", lambda.factor=0.05, nfolds=5) # the coefficients at lambda = lambda.1se pre = coef(cv$gglasso.fit, s = cv$lambda.1se)
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # 5-fold cross validation using group lasso # penalized logisitic regression cv <- cv.gglasso(x=colon$x, y=colon$y, group=group, loss="logit", pred.loss="misclass", lambda.factor=0.05, nfolds=5) # the coefficients at lambda = lambda.1se pre = coef(cv$gglasso.fit, s = cv$lambda.1se)
Computes the coefficients at the requested values for lambda
from a
fitted gglasso
object.
## S3 method for class 'gglasso' coef(object, s = NULL, ...)
## S3 method for class 'gglasso' coef(object, s = NULL, ...)
object |
fitted |
s |
value(s) of the penalty parameter |
... |
not used. Other arguments to predict. |
s
is the new vector at which predictions are requested. If s
is not in the lambda sequence used for fitting the model, the coef
function will use linear interpolation to make predictions. The new values
are interpolated using a fraction of coefficients from both left and right
lambda
indices.
The coefficients at the requested values for lambda
.
Yi Yang and Hui Zou
Maintainer: Yi Yang <[email protected]>
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for
Computing Group-Lasso Penalized Learning Problems,” Statistics and
Computing. 25(6), 1129-1141.
BugReport:
https://github.com/emeryyi/gglasso
predict.gglasso
method
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # fit group lasso m1 <- gglasso(x=colon$x,y=colon$y,group=group,loss="logit") # the coefficients at lambda = 0.01 and 0.02 coef(m1,s=c(0.01,0.02))
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # fit group lasso m1 <- gglasso(x=colon$x,y=colon$y,group=group,loss="logit") # the coefficients at lambda = 0.01 and 0.02 coef(m1,s=c(0.01,0.02))
Gene expression data (20 genes for 62 samples) from the microarray experiments of colon tissue samples of Alon et al. (1999).
colon
colon
An object of class list
of length 2.
This data set contains 62 samples with 100 predictors (expanded from 20 genes using 5 basis B-splines, as described in Yang, Y. and Zou, H. (2015)): 40 tumor tissues, coded 1 and 22 normal tissues, coded -1.
A list with the following elements:
x |
a [62 x 100] matrix (expanded from a [62 x 20] matrix) giving the expression levels of 20 genes for the 62 colon tissue samples. Each row corresponds to a patient, each 5 consecutive columns to a grouped gene. |
y |
a numeric vector of length 62 giving the type of tissue sample (tumor or normal). |
The data are described in Alon et al. (1999) and can be freely downloaded from http://microarray.princeton.edu/oncology/affydata/index.html.
Alon, U. and Barkai, N. and Notterman, D.A. and Gish, K. and
Ybarra, S. and Mack, D. and Levine, A.J. (1999). “Broad patterns of gene
expression revealed by clustering analysis of tumor and normal colon tissues
probed by oligonucleotide arrays”, Proc. Natl. Acad. Sci. USA,
96(12), 6745–6750.
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for Computing
Group-Lasso Penalized Learning Problems,” Statistics and Computing.
25(6), 1129-1141.
BugReport: https://github.com/emeryyi/gglasso
# load gglasso library library(gglasso) # load data set data(colon) # how many samples and how many predictors ? dim(colon$x) # how many samples of class -1 and 1 respectively ? sum(colon$y==-1) sum(colon$y==1)
# load gglasso library library(gglasso) # load data set data(colon) # how many samples and how many predictors ? dim(colon$x) # how many samples of class -1 and 1 respectively ? sum(colon$y==-1) sum(colon$y==1)
Does k-fold cross-validation for gglasso, produces a plot, and returns a
value for lambda
. This function is modified based on the cv
function from the glmnet
package.
cv.gglasso( x, y, group, lambda = NULL, pred.loss = c("misclass", "loss", "L1", "L2"), nfolds = 5, foldid, delta, ... )
cv.gglasso( x, y, group, lambda = NULL, pred.loss = c("misclass", "loss", "L1", "L2"), nfolds = 5, foldid, delta, ... )
x |
matrix of predictors, of dimension |
y |
response variable. This argument should be quantitative for regression (least squares), and a two-level factor for classification (logistic model, huberized SVM, squared SVM). |
group |
a vector of consecutive integers describing the grouping of the coefficients (see example below). |
lambda |
optional user-supplied lambda sequence; default is
|
pred.loss |
loss to use for cross-validation error. Valid options are:
Default is |
nfolds |
number of folds - default is 5. Although |
foldid |
an optional vector of values between 1 and |
delta |
parameter |
... |
other arguments that can be passed to gglasso. |
The function runs gglasso
nfolds
+1 times; the first to
get the lambda
sequence, and then the remainder to compute the fit
with each of the folds omitted. The average error and standard deviation
over the folds are computed.
an object of class cv.gglasso
is returned, which is a
list with the ingredients of the cross-validation fit.
lambda |
the
values of |
cvm |
the mean
cross-validated error - a vector of length |
cvsd |
estimate of standard error of |
cvupper |
upper
curve = |
cvlower |
lower curve = |
name |
a text string indicating type of measure (for plotting purposes). |
gglasso.fit |
a fitted |
lambda.min |
The optimal value of |
lambda.1se |
The largest
value of |
Yi Yang and Hui Zou
Maintainer: Yi Yang <[email protected]>
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for
Computing Group-Lasso Penalized Learning Problems,” Statistics and
Computing. 25(6), 1129-1141.
BugReport:
https://github.com/emeryyi/gglasso
gglasso
, plot.cv.gglasso
,
predict.cv.gglasso
, and coef.cv.gglasso
methods.
# load gglasso library library(gglasso) # load data set data(bardet) # define group index group <- rep(1:20,each=5) # 5-fold cross validation using group lasso # penalized logisitic regression cv <- cv.gglasso(x=bardet$x, y=bardet$y, group=group, loss="ls", pred.loss="L2", lambda.factor=0.05, nfolds=5)
# load gglasso library library(gglasso) # load data set data(bardet) # define group index group <- rep(1:20,each=5) # 5-fold cross validation using group lasso # penalized logisitic regression cv <- cv.gglasso(x=bardet$x, y=bardet$y, group=group, loss="ls", pred.loss="L2", lambda.factor=0.05, nfolds=5)
Fits regularization paths for group-lasso penalized learning problems at a sequence of regularization parameters lambda.
gglasso( x, y, group = NULL, loss = c("ls", "logit", "sqsvm", "hsvm", "wls"), nlambda = 100, lambda.factor = ifelse(nobs < nvars, 0.05, 0.001), lambda = NULL, pf = sqrt(bs), weight = NULL, dfmax = as.integer(max(group)) + 1, pmax = min(dfmax * 1.2, as.integer(max(group))), eps = 1e-08, maxit = 3e+08, delta, intercept = TRUE )
gglasso( x, y, group = NULL, loss = c("ls", "logit", "sqsvm", "hsvm", "wls"), nlambda = 100, lambda.factor = ifelse(nobs < nvars, 0.05, 0.001), lambda = NULL, pf = sqrt(bs), weight = NULL, dfmax = as.integer(max(group)) + 1, pmax = min(dfmax * 1.2, as.integer(max(group))), eps = 1e-08, maxit = 3e+08, delta, intercept = TRUE )
x |
matrix of predictors, of dimension |
y |
response variable. This argument should be quantitative for regression (least squares), and a two-level factor for classification (logistic model, huberized SVM, squared SVM). |
group |
a vector of consecutive integers describing the grouping of the coefficients (see example below). |
loss |
a character string specifying the loss function to use, valid options are:
Default is |
nlambda |
the number of |
lambda.factor |
the factor for getting the minimal lambda in
|
lambda |
a user supplied |
pf |
penalty factor, a vector in length of bn (bn is the total number of
groups). Separate penalty weights can be applied to each group of
|
weight |
a |
dfmax |
limit the maximum number of groups in the model. Useful for very
large |
pmax |
limit the maximum number of groups ever to be nonzero. For
example once a group enters the model, no matter how many times it exits or
re-enters model through the path, it will be counted only once. Default is
|
eps |
convergence termination tolerance. Defaults value is |
maxit |
maximum number of outer-loop iterations allowed at fixed lambda
value. Default is 3e8. If models do not converge, consider increasing
|
delta |
the parameter |
intercept |
Whether to include intercept in the model. Default is TRUE. |
Note that the objective function for "ls"
least squares is
for "hsvm"
Huberized squared
hinge loss, "sqsvm"
Squared hinge loss and "logit"
logistic
regression, the objective function is
Users can also tweak the penalty by choosing different penalty factor.
For computing speed reason, if models are not converging or running slow,
consider increasing eps
, decreasing nlambda
, or increasing
lambda.factor
before increasing maxit
.
An object with S3 class gglasso
.
call |
the call that produced this object |
b0 |
intercept sequence of length
|
beta |
a |
df |
the number of nonzero groups for each value of
|
dim |
dimension of coefficient matrix (ices) |
lambda |
the actual sequence of |
npasses |
total number of iterations (the most inner loop) summed over all lambda values |
jerr |
error flag, for warnings and errors, 0 if no error. |
group |
a vector of consecutive integers describing the grouping of the coefficients. |
Yi Yang and Hui Zou
Maintainer: Yi Yang <[email protected]>
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for
Computing Group-Lasso Penalized Learning Problems,” Statistics and
Computing. 25(6), 1129-1141.
BugReport:
https://github.com/emeryyi/gglasso
plot.gglasso
# load gglasso library library(gglasso) # load bardet data set data(bardet) # define group index group1 <- rep(1:20,each=5) # fit group lasso penalized least squares m1 <- gglasso(x=bardet$x,y=bardet$y,group=group1,loss="ls") # load colon data set data(colon) # define group index group2 <- rep(1:20,each=5) # fit group lasso penalized logistic regression m2 <- gglasso(x=colon$x,y=colon$y,group=group2,loss="logit")
# load gglasso library library(gglasso) # load bardet data set data(bardet) # define group index group1 <- rep(1:20,each=5) # fit group lasso penalized least squares m1 <- gglasso(x=bardet$x,y=bardet$y,group=group1,loss="ls") # load colon data set data(colon) # define group index group2 <- rep(1:20,each=5) # fit group lasso penalized logistic regression m2 <- gglasso(x=colon$x,y=colon$y,group=group2,loss="logit")
Plots the cross-validation curve, and upper and lower standard deviation
curves, as a function of the lambda
values used. This function is
modified based on the plot.cv
function from the glmnet
package.
## S3 method for class 'cv.gglasso' plot(x, sign.lambda = 1, ...)
## S3 method for class 'cv.gglasso' plot(x, sign.lambda = 1, ...)
x |
fitted |
sign.lambda |
either plot against |
... |
other graphical parameters to plot |
A plot is produced.
Yi Yang and Hui Zou
Maintainer: Yi Yang <[email protected]>
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for
Computing Group-Lasso Penalized Learning Problems,” Statistics and
Computing. 25(6), 1129-1141.
BugReport:
https://github.com/emeryyi/gglasso
Friedman, J., Hastie, T., and Tibshirani, R. (2010), “Regularization paths
for generalized linear models via coordinate descent,” Journal of
Statistical Software, 33, 1.
http://www.jstatsoft.org/v33/i01/
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # 5-fold cross validation using group lasso # penalized logisitic regression cv <- cv.gglasso(x=colon$x, y=colon$y, group=group, loss="logit", pred.loss="misclass", lambda.factor=0.05, nfolds=5) # make a CV plot plot(cv)
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # 5-fold cross validation using group lasso # penalized logisitic regression cv <- cv.gglasso(x=colon$x, y=colon$y, group=group, loss="logit", pred.loss="misclass", lambda.factor=0.05, nfolds=5) # make a CV plot plot(cv)
Produces a coefficient profile plot of the coefficient paths for a fitted
gglasso
object.
## S3 method for class 'gglasso' plot(x, group = FALSE, log.l = TRUE, ...)
## S3 method for class 'gglasso' plot(x, group = FALSE, log.l = TRUE, ...)
x |
fitted |
group |
what is on the Y-axis. Plot the norm of each group if
|
log.l |
what is on the X-axis. Plot against the log-lambda sequence if
|
... |
other graphical parameters to plot |
A coefficient profile plot is produced.
Yi Yang and Hui Zou
Maintainer: Yi Yang <[email protected]>
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for
Computing Group-Lasso Penalized Learning Problems,” Statistics and
Computing. 25(6), 1129-1141.
BugReport:
https://github.com/emeryyi/gglasso
# load gglasso library library(gglasso) # load data set data(bardet) # define group index group <- rep(1:20,each=5) # fit group lasso m1 <- gglasso(x=bardet$x,y=bardet$y,group=group,loss="ls") # make plots par(mfrow=c(1,3)) plot(m1) # plots the coefficients against the log-lambda sequence plot(m1,group=TRUE) # plots group norm against the log-lambda sequence plot(m1,log.l=FALSE) # plots against the lambda sequence
# load gglasso library library(gglasso) # load data set data(bardet) # define group index group <- rep(1:20,each=5) # fit group lasso m1 <- gglasso(x=bardet$x,y=bardet$y,group=group,loss="ls") # make plots par(mfrow=c(1,3)) plot(m1) # plots the coefficients against the log-lambda sequence plot(m1,group=TRUE) # plots group norm against the log-lambda sequence plot(m1,log.l=FALSE) # plots against the lambda sequence
This function makes predictions from a cross-validated gglasso
model,
using the stored "gglasso.fit"
object, and the optimal value chosen
for lambda
.
## S3 method for class 'cv.gglasso' predict(object, newx, s = c("lambda.1se", "lambda.min"), ...)
## S3 method for class 'cv.gglasso' predict(object, newx, s = c("lambda.1se", "lambda.min"), ...)
object |
fitted |
newx |
matrix of new values for |
s |
value(s) of the penalty parameter |
... |
not used. Other arguments to predict. |
This function makes it easier to use the results of cross-validation to make a prediction.
The returned object depends on the ... argument which is passed
on to the predict
method for gglasso
objects.
Yi Yang and Hui Zou
Maintainer: Yi Yang <[email protected]>
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for
Computing Group-Lasso Penalized Learning Problems,” Statistics and
Computing. 25(6), 1129-1141.
BugReport:
https://github.com/emeryyi/gglasso
cv.gglasso
, and coef.cv.gglasso
methods.
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # 5-fold cross validation using group lasso # penalized logisitic regression cv <- cv.gglasso(x=colon$x, y=colon$y, group=group, loss="logit", pred.loss="misclass", lambda.factor=0.05, nfolds=5) # the coefficients at lambda = lambda.min, newx = x[1,] pre = predict(cv$gglasso.fit, newx = colon$x[1:10,], s = cv$lambda.min, type = "class")
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # 5-fold cross validation using group lasso # penalized logisitic regression cv <- cv.gglasso(x=colon$x, y=colon$y, group=group, loss="logit", pred.loss="misclass", lambda.factor=0.05, nfolds=5) # the coefficients at lambda = lambda.min, newx = x[1,] pre = predict(cv$gglasso.fit, newx = colon$x[1:10,], s = cv$lambda.min, type = "class")
Similar to other predict methods, this functions predicts fitted values and
class labels from a fitted gglasso
object.
## S3 method for class 'gglasso' predict(object, newx, s = NULL, type = c("class", "link"), ...)
## S3 method for class 'gglasso' predict(object, newx, s = NULL, type = c("class", "link"), ...)
object |
fitted |
newx |
matrix of new values for |
s |
value(s) of the penalty parameter |
type |
type of prediction required:
|
... |
Not used. Other arguments to predict. |
s
is the new vector at which predictions are requested. If s
is not in the lambda sequence used for fitting the model, the predict
function will use linear interpolation to make predictions. The new values
are interpolated using a fraction of predicted values from both left and
right lambda
indices.
The object returned depends on type.
Yi Yang and Hui Zou
Maintainer: Yi Yang <[email protected]>
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for
Computing Group-Lasso Penalized Learning Problems,” Statistics and
Computing. 25(6), 1129-1141.
BugReport:
https://github.com/emeryyi/gglasso
coef
method
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # fit group lasso m1 <- gglasso(x=colon$x,y=colon$y,group=group,loss="logit") # predicted class label at x[10,] print(predict(m1,type="class",newx=colon$x[10,])) # predicted linear predictors at x[1:5,] print(predict(m1,type="link",newx=colon$x[1:5,]))
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # fit group lasso m1 <- gglasso(x=colon$x,y=colon$y,group=group,loss="logit") # predicted class label at x[10,] print(predict(m1,type="class",newx=colon$x[10,])) # predicted linear predictors at x[1:5,] print(predict(m1,type="link",newx=colon$x[1:5,]))
Print the nonzero group counts at each lambda along the gglasso path.
## S3 method for class 'gglasso' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'gglasso' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
fitted |
digits |
significant digits in printout |
... |
additional print arguments |
Print the information about the nonzero group counts at each lambda step in
the gglasso
object. The result is a two-column matrix with
columns Df
and Lambda
. The Df
column is the number of
the groups that have nonzero within-group coefficients, the Lambda
column is the the corresponding lambda.
a two-column matrix, the first columns is the number of nonzero
group counts and the second column is Lambda
.
Yi Yang and Hui Zou
Maintainer: Yi Yang <[email protected]>
Yang, Y. and Zou, H. (2015), “A Fast Unified Algorithm for
Computing Group-Lasso Penalized Learning Problems,” Statistics and
Computing. 25(6), 1129-1141.
BugReport:
https://github.com/emeryyi/gglasso
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # fit group lasso m1 <- gglasso(x=colon$x,y=colon$y,group=group,loss="logit") # print out results print(m1)
# load gglasso library library(gglasso) # load data set data(colon) # define group index group <- rep(1:20,each=5) # fit group lasso m1 <- gglasso(x=colon$x,y=colon$y,group=group,loss="logit") # print out results print(m1)