Title: | Integrative Lasso with Penalty Factors |
---|---|
Description: | The core of the package is cvr2.ipflasso(), an extension of glmnet to be used when the (large) set of available predictors is partitioned into several modalities which potentially differ with respect to their information content in terms of prediction. For example, in biomedical applications patient outcome such as survival time or response to therapy may have to be predicted based on, say, mRNA data, miRNA data, methylation data, CNV data, clinical data, etc. The clinical predictors are on average often much more important for outcome prediction than the mRNA data. The ipflasso method takes this problem into account by using different penalty parameters for predictors from different modalities. The ratio between the different penalty parameters can be chosen from a set of optional candidates by cross-validation or alternatively generated from the input data. |
Authors: | Anne-Laure Boulesteix, Mathias Fuchs, Gerhard Schulze |
Maintainer: | Anne-Laure Boulesteix <[email protected]> |
License: | GPL |
Version: | 1.1 |
Built: | 2024-11-08 06:19:47 UTC |
Source: | CRAN |
Runs cvr.ipflasso applying different data based penalty factors to predictors from different blocks.
cvr.adaptive.ipflasso(X, Y, family, type.measure, standardize = TRUE, alpha, type.step1, blocks, nfolds, ncv)
cvr.adaptive.ipflasso(X, Y, family, type.measure, standardize = TRUE, alpha, type.step1, blocks, nfolds, ncv)
X |
a (nxp) matrix of predictors with observations in rows and predictors in columns. |
Y |
n-vector giving the value of the response (either continuous, numeric-binary 0/1, or |
family |
should be "gaussian" for continuous |
type.measure |
the accuracy/error measure computed in cross-validation. If not specified, type.measure is "class" (classification error) if |
standardize |
whether the predictors should be standardized or not. Default is TRUE. |
alpha |
the elastic net mixing parameter for step 1: |
type.step1 |
whether the models of step 1 should be run on the whole data set |
blocks |
a list of length M of the format |
nfolds |
the number of folds of the CV procedure. |
ncv |
the number of repetitions of the CV. Not to be confused with |
The penalty factors are the inverse arithmetic means of the absolute model coefficients per block, generated in a first step of the function. The user can choose to determine these coefficients by running a Lasso model (alpha=1
) or a Ridge model (alpha=0
) either on the whole data set (type.step1="comb"
) or seperately for each block (type.step1="sep"
). If type.step1
is ommited, it will be set to "sep"
for Lasso and to "comb"
for Ridge.
If a Lasso model in step 1 returns any zero coefficient mean, the corresponding block will be excluded from the input date set X
and step 2 will be run with the remaining blocks. If all model coefficient means are zero, step 2 will not be performed.
A list with the following arguments:
coeff |
the matrix of coefficients with predictors corresponding to rows and lambda values corresponding to columns. The first row contains the intercept of the models (for all families other than In the special case of separate step 1 Lasso models and all coefficient means equal to zero, the intercept is the average of the separate model intercepts per block. |
ind.bestlambda |
the index of the best lambda according to CV. |
lambda |
the lambda sequence. In the special case of separate step 1 Lasso models and all coefficient means equal to zero, it is the lambda sequence with the highest lambda value among the lambda sequences of all blocks. |
cvm |
the CV estimate of the measure specified by In the special case of separate step 1 Lasso models and all coefficient means equal to zero, cmv is the average of the separate model cvms per block. |
nzero |
the number of non-zero coefficients in the selected model. In the special case of separate step 1 Lasso models and all coefficient means equal to zero, nzero is the sum of the non-zero coefficients of the separate models per block. |
family |
see arguments. |
means.step1 |
the arithmetic means of the absolute model coefficients per block, returned by the first step of the function. |
exc |
the exclusion vector containing the indices of the block(s) to be excluded from |
Gerhard Schulze ([email protected])
Schulze, Gerhard (2017): Clinical Outcome Prediction Based on Multi-Omics Data: Extension of IPF-LASSO. Masterarbeit, Ludwig-Maximilians-Universitaet Muenchen (Department of Statistics: Technical Reports) https://doi.org/10.5282/ubm/epub.59092
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Y<-rbinom(50,1,0.5) cvr.adaptive.ipflasso(X=X,Y=Y,family="binomial",type.measure="class",standardize=FALSE, alpha = 1,blocks=list(block1=1:50,block2=51:200),nfolds=5,ncv=10)
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Y<-rbinom(50,1,0.5) cvr.adaptive.ipflasso(X=X,Y=Y,family="binomial",type.measure="class",standardize=FALSE, alpha = 1,blocks=list(block1=1:50,block2=51:200),nfolds=5,ncv=10)
the same as cv.glmnet but with several ncv repetitions of CV: cross-validation repeated ncv times (i.e. for ncv different random partitions)
cvr.glmnet(X, Y, family, standardize=TRUE,alpha=1, nfolds, ncv, type.measure,...)
cvr.glmnet(X, Y, family, standardize=TRUE,alpha=1, nfolds, ncv, type.measure,...)
X |
a (nxp) matrix of predictors with observations in rows and predictors in columns |
Y |
n-vector giving the value of the response (either continuous, numeric-binary 0/1, or |
family |
should be "gaussian" for continuous |
standardize |
whether the predictors should be standardized or not. Default is TRUE. |
alpha |
the elastic net mixing parameter: |
nfolds |
the number of folds of CV procedure. |
ncv |
the number of repetitions of CV. Not to be confused with |
type.measure |
The accuracy/error measure computed in cross-validation. If not specified, type.measure is "class" (classification error) if |
... |
Other arguments to be passed to the function |
A list with the following arguments:
coeff |
the matrix of coefficients with predictors corresponding to rows and lambda values corresponding to columns. The first rows contains the intercept of the model (for all families other than |
lambda |
the lambda sequence |
cvm |
the CV estimate of the measure specified by |
Anne-Laure Boulesteix (https://www.en.ibe.med.uni-muenchen.de/mitarbeiter/professoren/boulesteix/index.html)
Boulesteix AL, De Bin R, Jiang X, Fuchs M, 2017. IPF-lasso: integrative L1-penalized regression with penalty factors for prediction based on multi-omics data. Comput Math Methods Med 2017:7691937.
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Y<-rbinom(50,1,0.5) cvr.glmnet(X=X,Y=Y,family="binomial",standardize=FALSE,nfolds=5,ncv=10,type.measure="class")
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Y<-rbinom(50,1,0.5) cvr.glmnet(X=X,Y=Y,family="binomial",standardize=FALSE,nfolds=5,ncv=10,type.measure="class")
Runs cvr.glmnet giving different penalty factors to predictors from different blocks.
cvr.ipflasso(X, Y, family, type.measure, standardize=TRUE, alpha=1, blocks, pf, nfolds, ncv)
cvr.ipflasso(X, Y, family, type.measure, standardize=TRUE, alpha=1, blocks, pf, nfolds, ncv)
X |
a (nxp) matrix of predictors with observations in rows and predictors in columns |
Y |
n-vector giving the value of the response (either continuous, numeric-binary 0/1, or |
family |
should be "gaussian" for continuous |
type.measure |
The accuracy/error measure computed in cross-validation. If not specified, type.measure is "class" (classification error) if |
standardize |
whether the predictors should be standardized or not. Default is TRUE. |
alpha |
the elastic net mixing parameter: |
blocks |
a list of length M the format |
pf |
a vector of length equal to the number of blocks M. Each entry contains the penalty factor to be applied to the predictors of the corresponding block. Example: if |
nfolds |
the number of folds of CV procedure. |
ncv |
the number of repetitions of CV. Not to be confused with |
A list with the following arguments:
coeff |
the matrix of coefficients with predictors corresponding to rows and lambda values corresponding to columns. The first rows contains the intercept of the model (for all families other than |
ind.bestlambda |
the index of the best lambda according to CV. |
lambda |
the lambda sequence. |
cvm |
the CV estimate of the measure specified by |
nzero |
the number of non-zero coefficients in the selected model. |
family |
See arguments. |
Anne-Laure Boulesteix (https://www.en.ibe.med.uni-muenchen.de/mitarbeiter/professoren/boulesteix/index.html)
Boulesteix AL, De Bin R, Jiang X, Fuchs M, 2017. IPF-lasso: integrative L1-penalized regression with penalty factors for prediction based on multi-omics data. Comput Math Methods Med 2017:7691937.
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Y<-rbinom(50,1,0.5) cvr.ipflasso(X=X,Y=Y,family="binomial",standardize=FALSE, blocks=list(block1=1:50,block2=51:200), pf=c(1,2),nfolds=5,ncv=10,type.measure="class")
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Y<-rbinom(50,1,0.5) cvr.ipflasso(X=X,Y=Y,family="binomial",standardize=FALSE, blocks=list(block1=1:50,block2=51:200), pf=c(1,2),nfolds=5,ncv=10,type.measure="class")
Runs cvr.glmnet giving different penalty factors to predictors from different blocks and chooses the penalty factors by cross-validation from the list pflist
of candidates.
cvr2.ipflasso(X, Y, family, type.measure, standardize=TRUE, alpha=1, blocks, pflist, nfolds, ncv, nzeromax = +Inf, plot=FALSE)
cvr2.ipflasso(X, Y, family, type.measure, standardize=TRUE, alpha=1, blocks, pflist, nfolds, ncv, nzeromax = +Inf, plot=FALSE)
X |
a (nxp) matrix of predictors with observations in rows and predictors in columns |
Y |
n-vector giving the value of the response (either continuous, numeric-binary 0/1, or |
family |
should be "gaussian" for continuous |
type.measure |
The accuracy/error measure computed in cross-validation. If not specified, type.measure is "class" (classification error) if |
standardize |
whether the predictors should be standardized or not. Default is TRUE. |
alpha |
the elastic net mixing parameter: |
blocks |
a list of length M the format |
pflist |
a list of candidate penalty factors (see the argument |
nfolds |
the number of folds of CV procedure. |
ncv |
the number of repetitions of CV. Not to be confused with |
nzeromax |
the maximal number of predictors allowed in the final model. Default is +Inf, i.e. the best model is selected based on CV without restriction. |
plot |
If |
A list with the following arguments:
coeff |
the matrix of coefficients obtained with the best combination of penalty factors, with covariates corresponding to rows and lambda values corresponding to columns. The first row contains the intercept of the model. |
ind.bestlambda |
the index of the best lambda as selected by CV for the best combination of penalty factors. |
bestlambda |
the best lambda as selected by CV for the best combination of penalty factors. |
ind.bestpf |
the index of the best penalty factor selected by CV from the list of candidates |
cvm |
the CV error for each candidate lambda value, averaged over the ncv runs of |
a |
a list of length |
family |
See arguments. |
Anne-Laure Boulesteix (https://www.en.ibe.med.uni-muenchen.de/mitarbeiter/professoren/boulesteix/index.html)
Boulesteix AL, De Bin R, Jiang X, Fuchs M, 2017. IPF-lasso: integrative L1-penalized regression with penalty factors for prediction based on multi-omics data. Comput Math Methods Med 2017:7691937.
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Y<-rbinom(50,1,0.5) cvr2.ipflasso(X=X,Y=Y,family="binomial",type.measure="class",standardize=FALSE, blocks=list(block1=1:50,block2=51:200), pflist=list(c(1,1),c(1,2),c(2,1)),nfolds=5,ncv=10)
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Y<-rbinom(50,1,0.5) cvr2.ipflasso(X=X,Y=Y,family="binomial",type.measure="class",standardize=FALSE, blocks=list(block1=1:50,block2=51:200), pflist=list(c(1,1),c(1,2),c(2,1)),nfolds=5,ncv=10)
Derives predictions for new observations from a model fitted by the functions cvr.ipflasso
or cvr2.ipflasso
.
ipflasso.predict(object, Xtest)
ipflasso.predict(object, Xtest)
object |
the output of either |
Xtest |
a ntest x p matrix containing the values of the predictors for the test data. It should have the same number of columns as the matrix |
A list with the following arguments:
linpredtest |
a ntest-vector giving the value of the linear predictor for the test observations |
classtest |
a ntest-vector with values 0 or 1 giving the predicted class for the test observations (for binary Y). |
probabilitiestest |
a ntest-vector giving the predicted probability of Y=1 for the test observations (for binary Y). |
Anne-Laure Boulesteix (https://www.en.ibe.med.uni-muenchen.de/mitarbeiter/professoren/boulesteix/index.html)
Boulesteix AL, De Bin R, Jiang X, Fuchs M, 2017. IPF-lasso: integrative L1-penalized regression with penalty factors for prediction based on multi-omics data. Comput Math Methods Med 2017:7691937.
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Xtest<-matrix(rnorm(20*200),20,200) Y<-rbinom(50,1,0.5) # fitting the IPF-lasso model model1<-cvr.ipflasso(X=X,Y=Y,family="binomial",standardize=FALSE, blocks=list(block1=1:50,block2=51:200), pf=c(1,2),nfolds=5,ncv=10,type.measure="class") # making predictions from Xtest ipflasso.predict(object=model1,Xtest=Xtest)
# load ipflasso library library(ipflasso) # generate dummy data X<-matrix(rnorm(50*200),50,200) Xtest<-matrix(rnorm(20*200),20,200) Y<-rbinom(50,1,0.5) # fitting the IPF-lasso model model1<-cvr.ipflasso(X=X,Y=Y,family="binomial",standardize=FALSE, blocks=list(block1=1:50,block2=51:200), pf=c(1,2),nfolds=5,ncv=10,type.measure="class") # making predictions from Xtest ipflasso.predict(object=model1,Xtest=Xtest)
computes the area under the ROC curve (AUC) for the marker 'linpred' and the binary status 'Y'.
my.auc(linpred, Y)
my.auc(linpred, Y)
linpred |
n-vector giving the value of the marker. |
Y |
n-vector giving the binary status, coded as 0/1. |
the area under the curve