Title: | Fast Cross-Validation for Multi-Penalty Ridge Regression |
---|---|
Description: | Multi-penalty linear, logistic and cox ridge regression, including estimation of the penalty parameters by efficient (repeated) cross-validation and marginal likelihood maximization. Multiple high-dimensional data types that require penalization are allowed, as well as unpenalized variables. Paired and preferential data types can be specified. See Van de Wiel et al. (2021), <arXiv:2005.09301>. |
Authors: | Mark A. van de Wiel |
Maintainer: | Mark A. van de Wiel <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.11 |
Built: | 2024-11-18 06:35:16 UTC |
Source: | CRAN |
The package implements multi-penalty linear, logistic and cox ridge regression, including estimation of the penalty parameters by efficient (repeated) cross-validation or marginal likelihood maximization. It allows for multiple high-dimensional data types that require penalization, as well as unpenalized variables. Moreover, it allows a paired penalty for paired data types, and preferential data types can be specified.
The DESCRIPTION file:
Package: | multiridge |
Type: | Package |
Title: | Fast Cross-Validation for Multi-Penalty Ridge Regression |
Version: | 1.11 |
Date: | 2022-06-13 |
Author: | Mark A. van de Wiel |
Maintainer: | Mark A. van de Wiel <[email protected]> |
Depends: | R (>= 3.5.0), survival, pROC, methods, mgcv, snowfall |
Description: | Multi-penalty linear, logistic and cox ridge regression, including estimation of the penalty parameters by efficient (repeated) cross-validation and marginal likelihood maximization. Multiple high-dimensional data types that require penalization are allowed, as well as unpenalized variables. Paired and preferential data types can be specified. See Van de Wiel et al. (2021), <arXiv:2005.09301>. |
License: | GPL (>= 3) |
NeedsCompilation: | no |
Packaged: | 2022-06-13 13:11:49 UTC; VNOB-0735 |
Repository: | CRAN |
Date/Publication: | 2022-06-13 15:10:05 UTC |
Index of help topics:
CVfolds Creates (repeated) cross-validation folds CVscore Cross-validated score IWLSCoxridge Iterative weighted least squares algorithm for Cox ridge regression. IWLSridge Iterative weighted least squares algorithm for linear and logistic ridge regression. Scoring Evaluate predictions SigmaFromBlocks Create penalized sample cross-product matrix augment Augment data with zeros. betasout Coefficient estimates from (converged) IWLS fit createXXblocks Creates list of (unscaled) sample covariance matrices createXblocks Create list of paired data blocks dataXXmirmeth Contains R-object 'dataXXmirmeth' doubleCV Double cross-validation for estimating performance of 'multiridge' fastCV2 Fast cross-validation per data block mgcv_lambda Maximum marginal likelihood score mlikCV Outer-loop cross-validation for estimating performance of marginal likelihood based 'multiridge' multiridge-package Fast cross-validation for multi-penalty ridge regression optLambdas Find optimal ridge penalties. optLambdasWrap Find optimal ridge penalties with sequential optimization. optLambdas_mgcv Find optimal ridge penalties with maximimum marginal likelihood optLambdas_mgcvWrap Find optimal ridge penalties with sequential optimization. predictIWLS Predictions from ridge fits setupParallel Setting up parallel computing
betasout
: Coefficient estimates from (converged) IWLS fit createXXblocks
: Creates list of (unscaled) sample covariance matrices CVscore
: Cross-validated score for given penalty parameters dataXXmirmeth
: Example data doubleCV
: Double cross-validation for estimating performance fastCV2
: Fast cross-validation per data block; no dependencyIWLSCoxridge
: Iterative weighted least squares algorithm for Cox ridge regression IWLSridge
: Iterative weighted least squares algorithm for linear and logistic ridge regression mlikCV
: Cross-validation for estimating performance of marginal likelihood estimationoptLambdasWrap
: Find optimal ridge penalties by cross-validation optLambdas_mgcvWrap
: Find optimal ridge penalties in terms of marginal likelihood predictIWLS
: Predictions from ridge fits setupParallel
: Setting up parallel computingSigmaFromBlocks
: Create penalized sample cross-product matrix
Mark A. van de Wiel ([email protected])
Mark A. van de Wiel, Mirrelijn van Nee, Armin Rauschenberger (2021). Fast cross-validation for high-dimensional ridge regression. J Comp Graph Stat
A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create (repeated) CV-splits of the data. leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) # Compute cross-validated score for initial lambdas CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout, score="loglik") # Optimizes cross-validate criterion (default: log-lik) # Increase the number of iterations for optimal results jointlambdas <- optLambdasWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp, folds=leftout,score="loglik",save=T, maxItropt1=5, maxItropt2=5) # Alternatively: optimize by using marginal likelihood criterion ## Not run: jointlambdas2 <- optLambdas_mgcvWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth, Y=resp) ## End(Not run) # Optimal lambdas optlambdas <- jointlambdas$optpen # Prepare fitting for the optimal lambdas. XXT <- SigmaFromBlocks(XXmirmeth,penalties=optlambdas) # Fit. fit$etas contains the n linear predictors fit <- IWLSridge(XXT,Y=resp)
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create (repeated) CV-splits of the data. leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) # Compute cross-validated score for initial lambdas CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout, score="loglik") # Optimizes cross-validate criterion (default: log-lik) # Increase the number of iterations for optimal results jointlambdas <- optLambdasWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp, folds=leftout,score="loglik",save=T, maxItropt1=5, maxItropt2=5) # Alternatively: optimize by using marginal likelihood criterion ## Not run: jointlambdas2 <- optLambdas_mgcvWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth, Y=resp) ## End(Not run) # Optimal lambdas optlambdas <- jointlambdas$optpen # Prepare fitting for the optimal lambdas. XXT <- SigmaFromBlocks(XXmirmeth,penalties=optlambdas) # Fit. fit$etas contains the n linear predictors fit <- IWLSridge(XXT,Y=resp)
This function augments data with zeros to allow pairing of data on the same variables, but from DIFFERENT samples
augment(Xdata1, Xdata2)
augment(Xdata1, Xdata2)
Xdata1 |
Data frame or data matrix of dimension |
Xdata2 |
Data frame or data matrix of dimension |
Xdata1 and Xdata2 should have the same number of columns. These columns represent variables. Augments both data matrices with zeros,
such that the matrices can be paired using createXXblocks
on the output of this function.
List
Xaug1 |
Augmented data matrix 1 |
Xaug2 |
Augmented data matrix 2 |
#Example #Simulate n1 <- 10 n2 <- 20 p <- 100 X1 <- matrix(rnorm(p*n1),nrow=n1) X2 <- matrix(rnorm(p*n2),nrow=n2) #check whether column dimension is correct ncol(X1)==ncol(X2) #create cross-product Xaugm <- augment(X1,X2) #check dimensions (should be (n1+n2) x p) dim(Xaugm[[1]]) dim(Xaugm[[2]])
#Example #Simulate n1 <- 10 n2 <- 20 p <- 100 X1 <- matrix(rnorm(p*n1),nrow=n1) X2 <- matrix(rnorm(p*n2),nrow=n2) #check whether column dimension is correct ncol(X1)==ncol(X2) #create cross-product Xaugm <- augment(X1,X2) #check dimensions (should be (n1+n2) x p) dim(Xaugm[[1]]) dim(Xaugm[[2]])
Extracts estimated regression coefficients from the final Iterative Weighted Least Squares fit, as obtained from linear, logistic, or Cox ridge regression.
betasout(IWLSfit, Xblocks, X1=NULL, penalties, pairing = NULL)
betasout(IWLSfit, Xblocks, X1=NULL, penalties, pairing = NULL)
IWLSfit |
List object, see details |
Xblocks |
List of data frames or matrices, representing |
X1 |
Matrix. Dimension |
penalties |
Numerical vector. |
pairing |
Numerical vector of length 3 or |
IWLSfit
should be the output of either IWLSridge
or IWLSCoxridge
. Xblocks
may be created by createXblocks
.
List. Number of components equals number of components of Xblocks
plus one, as the output is augmented with an intercept estimate (first component, NULL
if absent).
Each component is a numerical vector representing regression parameter estimates. Lengths of vectors match column dimensions of Xblocks
(nr of variables for given data type)
createXblocks
. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] lambdas <- c(100,1000) # Prepare fitting for the specified penalties. XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas) # Fit. fit$etas contains the n linear predictors fit <- IWLSridge(XXT,Y=resp) # Computation of the regression coefficients requires the original # (large!) nxp data sets, available from link above ## Not run: Xbl <- createXblocks(list(datamir,datameth)) betas <- betasout(fit, Xblocks=Xbl, penalties=lambdas) ## End(Not run)
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] lambdas <- c(100,1000) # Prepare fitting for the specified penalties. XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas) # Fit. fit$etas contains the n linear predictors fit <- IWLSridge(XXT,Y=resp) # Computation of the regression coefficients requires the original # (large!) nxp data sets, available from link above ## Not run: Xbl <- createXblocks(list(datamir,datameth)) betas <- betasout(fit, Xblocks=Xbl, penalties=lambdas) ## End(Not run)
Create list of paired data blocks
createXblocks(datablocks, which2pair = NULL)
createXblocks(datablocks, which2pair = NULL)
datablocks |
List of data frames or matrices representing |
which2pair |
Integer vector of size 2 (or |
Only use this function when you wish to pair two data blocks. If which2pair = NULL
the output
matches the input. If not, the function adds a paired data block, pairing the two data blocks corresponding to the elements of
which2pair
.
List. Same length as datablocks
when which2pair = NULL
, or augmented with one paired data block.
createXXblocks
. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
n <- 43 p <- 100 fakeXbl <- createXblocks(list(X1 = matrix(rnorm(n*p),nrow=n),X2 = matrix(rnorm(n*p),nrow=n)))
n <- 43 p <- 100 fakeXbl <- createXblocks(list(X1 = matrix(rnorm(n*p),nrow=n),X2 = matrix(rnorm(n*p),nrow=n)))
Creates list of (unscaled) sample covariance matrices X_b %*% t(X_b)
for data blocks b = 1,..., B.
createXXblocks(datablocks, datablocksnew = NULL, which2pair = NULL)
createXXblocks(datablocks, datablocksnew = NULL, which2pair = NULL)
datablocks |
List of data frames or matrices |
datablocksnew |
List of data frames or matrices |
which2pair |
Integer vector of size 2 (or |
The efficiency of multiridge
for high-dimendional data relies largely on this function:
all iterative calculation are performed on the out put of this function, which contains B
blocks of
nxn
matrices. If which2pair != NULL
, the function adds a paired covariance block, pairing the two data blocks corresponding to the elements of which2pair
. If predictions for new samples are desired, one also needs to specify
datablocksnew
, which should have he exact same format as datablocks
with matching column dimension (number of variables).
List. Same number of component as datablocks
when which2pair = NULL
, or augmented with one paired data block.
Dimension is nxn
for all components.
createXblocks
, which is required when parameter estimates are desired (not needed for prediction). A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
#Example #Simulate Xbl1 <- matrix(rnorm(1000),nrow=10) Xbl2 <- matrix(rnorm(2000),nrow=10) #check whether dimensions are correct ncol(Xbl1)==nrow(Xbl2) #create cross-product XXbl <- createXXblocks(list(Xbl1,Xbl2)) #suppose penalties for two data types equal 5,10, respectively Sigma <- SigmaFromBlocks(XXbl,c(5,10)) #check dimensions (should be n x n) dim(Sigma)
#Example #Simulate Xbl1 <- matrix(rnorm(1000),nrow=10) Xbl2 <- matrix(rnorm(2000),nrow=10) #check whether dimensions are correct ncol(Xbl1)==nrow(Xbl2) #create cross-product XXbl <- createXXblocks(list(Xbl1,Xbl2)) #suppose penalties for two data types equal 5,10, respectively Sigma <- SigmaFromBlocks(XXbl,c(5,10)) #check dimensions (should be n x n) dim(Sigma)
Creates (repeated) cross-validation folds for samples
CVfolds(Y, model = NULL, balance = TRUE, kfold = 10, fixedfolds = TRUE, nrepeat = 1)
CVfolds(Y, model = NULL, balance = TRUE, kfold = 10, fixedfolds = TRUE, nrepeat = 1)
Y |
Response vector: numeric, binary, factor or |
model |
Character. Any of |
balance |
Boolean. Should the splits be balanced in terms of response labels? |
kfold |
Integer. Desired fold. |
fixedfolds |
Boolean. Should fixed splits be used for reproducibility? |
nrepeat |
Numeric. Number of repeats. |
Creates (repeated), possibly balanced, splits of the samples. Computing time will often largely depend on
on kfold*nrepeat
, the number of training-test splits evaluated.
List object with kfold*nrepeat
elements containing the sample indices of the left-out samples per split.
A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)
Cross-validated score for given penalty parameters.
CVscore(penalties, XXblocks, Y, X1 = NULL, pairing = NULL, folds, intercept = ifelse(is(Y, "Surv"),FALSE, TRUE), frac1 = NULL, score = "loglik", model = NULL, eps = 1e-07, maxItr = 100, trace = FALSE, printCV = TRUE, save = FALSE, parallel = FALSE)
CVscore(penalties, XXblocks, Y, X1 = NULL, pairing = NULL, folds, intercept = ifelse(is(Y, "Surv"),FALSE, TRUE), frac1 = NULL, score = "loglik", model = NULL, eps = 1e-07, maxItr = 100, trace = FALSE, printCV = TRUE, save = FALSE, parallel = FALSE)
penalties |
Numeric vector. |
XXblocks |
List of |
Y |
Response vector: numeric, binary, factor or |
X1 |
Matrix. Dimension |
pairing |
Numerical vector of length 3 or |
folds |
List of integer vector. Usually output of |
intercept |
Boolean. Should an intercept be included? |
frac1 |
Scalar. Prior fraction of cases. Only relevant for |
score |
Character. See Details. |
model |
Character. Any of |
eps |
Scalar. Numerical bound for IWLS convergence. |
maxItr |
Integer. Maximum number of iterations used in IWLS. |
trace |
Boolean. Should the output of the IWLS algorithm be traced? |
printCV |
Boolean. Should the CV-score be printed on screen? |
save |
Boolean. If TRUE appends the penalties and resulting CVscore to global variable |
parallel |
Boolean. Should computation be done in parallel? If |
See Scoring
for details on score
.
Numeric, cross-validated prediction score for given penalties
doubleCV
for double cross-validation, used for performance evaluation
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create training-test splits leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout,score="loglik")
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create training-test splits leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout,score="loglik")
dataXXmirmeth
This list object contains the binary response (control/case) and two data blocks corresponding to miRNA and methylation data
data(dataXXmirmeth)
data(dataXXmirmeth)
The format is a list with two components: resp: numeric (0/1) [1:43]\ XXmirmeth: list with 2 components, each a matrix [1:43,1:43]\
The object XXmirmeth
is created by applying createXXblocks(list(datamir,datameth))
, where
objects datamir
and datameth
are large data matrices stored in the
mirmethdata.Rdata
file, which is available from the link below.
Snoek, B. C. et al. (2019), Genome-wide microRNA analysis of HPV-positive self-samples yields novel triage markers for early detection of cervical cancer, International Journal of Cancer 144(2), 372-379.
Verlaat, W. et al. (2018), Identification and validation of a 3-gene methylation classifier for hpv-based cervical screening on self-samples, Clinical Cancer Research 24(14), 3456-3464.
Mark A. van de Wiel, Mirrelijn van Nee, Armin Rauschenberger (2021). Fast cross-validation for multi-penalty high-dimensional ridge regression. J Comp Graph Stat
createXXblocks
. Source data file is available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]]
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]]
multiridge
Double cross-validation for estimating performance of multiridge
. Outer fold is for testing, inner fold for penalty parameter tuning
doubleCV(penaltiesinit, XXblocks, Y, X1 = NULL, pairing = NULL, outfold = 5, infold = 10, nrepeatout = 1, nrepeatin = 1, balance = TRUE, fixedfolds = TRUE, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL, score = "loglik",model = NULL, eps = 1e-07, maxItr = 10, trace = FALSE, printCV = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10, maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)
doubleCV(penaltiesinit, XXblocks, Y, X1 = NULL, pairing = NULL, outfold = 5, infold = 10, nrepeatout = 1, nrepeatin = 1, balance = TRUE, fixedfolds = TRUE, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL, score = "loglik",model = NULL, eps = 1e-07, maxItr = 10, trace = FALSE, printCV = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10, maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)
penaltiesinit |
Numeric vector. Initial values for penaltyparameters. May be obtained from |
XXblocks |
List of |
Y |
Response vector: numeric, binary, factor or |
X1 |
Matrix. Dimension |
pairing |
Numerical vector of length 3 or |
outfold |
Integer. Outer fold for test samples. |
infold |
Integer. Inner fold for tuning penalty parameters. |
nrepeatout |
Integer. Number of repeated splits for outer fold. |
nrepeatin |
Integer. Number of repeated splits for inner fold. |
balance |
Boolean. Should the splits be balanced in terms of response labels? |
fixedfolds |
Boolean. Should fixed splits be used for reproducibility? |
intercept |
Boolean. Should an intercept be included? |
frac1 |
Scalar. Prior fraction of cases. Only relevant for |
score |
Character. See Details. |
model |
Character. Any of |
eps |
Scalar. Numerical bound for IWLS convergence. |
maxItr |
Integer. Maximum number of iterations used in IWLS. |
trace |
Boolean. Should the output of the IWLS algorithm be traced? |
printCV |
Boolean. Should the CV-score be printed on screen? |
reltol |
Scalar. Relative tolerance for optimization methods. |
optmethod1 |
Character. First, global search method. Any of the methods |
optmethod2 |
Character. Second, local search method. Any of the methods |
maxItropt1 |
Integer. Maximum number of iterations for |
maxItropt2 |
Integer. Maximum number of iterations for |
save |
Boolean. If TRUE appends the penalties and resulting CVscore to global variable |
parallel |
Boolean. Should computation be done in parallel? If |
pref |
Integer vector or |
fixedpen |
Integer vector or |
WARNING: this function may be very time-consuming. The number of evaluations may equal nrepeatout*outerfold*nrepeatin*innerfold*maxItr*(maxItropt1+maxItropt2)
. Computing time may be estimated by multiplying computing time of optLambdasWrap
by
nrepeatout*outerfold
. See Scoring
for details on score
.
List with the following components:
sampleindex |
Numerical vector: sample indices |
true |
True responses |
linpred |
Cross-validated linear predictors |
optLambdas
, optLambdasWrap
which optimize the penalties.
Scoring
which may applied to output of this function to obtain overall cross-validated performance score. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Double cross-validation ## Not run: perf <- doubleCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp, score="loglik",outfold=10, infold=10, nrepeatout=1, nrepeatin=3, parallel=TRUE) # Performance metrics Scoring(perf$linpred,perf$true,score="auc",print=TRUE) Scoring(perf$linpred,perf$true,score="brier",print=TRUE) Scoring(perf$linpred,perf$true,score="loglik",print=TRUE) ## End(Not run)
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Double cross-validation ## Not run: perf <- doubleCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp, score="loglik",outfold=10, infold=10, nrepeatout=1, nrepeatin=3, parallel=TRUE) # Performance metrics Scoring(perf$linpred,perf$true,score="auc",print=TRUE) Scoring(perf$linpred,perf$true,score="brier",print=TRUE) Scoring(perf$linpred,perf$true,score="loglik",print=TRUE) ## End(Not run)
Fast cross-validation for high-dimensional data. Finds optimal penalties separately per data block. Useful for initialization.
fastCV2(XXblocks, Y, X1 = NULL, kfold = 10, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), parallel = FALSE, fixedfolds = TRUE, model = NULL, eps = 1e-10, reltol = 0.5, lambdamax= 10^6, traceCV=TRUE)
fastCV2(XXblocks, Y, X1 = NULL, kfold = 10, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), parallel = FALSE, fixedfolds = TRUE, model = NULL, eps = 1e-10, reltol = 0.5, lambdamax= 10^6, traceCV=TRUE)
XXblocks |
List of data frames or matrices, representing |
Y |
Response vector: numeric, binary, factor or |
X1 |
Matrix. Dimension |
kfold |
Integer. Desired fold. |
intercept |
Boolean. Should an intercept be included? |
parallel |
Boolean. Should computation be done in parallel? If |
fixedfolds |
Boolean. Should fixed splits be used for reproducibility? |
model |
Character. Any of |
eps |
Scalar. Numerical bound for IWLS convergence. |
reltol |
Scalar. Relative tolerance for optimization method. |
lambdamax |
Numeric. Upperbound for lambda. |
traceCV |
Boolean. Should the CV results be traced and printed? |
This function is basically a wrapper for applying optLambdas
per data block separately using Brent optimization.
Numerical vector containing penalties optimized separately per data block. Useful for initialization.
optLambdas
, optLambdasWrap
which optimize the penalties jointly.
A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas
Iterative weighted least squares algorithm for Cox ridge regression. Updates the weights and linear predictors until convergence.
IWLSCoxridge(XXT, Y, X1 = NULL, intercept = FALSE, eps = 1e-07, maxItr = 25, trace = FALSE, E0 = NULL)
IWLSCoxridge(XXT, Y, X1 = NULL, intercept = FALSE, eps = 1e-07, maxItr = 25, trace = FALSE, E0 = NULL)
XXT |
Matrix. Dimensions |
Y |
Response vector: class |
X1 |
Matrix. Dimension |
intercept |
Boolean. Should an intercept be included? |
eps |
Scalar. Numerical bound for IWLS convergence. |
maxItr |
Integer. Maximum number of iterations used in IWLS. |
trace |
Boolean. Should the output of the IWLS algorithm be traced? |
E0 |
Numerical vector or |
Usually, Cox ridge regression does not use an intercept, as this is part of the baseline hazard. The latter is estimated using the Breslow estimator. To keep the function computationally efficient it returns the linear predictors (which suffice for predictions), instead of parameter estimates. These may be obtained by applying the betasout
function to the output of this function.
List, containing:
etas |
Numerical vector: Final linear predictors |
Ypred |
Predicted survival |
convergence |
Boolean: has IWLS converged? |
nIt |
Number of iterations |
Hres |
Auxiliary list object. Passed on to other functions |
linearized |
Linearized predictions |
unpen |
Boolean: are there any unpenalized covariates involved? Passed on to other functions |
intercept |
Boolean: Is an intercept included? |
eta0 |
Numerical vector: Initial linear predictors |
X1 |
Matrix: design matrix unpenalized variables |
Mark A. van de Wiel, Mirrelijn van Nee, Armin Rauschenberger (2021). Fast cross-validation for high-dimensional ridge regression. J Comp Graph Stat
IWLSridge
for linear and logistic ridge. betasout
for obtaining parameter estimates.
predictIWLS
for predictions on new samples. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] lambdas <- c(100,1000) # Create fake survival data respsurv <- Surv(rexp(length(resp)),resp) # Prepare fitting for the specified penalties. XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas) # Fit. fit$etas contains the n linear predictors fit <- IWLSCoxridge(XXT,Y=respsurv)
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] lambdas <- c(100,1000) # Create fake survival data respsurv <- Surv(rexp(length(resp)),resp) # Prepare fitting for the specified penalties. XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas) # Fit. fit$etas contains the n linear predictors fit <- IWLSCoxridge(XXT,Y=respsurv)
Iterative weighted least squares algorithm for linear and logistic ridge regression. Updates the weights and linear predictors until convergence.
IWLSridge(XXT, Y, X1 = NULL, intercept = TRUE, frac1 = NULL, eps = 1e-07, maxItr = 25, trace = FALSE, model = NULL, E0 = NULL)
IWLSridge(XXT, Y, X1 = NULL, intercept = TRUE, frac1 = NULL, eps = 1e-07, maxItr = 25, trace = FALSE, model = NULL, E0 = NULL)
XXT |
Matrix. Dimensions |
Y |
Response vector: numeric, binary, or two-class factor |
X1 |
Matrix. Dimension |
intercept |
Boolean. Should an intercept be included? |
frac1 |
Scalar. Prior fraction of cases. Only relevant for |
eps |
Scalar. Numerical bound for IWLS convergence. |
maxItr |
Integer. Maximum number of iterations used in IWLS. |
trace |
Boolean. Should the output of the IWLS algorithm be traced? |
model |
Character. Any of |
E0 |
Numerical vector or |
An (unpenalized) intercept is included by default. To keep the function computationally efficient it returns the linear predictors (which suffice for predictions), instead of parameter estimates. These may be obtained by applying the betasout
function to the output of this function.
List, containing:
etas |
Numerical vector: Final linear predictors |
Ypred |
Predicted survival |
convergence |
Boolean: has IWLS converged? |
nIt |
Number of iterations |
Hres |
Auxiliary list object. Passed on to other functions |
linearized |
Linearized predictions |
unpen |
Boolean: are there any unpenalized covariates involved? Passed on to other functions |
intercept |
Boolean: Is an intercept included? |
eta0 |
Numerical vector: Initial linear predictors |
X1 |
Matrix: design matrix unpenalized variables |
Mark A. van de Wiel, Mirrelijn van Nee, Armin Rauschenberger (2021). Fast cross-validation for high-dimensional ridge regression. J Comp Graph Stat
IWLSCoxridge
for Cox ridge. betasout
for obtaining parameter estimates.
predictIWLS
for predictions on new samples. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] lambdas <- c(100,1000) # Prepare fitting for the specified penalties. XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas) # Fit. fit$etas contains the n linear predictors fit <- IWLSridge(XXT,Y=resp)
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] lambdas <- c(100,1000) # Prepare fitting for the specified penalties. XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas) # Fit. fit$etas contains the n linear predictors fit <- IWLSridge(XXT,Y=resp)
Computed maximum marginal likelihood score for given penalty parameters using mgcv
.
mgcv_lambda(penalties, XXblocks,Y, model=NULL, printscore=TRUE, pairing=NULL, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
mgcv_lambda(penalties, XXblocks,Y, model=NULL, printscore=TRUE, pairing=NULL, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
penalties |
Numeric vector. |
XXblocks |
List of |
Y |
Response vector: numeric, binary, factor or |
model |
Character. Any of |
printscore |
Boolean. Should the score be printed? |
pairing |
Numerical vector of length 3 or |
sigmasq |
Default error variance. |
opt.sigma |
Boolean. Should the error variance be optimized as well? Only relevant for |
See gam
for details on how the marginal likelihood is computed.
Numeric, marginal likelihood score for given penalties
Wood, S. N. (2011), Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models, J. Roy. Statist. Soc., B 73(1), 3-36.
CVscore
for cross-validation alternative. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
multiridge
Outer-loop cross-validation for estimating performance of marginal likelihood based multiridge
.
Outer fold is for testing; penalty parameter tuning is performed by marginal likelihood estimation
mlikCV(penaltiesinit, XXblocks, Y, pairing = NULL, outfold = 5, nrepeatout = 1, balance = TRUE,fixedfolds = TRUE, model = NULL, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), reltol = 1e-04, trace = FALSE, optmethod1 = "SANN", optmethod2 = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10, maxItropt2 = 25, parallel = FALSE, pref = NULL, fixedpen = NULL, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
mlikCV(penaltiesinit, XXblocks, Y, pairing = NULL, outfold = 5, nrepeatout = 1, balance = TRUE,fixedfolds = TRUE, model = NULL, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), reltol = 1e-04, trace = FALSE, optmethod1 = "SANN", optmethod2 = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10, maxItropt2 = 25, parallel = FALSE, pref = NULL, fixedpen = NULL, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
penaltiesinit |
Numeric vector. Initial values for penaltyparameters. May be obtained from |
XXblocks |
List of |
Y |
Response vector: numeric, binary, factor or |
pairing |
Numerical vector of length 3 or |
outfold |
Integer. Outer fold for test samples. |
nrepeatout |
Integer. Number of repeated splits for outer fold. |
balance |
Boolean. Should the splits be balanced in terms of response labels? |
fixedfolds |
Boolean. Should fixed splits be used for reproducibility? |
intercept |
Boolean. Should an intercept be included? |
model |
Character. Any of |
trace |
Boolean. Should the output of the IWLS algorithm be traced? |
reltol |
Scalar. Relative tolerance for optimization methods. |
optmethod1 |
Character. First, global search method. Any of the methods |
optmethod2 |
Character. Second, local search method. Any of the methods |
maxItropt1 |
Integer. Maximum number of iterations for |
maxItropt2 |
Integer. Maximum number of iterations for |
parallel |
Boolean. Should computation be done in parallel? If |
pref |
Integer vector or |
fixedpen |
Integer vector or |
sigmasq |
Default error variance. |
opt.sigma |
Boolean. Should the error variance be optimized as well? Only relevant for |
WARNING: this function may be very time-consuming. The number of evaluations may equal nrepeatout*outerfold*(maxItropt1+maxItropt2)
. Computing time may be estimated by multiplying computing time of optLambdas_mgcvWrap
by
nrepeatout*outerfold
.
List with the following components:
sampleindex |
Numerical vector: sample indices |
true |
True responses |
linpred |
Cross-validated linear predictors |
optLambdas_mgcv
, optLambdas_mgcvWrap
which optimize the penalties.
Scoring
which may applied to output of this function to obtain overall cross-validated performance score.
doubleCV
for double cross-validation counterpart. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Outer cross-validation, inner marginal likelihood optimization ## Not run: perfmlik <- mlikCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp,outfold=10, nrepeatout=1) # Performance metrics Scoring(perfmlik$linpred,perfmlik$true,score="auc",print=TRUE) Scoring(perfmlik$linpred,perfmlik$true,score="brier",print=TRUE) Scoring(perfmlik$linpred,perfmlik$true,score="loglik",print=TRUE) ## End(Not run)
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Outer cross-validation, inner marginal likelihood optimization ## Not run: perfmlik <- mlikCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp,outfold=10, nrepeatout=1) # Performance metrics Scoring(perfmlik$linpred,perfmlik$true,score="auc",print=TRUE) Scoring(perfmlik$linpred,perfmlik$true,score="brier",print=TRUE) Scoring(perfmlik$linpred,perfmlik$true,score="loglik",print=TRUE) ## End(Not run)
Optimizes a cross-validated score w.r.t. ridge penalties for multiple data blocks.
optLambdas(penaltiesinit = NULL, XXblocks, Y, X1 = NULL, pairing = NULL, folds, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL, score = "loglik", model = NULL, epsIWLS = 0.001, maxItrIWLS = 25, traceCV = TRUE, reltol = 1e-04, optmethod = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt = 500, save = FALSE, parallel = FALSE, fixedpen = NULL, fixedseed = TRUE)
optLambdas(penaltiesinit = NULL, XXblocks, Y, X1 = NULL, pairing = NULL, folds, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL, score = "loglik", model = NULL, epsIWLS = 0.001, maxItrIWLS = 25, traceCV = TRUE, reltol = 1e-04, optmethod = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt = 500, save = FALSE, parallel = FALSE, fixedpen = NULL, fixedseed = TRUE)
penaltiesinit |
Numeric vector. Initial values for penaltyparameters. May be obtained from |
XXblocks |
List of |
Y |
Response vector: numeric, binary, factor or |
X1 |
Matrix. Dimension |
pairing |
Numerical vector of length 3 or |
folds |
List, containing the splits of the samples. Usually obtained by |
intercept |
Boolean. Should an intercept be included? |
frac1 |
Scalar. Prior fraction of cases. Only relevant for |
score |
Character. See Details. |
model |
Character. Any of |
epsIWLS |
Scalar. Numerical bound for IWLS convergence. |
maxItrIWLS |
Integer. Maximum number of iterations used in IWLS. |
traceCV |
Boolean. Should the output of the IWLS algorithm be traced? |
reltol |
Scalar. Relative tolerance for optimization methods. |
optmethod |
Character. Optimization method. Any of the methods |
maxItropt |
Integer. Maximum number of iterations for |
save |
Boolean. If TRUE appends the penalties and resulting CVscore to global variable |
parallel |
Boolean. Should computation be done in parallel? If |
fixedpen |
Integer vector or |
fixedseed |
Boolean. Should the initialization be fixed? For reproducibility. |
See Scoring
for details on score
.
We highly recommend to use smooth scoring functions, in particular "loglik"
.
For ranking-based criteria like auc
and cindex
we advise to use repeated CV (see CVfolds
) to avoid ending up in any of the many local optima.
List, with components:
optres |
Output of the optimizer |
optpen |
Vector with determined optimal penalties |
allsc |
Matrix with CV scores for all penalty parameter configurations used by the optimizer |
optLambdasWrap
for i) (recommended) optimization in two steps: first global, then local; and ii) sequential optimization
when some data types are preferred over others. fastCV2
for initialization of penalties. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create (repeated) CV-splits of the data. leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) # One-pass optimization # Increase the number of iterations for optimal results jointlambdas <- optLambdas(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp, folds=leftout,score="loglik",save=T,maxItropt=5)
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create (repeated) CV-splits of the data. leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) # One-pass optimization # Increase the number of iterations for optimal results jointlambdas <- optLambdas(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp, folds=leftout,score="loglik",save=T,maxItropt=5)
Optimizes a marginal likelihood score w.r.t. ridge penalties for multiple data blocks.
optLambdas_mgcv(penaltiesinit=NULL, XXblocks,Y, pairing=NULL, model=NULL, reltol=1e-4, optmethod=ifelse(length(penaltiesinit)==1,"Brent", "Nelder-Mead"),maxItropt=500, tracescore=TRUE, fixedpen=NULL, fixedseed =TRUE, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
optLambdas_mgcv(penaltiesinit=NULL, XXblocks,Y, pairing=NULL, model=NULL, reltol=1e-4, optmethod=ifelse(length(penaltiesinit)==1,"Brent", "Nelder-Mead"),maxItropt=500, tracescore=TRUE, fixedpen=NULL, fixedseed =TRUE, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
penaltiesinit |
Numeric vector. Initial values for penaltyparameters. May be obtained from |
XXblocks |
List of |
Y |
Response vector: numeric, binary, factor or |
pairing |
Numerical vector of length 3 or |
model |
Character. Any of |
reltol |
Scalar. Relative tolerance for optimization methods. |
optmethod |
Character. Optimization method. Any of the methods |
maxItropt |
Integer. Maximum number of iterations for |
tracescore |
Boolean. Should the output of the scores be traced? |
fixedpen |
Integer vector or |
fixedseed |
Boolean. Should the initialization be fixed? For reproducibility. |
sigmasq |
Default error variance. |
opt.sigma |
Boolean. Should the error variance be optimized as well? Only relevant for |
See gam
for details on how the marginal likelihood is computed.
List, with components:
optres |
Output of the optimizer |
optpen |
Vector with determined optimal penalties |
allsc |
Matrix with marginal likelihood scores for all penalty parameter configurations used by the optimizer |
optLambdas_mgcvWrap
for i) (recommended) optimization in two steps: first global, then local; and ii) sequential optimization
when some data types are preferred over others. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create (repeated) CV-splits of the data. leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) # Compute cross-validated score for initial lambdas CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout, score="loglik") # Optimize by using marginal likelihood criterion jointlambdas2 <- optLambdas_mgcvWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth, Y=resp) # Optimal lambdas optlambdas <- jointlambdas2$optpen
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create (repeated) CV-splits of the data. leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) # Compute cross-validated score for initial lambdas CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout, score="loglik") # Optimize by using marginal likelihood criterion jointlambdas2 <- optLambdas_mgcvWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth, Y=resp) # Optimal lambdas optlambdas <- jointlambdas2$optpen
Sequentially optimizes a marginal likelihood score w.r.t. ridge penalties for multiple data blocks.
optLambdas_mgcvWrap(penaltiesinit=NULL, XXblocks,Y, pairing=NULL, model=NULL, reltol=1e-4, optmethod1= "SANN", optmethod2 =ifelse(length(penaltiesinit)==1,"Brent", "Nelder-Mead"), maxItropt1=10,maxItropt2=25,tracescore=TRUE,fixedseed =TRUE, pref=NULL, fixedpen=NULL, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
optLambdas_mgcvWrap(penaltiesinit=NULL, XXblocks,Y, pairing=NULL, model=NULL, reltol=1e-4, optmethod1= "SANN", optmethod2 =ifelse(length(penaltiesinit)==1,"Brent", "Nelder-Mead"), maxItropt1=10,maxItropt2=25,tracescore=TRUE,fixedseed =TRUE, pref=NULL, fixedpen=NULL, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
penaltiesinit |
Numeric vector. Initial values for penaltyparameters. May be obtained from |
XXblocks |
List of |
Y |
Response vector: numeric, binary, factor or |
pairing |
Numerical vector of length 3 or |
model |
Character. Any of |
reltol |
Scalar. Relative tolerance for optimization methods. |
optmethod1 |
Character. First, global search method. Any of the methods |
optmethod2 |
Character. Second, local search method. Any of the methods |
maxItropt1 |
Integer. Maximum number of iterations for |
maxItropt2 |
Integer. Maximum number of iterations for |
tracescore |
Boolean. Should the output of the scores be traced? |
fixedseed |
Boolean. Should the initialization be fixed? For reproducibility. |
pref |
Integer vector or |
fixedpen |
Integer vector or |
sigmasq |
Default error variance. |
opt.sigma |
Boolean. Should the error variance be optimized as well? Only relevant for |
As opposed to optLambdas_mgcv
this function first searches globally, then locally.
Hence, more time-consuming, but better guarded against multiple local optima.
See gam
for details on how the marginal likelihood is computed.
List, with components:
res |
Outputs of all optimizers used |
lambdas |
List of penalties found by the optimizers |
optpen |
Numerical vector with final, optimal penalties |
Wood, S. N. (2011), Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models, J. Roy. Statist. Soc., B 73(1), 3-36.
optLambdas_mgcv
for one-pass optimization. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
Sequentially optimizes a cross-validated score w.r.t. ridge penalties for multiple data blocks. Also implements preferential ridge, which allows to first optimize for the preferential data types.
optLambdasWrap(penaltiesinit = NULL, XXblocks, Y, X1 = NULL, pairing = NULL, folds, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL, score = "loglik", model = NULL, epsIWLS = 0.001, maxItrIWLS = 25, traceCV = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10, maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)
optLambdasWrap(penaltiesinit = NULL, XXblocks, Y, X1 = NULL, pairing = NULL, folds, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL, score = "loglik", model = NULL, epsIWLS = 0.001, maxItrIWLS = 25, traceCV = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10, maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)
penaltiesinit |
Numeric vector. Initial values for penaltyparameters. May be obtained from |
XXblocks |
List of |
Y |
Response vector: numeric, binary, factor or |
X1 |
Matrix. Dimension |
pairing |
Numerical vector of length 3 or |
folds |
List, containing the splits of the samples. Usually obtained by
|
intercept |
Boolean. Should an intercept be included? |
frac1 |
Scalar. Prior fraction of cases. Only relevant for |
score |
Character. See Details. |
model |
Character. Any of |
epsIWLS |
Scalar. Numerical bound for IWLS convergence. |
maxItrIWLS |
Integer. Maximum number of iterations used in IWLS. |
traceCV |
Boolean. Should the output of the IWLS algorithm be traced? |
reltol |
Scalar. Relative tolerance for optimization methods. |
optmethod1 |
Character. First, global search method. Any of the methods |
optmethod2 |
Character. Second, local search method. Any of the methods |
maxItropt1 |
Integer. Maximum number of iterations for |
maxItropt2 |
Integer. Maximum number of iterations for |
save |
Boolean. If TRUE appends the penalties and resulting CVscore to global variable |
parallel |
Boolean. Should computation be done in parallel? If |
pref |
Integer vector or |
fixedpen |
Integer vector or |
As opposed to optLambdas
this function first searches globally,
then locally.
Hence, more time-consuming, but better guarded against multiple local optima.
See Scoring
for details on score
. We highly recommend to
use smooth scoring functions, in particular "loglik"
.
For ranking-based criteria like "auc"
and "cindex"
we advise to
use repeated CV (see CVfolds
) to avoid ending up in any of the
many local optima.
List, with components:
res |
Outputs of all optimizers used |
lambdas |
List of penalties found by the optimizers |
optpen |
Numerical vector with final, optimal penalties |
optLambdas
for one-pass optimization. fastCV2
for initialization of penalties.A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create (repeated) CV-splits of the data. leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) # Optimizes cross-validate criterion (default: log-lik) # Increase the number of iterations for optimal results jointlambdas <- optLambdasWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp, folds=leftout,score="loglik",save=T,maxItropt1=5, maxItropt2=5)
data(dataXXmirmeth) resp <- dataXXmirmeth[[1]] XXmirmeth <- dataXXmirmeth[[2]] # Find initial lambdas: fast CV per data block separately. cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE) lambdas <- cvperblock2$lambdas # Create (repeated) CV-splits of the data. leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE) # Optimizes cross-validate criterion (default: log-lik) # Increase the number of iterations for optimal results jointlambdas <- optLambdasWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp, folds=leftout,score="loglik",save=T,maxItropt1=5, maxItropt2=5)
Produces predictions from ridge fits for new data.
predictIWLS(IWLSfit, X1new = NULL, Sigmanew)
predictIWLS(IWLSfit, X1new = NULL, Sigmanew)
IWLSfit |
List, containing fits from either |
X1new |
Matrix. Dimension |
Sigmanew |
Matrix. Dimensions |
Predictions rely purely on the linear predictors, and do not require producing the parameter vector.
Numerical vector of linear predictor for the test samples.
IWLSridge
(IWLSCoxridge
) for fitting linear and
logistic ridge (Cox ridge). betasout
for obtaining parameter
estimates.
Scoring
to evaluate the predictions. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
#Example below shows how to create the input argument Sigmanew (for simulated data) #Simulate Xbl1 <- matrix(rnorm(1000),nrow=10) Xbl2 <- matrix(rnorm(2000),nrow=10) Xbl1new <- matrix(rnorm(200),nrow=2) Xbl2new <- matrix(rnorm(400),nrow=2) #check whether dimensions are correct nrow(Xbl1)==nrow(Xbl1new) nrow(Xbl2)==nrow(Xbl2new) ncol(Xbl1)==nrow(Xbl2) ncol(Xbl1new)==ncol(Xbl2new) #create cross-product XXbl <- createXXblocks(list(Xbl1,Xbl2),list(Xbl1new,Xbl2new)) #suppose penalties for two data types equal 5,10, respectively Sigmanew <- SigmaFromBlocks(XXbl,c(5,10)) #check dimensions (should be nnew x n) dim(Sigmanew)
#Example below shows how to create the input argument Sigmanew (for simulated data) #Simulate Xbl1 <- matrix(rnorm(1000),nrow=10) Xbl2 <- matrix(rnorm(2000),nrow=10) Xbl1new <- matrix(rnorm(200),nrow=2) Xbl2new <- matrix(rnorm(400),nrow=2) #check whether dimensions are correct nrow(Xbl1)==nrow(Xbl1new) nrow(Xbl2)==nrow(Xbl2new) ncol(Xbl1)==nrow(Xbl2) ncol(Xbl1new)==ncol(Xbl2new) #create cross-product XXbl <- createXXblocks(list(Xbl1,Xbl2),list(Xbl1new,Xbl2new)) #suppose penalties for two data types equal 5,10, respectively Sigmanew <- SigmaFromBlocks(XXbl,c(5,10)) #check dimensions (should be nnew x n) dim(Sigmanew)
Evaluates predictions by a score suitable for the corresponding response
Scoring(lp, Y, model = NULL, score = ifelse(model == "linear", "mse", "loglik"), print = TRUE)
Scoring(lp, Y, model = NULL, score = ifelse(model == "linear", "mse", "loglik"), print = TRUE)
lp |
Numerical vector. Linear predictor. |
Y |
Response vector: numeric, binary, factor or |
score |
Character. See Details. |
model |
Character. Any of |
print |
Boolean. Should the score be printed on screen. |
Several scores are allowed, depending on the type of output. For model = "linear"
,
score
equals any of c("loglik","mse","abserror","cor","kendall","spearman")
, denoting
CV-ed log-likelihood, mean-squared error, mean absolute error, Pearson (Kendall, Spearman) correlation with response.
For model = "logistic"
, score
equals any of c("loglik","auc", "brier")
, denoting
CV-ed log-likelihood, area-under-the-ROC-curve, and brier score a.k.a. MSE.
For model = "cox"
, score
equals any of c("loglik","cindex")
, denoting
CV-ed log-likelihood, and c-index.
Numerical value.
CVscore
for obtaining the cross-validated score (for given penalties), and doubleCV
to obtain doubly cross-validated linear predictors to which Scoring
can be applied to estimated predictive performance by double cross-validation. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
This function sets up parallel computing by the package snowfall
.
setupParallel(ncpus = 2, sourcefile = NULL, sourcelibraries = c("multiridge","survival","pROC","risksetROC"))
setupParallel(ncpus = 2, sourcefile = NULL, sourcelibraries = c("multiridge","survival","pROC","risksetROC"))
ncpus |
Integer. Number of cpus to use. Should be >= 2. |
sourcefile |
Character. Additional source files to be loaded in parallel. Only required when parallel computing is also desired for functions
not available in |
sourcelibraries |
Character vector. Libraries to be loaded in parallel. Defaults to the libraries multiridge depends on. |
Parallel computing is available for several functions that rely on cross-validation. If double CV is used, parallel computing is applied to the outer loop, to optimize efficiency.
No return value, called for side effects
Snowfall package for further documentation on parallel computing. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
## Not run: setupParallel(ncpus=4) ## End(Not run)
## Not run: setupParallel(ncpus=4) ## End(Not run)
Creates penalized sample cross-product matrix, dimension nxn
.
SigmaFromBlocks(XXblocks, penalties, pairing = NULL)
SigmaFromBlocks(XXblocks, penalties, pairing = NULL)
XXblocks |
List of |
penalties |
Numeric vector, representing penaltyparameters. |
pairing |
Numerical vector of length 3 or |
Matrix of size nxn
.
A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4
#Example #Simulate Xbl1 <- matrix(rnorm(1000),nrow=10) Xbl2 <- matrix(rnorm(2000),nrow=10) #check whether dimensions are correct ncol(Xbl1)==nrow(Xbl2) #create cross-product XXbl <- createXXblocks(list(Xbl1,Xbl2)) #suppose penalties for two data types equal 5,10, respectively Sigma <- SigmaFromBlocks(XXbl,c(5,10)) #check dimensions (should be n x n) dim(Sigma)
#Example #Simulate Xbl1 <- matrix(rnorm(1000),nrow=10) Xbl2 <- matrix(rnorm(2000),nrow=10) #check whether dimensions are correct ncol(Xbl1)==nrow(Xbl2) #create cross-product XXbl <- createXXblocks(list(Xbl1,Xbl2)) #suppose penalties for two data types equal 5,10, respectively Sigma <- SigmaFromBlocks(XXbl,c(5,10)) #check dimensions (should be n x n) dim(Sigma)