Package 'multiridge' reference manual

Title:	Fast Cross-Validation for Multi-Penalty Ridge Regression
Description:	Multi-penalty linear, logistic and cox ridge regression, including estimation of the penalty parameters by efficient (repeated) cross-validation and marginal likelihood maximization. Multiple high-dimensional data types that require penalization are allowed, as well as unpenalized variables. Paired and preferential data types can be specified. See Van de Wiel et al. (2021), <arXiv:2005.09301>.
Authors:	Mark A. van de Wiel
Maintainer:	Mark A. van de Wiel <[email protected]>
License:	GPL (>= 3)
Version:	1.11
Built:	2025-02-16 06:42:22 UTC
Source:	CRAN

Fast cross-validation for multi-penalty ridge regression

Description

The package implements multi-penalty linear, logistic and cox ridge regression, including estimation of the penalty parameters by efficient (repeated) cross-validation or marginal likelihood maximization. It allows for multiple high-dimensional data types that require penalization, as well as unpenalized variables. Moreover, it allows a paired penalty for paired data types, and preferential data types can be specified.

Details

The DESCRIPTION file:

Package:	multiridge
Type:	Package
Title:	Fast Cross-Validation for Multi-Penalty Ridge Regression
Version:	1.11
Date:	2022-06-13
Author:	Mark A. van de Wiel
Maintainer:	Mark A. van de Wiel <[email protected]>
Depends:	R (>= 3.5.0), survival, pROC, methods, mgcv, snowfall
Description:	Multi-penalty linear, logistic and cox ridge regression, including estimation of the penalty parameters by efficient (repeated) cross-validation and marginal likelihood maximization. Multiple high-dimensional data types that require penalization are allowed, as well as unpenalized variables. Paired and preferential data types can be specified. See Van de Wiel et al. (2021), <arXiv:2005.09301>.
License:	GPL (>= 3)
NeedsCompilation:	no
Packaged:	2022-06-13 13:11:49 UTC; VNOB-0735
Repository:	CRAN
Date/Publication:	2022-06-13 15:10:05 UTC

Index of help topics:

CVfolds                 Creates (repeated) cross-validation folds
CVscore                 Cross-validated score
IWLSCoxridge            Iterative weighted least squares algorithm for
                        Cox ridge regression.
IWLSridge               Iterative weighted least squares algorithm for
                        linear and logistic ridge regression.
Scoring                 Evaluate predictions
SigmaFromBlocks         Create penalized sample cross-product matrix
augment                 Augment data with zeros.
betasout                Coefficient estimates from (converged) IWLS fit
createXXblocks          Creates list of (unscaled) sample covariance
                        matrices
createXblocks           Create list of paired data blocks
dataXXmirmeth           Contains R-object 'dataXXmirmeth'
doubleCV                Double cross-validation for estimating
                        performance of 'multiridge'
fastCV2                 Fast cross-validation per data block
mgcv_lambda             Maximum marginal likelihood score
mlikCV                  Outer-loop cross-validation for estimating
                        performance of marginal likelihood based
                        'multiridge'
multiridge-package      Fast cross-validation for multi-penalty ridge
                        regression
optLambdas              Find optimal ridge penalties.
optLambdasWrap          Find optimal ridge penalties with sequential
                        optimization.
optLambdas_mgcv         Find optimal ridge penalties with maximimum
                        marginal likelihood
optLambdas_mgcvWrap     Find optimal ridge penalties with sequential
                        optimization.
predictIWLS             Predictions from ridge fits
setupParallel           Setting up parallel computing

betasout: Coefficient estimates from (converged) IWLS fit
createXXblocks: Creates list of (unscaled) sample covariance matrices
CVscore: Cross-validated score for given penalty parameters
dataXXmirmeth: Example data
doubleCV: Double cross-validation for estimating performance
fastCV2: Fast cross-validation per data block; no dependency
IWLSCoxridge: Iterative weighted least squares algorithm for Cox ridge regression
IWLSridge: Iterative weighted least squares algorithm for linear and logistic ridge regression
mlikCV: Cross-validation for estimating performance of marginal likelihood estimation
optLambdasWrap: Find optimal ridge penalties by cross-validation
optLambdas_mgcvWrap: Find optimal ridge penalties in terms of marginal likelihood
predictIWLS: Predictions from ridge fits
setupParallel: Setting up parallel computing
SigmaFromBlocks: Create penalized sample cross-product matrix

Author(s)

Mark A. van de Wiel ([email protected])

References

Mark A. van de Wiel, Mirrelijn van Nee, Armin Rauschenberger (2021). Fast cross-validation for high-dimensional ridge regression. J Comp Graph Stat

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create (repeated) CV-splits of the data.
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)

# Compute cross-validated score for initial lambdas
CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout,
score="loglik")

# Optimizes cross-validate criterion (default: log-lik)
# Increase the number of iterations for optimal results
jointlambdas <- optLambdasWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp,
folds=leftout,score="loglik",save=T, maxItropt1=5, maxItropt2=5)


# Alternatively: optimize by using marginal likelihood criterion
## Not run: 
jointlambdas2 <- optLambdas_mgcvWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,
Y=resp)

## End(Not run)

# Optimal lambdas
optlambdas <- jointlambdas$optpen

# Prepare fitting for the optimal lambdas.
XXT <- SigmaFromBlocks(XXmirmeth,penalties=optlambdas)

# Fit. fit$etas contains the n linear predictors
fit <- IWLSridge(XXT,Y=resp)
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create (repeated) CV-splits of the data.
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)

# Compute cross-validated score for initial lambdas
CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout,
score="loglik")

# Optimizes cross-validate criterion (default: log-lik)
# Increase the number of iterations for optimal results
jointlambdas <- optLambdasWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp,
folds=leftout,score="loglik",save=T, maxItropt1=5, maxItropt2=5)


# Alternatively: optimize by using marginal likelihood criterion
## Not run: 
jointlambdas2 <- optLambdas_mgcvWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,
Y=resp)

## End(Not run)

# Optimal lambdas
optlambdas <- jointlambdas$optpen

# Prepare fitting for the optimal lambdas.
XXT <- SigmaFromBlocks(XXmirmeth,penalties=optlambdas)

# Fit. fit$etas contains the n linear predictors
fit <- IWLSridge(XXT,Y=resp)

Augment data with zeros.

Description

This function augments data with zeros to allow pairing of data on the same variables, but from DIFFERENT samples

Usage

augment(Xdata1, Xdata2)
augment(Xdata1, Xdata2)

Arguments

`Xdata1`	Data frame or data matrix of dimension `n_1 x p`.
`Xdata2`	Data frame or data matrix of dimension `n_2 x p`

Details

Xdata1 and Xdata2 should have the same number of columns. These columns represent variables. Augments both data matrices with zeros, such that the matrices can be paired using createXXblocks on the output of this function.

Value

List

`Xaug1`	Augmented data matrix 1
`Xaug2`	Augmented data matrix 2

Examples

#Example
#Simulate
n1 <- 10
n2 <- 20
p <- 100
X1 <- matrix(rnorm(p*n1),nrow=n1)
X2 <- matrix(rnorm(p*n2),nrow=n2)

#check whether column dimension is correct
ncol(X1)==ncol(X2)

#create cross-product
Xaugm <- augment(X1,X2)

#check dimensions (should be (n1+n2) x p)
dim(Xaugm[[1]])
dim(Xaugm[[2]])
#Example
#Simulate
n1 <- 10
n2 <- 20
p <- 100
X1 <- matrix(rnorm(p*n1),nrow=n1)
X2 <- matrix(rnorm(p*n2),nrow=n2)

#check whether column dimension is correct
ncol(X1)==ncol(X2)

#create cross-product
Xaugm <- augment(X1,X2)

#check dimensions (should be (n1+n2) x p)
dim(Xaugm[[1]])
dim(Xaugm[[2]])

Coefficient estimates from (converged) IWLS fit

Description

Extracts estimated regression coefficients from the final Iterative Weighted Least Squares fit, as obtained from linear, logistic, or Cox ridge regression.

Usage

betasout(IWLSfit, Xblocks, X1=NULL, penalties, pairing = NULL)
betasout(IWLSfit, Xblocks, X1=NULL, penalties, pairing = NULL)

Arguments

`IWLSfit`	List object, see details
`Xblocks`	List of data frames or matrices, representing `b=1,...,B` data blocks of dimensions `n x p_b`.
`X1`	Matrix. Dimension `n x p_0, p_0 < n`, representing unpenalized covariates.
`penalties`	Numerical vector.
`pairing`	Numerical vector of length 3 or `NULL`.

Details

IWLSfit should be the output of either IWLSridge or IWLSCoxridge. Xblocks may be created by createXblocks.

Value

List. Number of components equals number of components of Xblocks plus one, as the output is augmented with an intercept estimate (first component, NULL if absent). Each component is a numerical vector representing regression parameter estimates. Lengths of vectors match column dimensions of Xblocks (nr of variables for given data type)

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]
lambdas <- c(100,1000)

# Prepare fitting for the specified penalties.
XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas)

# Fit. fit$etas contains the n linear predictors
fit <- IWLSridge(XXT,Y=resp)

# Computation of the regression coefficients requires the original
# (large!) nxp data sets, available from link above
## Not run: 
Xbl <- createXblocks(list(datamir,datameth))
betas <- betasout(fit, Xblocks=Xbl, penalties=lambdas)

## End(Not run)
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]
lambdas <- c(100,1000)

# Prepare fitting for the specified penalties.
XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas)

# Fit. fit$etas contains the n linear predictors
fit <- IWLSridge(XXT,Y=resp)

# Computation of the regression coefficients requires the original
# (large!) nxp data sets, available from link above
## Not run: 
Xbl <- createXblocks(list(datamir,datameth))
betas <- betasout(fit, Xblocks=Xbl, penalties=lambdas)

## End(Not run)

Create list of paired data blocks

Description

Create list of paired data blocks

Usage

createXblocks(datablocks, which2pair = NULL)
createXblocks(datablocks, which2pair = NULL)

Arguments

`datablocks`	List of data frames or matrices representing `b=1,...,B` data blocks of dimensions `n x p_b`.
`which2pair`	Integer vector of size 2 (or `NULL`)

Details

Only use this function when you wish to pair two data blocks. If which2pair = NULL the output matches the input. If not, the function adds a paired data block, pairing the two data blocks corresponding to the elements of which2pair.

Value

List. Same length as datablocks when which2pair = NULL, or augmented with one paired data block.

Examples

n <- 43
p <- 100
fakeXbl <- createXblocks(list(X1 = matrix(rnorm(n*p),nrow=n),X2 = matrix(rnorm(n*p),nrow=n)))
n <- 43
p <- 100
fakeXbl <- createXblocks(list(X1 = matrix(rnorm(n*p),nrow=n),X2 = matrix(rnorm(n*p),nrow=n)))

Creates list of (unscaled) sample covariance matrices

Description

Creates list of (unscaled) sample covariance matrices X_b %*% t(X_b) for data blocks b = 1,..., B.

Usage

createXXblocks(datablocks, datablocksnew = NULL, which2pair = NULL)
createXXblocks(datablocks, datablocksnew = NULL, which2pair = NULL)

Arguments

`datablocks`	List of data frames or matrices
`datablocksnew`	List of data frames or matrices
`which2pair`	Integer vector of size 2 (or `NULL`)

Details

The efficiency of multiridge for high-dimendional data relies largely on this function: all iterative calculation are performed on the out put of this function, which contains B blocks of nxn matrices. If which2pair != NULL, the function adds a paired covariance block, pairing the two data blocks corresponding to the elements of which2pair. If predictions for new samples are desired, one also needs to specify datablocksnew, which should have he exact same format as datablocks with matching column dimension (number of variables).

Value

List. Same number of component as datablocks when which2pair = NULL, or augmented with one paired data block. Dimension is nxn for all components.

Examples

#Example
#Simulate
Xbl1 <- matrix(rnorm(1000),nrow=10)
Xbl2 <- matrix(rnorm(2000),nrow=10)

#check whether dimensions are correct
ncol(Xbl1)==nrow(Xbl2)

#create cross-product
XXbl <- createXXblocks(list(Xbl1,Xbl2))

#suppose penalties for two data types equal 5,10, respectively
Sigma <- SigmaFromBlocks(XXbl,c(5,10))

#check dimensions (should be n x n)
dim(Sigma)
#Example
#Simulate
Xbl1 <- matrix(rnorm(1000),nrow=10)
Xbl2 <- matrix(rnorm(2000),nrow=10)

#check whether dimensions are correct
ncol(Xbl1)==nrow(Xbl2)

#create cross-product
XXbl <- createXXblocks(list(Xbl1,Xbl2))

#suppose penalties for two data types equal 5,10, respectively
Sigma <- SigmaFromBlocks(XXbl,c(5,10))

#check dimensions (should be n x n)
dim(Sigma)

Creates (repeated) cross-validation folds

Description

Creates (repeated) cross-validation folds for samples

Usage

CVfolds(Y, model = NULL, balance = TRUE, kfold = 10, fixedfolds = TRUE, nrepeat = 1)
CVfolds(Y, model = NULL, balance = TRUE, kfold = 10, fixedfolds = TRUE, nrepeat = 1)

Arguments

`Y`	Response vector: numeric, binary, factor or `survival`.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`balance`	Boolean. Should the splits be balanced in terms of response labels?
`kfold`	Integer. Desired fold.
`fixedfolds`	Boolean. Should fixed splits be used for reproducibility?
`nrepeat`	Numeric. Number of repeats.

Details

Creates (repeated), possibly balanced, splits of the samples. Computing time will often largely depend on on kfold*nrepeat, the number of training-test splits evaluated.

Value

List object with kfold*nrepeat elements containing the sample indices of the left-out samples per split.

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)

Cross-validated score

Description

Cross-validated score for given penalty parameters.

Usage

CVscore(penalties, XXblocks, Y, X1 = NULL, pairing = NULL, folds, intercept =
ifelse(is(Y, "Surv"),FALSE, TRUE), frac1 = NULL, score = "loglik", model =
NULL, eps = 1e-07, maxItr = 100, trace = FALSE,   printCV = TRUE, save = FALSE,
parallel = FALSE)
CVscore(penalties, XXblocks, Y, X1 = NULL, pairing = NULL, folds, intercept =
ifelse(is(Y, "Surv"),FALSE, TRUE), frac1 = NULL, score = "loglik", model =
NULL, eps = 1e-07, maxItr = 100, trace = FALSE,   printCV = TRUE, save = FALSE,
parallel = FALSE)

Arguments

`penalties`	Numeric vector.
`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`X1`	Matrix. Dimension `n x p_0, p_0 < n`, representing unpenalized covariates
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.
`folds`	List of integer vector. Usually output of `CVfolds`.
`intercept`	Boolean. Should an intercept be included?
`frac1`	Scalar. Prior fraction of cases. Only relevant for `model=" logistic"`.
`score`	Character. See Details.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`eps`	Scalar. Numerical bound for IWLS convergence.
`maxItr`	Integer. Maximum number of iterations used in IWLS.
`trace`	Boolean. Should the output of the IWLS algorithm be traced?
`printCV`	Boolean. Should the CV-score be printed on screen?
`save`	Boolean. If TRUE appends the penalties and resulting CVscore to global variable `allscores`
`parallel`	Boolean. Should computation be done in parallel? If `TRUE`, requires to run `setupParallel` first.

Details

See Scoring for details on score.

Value

Numeric, cross-validated prediction score for given penalties

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create training-test splits
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)
CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout,score="loglik")
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create training-test splits
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)
CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout,score="loglik")

Contains R-object `dataXXmirmeth`

Description

This list object contains the binary response (control/case) and two data blocks corresponding to miRNA and methylation data

Usage

data(dataXXmirmeth)data(dataXXmirmeth)

Format

The format is a list with two components: resp: numeric (0/1) [1:43]\ XXmirmeth: list with 2 components, each a matrix [1:43,1:43]\

Details

The object XXmirmeth is created by applying createXXblocks(list(datamir,datameth)), where objects datamir and datameth are large data matrices stored in the mirmethdata.Rdata file, which is available from the link below.

Source

Snoek, B. C. et al. (2019), Genome-wide microRNA analysis of HPV-positive self-samples yields novel triage markers for early detection of cervical cancer, International Journal of Cancer 144(2), 372-379.

Verlaat, W. et al. (2018), Identification and validation of a 3-gene methylation classifier for hpv-based cervical screening on self-samples, Clinical Cancer Research 24(14), 3456-3464.

References

Mark A. van de Wiel, Mirrelijn van Nee, Armin Rauschenberger (2021). Fast cross-validation for multi-penalty high-dimensional ridge regression. J Comp Graph Stat

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

Double cross-validation for estimating performance of `multiridge`

Description

Double cross-validation for estimating performance of multiridge. Outer fold is for testing, inner fold for penalty parameter tuning

Usage

doubleCV(penaltiesinit, XXblocks, Y, X1 = NULL, pairing = NULL, outfold = 5,
  infold = 10, nrepeatout =   1, nrepeatin = 1, balance = TRUE, fixedfolds =
  TRUE, intercept = ifelse(is(Y, "Surv"), FALSE,     TRUE), frac1 = NULL,
  score = "loglik",model = NULL, eps = 1e-07, maxItr = 10, trace = FALSE,
  printCV   = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 =
  ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10,
  maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)
doubleCV(penaltiesinit, XXblocks, Y, X1 = NULL, pairing = NULL, outfold = 5,
  infold = 10, nrepeatout =   1, nrepeatin = 1, balance = TRUE, fixedfolds =
  TRUE, intercept = ifelse(is(Y, "Surv"), FALSE,     TRUE), frac1 = NULL,
  score = "loglik",model = NULL, eps = 1e-07, maxItr = 10, trace = FALSE,
  printCV   = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 =
  ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10,
  maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)

Arguments

`penaltiesinit`	Numeric vector. Initial values for penaltyparameters. May be obtained from `fastCV2`.
`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`X1`	Matrix. Dimension `n x p_0, p_0 < n`, representing unpenalized covariates
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.
`outfold`	Integer. Outer fold for test samples.
`infold`	Integer. Inner fold for tuning penalty parameters.
`nrepeatout`	Integer. Number of repeated splits for outer fold.
`nrepeatin`	Integer. Number of repeated splits for inner fold.
`balance`	Boolean. Should the splits be balanced in terms of response labels?
`fixedfolds`	Boolean. Should fixed splits be used for reproducibility?
`intercept`	Boolean. Should an intercept be included?
`frac1`	Scalar. Prior fraction of cases. Only relevant for `model=" logistic"`.
`score`	Character. See Details.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`eps`	Scalar. Numerical bound for IWLS convergence.
`maxItr`	Integer. Maximum number of iterations used in IWLS.
`trace`	Boolean. Should the output of the IWLS algorithm be traced?
`printCV`	Boolean. Should the CV-score be printed on screen?
`reltol`	Scalar. Relative tolerance for optimization methods.
`optmethod1`	Character. First, global search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but simulated annealing by `"Sann"` is recommended to search a wide landscape. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`optmethod2`	Character. Second, local search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but `"Nelder-Mead"` is generally recommended. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`maxItropt1`	Integer. Maximum number of iterations for `optmethod1`.
`maxItropt2`	Integer. Maximum number of iterations for `optmethod2`.
`save`	Boolean. If TRUE appends the penalties and resulting CVscore to global variable `allscores`
`parallel`	Boolean. Should computation be done in parallel? If `TRUE`, requires to run `setupParallel` first.
`pref`	Integer vector or `NULL`. Contains indices of data types in `XXblocks` that are preferential.
`fixedpen`	Integer vector or `NULL`. Contains indices of data types of which penalty is fixed to the corresponding value in `penaltiesinit`.

Details

WARNING: this function may be very time-consuming. The number of evaluations may equal nrepeatout*outerfold*nrepeatin*innerfold*maxItr*(maxItropt1+maxItropt2). Computing time may be estimated by multiplying computing time of optLambdasWrap by nrepeatout*outerfold. See Scoring for details on score.

Value

List with the following components:

`sampleindex`	Numerical vector: sample indices
`true`	True responses
`linpred`	Cross-validated linear predictors

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Double cross-validation
## Not run: 
perf <- doubleCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp,
score="loglik",outfold=10, infold=10, nrepeatout=1, nrepeatin=3, parallel=TRUE)

# Performance metrics
Scoring(perf$linpred,perf$true,score="auc",print=TRUE)
Scoring(perf$linpred,perf$true,score="brier",print=TRUE)
Scoring(perf$linpred,perf$true,score="loglik",print=TRUE)

## End(Not run)
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Double cross-validation
## Not run: 
perf <- doubleCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp,
score="loglik",outfold=10, infold=10, nrepeatout=1, nrepeatin=3, parallel=TRUE)

# Performance metrics
Scoring(perf$linpred,perf$true,score="auc",print=TRUE)
Scoring(perf$linpred,perf$true,score="brier",print=TRUE)
Scoring(perf$linpred,perf$true,score="loglik",print=TRUE)

## End(Not run)

Fast cross-validation per data block

Description

Fast cross-validation for high-dimensional data. Finds optimal penalties separately per data block. Useful for initialization.

Usage

fastCV2(XXblocks, Y, X1 = NULL, kfold = 10, intercept =
ifelse(is(Y, "Surv"), FALSE, TRUE), parallel = FALSE, fixedfolds = TRUE,
model = NULL, eps = 1e-10, reltol = 0.5, lambdamax= 10^6, traceCV=TRUE)
fastCV2(XXblocks, Y, X1 = NULL, kfold = 10, intercept =
ifelse(is(Y, "Surv"), FALSE, TRUE), parallel = FALSE, fixedfolds = TRUE,
model = NULL, eps = 1e-10, reltol = 0.5, lambdamax= 10^6, traceCV=TRUE)

Arguments

`XXblocks`	List of data frames or matrices, representing `b=1,...,B` data blocks of dimensions `n x p_b`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`X1`	Matrix. Dimension `n x p_0, p_0 < n`, representing unpenalized covariates.
`kfold`	Integer. Desired fold.
`intercept`	Boolean. Should an intercept be included?
`parallel`	Boolean. Should computation be done in parallel? If `TRUE`, requires to run `setupParallel` first.
`fixedfolds`	Boolean. Should fixed splits be used for reproducibility?
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`eps`	Scalar. Numerical bound for IWLS convergence.
`reltol`	Scalar. Relative tolerance for optimization method.
`lambdamax`	Numeric. Upperbound for lambda.
`traceCV`	Boolean. Should the CV results be traced and printed?

Details

This function is basically a wrapper for applying optLambdas per data block separately using Brent optimization.

Value

Numerical vector containing penalties optimized separately per data block. Useful for initialization.

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

Iterative weighted least squares algorithm for Cox ridge regression.

Description

Iterative weighted least squares algorithm for Cox ridge regression. Updates the weights and linear predictors until convergence.

Usage

IWLSCoxridge(XXT, Y, X1 = NULL, intercept = FALSE, eps = 1e-07, maxItr = 25,
trace = FALSE, E0 = NULL)
IWLSCoxridge(XXT, Y, X1 = NULL, intercept = FALSE, eps = 1e-07, maxItr = 25,
trace = FALSE, E0 = NULL)

Arguments

`XXT`	Matrix. Dimensions `nxn`. Sample cross-product from penalized variables, usually computed by `SigmaFromBlocks`.
`Y`	Response vector: class `survival`.
`X1`	Matrix. Dimension `n x p_0, p_0 < n`, representing unpenalized covariates.
`intercept`	Boolean. Should an intercept be included?
`eps`	Scalar. Numerical bound for IWLS convergence.
`maxItr`	Integer. Maximum number of iterations used in IWLS.
`trace`	Boolean. Should the output of the IWLS algorithm be traced?
`E0`	Numerical vector or `NULL`. Optional initial values for linear predictor. Same length as `Y`. Usually `NULL`, which initializes linear predictor with 0.

Details

Usually, Cox ridge regression does not use an intercept, as this is part of the baseline hazard. The latter is estimated using the Breslow estimator. To keep the function computationally efficient it returns the linear predictors (which suffice for predictions), instead of parameter estimates. These may be obtained by applying the betasout function to the output of this function.

Value

List, containing:

`etas`	Numerical vector: Final linear predictors
`Ypred`	Predicted survival
`convergence`	Boolean: has IWLS converged?
`nIt`	Number of iterations
`Hres`	Auxiliary list object. Passed on to other functions
`linearized`	Linearized predictions
`unpen`	Boolean: are there any unpenalized covariates involved? Passed on to other functions
`intercept`	Boolean: Is an intercept included?
`eta0`	Numerical vector: Initial linear predictors
`X1`	Matrix: design matrix unpenalized variables

References

Mark A. van de Wiel, Mirrelijn van Nee, Armin Rauschenberger (2021). Fast cross-validation for high-dimensional ridge regression. J Comp Graph Stat

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]
lambdas <- c(100,1000)

# Create fake survival data
respsurv <- Surv(rexp(length(resp)),resp)

# Prepare fitting for the specified penalties.
XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas)

# Fit. fit$etas contains the n linear predictors
fit <- IWLSCoxridge(XXT,Y=respsurv)
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]
lambdas <- c(100,1000)

# Create fake survival data
respsurv <- Surv(rexp(length(resp)),resp)

# Prepare fitting for the specified penalties.
XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas)

# Fit. fit$etas contains the n linear predictors
fit <- IWLSCoxridge(XXT,Y=respsurv)

Iterative weighted least squares algorithm for linear and logistic ridge regression.

Description

Iterative weighted least squares algorithm for linear and logistic ridge regression. Updates the weights and linear predictors until convergence.

Usage

IWLSridge(XXT, Y, X1 = NULL, intercept = TRUE, frac1 = NULL, eps = 1e-07,
maxItr = 25, trace = FALSE, model = NULL, E0 = NULL)
IWLSridge(XXT, Y, X1 = NULL, intercept = TRUE, frac1 = NULL, eps = 1e-07,
maxItr = 25, trace = FALSE, model = NULL, E0 = NULL)

Arguments

`XXT`	Matrix. Dimensions `nxn`. Sample cross-product from penalized variables, usually computed by `SigmaFromBlocks`.
`Y`	Response vector: numeric, binary, or two-class factor
`X1`	Matrix. Dimension `n x p_0, p_0 < n`, representing unpenalized covariates.
`intercept`	Boolean. Should an intercept be included?
`frac1`	Scalar. Prior fraction of cases. Only relevant for `model="logistic"`.
`eps`	Scalar. Numerical bound for IWLS convergence.
`maxItr`	Integer. Maximum number of iterations used in IWLS.
`trace`	Boolean. Should the output of the IWLS algorithm be traced?
`model`	Character. Any of `c("linear", "logistic")`. Is inferred from `Y` when `NULL`. Note that the cox model for survival is covered by the function `IWLSCoxridge`.
`E0`	Numerical vector or `NULL`. Optional initial values for linear predictor. Same length as `Y`. Usually `NULL`, which initializes linear predictor with 0.

Details

An (unpenalized) intercept is included by default. To keep the function computationally efficient it returns the linear predictors (which suffice for predictions), instead of parameter estimates. These may be obtained by applying the betasout function to the output of this function.

Value

List, containing:

`etas`	Numerical vector: Final linear predictors
`Ypred`	Predicted survival
`convergence`	Boolean: has IWLS converged?
`nIt`	Number of iterations
`Hres`	Auxiliary list object. Passed on to other functions
`linearized`	Linearized predictions
`unpen`	Boolean: are there any unpenalized covariates involved? Passed on to other functions
`intercept`	Boolean: Is an intercept included?
`eta0`	Numerical vector: Initial linear predictors
`X1`	Matrix: design matrix unpenalized variables

References

Mark A. van de Wiel, Mirrelijn van Nee, Armin Rauschenberger (2021). Fast cross-validation for high-dimensional ridge regression. J Comp Graph Stat

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]
lambdas <- c(100,1000)

# Prepare fitting for the specified penalties.
XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas)

# Fit. fit$etas contains the n linear predictors
fit <- IWLSridge(XXT,Y=resp)
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]
lambdas <- c(100,1000)

# Prepare fitting for the specified penalties.
XXT <- SigmaFromBlocks(XXmirmeth,penalties=lambdas)

# Fit. fit$etas contains the n linear predictors
fit <- IWLSridge(XXT,Y=resp)

Maximum marginal likelihood score

Description

Computed maximum marginal likelihood score for given penalty parameters using mgcv.

Usage

mgcv_lambda(penalties, XXblocks,Y, model=NULL, printscore=TRUE, pairing=NULL, sigmasq = 1,
  opt.sigma=ifelse(model=="linear",TRUE, FALSE))
mgcv_lambda(penalties, XXblocks,Y, model=NULL, printscore=TRUE, pairing=NULL, sigmasq = 1,
  opt.sigma=ifelse(model=="linear",TRUE, FALSE))

Arguments

`penalties`	Numeric vector.
`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`printscore`	Boolean. Should the score be printed?
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.
`sigmasq`	Default error variance.
`opt.sigma`	Boolean. Should the error variance be optimized as well? Only relevant for `model="linear"`.

Details

See gam for details on how the marginal likelihood is computed.

Value

Numeric, marginal likelihood score for given penalties

References

Wood, S. N. (2011), Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models, J. Roy. Statist. Soc., B 73(1), 3-36.

Outer-loop cross-validation for estimating performance of marginal likelihood based `multiridge`

Description

Outer-loop cross-validation for estimating performance of marginal likelihood based multiridge. Outer fold is for testing; penalty parameter tuning is performed by marginal likelihood estimation

Usage

mlikCV(penaltiesinit, XXblocks, Y, pairing = NULL, outfold = 5, nrepeatout = 1,
balance = TRUE,fixedfolds = TRUE,  model = NULL, intercept =
ifelse(is(Y, "Surv"), FALSE, TRUE), reltol = 1e-04, trace = FALSE, optmethod1 = "SANN",
optmethod2 = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"),
maxItropt1 = 10, maxItropt2 = 25, parallel = FALSE, pref = NULL,
fixedpen = NULL, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
mlikCV(penaltiesinit, XXblocks, Y, pairing = NULL, outfold = 5, nrepeatout = 1,
balance = TRUE,fixedfolds = TRUE,  model = NULL, intercept =
ifelse(is(Y, "Surv"), FALSE, TRUE), reltol = 1e-04, trace = FALSE, optmethod1 = "SANN",
optmethod2 = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"),
maxItropt1 = 10, maxItropt2 = 25, parallel = FALSE, pref = NULL,
fixedpen = NULL, sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))

Arguments

`penaltiesinit`	Numeric vector. Initial values for penaltyparameters. May be obtained from `fastCV2`.
`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.
`outfold`	Integer. Outer fold for test samples.
`nrepeatout`	Integer. Number of repeated splits for outer fold.
`balance`	Boolean. Should the splits be balanced in terms of response labels?
`fixedfolds`	Boolean. Should fixed splits be used for reproducibility?
`intercept`	Boolean. Should an intercept be included?
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`trace`	Boolean. Should the output of the IWLS algorithm be traced?
`reltol`	Scalar. Relative tolerance for optimization methods.
`optmethod1`	Character. First, global search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but simulated annealing by `"Sann"` is recommended to search a wide landscape. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`optmethod2`	Character. Second, local search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but `"Nelder-Mead"` is generally recommended. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`maxItropt1`	Integer. Maximum number of iterations for `optmethod1`.
`maxItropt2`	Integer. Maximum number of iterations for `optmethod2`.
`parallel`	Boolean. Should computation be done in parallel? If `TRUE`, requires to run `setupParallel` first.
`pref`	Integer vector or `NULL`. Contains indices of data types in `XXblocks` that are preferential.
`fixedpen`	Integer vector or `NULL`. Contains indices of data types of which penalty is fixed to the corresponding value in `penaltiesinit`.
`sigmasq`	Default error variance.
`opt.sigma`	Boolean. Should the error variance be optimized as well? Only relevant for `model="linear"`.

Details

WARNING: this function may be very time-consuming. The number of evaluations may equal nrepeatout*outerfold*(maxItropt1+maxItropt2). Computing time may be estimated by multiplying computing time of optLambdas_mgcvWrap by nrepeatout*outerfold.

Value

List with the following components:

`sampleindex`	Numerical vector: sample indices
`true`	True responses
`linpred`	Cross-validated linear predictors

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Outer cross-validation, inner marginal likelihood optimization
## Not run: 
perfmlik <- mlikCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp,outfold=10,
nrepeatout=1)


# Performance metrics
Scoring(perfmlik$linpred,perfmlik$true,score="auc",print=TRUE)
Scoring(perfmlik$linpred,perfmlik$true,score="brier",print=TRUE)
Scoring(perfmlik$linpred,perfmlik$true,score="loglik",print=TRUE)

## End(Not run)
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Outer cross-validation, inner marginal likelihood optimization
## Not run: 
perfmlik <- mlikCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp,outfold=10,
nrepeatout=1)


# Performance metrics
Scoring(perfmlik$linpred,perfmlik$true,score="auc",print=TRUE)
Scoring(perfmlik$linpred,perfmlik$true,score="brier",print=TRUE)
Scoring(perfmlik$linpred,perfmlik$true,score="loglik",print=TRUE)

## End(Not run)

Find optimal ridge penalties.

Description

Optimizes a cross-validated score w.r.t. ridge penalties for multiple data blocks.

Usage

optLambdas(penaltiesinit = NULL, XXblocks, Y, X1 = NULL, pairing = NULL, folds,
  intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL, score = "loglik",
  model = NULL, epsIWLS = 0.001, maxItrIWLS = 25, traceCV = TRUE, reltol = 1e-04,
  optmethod = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt = 500,
  save = FALSE, parallel = FALSE, fixedpen = NULL, fixedseed = TRUE)
optLambdas(penaltiesinit = NULL, XXblocks, Y, X1 = NULL, pairing = NULL, folds,
  intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL, score = "loglik",
  model = NULL, epsIWLS = 0.001, maxItrIWLS = 25, traceCV = TRUE, reltol = 1e-04,
  optmethod = ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt = 500,
  save = FALSE, parallel = FALSE, fixedpen = NULL, fixedseed = TRUE)

Arguments

`penaltiesinit`	Numeric vector. Initial values for penaltyparameters. May be obtained from `fastCV2`.
`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`X1`	Matrix. Dimension `n x p_0, p_0 < n`, representing unpenalized covariates
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.
`folds`	List, containing the splits of the samples. Usually obtained by `CVfolds`
`intercept`	Boolean. Should an intercept be included?
`frac1`	Scalar. Prior fraction of cases. Only relevant for `model=" logistic"`.
`score`	Character. See Details.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`epsIWLS`	Scalar. Numerical bound for IWLS convergence.
`maxItrIWLS`	Integer. Maximum number of iterations used in IWLS.
`traceCV`	Boolean. Should the output of the IWLS algorithm be traced?
`reltol`	Scalar. Relative tolerance for optimization methods.
`optmethod`	Character. Optimization method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but `"Nelder-Mead"` is generally recommended. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`maxItropt`	Integer. Maximum number of iterations for `optmethod`.
`save`	Boolean. If TRUE appends the penalties and resulting CVscore to global variable `allscores`
`parallel`	Boolean. Should computation be done in parallel? If `TRUE`, requires to run `setupParallel` first.
`fixedpen`	Integer vector or `NULL`. Contains indices of data types of which penalty is fixed to the corresponding value in `penaltiesinit`.
`fixedseed`	Boolean. Should the initialization be fixed? For reproducibility.

Details

See Scoring for details on score. We highly recommend to use smooth scoring functions, in particular "loglik". For ranking-based criteria like auc and cindex we advise to use repeated CV (see CVfolds) to avoid ending up in any of the many local optima.

Value

List, with components:

`optres`	Output of the optimizer
`optpen`	Vector with determined optimal penalties
`allsc`	Matrix with CV scores for all penalty parameter configurations used by the optimizer

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create (repeated) CV-splits of the data.
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)

# One-pass optimization
# Increase the number of iterations for optimal results
jointlambdas <- optLambdas(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp,
folds=leftout,score="loglik",save=T,maxItropt=5)
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create (repeated) CV-splits of the data.
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)

# One-pass optimization
# Increase the number of iterations for optimal results
jointlambdas <- optLambdas(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp,
folds=leftout,score="loglik",save=T,maxItropt=5)

Find optimal ridge penalties with maximimum marginal likelihood

Description

Optimizes a marginal likelihood score w.r.t. ridge penalties for multiple data blocks.

Usage

optLambdas_mgcv(penaltiesinit=NULL, XXblocks,Y, pairing=NULL, model=NULL, reltol=1e-4,
  optmethod=ifelse(length(penaltiesinit)==1,"Brent", "Nelder-Mead"),maxItropt=500,
  tracescore=TRUE, fixedpen=NULL, fixedseed =TRUE, sigmasq = 1,
  opt.sigma=ifelse(model=="linear",TRUE, FALSE))
optLambdas_mgcv(penaltiesinit=NULL, XXblocks,Y, pairing=NULL, model=NULL, reltol=1e-4,
  optmethod=ifelse(length(penaltiesinit)==1,"Brent", "Nelder-Mead"),maxItropt=500,
  tracescore=TRUE, fixedpen=NULL, fixedseed =TRUE, sigmasq = 1,
  opt.sigma=ifelse(model=="linear",TRUE, FALSE))

Arguments

`penaltiesinit`	Numeric vector. Initial values for penaltyparameters. May be obtained from `fastCV2`.
`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`reltol`	Scalar. Relative tolerance for optimization methods.
`optmethod`	Character. Optimization method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but `"Nelder-Mead"` is generally recommended. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`maxItropt`	Integer. Maximum number of iterations for `optmethod`.
`tracescore`	Boolean. Should the output of the scores be traced?
`fixedpen`	Integer vector or `NULL`. Contains indices of data types of which penalty is fixed to the corresponding value in `penaltiesinit`.
`fixedseed`	Boolean. Should the initialization be fixed? For reproducibility.
`sigmasq`	Default error variance.
`opt.sigma`	Boolean. Should the error variance be optimized as well? Only relevant for `model="linear"`.

Details

See gam for details on how the marginal likelihood is computed.

Value

List, with components:

`optres`	Output of the optimizer
`optpen`	Vector with determined optimal penalties
`allsc`	Matrix with marginal likelihood scores for all penalty parameter configurations used by the optimizer

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create (repeated) CV-splits of the data.
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)

# Compute cross-validated score for initial lambdas
CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout,
score="loglik")

# Optimize by using marginal likelihood criterion
jointlambdas2 <- optLambdas_mgcvWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,
Y=resp)

# Optimal lambdas
optlambdas <- jointlambdas2$optpen

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create (repeated) CV-splits of the data.
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)

# Compute cross-validated score for initial lambdas
CVscore(penalties=lambdas, XXblocks=XXmirmeth,Y=resp,folds=leftout,
score="loglik")

# Optimize by using marginal likelihood criterion
jointlambdas2 <- optLambdas_mgcvWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,
Y=resp)

# Optimal lambdas
optlambdas <- jointlambdas2$optpen

Find optimal ridge penalties with sequential optimization.

Description

Sequentially optimizes a marginal likelihood score w.r.t. ridge penalties for multiple data blocks.

Usage

optLambdas_mgcvWrap(penaltiesinit=NULL, XXblocks,Y, pairing=NULL, model=NULL, reltol=1e-4,
  optmethod1= "SANN", optmethod2 =ifelse(length(penaltiesinit)==1,"Brent", "Nelder-Mead"),
  maxItropt1=10,maxItropt2=25,tracescore=TRUE,fixedseed =TRUE, pref=NULL, fixedpen=NULL,
  sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))
optLambdas_mgcvWrap(penaltiesinit=NULL, XXblocks,Y, pairing=NULL, model=NULL, reltol=1e-4,
  optmethod1= "SANN", optmethod2 =ifelse(length(penaltiesinit)==1,"Brent", "Nelder-Mead"),
  maxItropt1=10,maxItropt2=25,tracescore=TRUE,fixedseed =TRUE, pref=NULL, fixedpen=NULL,
  sigmasq = 1, opt.sigma=ifelse(model=="linear",TRUE, FALSE))

Arguments

`penaltiesinit`	Numeric vector. Initial values for penaltyparameters. May be obtained from `fastCV2`.
`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`reltol`	Scalar. Relative tolerance for optimization methods.
`optmethod1`	Character. First, global search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but simulated annealing by `"Sann"` is recommended to search a wide landscape. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`optmethod2`	Character. Second, local search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but `"Nelder-Mead"` is generally recommended. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`maxItropt1`	Integer. Maximum number of iterations for `optmethod1`.
`maxItropt2`	Integer. Maximum number of iterations for `optmethod2`.
`tracescore`	Boolean. Should the output of the scores be traced?
`fixedseed`	Boolean. Should the initialization be fixed? For reproducibility.
`pref`	Integer vector or `NULL`. Contains indices of data types in `XXblocks` that are preferential.
`fixedpen`	Integer vector or `NULL`. Contains indices of data types of which penalty is fixed to the corresponding value in `penaltiesinit`.
`sigmasq`	Default error variance.
`opt.sigma`	Boolean. Should the error variance be optimized as well? Only relevant for `model="linear"`.

Details

As opposed to optLambdas_mgcv this function first searches globally, then locally. Hence, more time-consuming, but better guarded against multiple local optima. See gam for details on how the marginal likelihood is computed.

Value

List, with components:

`res`	Outputs of all optimizers used
`lambdas`	List of penalties found by the optimizers
`optpen`	Numerical vector with final, optimal penalties

References

Wood, S. N. (2011), Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models, J. Roy. Statist. Soc., B 73(1), 3-36.

Find optimal ridge penalties with sequential optimization.

Description

Sequentially optimizes a cross-validated score w.r.t. ridge penalties for multiple data blocks. Also implements preferential ridge, which allows to first optimize for the preferential data types.

Usage

optLambdasWrap(penaltiesinit = NULL, XXblocks, Y, X1 = NULL, pairing = NULL,
  folds, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL,
  score = "loglik", model = NULL, epsIWLS = 0.001, maxItrIWLS = 25,
  traceCV = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 =
  ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10,
  maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)
optLambdasWrap(penaltiesinit = NULL, XXblocks, Y, X1 = NULL, pairing = NULL,
  folds, intercept = ifelse(is(Y, "Surv"), FALSE, TRUE), frac1 = NULL,
  score = "loglik", model = NULL, epsIWLS = 0.001, maxItrIWLS = 25,
  traceCV = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 =
  ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10,
  maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)

Arguments

`penaltiesinit`	Numeric vector. Initial values for penaltyparameters. May be obtained from `fastCV2`.
`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`X1`	Matrix. Dimension `n x p_0, p_0 < n`, representing unpenalized covariates
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.
`folds`	List, containing the splits of the samples. Usually obtained by `CVfolds`
`intercept`	Boolean. Should an intercept be included?
`frac1`	Scalar. Prior fraction of cases. Only relevant for `model=" logistic"`.
`score`	Character. See Details.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`epsIWLS`	Scalar. Numerical bound for IWLS convergence.
`maxItrIWLS`	Integer. Maximum number of iterations used in IWLS.
`traceCV`	Boolean. Should the output of the IWLS algorithm be traced?
`reltol`	Scalar. Relative tolerance for optimization methods.
`optmethod1`	Character. First, global search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but simulated annealing by `"Sann"` is recommended to search a wide landscape. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`optmethod2`	Character. Second, local search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but `"Nelder-Mead"` is generally recommended. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`maxItropt1`	Integer. Maximum number of iterations for `optmethod1`.
`maxItropt2`	Integer. Maximum number of iterations for `optmethod2`.
`save`	Boolean. If TRUE appends the penalties and resulting CVscore to global variable `allscores`
`parallel`	Boolean. Should computation be done in parallel? If `TRUE`, requires to run `setupParallel` first.
`pref`	Integer vector or `NULL`. Contains indices of data types in `XXblocks` that are preferential.
`fixedpen`	Integer vector or `NULL`. Contains indices of data types of which penalty is fixed to the corresponding value in `penaltiesinit`.

Details

As opposed to optLambdas this function first searches globally, then locally. Hence, more time-consuming, but better guarded against multiple local optima.

See Scoring for details on score. We highly recommend to use smooth scoring functions, in particular "loglik". For ranking-based criteria like "auc" and "cindex" we advise to use repeated CV (see CVfolds) to avoid ending up in any of the many local optima.

Value

List, with components:

`res`	Outputs of all optimizers used
`lambdas`	List of penalties found by the optimizers
`optpen`	Numerical vector with final, optimal penalties

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create (repeated) CV-splits of the data.
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)

# Optimizes cross-validate criterion (default: log-lik)
# Increase the number of iterations for optimal results
jointlambdas <- optLambdasWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp,
folds=leftout,score="loglik",save=T,maxItropt1=5, maxItropt2=5)
data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Create (repeated) CV-splits of the data.
leftout <- CVfolds(Y=resp,kfold=10,nrepeat=3,fixedfolds = TRUE)

# Optimizes cross-validate criterion (default: log-lik)
# Increase the number of iterations for optimal results
jointlambdas <- optLambdasWrap(penaltiesinit=lambdas, XXblocks=XXmirmeth,Y=resp,
folds=leftout,score="loglik",save=T,maxItropt1=5, maxItropt2=5)

Predictions from ridge fits

Description

Produces predictions from ridge fits for new data.

Usage

predictIWLS(IWLSfit, X1new = NULL, Sigmanew)
predictIWLS(IWLSfit, X1new = NULL, Sigmanew)

Arguments

`IWLSfit`	List, containing fits from either `IWLSridge` (linear, logistic ridge) or `IWLSCoxridge`
`X1new`	Matrix. Dimension `nnew x p_0`, representing unpenalized covariates for new data.
`Sigmanew`	Matrix. Dimensions `nnew x n`. Sample cross-product from penalized variables, usually computed by first applying `createXXblocks` and then `SigmaFromBlocks`.

Details

Predictions rely purely on the linear predictors, and do not require producing the parameter vector.

Value

Numerical vector of linear predictor for the test samples.

Examples

#Example below shows how to create the input argument Sigmanew (for simulated data)
#Simulate
Xbl1 <- matrix(rnorm(1000),nrow=10)
Xbl2 <- matrix(rnorm(2000),nrow=10)
Xbl1new <- matrix(rnorm(200),nrow=2)
Xbl2new <- matrix(rnorm(400),nrow=2)

#check whether dimensions are correct
nrow(Xbl1)==nrow(Xbl1new)
nrow(Xbl2)==nrow(Xbl2new)
ncol(Xbl1)==nrow(Xbl2)
ncol(Xbl1new)==ncol(Xbl2new)

#create cross-product
XXbl <- createXXblocks(list(Xbl1,Xbl2),list(Xbl1new,Xbl2new))

#suppose penalties for two data types equal 5,10, respectively
Sigmanew <- SigmaFromBlocks(XXbl,c(5,10))

#check dimensions (should be nnew x n)
dim(Sigmanew)
#Example below shows how to create the input argument Sigmanew (for simulated data)
#Simulate
Xbl1 <- matrix(rnorm(1000),nrow=10)
Xbl2 <- matrix(rnorm(2000),nrow=10)
Xbl1new <- matrix(rnorm(200),nrow=2)
Xbl2new <- matrix(rnorm(400),nrow=2)

#check whether dimensions are correct
nrow(Xbl1)==nrow(Xbl1new)
nrow(Xbl2)==nrow(Xbl2new)
ncol(Xbl1)==nrow(Xbl2)
ncol(Xbl1new)==ncol(Xbl2new)

#create cross-product
XXbl <- createXXblocks(list(Xbl1,Xbl2),list(Xbl1new,Xbl2new))

#suppose penalties for two data types equal 5,10, respectively
Sigmanew <- SigmaFromBlocks(XXbl,c(5,10))

#check dimensions (should be nnew x n)
dim(Sigmanew)

Evaluate predictions

Description

Evaluates predictions by a score suitable for the corresponding response

Usage

Scoring(lp, Y, model = NULL, score = ifelse(model == "linear", "mse", "loglik"),
  print = TRUE)
Scoring(lp, Y, model = NULL, score = ifelse(model == "linear", "mse", "loglik"),
  print = TRUE)

Arguments

`lp`	Numerical vector. Linear predictor.
`Y`	Response vector: numeric, binary, factor or `survival`.
`score`	Character. See Details.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`print`	Boolean. Should the score be printed on screen.

Details

Several scores are allowed, depending on the type of output. For model = "linear", score equals any of c("loglik","mse","abserror","cor","kendall","spearman"), denoting CV-ed log-likelihood, mean-squared error, mean absolute error, Pearson (Kendall, Spearman) correlation with response. For model = "logistic", score equals any of c("loglik","auc", "brier"), denoting CV-ed log-likelihood, area-under-the-ROC-curve, and brier score a.k.a. MSE. For model = "cox", score equals any of c("loglik","cindex"), denoting CV-ed log-likelihood, and c-index.

Value

Numerical value.

Setting up parallel computing

Description

This function sets up parallel computing by the package snowfall.

Usage

setupParallel(ncpus = 2, sourcefile = NULL, sourcelibraries =
c("multiridge","survival","pROC","risksetROC"))
setupParallel(ncpus = 2, sourcefile = NULL, sourcelibraries =
c("multiridge","survival","pROC","risksetROC"))

Arguments

`ncpus`	Integer. Number of cpus to use. Should be >= 2.
`sourcefile`	Character. Additional source files to be loaded in parallel. Only required when parallel computing is also desired for functions not available in `multiridge`.
`sourcelibraries`	Character vector. Libraries to be loaded in parallel. Defaults to the libraries multiridge depends on.

Details

Parallel computing is available for several functions that rely on cross-validation. If double CV is used, parallel computing is applied to the outer loop, to optimize efficiency.

Value

No return value, called for side effects

Examples

## Not run: 
setupParallel(ncpus=4)

## End(Not run)
## Not run: 
setupParallel(ncpus=4)

## End(Not run)

Create penalized sample cross-product matrix

Description

Creates penalized sample cross-product matrix, dimension nxn.

Usage

SigmaFromBlocks(XXblocks, penalties, pairing = NULL)
SigmaFromBlocks(XXblocks, penalties, pairing = NULL)

Arguments

`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`penalties`	Numeric vector, representing penaltyparameters.
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.

Value

Matrix of size nxn.

Examples

#Example
#Simulate
Xbl1 <- matrix(rnorm(1000),nrow=10)
Xbl2 <- matrix(rnorm(2000),nrow=10)

#check whether dimensions are correct
ncol(Xbl1)==nrow(Xbl2)

#create cross-product
XXbl <- createXXblocks(list(Xbl1,Xbl2))

#suppose penalties for two data types equal 5,10, respectively
Sigma <- SigmaFromBlocks(XXbl,c(5,10))

#check dimensions (should be n x n)
dim(Sigma)
#Example
#Simulate
Xbl1 <- matrix(rnorm(1000),nrow=10)
Xbl2 <- matrix(rnorm(2000),nrow=10)

#check whether dimensions are correct
ncol(Xbl1)==nrow(Xbl2)

#create cross-product
XXbl <- createXXblocks(list(Xbl1,Xbl2))

#suppose penalties for two data types equal 5,10, respectively
Sigma <- SigmaFromBlocks(XXbl,c(5,10))

#check dimensions (should be n x n)
dim(Sigma)

Package 'multiridge'

Help Index

Fast cross-validation for multi-penalty ridge regression

Description

Details

Author(s)

References

See Also

Examples

Augment data with zeros.

Description

Usage

Arguments

Details

Value

Examples

Coefficient estimates from (converged) IWLS fit

Description

Usage

Arguments

Details

Value

See Also

Examples

Create list of paired data blocks

Description

Usage

Arguments

Details

Value

See Also

Examples

Creates list of (unscaled) sample covariance matrices

Description

Usage

Arguments

Details

Value

See Also

Examples

Creates (repeated) cross-validation folds

Description

Usage

Arguments

Details

Value

See Also

Examples

Cross-validated score

Description

Usage

Arguments

Details

Value

See Also

Examples

Contains R-object dataXXmirmeth

Description

Usage

Format

Details

Source

References

See Also

Examples

Double cross-validation for estimating performance of multiridge

Description

Usage

Arguments

Details

Value

See Also

Examples

Fast cross-validation per data block

Description

Usage

Arguments

Details

Value

See Also

Contains R-object `dataXXmirmeth`

Double cross-validation for estimating performance of `multiridge`

Outer-loop cross-validation for estimating performance of marginal likelihood based `multiridge`