Title: | Correlation-Adjusted Regression Survival (CARS) Scores |
---|---|
Description: | Contains functions to estimate the Correlation-Adjusted Regression Survival (CARS) Scores. The method is described in Welchowski, T. and Zuber, V. and Schmid, M., (2018), Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection, <arXiv:1802.08178>. |
Authors: | Thomas Welchowski |
Maintainer: | Thomas Welchowski <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-10-21 06:39:55 UTC |
Source: | CRAN |
Contains functions to estimate the Correlation-Adjusted Regression Survival (CARS) Scores.
The main function is carSurvScore
, which estimates CARS scores of each variable.
The higher the absolute values of CARS scores, the higher the variable importance.
Additionally there is the function carVarSelect
to select cut-off thresholds
to separate variables associated with survival from noise variables. There are two possible
cut-off threshold options: False non-discovery rate q-values and empirical quantiles of the
raw scores.
Package: carSurv
Type: Package
Version: 1.0.0
Date: 2018-02-24
License: GPL-3
Thomas Welchowski (Maintainer) [email protected]
Welchowski, T. and Zuber, V. and Schmid, M., (2018), Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection, <arXiv:1802.08178>
Zuber, V. and Strimmer, K., (2011), High-Dimensional Regression and Variable Selection Using CAR Scores, Statistical Applications in Genetics and Molecular Biology
Schaefer, J. and Strimmer, K., (2005), A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics, Statistical Applications in Genetics and Molecular Biology
Van der Laan, M. J. and Robins, J. M., (2003), Unified Methods for Censored Longitudinal Data and Causality, Springer Series in Statistics Strimmer, K., (2008), A unified approach to false discovery rate estimation, BMC Bioinformatics
Estimates CARS scores. CARS scores measure the relative importance of each variable with respect to the survival times adjusted by IPC weighting.
carSurvScore(obsTime, obsEvent, X, maxIPCweight = 10, denom = "1/n")
carSurvScore(obsTime, obsEvent, X, maxIPCweight = 10, denom = "1/n")
obsTime |
Observed time points of a right censored survival process (numeric vector). |
obsEvent |
Observed event indicator of right censored survival process (numeric vector) 0=no event, 1=event |
X |
Data of design variables (numeric matrix). Must be already encoded. |
maxIPCweight |
Specifies the maximum possible weight, to ensure numerical stability. |
denom |
Specifies the denominator of the weighted sums. Two options are available: The default value "1/n" uses the sample size as denominator. Option "sum_w" uses the sum of all IPC weights in the denominator. |
CARS scores are defined as theta=P_X^(-1/2) P_(X, log(T)). The term P_X^(-1/2) is the inverse square root of the correlation matrix between covariates X. P_(X, log(T)) is the correlation vector between covariates and the logarithmic survival time adjusted for censoring by IPC weighting.
Estimated CAR survival score of each variable (numeric vector).
It is recommended to use default setting "denom=1/n" because in this case CARS scores are consistent. Furthermore the simulation results of "1/n" show lower root mean squared error of CARS scores with respect to the true parameter.
Thomas Welchowski
Welchowski, T. and Zuber, V. and Schmid, M., (2018), Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection, <arXiv:1802.08178>
Zuber, V. and Strimmer, K., (2011), High-Dimensional Regression and Variable Selection Using CAR Scores, Statistical Applications in Genetics and Molecular Biology
Schaefer, J. and Strimmer, K., (2005), A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics, Statistical Applications in Genetics and Molecular Biology
Van der Laan, M. J. and Robins, J. M., (2003), Unified Methods for Censored Longitudinal Data and Causality, Springer Series in Statistics
########################################## # Simulate accelerated, failure time model # Generate multivariate normal distributed covariates noObs <- 100 noCovar <- 10 library(mvtnorm) set.seed(190) X <- rmvnorm(noObs, mean=rep(0, noCovar), sigma=diag(noCovar)) # Generate gamma distributed survival times # Only the first 5 variables have an influence eta <- 1 - 2 * X[,1] - X[,2] + X[,3] + 0.5 * X[,4] + 1.5 * X[,5] # Function to generate survival times genSurv <- function(x) { set.seed(x) rgamma(1, shape=2, scale=exp(eta[x])) } # Generate survival times survT <- sapply(1:length(eta), genSurv) # Generate exponential distributed censoring times censT <- rexp(noObs, rate=1) # Calculate event indicator eventInd <- ifelse(survT <= censT, 1, 0) # Calculate observed times obsTime <- survT obsTime[survT > censT] <- censT [survT > censT] # Estimate CAR scores carScores <- carSurvScore(obsTime=obsTime, obsEvent=eventInd, X=X) carScores
########################################## # Simulate accelerated, failure time model # Generate multivariate normal distributed covariates noObs <- 100 noCovar <- 10 library(mvtnorm) set.seed(190) X <- rmvnorm(noObs, mean=rep(0, noCovar), sigma=diag(noCovar)) # Generate gamma distributed survival times # Only the first 5 variables have an influence eta <- 1 - 2 * X[,1] - X[,2] + X[,3] + 0.5 * X[,4] + 1.5 * X[,5] # Function to generate survival times genSurv <- function(x) { set.seed(x) rgamma(1, shape=2, scale=exp(eta[x])) } # Generate survival times survT <- sapply(1:length(eta), genSurv) # Generate exponential distributed censoring times censT <- rexp(noObs, rate=1) # Calculate event indicator eventInd <- ifelse(survT <= censT, 1, 0) # Calculate observed times obsTime <- survT obsTime[survT > censT] <- censT [survT > censT] # Estimate CAR scores carScores <- carSurvScore(obsTime=obsTime, obsEvent=eventInd, X=X) carScores
Computes the CARS scores and selects significant variables. If the false non discovery rate (fndr) approach is used, significant and null variables are distinguished by an a priori defined q-value.
carVarSelect(carSurvScores, method = "fndr", plotDiag = FALSE, threshold = 0.05)
carVarSelect(carSurvScores, method = "fndr", plotDiag = FALSE, threshold = 0.05)
carSurvScores |
Estimated CAR survival scores of each variable (numeric vector).
See function |
method |
Gives the variable selection procedure. Default is "fndr", which is based on the false non-discovery-rate. The other option is "threshold", which selects only variables above a given empirical quantile. |
plotDiag |
Should diagnostic plots of the null distribution be plotted? Default is FALSE (logical scalar). |
threshold |
If method="threshold", then this specifies the quantile threshold of the CAR survival scores. Every score above this threshold is then a significant variable. |
Index giving the significant variables of the original data (integer vector).
The quality of estimated, significant variables depends on the sample size and on the number of variables.
Thomas Welchowski
Strimmer, K., (2008), A unified approach to false discovery rate estimation, BMC Bioinformatics
########################################## # Simulate accelerated, failure time model # Generate multivariate normal distributed covariates noObs <- 100 noCovar <- 250 library(mvtnorm) set.seed(7903) X <- rmvnorm(noObs, mean=rep(0, noCovar), sigma=diag(noCovar)) # Generate gamma distributed survival times # Only the first 5 variables have an influence eta <- 1 - 2 * X[,1] - X[,2] + X[,3] + 0.5 * X[,4] + 1.5 * X[,5] # Function to generate survival times genSurv <- function(x) { set.seed(x) rgamma(1, shape=2, scale=exp(eta[x])) } # Generate survival times survT <- sapply(1:length(eta), genSurv) # Generate exponential distributed censoring times censT <- rexp(noObs, rate=1) # Calculate event indicator eventInd <- ifelse(survT <= censT, 1, 0) # Calculate observed times obsTime <- survT obsTime[survT > censT] <- censT [survT > censT] # Conduct variable selection using fndr carScores <- carSurvScore(obsTime=obsTime, obsEvent=eventInd, X=X) selectedVar <- carVarSelect(carSurvScores=carScores) selectedVar # Check true positive and true negative rate TPR <- mean(c(1:5) %in% selectedVar) TNR <- mean(c(6:250) %in% setdiff(6:250, selectedVar)) perf <- TPR + TNR -1 perf
########################################## # Simulate accelerated, failure time model # Generate multivariate normal distributed covariates noObs <- 100 noCovar <- 250 library(mvtnorm) set.seed(7903) X <- rmvnorm(noObs, mean=rep(0, noCovar), sigma=diag(noCovar)) # Generate gamma distributed survival times # Only the first 5 variables have an influence eta <- 1 - 2 * X[,1] - X[,2] + X[,3] + 0.5 * X[,4] + 1.5 * X[,5] # Function to generate survival times genSurv <- function(x) { set.seed(x) rgamma(1, shape=2, scale=exp(eta[x])) } # Generate survival times survT <- sapply(1:length(eta), genSurv) # Generate exponential distributed censoring times censT <- rexp(noObs, rate=1) # Calculate event indicator eventInd <- ifelse(survT <= censT, 1, 0) # Calculate observed times obsTime <- survT obsTime[survT > censT] <- censT [survT > censT] # Conduct variable selection using fndr carScores <- carSurvScore(obsTime=obsTime, obsEvent=eventInd, X=X) selectedVar <- carVarSelect(carSurvScores=carScores) selectedVar # Check true positive and true negative rate TPR <- mean(c(1:5) %in% selectedVar) TNR <- mean(c(6:250) %in% setdiff(6:250, selectedVar)) perf <- TPR + TNR -1 perf
Efficient C implementation of the sample covariance estimator. The denominator is defined as the sum of all weights.
weightedCovarRcpp(x, y, w)
weightedCovarRcpp(x, y, w)
x |
Covariate without weighting (numeric vector). |
y |
Response. The mean of the response contains weights (numeric vector). |
w |
Weights for averaging (numeric vector). |
Weighted variance (numeric scalar).
If all weights are set to 1, the denominator is identical to n. There are no safety checks of input arguments.
Thomas Welchowski
# Simulate two random vectors set.seed(3975) x <- rnorm(100) set.seed(-3975) y <- rnorm(100) # Calculate variance with standard R function # Rescaling ensures that both calculations use same denominator "n" covarEst <- cov(x, y) * (100-1) / 100 # Calculate weighted variance with equal weights equalW <- rep(1, 100) weightCovarEst <- weightedCovarRcpp(x=x, y=y, w=equalW) # Output comparison all.equal(covarEst, weightCovarEst) # Runtime comparison library(microbenchmark) microbenchmark(Default=cov(x, y), New=weightedCovarRcpp(x=x, y=y, w=equalW), times=25) # -> New method is multiple times faster
# Simulate two random vectors set.seed(3975) x <- rnorm(100) set.seed(-3975) y <- rnorm(100) # Calculate variance with standard R function # Rescaling ensures that both calculations use same denominator "n" covarEst <- cov(x, y) * (100-1) / 100 # Calculate weighted variance with equal weights equalW <- rep(1, 100) weightCovarEst <- weightedCovarRcpp(x=x, y=y, w=equalW) # Output comparison all.equal(covarEst, weightCovarEst) # Runtime comparison library(microbenchmark) microbenchmark(Default=cov(x, y), New=weightedCovarRcpp(x=x, y=y, w=equalW), times=25) # -> New method is multiple times faster
Efficient C implementation of the sample covariance estimator. The denominator is defined as the sample size.
weightedCovarRcppN(x, y, w)
weightedCovarRcppN(x, y, w)
x |
Covariate without weighting (numeric vector). |
y |
Response. The mean of the response contains weights (numeric vector). |
w |
Weights for averaging (numeric vector). |
Weighted variance (numeric scalar).
There are no safety checks of input arguments.
Thomas Welchowski
# Simulate two random vectors set.seed(3975) x <- rnorm(100) set.seed(-3975) y <- rnorm(100) # Calculate variance with standard R function # Rescaling ensures that both calculations use same denominator "n" covarEst <- cov(x, y) * (100-1) / 100 # Calculate weighted variance with equal weights equalW <- rep(1, 100) weightCovarEst <- weightedCovarRcppN(x=x, y=y, w=equalW) # Output comparison all.equal(covarEst, weightCovarEst) # Runtime comparison library(microbenchmark) microbenchmark(Default=cov(x, y), New=weightedCovarRcpp(x=x, y=y, w=equalW), times=25) # -> New method is multiple times faster
# Simulate two random vectors set.seed(3975) x <- rnorm(100) set.seed(-3975) y <- rnorm(100) # Calculate variance with standard R function # Rescaling ensures that both calculations use same denominator "n" covarEst <- cov(x, y) * (100-1) / 100 # Calculate weighted variance with equal weights equalW <- rep(1, 100) weightCovarEst <- weightedCovarRcppN(x=x, y=y, w=equalW) # Output comparison all.equal(covarEst, weightCovarEst) # Runtime comparison library(microbenchmark) microbenchmark(Default=cov(x, y), New=weightedCovarRcpp(x=x, y=y, w=equalW), times=25) # -> New method is multiple times faster
Efficient C implementation of the sample variance estimator. The denominator is defined as sum of all weights.
weightedVarRcpp(y, w)
weightedVarRcpp(y, w)
y |
Response. The mean of the response contains weights (numeric vector). |
w |
Weights for averaging (numeric vector). |
Weighted variance (numeric scalar).
If all weights are set to 1, the denominator is identical to n. There are no safety checks of input arguments.
Thomas Welchowski
# Simulate a random vector set.seed(3975) x <- rnorm(100) # Calculate variance with standard implementation # Rescaling ensures that both calculations use same denominator "n" varEst <- var(x) * (100-1) / 100 # Calculate weighted variance with equal weights equalW <- rep(1/100, 100) weightVarEst <- weightedVarRcpp(y=x, w=equalW) # Output comparison all.equal(varEst, weightVarEst) # Runtime comparison library(microbenchmark) microbenchmark(Default=var(x), New=weightedVarRcppN(y=x, w=equalW), times=25) # -> New method is multiple times faster
# Simulate a random vector set.seed(3975) x <- rnorm(100) # Calculate variance with standard implementation # Rescaling ensures that both calculations use same denominator "n" varEst <- var(x) * (100-1) / 100 # Calculate weighted variance with equal weights equalW <- rep(1/100, 100) weightVarEst <- weightedVarRcpp(y=x, w=equalW) # Output comparison all.equal(varEst, weightVarEst) # Runtime comparison library(microbenchmark) microbenchmark(Default=var(x), New=weightedVarRcppN(y=x, w=equalW), times=25) # -> New method is multiple times faster
Efficient C implementation of the sample variance estimator. The denominator is defined as the sample size.
weightedVarRcppN(y, w)
weightedVarRcppN(y, w)
y |
Response. The mean of the response contains weights (numeric vector). |
w |
Weights for averaging (numeric vector). |
Weighted variance (numeric scalar).
There are no safety checks of input arguments.
Thomas Welchowski
# Simulate a random vector set.seed(3975) x <- rnorm(100) # Calculate variance with standard implementation varEst <- var(x) * (100-1) / 100 # Calculate weighted variance with equal weights equalW <- rep(1, 100) weightVarEst <- weightedVarRcppN(y=x, w=equalW) # Output comparison all.equal(varEst, weightVarEst) # Runtime comparison library(microbenchmark) microbenchmark(Default=var(x), New=weightedVarRcppN(y=x, w=equalW), times=25) # -> New method is multiple times faster
# Simulate a random vector set.seed(3975) x <- rnorm(100) # Calculate variance with standard implementation varEst <- var(x) * (100-1) / 100 # Calculate weighted variance with equal weights equalW <- rep(1, 100) weightVarEst <- weightedVarRcppN(y=x, w=equalW) # Output comparison all.equal(varEst, weightVarEst) # Runtime comparison library(microbenchmark) microbenchmark(Default=var(x), New=weightedVarRcppN(y=x, w=equalW), times=25) # -> New method is multiple times faster