Package 'compound.Cox' reference manual

Title:	Univariate Feature Selection and Compound Covariate for Predicting Survival
Description:	Univariate feature selection and compound covariate methods under the Cox model with high-dimensional features (e.g., gene expressions). Available are survival data for non-small-cell lung cancer patients with gene expressions (Chen et al 2007 New Engl J Med) <DOI:10.1056/NEJMoa060096>, statistical methods in Emura et al (2012 PLoS ONE) <DOI:10.1371/journal.pone.0047627>, Emura & Chen (2016 Stat Methods Med Res) <DOI:10.1177/0962280214533378>, and Emura et al (2019)<DOI:10.1016/j.cmpb.2018.10.020>. Algorithms for generating correlated gene expressions are also available. Estimation of survival functions via copula-graphic (CG) estimators is also implemented, which is useful for sensitivity analyses under dependent censoring (Yeh et al 2023) <DOI:10.3390/biomedicines11030797>.
Authors:	Takeshi Emura, Hsuan-Yu Chen, Shigeyuki Matsui, Yi-Hau Chen
Maintainer:	Takeshi Emura <[email protected]>
License:	GPL-2
Version:	3.30
Built:	2024-08-28 06:30:13 UTC
Source:	CRAN

Univariate Feature Selection and Compound Covariate for Predicting Survival

Description

Univariate feature selection and compound covariate methods under the Cox model with high-dimensional features (e.g., gene expressions). Available are survival data for non-small-cell lung cancer patients with gene expressions (Chen et al 2007 New Engl J Med), statistical methods in Emura et al (2012 PLoS ONE), Emura & Chen (2016 Stat Methods Med Res), and Emura et al. (2019 Comput Methods Programs Biomed). Algorithms for generating correlated gene expressions are also available. Estimation of survival functions via copula-graphic (CG) estimators is also implemented, which is useful for sensitivity analyses under dependent censoring (Yeh et al 2023 Biomedicines).

Details

Package:	compound.Cox
Type:	Package
Version:	3.30
Date:	2023-7-9
License:	GPL-2

Author(s)

Takeshi Emura, Hsuan-Yu Chen, Shigeyuki Matsui, Yi-Hau Chen; Maintainer: Takeshi Emura <[email protected]>

References

Chen HY, Yu SL, Chen CH, et al (2007). A Five-gene Signature and Clinical Outcome in Non-small-cell Lung Cancer, N Engl J Med 356: 11-20.

Emura T, Chen YH, Chen HY (2012). Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37

Matsui S (2006). Predicting Survival Outcomes Using Subsets of Significant Genes in Prognostic Marker Studies with Microarrays. BMC Bioinformatics: 7:156.

Yeh CT, Liao GY, Emura T (2023). Sensitivity analysis for survival prognostic prediction with gene selection: a copula method for dependent censoring, Biomedicines 11(3):797.

Copula-graphic estimator under the Clayton copula.

Description

This function computes the copula-graphic (CG) estimator (Rivest & Wells 2001) of a survival function under the Clayton copula.

Usage

CG.Clayton(t.vec, d.vec, alpha, S.plot = TRUE, S.col = "black")
CG.Clayton(t.vec, d.vec, alpha, S.plot = TRUE, S.col = "black")

Arguments

`t.vec`	Vector of survival times (time to either death or censoring)
`d.vec`	Vector of censoring indicators, 1=death, 0=censoring
`alpha`	Association parameter that is related to Kendall's tau through "tau= alpha/(alpha+2)"
`S.plot`	If TRUE, the survival curve is displayed
`S.col`	Color of the survival curve in the plot

Details

The CG estimator is a variant of the Kaplan-Meier estimator for a survival function. The CG estimator relaxes the independent censoring assumption of the KM estimator through a copula-based dependent censoring model. The computational formula of the CG estimator is given in Appendix D of Emura et al. (2019) or Section 3.2 of Yeh et al.(2023). The output shows the survival probabilities at given time points of "t.vec". The input requires to specify an association parameter "alpha" of the Clayton copula (alpha>0), where alpha=0 corresponds to the independence copula. Emura and Chen (2016, 2018) and Yeh et al.(2023) applied the CG estimator to assess survival prognosis for lung cancer patients.

Value

`tau`	Kendall's tau (=alpha/(alpha+2))
`time`	sort(t.vec)
`n.risk`	the number of patients at-risk
`surv`	survival probability at "time"

Author(s)

Takeshi Emura

References

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57.

Emura T, Chen YH (2018). Analysis of Survival Data with Dependent Censoring, Copula-Based Approaches, JSS Research Series in Statistics, Springer, Singapore.

Rivest LP, Wells MT (2001). A Martingale Approach to the Copula-graphic Estimator for the Survival Function under Dependent Censoring, J Multivar Anal; 79: 138-55.

Yeh CT, Liao GY, Emura T (2023). Sensitivity analysis for survival prognostic prediction with gene selection: a copula method for dependent censoring, Biomedicines 11(3):797.

Examples

## Example 1 (a toy example of n=8) ##
t.vec=c(1,3,5,4,7,8,10,13)
d.vec=c(1,0,0,1,1,0,1,0)
CG.Clayton(t.vec,d.vec,alpha=18,S.col="blue")
### CG.Clayton gives identical results with the Kaplan-Meier estimator with alpha=0 ###
CG.Clayton(t.vec,d.vec,alpha=0,S.plot=FALSE)$surv
survfit(Surv(t.vec,d.vec)~1)$surv

## Example 2 (Analysis of the lung cancer data) ##
data(Lung) # read the data
t.vec=Lung[,"t.vec"]
d.vec=Lung[,"d.vec"]
x.vec=Lung[,"MMP16"] # the gene associated with survival (Emura and Chen 2016, 2018) #
Poor=x.vec>median(x.vec) ## Indicator of poor survival
Good=x.vec<=median(x.vec) ## Indicator of good survival

par(mfrow=c(1,2))
###### Predicted survival curves via the CG estimator #####
t.good=t.vec[Good]
d.good=d.vec[Good]
CG.Clayton(t.good,d.good,alpha=18,S.plot=TRUE,S.col="blue")

t.poor=t.vec[Poor]
d.poor=d.vec[Poor]
CG.Clayton(t.poor,d.poor,alpha=18,S.plot=TRUE,S.col="red")
## Example 1 (a toy example of n=8) ##
t.vec=c(1,3,5,4,7,8,10,13)
d.vec=c(1,0,0,1,1,0,1,0)
CG.Clayton(t.vec,d.vec,alpha=18,S.col="blue")
### CG.Clayton gives identical results with the Kaplan-Meier estimator with alpha=0 ###
CG.Clayton(t.vec,d.vec,alpha=0,S.plot=FALSE)$surv
survfit(Surv(t.vec,d.vec)~1)$surv

## Example 2 (Analysis of the lung cancer data) ##
data(Lung) # read the data
t.vec=Lung[,"t.vec"]
d.vec=Lung[,"d.vec"]
x.vec=Lung[,"MMP16"] # the gene associated with survival (Emura and Chen 2016, 2018) #
Poor=x.vec>median(x.vec) ## Indicator of poor survival
Good=x.vec<=median(x.vec) ## Indicator of good survival

par(mfrow=c(1,2))
###### Predicted survival curves via the CG estimator #####
t.good=t.vec[Good]
d.good=d.vec[Good]
CG.Clayton(t.good,d.good,alpha=18,S.plot=TRUE,S.col="blue")

t.poor=t.vec[Poor]
d.poor=d.vec[Poor]
CG.Clayton(t.poor,d.poor,alpha=18,S.plot=TRUE,S.col="red")

Copula-graphic estimator under the Frank copula.

Description

This function computes the copula-graphic (CG) estimator (Rivest & Wells 2001) under the Frank copula.

Usage

CG.Frank(t.vec, d.vec, alpha, S.plot = TRUE, S.col = "black")
CG.Frank(t.vec, d.vec, alpha, S.plot = TRUE, S.col = "black")

Arguments

`t.vec`	Vector of survival times (time to either death or censoring)
`d.vec`	Vector of censoring indicators, 1=death, 0=censoring
`alpha`	Association parameter that is related to Kendall's tau (see P.32, Table 3.1 of Emura and Chen (2018))
`S.plot`	If TRUE, the survival curve is displayed
`S.col`	Color of the survival curve in the plot

Details

The CG estimator is a variant of the Kaplan-Meier estimator for a survival function. The CG estimator relaxes the independent censoring assumption of the KM estimator through a copula-based dependent censoring model. The output shows the survival probabilities at given time points of "t.vec". The input requires to specify an association parameter "alpha" of the Frank copula, where alpha=0 corresponds to the independence copula. Emura and Chen (2016, 2018) and Yeh et al.(2023) applied the CG estimator to assess survival prognosis for lung cancer patients.

Value

`tau`	Kendall's tau (=alpha/(alpha+1))
`time`	sort(t.vec)
`n.risk`	the number of patients at-risk
`surv`	survival probability at "time"

Author(s)

Takeshi Emura

References

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57.

Emura T, Chen YH (2018). Analysis of Survival Data with Dependent Censoring, Copula-Based Approaches, JSS Research Series in Statistics, Springer, Singapore.

Rivest LP, Wells MT (2001). A Martingale Approach to the Copula-graphic Estimator for the Survival Function under Dependent Censoring, J Multivar Anal; 79: 138-55.

Yeh CT, Liao GY, Emura T (2023). Sensitivity analysis for survival prognostic prediction with gene selection: a copula method for dependent censoring, Biomedicines 11(3):797.

Examples

## Example 1 (a toy example of n=8) ##
t.vec=c(1,3,5,4,7,8,10,13)
d.vec=c(1,0,0,1,1,0,1,0)
CG.Frank(t.vec,d.vec,alpha=9,S.col="blue")
### CG.Frank gives identical results with the Kaplan-Meier estimator with alpha=0 ###
CG.Frank(t.vec,d.vec,alpha=0,S.plot=FALSE)$surv
survfit(Surv(t.vec,d.vec)~1)$surv

## Example 2 (Analysis of the lung cancer data) ##
data(Lung) # read the data
t.vec=Lung[,"t.vec"]
d.vec=Lung[,"d.vec"]
x.vec=Lung[,"MMP16"] # the gene associated with survival (Emura and Chen 2016, 2018) #
Poor=x.vec>median(x.vec) ## Indicator of poor survival
Good=x.vec<=median(x.vec) ## Indicator of good survival

par(mfrow=c(1,2))
###### Predicted survival curves via the CG estimator #####
t.good=t.vec[Good]
d.good=d.vec[Good]
CG.Frank(t.good,d.good,alpha=6,S.plot=TRUE,S.col="blue")

t.poor=t.vec[Poor]
d.poor=d.vec[Poor]
CG.Frank(t.poor,d.poor,alpha=6,S.plot=TRUE,S.col="red")
## Example 1 (a toy example of n=8) ##
t.vec=c(1,3,5,4,7,8,10,13)
d.vec=c(1,0,0,1,1,0,1,0)
CG.Frank(t.vec,d.vec,alpha=9,S.col="blue")
### CG.Frank gives identical results with the Kaplan-Meier estimator with alpha=0 ###
CG.Frank(t.vec,d.vec,alpha=0,S.plot=FALSE)$surv
survfit(Surv(t.vec,d.vec)~1)$surv

## Example 2 (Analysis of the lung cancer data) ##
data(Lung) # read the data
t.vec=Lung[,"t.vec"]
d.vec=Lung[,"d.vec"]
x.vec=Lung[,"MMP16"] # the gene associated with survival (Emura and Chen 2016, 2018) #
Poor=x.vec>median(x.vec) ## Indicator of poor survival
Good=x.vec<=median(x.vec) ## Indicator of good survival

par(mfrow=c(1,2))
###### Predicted survival curves via the CG estimator #####
t.good=t.vec[Good]
d.good=d.vec[Good]
CG.Frank(t.good,d.good,alpha=6,S.plot=TRUE,S.col="blue")

t.poor=t.vec[Poor]
d.poor=d.vec[Poor]
CG.Frank(t.poor,d.poor,alpha=6,S.plot=TRUE,S.col="red")

Copula-graphic estimator under the Gumbel copula.

Description

This function computes the copula-graphic (CG) estimator (Rivest & Wells 2001) under the Gumbel copula.

Usage

CG.Gumbel(t.vec, d.vec, alpha, S.plot = TRUE, S.col = "black")
CG.Gumbel(t.vec, d.vec, alpha, S.plot = TRUE, S.col = "black")

Arguments

`t.vec`	Vector of survival times (time to either death or censoring)
`d.vec`	Vector of censoring indicators, 1=death, 0=censoring
`alpha`	Association parameter that is related to Kendall's tau through "tau= alpha/(alpha+1)"
`S.plot`	If TRUE, the survival curve is displayed
`S.col`	Color of the survival curve in the plot

Details

The CG estimator is a variant of the Kaplan-Meier estimator for a survival function. The CG estimator relaxes the independent censoring assumption of the KM estimator through a copula-based dependent censoring model. The computational formula of the CG estimator is given in Appendix D of Emura et al. (2019) or Section 3.2 of Yeh et al.(2023). The output shows the survival probabilities at given time points of "t.vec". The input requires to specify an association parameter "alpha" of the Gumbel copula (alpha>=0), where alpha=0 corresponds to the independence copula. Emura and Chen (2016, 2018) and Yeh et al.(2023) applied the CG estimator to assess survival prognosis for lung cancer patients.

Value

`tau`	Kendall's tau (=alpha/(alpha+1))
`time`	sort(t.vec)
`n.risk`	the number of patients at-risk
`surv`	survival probability at "time"

Author(s)

Takeshi Emura

References

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57.

Emura T, Chen YH (2018). Analysis of Survival Data with Dependent Censoring, Copula-Based Approaches, JSS Research Series in Statistics, Springer, Singapore.

Rivest LP, Wells MT (2001). A Martingale Approach to the Copula-graphic Estimator for the Survival Function under Dependent Censoring, J Multivar Anal; 79: 138-55.

Yeh CT, Liao GY, Emura T (2023). Sensitivity analysis for survival prognostic prediction with gene selection: a copula method for dependent censoring, Biomedicines 11(3):797.

Examples

## Example 1 (a toy example of n=8) ##
t.vec=c(1,3,5,4,7,8,10,13)
d.vec=c(1,0,0,1,1,0,1,0)
CG.Gumbel(t.vec,d.vec,alpha=9,S.col="blue")
### CG.Gumbel gives identical results with the Kaplan-Meier estimator with alpha=0 ### 
CG.Gumbel(t.vec,d.vec,alpha=0,S.plot=FALSE)$surv
survfit(Surv(t.vec,d.vec)~1)$surv

## Example 2 (Analysis of the lung cancer data) ##
data(Lung) # read the data
t.vec=Lung[,"t.vec"]
d.vec=Lung[,"d.vec"]
x.vec=Lung[,"MMP16"] # the gene associated with survival (Emura and Chen 2016, 2018) #
Poor=x.vec>median(x.vec) ## Indicator of poor survival
Good=x.vec<=median(x.vec) ## Indicator of good survival

par(mfrow=c(1,2))
###### Predicted survival curves via the CG estimator #####
t.good=t.vec[Good]
d.good=d.vec[Good]
CG.Gumbel(t.good,d.good,alpha=9,S.plot=TRUE,S.col="blue")

t.poor=t.vec[Poor]
d.poor=d.vec[Poor]
CG.Gumbel(t.poor,d.poor,alpha=9,S.plot=TRUE,S.col="red")
## Example 1 (a toy example of n=8) ##
t.vec=c(1,3,5,4,7,8,10,13)
d.vec=c(1,0,0,1,1,0,1,0)
CG.Gumbel(t.vec,d.vec,alpha=9,S.col="blue")
### CG.Gumbel gives identical results with the Kaplan-Meier estimator with alpha=0 ### 
CG.Gumbel(t.vec,d.vec,alpha=0,S.plot=FALSE)$surv
survfit(Surv(t.vec,d.vec)~1)$surv

## Example 2 (Analysis of the lung cancer data) ##
data(Lung) # read the data
t.vec=Lung[,"t.vec"]
d.vec=Lung[,"d.vec"]
x.vec=Lung[,"MMP16"] # the gene associated with survival (Emura and Chen 2016, 2018) #
Poor=x.vec>median(x.vec) ## Indicator of poor survival
Good=x.vec<=median(x.vec) ## Indicator of good survival

par(mfrow=c(1,2))
###### Predicted survival curves via the CG estimator #####
t.good=t.vec[Good]
d.good=d.vec[Good]
CG.Gumbel(t.good,d.good,alpha=9,S.plot=TRUE,S.col="blue")

t.poor=t.vec[Poor]
d.poor=d.vec[Poor]
CG.Gumbel(t.poor,d.poor,alpha=9,S.plot=TRUE,S.col="red")

Testing survival difference of two groups via the CG estimators

Description

Testing survival difference of two prognostic groups separated by a prognostic index (PI). Survival probabilities are computed by the CG estimators (Yeh, et al. 2023).

Usage

CG.test(t.vec,d.vec,PI,cutoff=median(PI),alpha=2,
copula=CG.Clayton,S.plot=TRUE,N=10000,mark.time=TRUE)
CG.test(t.vec,d.vec,PI,cutoff=median(PI),alpha=2,
copula=CG.Clayton,S.plot=TRUE,N=10000,mark.time=TRUE)

Arguments

`t.vec`	Vector of survival times (time to either death or censoring)
`d.vec`	Vector of censoring indicators, 1=death, 0=censoring
`PI`	Vector of real numbers (the values of a prognostic index)
`cutoff`	A number determining the cut-off value of a prognostic index
`alpha`	Copula parameter
`copula`	Copula function: "CG.Clayton","CG.Gumbel" or "CG.Frank"
`S.plot`	If TRUE, the survival curve is displayed
`N`	The number of permutations
`mark.time`	If TRUE, then curves are marked at each censoring time

Details

Two-sample comparison based on estimated survival functions via copula-graphic estimators under dependent censoring. The D statistic (the mean vertical difference betewen two estimated survival functions) is used for testing the null hypothesis of no difference in survival. See Yeh et al.(2023) for details.

Value

`test`	Testing the difference of two survival functions
`Good`	Good prognostic group defined by PI<=c
`Poor`	Poor prognostic group defined by PI>c

Author(s)

Takeshi Emura, Pauline Baur

References

Emura T, Chen YH (2018). Analysis of Survival Data with Dependent Censoring, Copula-Based Approaches, JSS Research Series in Statistics, Springer, Singapore.

Rivest LP, Wells MT (2001). A Martingale Approach to the Copula-graphic Estimator for the Survival Function under Dependent Censoring, J Multivar Anal; 79: 138-55.

Yeh CT, Liao GY, Emura T (2023). Sensitivity analysis for survival prognostic prediction with gene selection: a copula method for dependent censoring, Biomedicines 11(3):797.

Examples

t.vec=c(1,3,5,4,7,8,10,13)
d.vec=c(1,0,0,1,1,0,1,0)
PI=c(8,7,6,5,4,3,2,1)

CG.test(t.vec,d.vec,PI,copula=CG.Clayton,alpha=18,N=100)
CG.test(t.vec,d.vec,PI,copula=CG.Gumbel,alpha=2,N=100)
t.vec=c(1,3,5,4,7,8,10,13)
d.vec=c(1,0,0,1,1,0,1,0)
PI=c(8,7,6,5,4,3,2,1)

CG.test(t.vec,d.vec,PI,copula=CG.Clayton,alpha=18,N=100)
CG.test(t.vec,d.vec,PI,copula=CG.Gumbel,alpha=2,N=100)

Cross-validated c-index for measuring the predictive accuracy of a prognostic index under a copula-based dependent censoring model.

Description

This function calculates the cross-validated c-index (concordance index) for measuring the predictive accuracy of a prognostic index under a copula-based dependent censoring model. Here the prognostic index is calculated as a compound covariate predictor based on the univariate Cox regression estimates. The expression and details are given in Section 3.2 of Emura and Chen (2016). The association between survival time and censoring time is modeled via the Clayton copula.

Usage

cindex.CV(t.vec, d.vec, X.mat, alpha, K = 5)
cindex.CV(t.vec, d.vec, X.mat, alpha, K = 5)

Arguments

`t.vec`	Vector of survival times (time to death or time to censoring, whichever comes first)
`d.vec`	Vector of censoring indicators, 1=death, 0=censoring
`X.mat`	n by p matrix of covariates, where n is the sample size and p is the number of covariates
`alpha`	Association parameter of the Clayton copula; Kendall's tau = alpha/(alpha+2)
`K`	The number of cross-validation folds (K=5 is the defailt)

Details

Currently, only the Clayton copula is implemented for modeling association between survival time and censoring time. The Clayton model yields positive association between survival time and censoring time with the Kendall's tau being equal to alpha/(alpha+2), where alpha > 0. The independent copula corresponds to alpha = 0.

If the number of covariates p is large (e.g., p>=100), the computational time becomes very long. Pre-filtering for covariates is recommended to reduce p.

Value

concordant

Cross-validated c-index

Author(s)

Takeshi Emura

References

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57.

Examples


n=25 ### sample size ###
p=3  ### the number of covariates ###
set.seed(1)
T=rexp(n) ### survival time
U=rexp(n) ### censoring time
t.vec=pmin(T,U) ### minimum of survival time and censoring time
d.vec=as.numeric( c(T<=U) ) ### censoring indicator
X.mat=matrix(runif(n*p),n,p) ### covariates matrix

cindex.CV(t.vec,d.vec,X.mat,alpha=2) ### alpha=2 corresponds to Kendall's tau=0.5

n=25 ### sample size ###
p=3  ### the number of covariates ###
set.seed(1)
T=rexp(n) ### survival time
U=rexp(n) ### censoring time
t.vec=pmin(T,U) ### minimum of survival time and censoring time
d.vec=as.numeric( c(T<=U) ) ### censoring indicator
X.mat=matrix(runif(n*p),n,p) ### covariates matrix

cindex.CV(t.vec,d.vec,X.mat,alpha=2) ### alpha=2 corresponds to Kendall's tau=0.5

Compound shrinkage estimation under the Cox model

Description

This function implements the "compound shrinkage estimator" to calculate the regression coefficients of the Cox model, which was proposed by Emura, Chen & Chen (2012). The method is a variant of the Cox partial likelihood estimator such that the regression coefficients are mixed with the univariate Cox regression estimators. The resultant estimator is applicable even when the number of covariates is greater than the number of samples (the p>n setting). The standard errors (SEs) are calculated based on the asymptotic theory (see Emura et al., 2012).

Usage

compound.reg(t.vec, d.vec, X.mat, K = 5, delta_a = 0.025, a_0 = 0, var = FALSE,
plot=TRUE, randomize = FALSE, var.detail = FALSE)
compound.reg(t.vec, d.vec, X.mat, K = 5, delta_a = 0.025, a_0 = 0, var = FALSE,
plot=TRUE, randomize = FALSE, var.detail = FALSE)

Arguments

`t.vec`	Vector of survival times (time to either death or censoring)
`d.vec`	Vector of censoring indicators, 1=death, 0=censoring
`X.mat`	n by p matrix of covariates, where n is the sample size and p is the number of covariates
`K`	The number of cross validation folds, K=n corresponds to a leave-one-out cross validation (default=5)
`delta_a`	The step size for a grid search for the maximum of the cross-validated likelihood (default=0.025)
`a_0`	The starting value of a grid search for the maximum of the cross-validated likelihood (default=0)
`var`	If TRUE, the standard deviations and confidence intervals are given (default=FALSE, to reduce the computational cost)
`plot`	If TRUE, the cross validated likelihood curve and its maximized point are drawn
`randomize`	If TRUE, randomize the subject ID's so that the subjects in the cross validation folds are randomly chosen. Otherwise, the cross validation folds are constructed in the ascending sequence
`var.detail`	Detailed information about the covariance matrix, which is mainly used for theoretical purposes. Please consult Takeshi Emura for more details (default=FALSE)

Details

K=5 cross validation is recommended for computational efficiency, though the results appear to be robust against the choice of the number K. If the number of covariates is greater than 200, the computational time becomes very long. In such a case, the univariate pre-selection is recommended to reduce the number of covariates.

Value

`a`	An optimized value of the shrinkage parameter (0<=a<=1)
`beta`	Estimated regression coefficients
`SE`	Standard errors for estimated regression coefficients
`Lower95CI`	Lower ends of 95 percent confidence intervals (beta_hat-1.96*SE)
`Upper95CI`	Upper ends of 95 percent confidence intervals (beta_hat+1.96*SE)
`Sigma`	Covariance matrix for estimated regression coefficients
`V`	Estimates of the information matrix (-[Hessian of the loglikelihood]/n)
`Hessian_CV`	Second derivative of the cross-validated likelihood. Normally negative since the cross-validated curve is concave
`h_dot`	Derivative of Equation (8) of Emura et al. (2012) with respect to a shrinkage parameter "a"

Author(s)

Takeshi Emura & Yi-Hau Chen

References

Emura T, Chen Y-H, Chen H-Y (2012) Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627

Examples

### A simulation study ###
n=50 ### sample size
beta_true=c(1,1,0,0,0)
p=length(beta_true) 
t.vec=d.vec=numeric(n)
X.mat=matrix(0,n,p)

set.seed(1)
for(i in 1:n){
  X.mat[i,]=rnorm(p,mean=0,sd=1)
  eta=sum( as.vector(X.mat[i,])*beta_true )
  T=rexp(1,rate=exp(eta))
  C=runif(1,min=0,max=5)
  t.vec[i]=min(T,C)
  d.vec[i]=(T<=C)
}
compound.reg(t.vec,d.vec,X.mat,delta_a=0.1) 
### compare the estimates (beta) with the true value ###
beta_true

### Lung cancer data analysis (Emura et al. 2012 PLoS ONE) ###
data(Lung)
temp=Lung[,"train"]==TRUE
t.vec=Lung[temp,"t.vec"]
d.vec=Lung[temp,"d.vec"]
X.mat=as.matrix( Lung[temp,-c(1,2,3)] )
#compound.reg(t.vec=t.vec,d.vec=d.vec,X.mat=X.mat,delta_a=0.025) # time-consuming process
### A simulation study ###
n=50 ### sample size
beta_true=c(1,1,0,0,0)
p=length(beta_true) 
t.vec=d.vec=numeric(n)
X.mat=matrix(0,n,p)

set.seed(1)
for(i in 1:n){
  X.mat[i,]=rnorm(p,mean=0,sd=1)
  eta=sum( as.vector(X.mat[i,])*beta_true )
  T=rexp(1,rate=exp(eta))
  C=runif(1,min=0,max=5)
  t.vec[i]=min(T,C)
  d.vec[i]=(T<=C)
}
compound.reg(t.vec,d.vec,X.mat,delta_a=0.1) 
### compare the estimates (beta) with the true value ###
beta_true

### Lung cancer data analysis (Emura et al. 2012 PLoS ONE) ###
data(Lung)
temp=Lung[,"train"]==TRUE
t.vec=Lung[temp,"t.vec"]
d.vec=Lung[temp,"d.vec"]
X.mat=as.matrix( Lung[temp,-c(1,2,3)] )
#compound.reg(t.vec=t.vec,d.vec=d.vec,X.mat=X.mat,delta_a=0.025) # time-consuming process

Univariate Cox regression under dependent censoring.

Description

This function performs univariate Cox regression under dependent censoring, where dependence between survival time and censoring time is modeled via the Clayton copula (Emura and Chen 2016).

Usage

dependCox.reg(t.vec, d.vec, X.vec, alpha, var = TRUE, censor.reg=FALSE, baseline=FALSE)
dependCox.reg(t.vec, d.vec, X.vec, alpha, var = TRUE, censor.reg=FALSE, baseline=FALSE)

Arguments

`t.vec`	A vector of survival times (time-to-death or censoring)
`d.vec`	A vector of censoring indicators, 1=death, 0=censoring
`X.vec`	A vector of covariates (multiple covariates are not allowed)
`alpha`	An copula parameter (Kendall's tau = alpha/(alpha+2)
`var`	If TRUE, the standard deviations are given (use FALSE to reduce the computational cost)
`censor.reg`	If TRUE, show the fitted results for both survival and censoring models
`baseline`	If TRUE, show the cumulative baseline hazards at the values of "t.vec"

Details

The Clayton model yields positive association between survival time and censoring time with Kendall's tau being equal to alpha/(alpha+2), where alpha > 0 is a copula parameter. The independence copula corresponds to alpha = 0.

Value

`beta`	The estimated regression coefficient
`SE`	The standard error for the estimated regression coefficient
`Z`	The Z-value for testing the null hypothesis of "beta=0" (the Wald test)
`P`	The P-value for testing the null hypothesis of "beta=0" (the Wald test)

Author(s)

Takeshi Emura

References

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57.

Examples

### Joint Cox regression of survival and censoring ### 
data(Lung)
t.vec=Lung[,"t.vec"]# death or censoring times #
d.vec=Lung[,"d.vec"]# censoring indicators #
# 16-gene prognostic index (Emura and Chen 2016; 2018) #
X.vec=0.51*Lung[,"ZNF264"]+0.50*Lung[,"MMP16"]+
  0.50*Lung[,"HGF"]-0.49*Lung[,"HCK"]+0.47*Lung[,"NF1"]+
  0.46*Lung[,"ERBB3"]+0.57*Lung[,"NR2F6"]+0.77*Lung[,"AXL"]+
  0.51*Lung[,"CDC23"]+0.92*Lung[,"DLG2"]-0.34*Lung[,"IGF2"]+
  0.54*Lung[,"RBBP6"]+0.51*Lung[,"COX11"]+
  0.40*Lung[,"DUSP6"]-0.37*Lung[,"ENG"]-0.41*Lung[,"IHPK1"]
dependCox.reg(t.vec,d.vec,X.vec,alpha=18,censor.reg=TRUE)

temp=c(Lung[,"train"]==TRUE)
t.vec=Lung[temp,"t.vec"]
d.vec=Lung[temp,"d.vec"]
dependCox.reg(t.vec,d.vec,Lung[temp,"ZNF264"],alpha=18) 
# this reproduces Table 3 of Emura and Chen (2016) #

#### A simulation study under dependent censoring ####
beta_true=1.5 # true regression coefficient
alpha_true=2 # true copula parameter corresponding to Kendall's tau=0.5
n=150
t.vec=d.vec=X.vec=numeric(n)
set.seed(1)
for(i in 1:n){
  X.vec[i]=runif(1)
  eta=X.vec[i]*beta_true
  U=runif(1)
  V=runif(1)
  T=-1/exp(eta)*log(1-U) # Exp(eta) distribution
  W=(1-U)^(-alpha_true) # dependence produced by the Clayton copula
  C=1/alpha_true/exp(eta)*log(  1-W+W*(1-V)^(-alpha_true/(alpha_true+1))  ) # Exp(eta) distribution
  t.vec[i]=min(T,C)
  d.vec[i]=(T<=C) 
}

dependCox.reg(t.vec,d.vec,X.vec,alpha=alpha_true,var=FALSE) # faster computation by "var=FALSE"
beta_true# the above estimate is close to the true value
coxph(Surv(t.vec,d.vec)~X.vec)$coef
# this estimate is biased for the true value due to dependent censoring 
### Joint Cox regression of survival and censoring ### 
data(Lung)
t.vec=Lung[,"t.vec"]# death or censoring times #
d.vec=Lung[,"d.vec"]# censoring indicators #
# 16-gene prognostic index (Emura and Chen 2016; 2018) #
X.vec=0.51*Lung[,"ZNF264"]+0.50*Lung[,"MMP16"]+
  0.50*Lung[,"HGF"]-0.49*Lung[,"HCK"]+0.47*Lung[,"NF1"]+
  0.46*Lung[,"ERBB3"]+0.57*Lung[,"NR2F6"]+0.77*Lung[,"AXL"]+
  0.51*Lung[,"CDC23"]+0.92*Lung[,"DLG2"]-0.34*Lung[,"IGF2"]+
  0.54*Lung[,"RBBP6"]+0.51*Lung[,"COX11"]+
  0.40*Lung[,"DUSP6"]-0.37*Lung[,"ENG"]-0.41*Lung[,"IHPK1"]
dependCox.reg(t.vec,d.vec,X.vec,alpha=18,censor.reg=TRUE)

temp=c(Lung[,"train"]==TRUE)
t.vec=Lung[temp,"t.vec"]
d.vec=Lung[temp,"d.vec"]
dependCox.reg(t.vec,d.vec,Lung[temp,"ZNF264"],alpha=18) 
# this reproduces Table 3 of Emura and Chen (2016) #

#### A simulation study under dependent censoring ####
beta_true=1.5 # true regression coefficient
alpha_true=2 # true copula parameter corresponding to Kendall's tau=0.5
n=150
t.vec=d.vec=X.vec=numeric(n)
set.seed(1)
for(i in 1:n){
  X.vec[i]=runif(1)
  eta=X.vec[i]*beta_true
  U=runif(1)
  V=runif(1)
  T=-1/exp(eta)*log(1-U) # Exp(eta) distribution
  W=(1-U)^(-alpha_true) # dependence produced by the Clayton copula
  C=1/alpha_true/exp(eta)*log(  1-W+W*(1-V)^(-alpha_true/(alpha_true+1))  ) # Exp(eta) distribution
  t.vec[i]=min(T,C)
  d.vec[i]=(T<=C) 
}

dependCox.reg(t.vec,d.vec,X.vec,alpha=alpha_true,var=FALSE) # faster computation by "var=FALSE"
beta_true# the above estimate is close to the true value
coxph(Surv(t.vec,d.vec)~X.vec)$coef
# this estimate is biased for the true value due to dependent censoring

Cox regression under dependent censoring.

Description

This function performs estimation and significance testing for survival data under a copula-based dependent censoring model proposed in Emura and Chen (2016). The dependency between the failure and censoring times is modeled via the Clayton copula. The method is based on the semiparametric maximum likelihood estimation, where the association parameter is estimated by maximizing the cross-validated c-index (see Emura and Chen 2016 for details).

Usage

dependCox.reg.CV(t.vec, d.vec, X.mat, K = 5, G = 20)
dependCox.reg.CV(t.vec, d.vec, X.mat, K = 5, G = 20)

Arguments

`t.vec`	A vector of survival times (time-to-death or censoring)
`d.vec`	A vector of censoring indicators, 1=death, 0=censoring
`X.mat`	An (n*p) matrix of covariates, where n is the sample size and p is the number of covariates
`K`	The number of cross-validation folds
`G`	The number of grids to optimize c-index (c-index is computed for G different values of copula parameters)

Details

If the number of covariates p is large (p>=100), the computational time becomes very long. We suggest using "uni.selection" to reduce the number such that p<100.

If the number of grids G is large, the computational time becomes very long. Please take 5<=G<=20.

Value

`beta`	The estimated regression coefficients
`SE`	The standard errors for the estimated regression coefficients
`Z`	The Z-values for testing the null hypothesis of "beta=0" (the Wald test)
`P`	The P-values for testing the null hypothesis of "beta=0" (the Wald test)
`alpha`	The estimated copula parameter by optimizing c-index
`c_index`	The optimized value of c_index

Author(s)

Takeshi Emura

References

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57

Examples

### Reproduce Section 5 of Emura and Chen (2016) ###
data(Lung)
temp=Lung[,"train"]==TRUE
t.vec=Lung[temp,"t.vec"]
d.vec=Lung[temp,"d.vec"]
X.mat=as.matrix(Lung[temp,-c(1,2,3)])
#dependCox.reg.CV(t.vec,d.vec,X.mat,G=20) # time-consuming process #
### Reproduce Section 5 of Emura and Chen (2016) ###
data(Lung)
temp=Lung[,"train"]==TRUE
t.vec=Lung[temp,"t.vec"]
d.vec=Lung[temp,"d.vec"]
X.mat=as.matrix(Lung[temp,-c(1,2,3)])
#dependCox.reg.CV(t.vec,d.vec,X.mat,G=20) # time-consuming process #

Survival data for patients with non-small-cell lung cancer.

Description

A subset of the lung cancer data (Chen et al. 2007) made available by Emura et al. (2019). The subset consists of 97 gene expressions from 125 patients with non-small-cell lung cancer. The 97 genes were selected with P-value<0.20 under univariate Cox regression analyses as previously done in Emura et al. (2012) and Emura and Chen (2016). The intensity of gene expression was transformed to an ordinal level using the quantile, i.e. if the intensity of gene expression was <=25th, >25th, >50th, or >75th percentile, it was coded as 1, 2, 3, or 4, respectively (Chen et al. 2007).

Usage

data("Lung")data("Lung")

Format

A data frame with 125 observations on the following 100 variables.

t.vec: survival times (time to either death or censoring) in months
d.vec: censoring indicators, 1=death, 0=censoring
train: TRUE=training set, FALSE=testing set, as defined in Chen et al. (2007)
VHL: gene expression, coded as 1, 2, 3, or 4
IHPK1: gene expression, coded as 1, 2, 3, or 4
HMMR: gene expression, coded as 1, 2, 3, or 4
CMKOR1: gene expression, coded as 1, 2, 3, or 4
PLAU: gene expression, coded as 1, 2, 3, or 4
IGF2: gene expression, coded as 1, 2, 3, or 4
FGB: gene expression, coded as 1, 2, 3, or 4
MYBL2: gene expression, coded as 1, 2, 3, or 4
ODC1: gene expression, coded as 1, 2, 3, or 4
MTHFD2: gene expression, coded as 1, 2, 3, or 4
GLIPR1: gene expression, coded as 1, 2, 3, or 4
EZH2: gene expression, coded as 1, 2, 3, or 4
HCK: gene expression, coded as 1, 2, 3, or 4
CCNC: gene expression, coded as 1, 2, 3, or 4
XRCC1: gene expression, coded as 1, 2, 3, or 4
CYP1B1: gene expression, coded as 1, 2, 3, or 4
CDC25A: gene expression, coded as 1, 2, 3, or 4
CD44: gene expression, coded as 1, 2, 3, or 4
LCK: gene expression, coded as 1, 2, 3, or 4
MTHFS: gene expression, coded as 1, 2, 3, or 4
PON3: gene expression, coded as 1, 2, 3, or 4
PTPN6: gene expression, coded as 1, 2, 3, or 4
KIDINS220: gene expression, coded as 1, 2, 3, or 4
KLHL22: gene expression, coded as 1, 2, 3, or 4
RBBP6: gene expression, coded as 1, 2, 3, or 4
GABARAPL2: gene expression, coded as 1, 2, 3, or 4
SEH1L: gene expression, coded as 1, 2, 3, or 4
CITED2: gene expression, coded as 1, 2, 3, or 4
BARD1: gene expression, coded as 1, 2, 3, or 4
TLX1: gene expression, coded as 1, 2, 3, or 4
CRMP1: gene expression, coded as 1, 2, 3, or 4
CTNNA1: gene expression, coded as 1, 2, 3, or 4
ANXA5: gene expression, coded as 1, 2, 3, or 4
PTGS2: gene expression, coded as 1, 2, 3, or 4
SMC4L1: gene expression, coded as 1, 2, 3, or 4
LOC285086: gene expression, coded as 1, 2, 3, or 4
ATP11B: gene expression, coded as 1, 2, 3, or 4
CDK10: gene expression, coded as 1, 2, 3, or 4
IRF4: gene expression, coded as 1, 2, 3, or 4
MYH11: gene expression, coded as 1, 2, 3, or 4
ME3: gene expression, coded as 1, 2, 3, or 4
CCT6A: gene expression, coded as 1, 2, 3, or 4
SNCG: gene expression, coded as 1, 2, 3, or 4
MAK3: gene expression, coded as 1, 2, 3, or 4
VCPIP1: gene expression, coded as 1, 2, 3, or 4
JMJD1A: gene expression, coded as 1, 2, 3, or 4
STAT2: gene expression, coded as 1, 2, 3, or 4
DDX6: gene expression, coded as 1, 2, 3, or 4
ERBB3: gene expression, coded as 1, 2, 3, or 4
PAX2: gene expression, coded as 1, 2, 3, or 4
PCTK2: gene expression, coded as 1, 2, 3, or 4
NF1: gene expression, coded as 1, 2, 3, or 4
DLG2: gene expression, coded as 1, 2, 3, or 4
JMJD1A.1: gene expression, coded as 1, 2, 3, or 4
SUCLA2: gene expression, coded as 1, 2, 3, or 4
MMP16: gene expression, coded as 1, 2, 3, or 4
AP3B2: gene expression, coded as 1, 2, 3, or 4
HGF: gene expression, coded as 1, 2, 3, or 4
MAP2K3: gene expression, coded as 1, 2, 3, or 4
CPEB4: gene expression, coded as 1, 2, 3, or 4
ZNF264: gene expression, coded as 1, 2, 3, or 4
AXL: gene expression, coded as 1, 2, 3, or 4
CDC23: gene expression, coded as 1, 2, 3, or 4
MAST3: gene expression, coded as 1, 2, 3, or 4
COX11: gene expression, coded as 1, 2, 3, or 4
PRKAG2: gene expression, coded as 1, 2, 3, or 4
MAN1B1: gene expression, coded as 1, 2, 3, or 4
F8: gene expression, coded as 1, 2, 3, or 4
RSU1: gene expression, coded as 1, 2, 3, or 4
MMD: gene expression, coded as 1, 2, 3, or 4
AK5: gene expression, coded as 1, 2, 3, or 4
IDS: gene expression, coded as 1, 2, 3, or 4
BNIP1: gene expression, coded as 1, 2, 3, or 4
ENG: gene expression, coded as 1, 2, 3, or 4
PCDHGC3: gene expression, coded as 1, 2, 3, or 4
RALY: gene expression, coded as 1, 2, 3, or 4
WDR33: gene expression, coded as 1, 2, 3, or 4
RNF4: gene expression, coded as 1, 2, 3, or 4
PRDX1: gene expression, coded as 1, 2, 3, or 4
FXN: gene expression, coded as 1, 2, 3, or 4
PTPRU: gene expression, coded as 1, 2, 3, or 4
FRAP1: gene expression, coded as 1, 2, 3, or 4
MMP7: gene expression, coded as 1, 2, 3, or 4
CST3: gene expression, coded as 1, 2, 3, or 4
TIMP2: gene expression, coded as 1, 2, 3, or 4
TAL1: gene expression, coded as 1, 2, 3, or 4
STAT1: gene expression, coded as 1, 2, 3, or 4
CCND1: gene expression, coded as 1, 2, 3, or 4
DUSP6: gene expression, coded as 1, 2, 3, or 4
SNRPF: gene expression, coded as 1, 2, 3, or 4
MMP13: gene expression, coded as 1, 2, 3, or 4
NR2F6: gene expression, coded as 1, 2, 3, or 4
HOXA1: gene expression, coded as 1, 2, 3, or 4
RIPK1: gene expression, coded as 1, 2, 3, or 4
IL7R: gene expression, coded as 1, 2, 3, or 4
SEC13L1: gene expression, coded as 1, 2, 3, or 4
RPL5: gene expression, coded as 1, 2, 3, or 4

Details

Survival data consisting of 125 patients.

Source

Chen HY, Yu SL, Chen CH, et al (2007). A Five-gene Signature and Clinical Outcome in Non-small-cell Lung Cancer, N Engl J Med 356: 11-20.

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

References

Chen HY, Yu SL, Chen CH, et al (2007). A Five-gene Signature and Clinical Outcome in Non-small-cell Lung Cancer, N Engl J Med 356: 11-20.

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57

Examples

data(Lung)
Lung[1:3,] ## show the first 3 samples ## 

## The five-gene signature in Chen et al. (2007) ##
temp=Lung[,"train"]==TRUE
t.vec=Lung[temp,"t.vec"]
d.vec=Lung[temp,"d.vec"]
coxph(Surv(t.vec,d.vec)~Lung[temp,"ERBB3"])
coxph(Surv(t.vec,d.vec)~Lung[temp,"LCK"])
coxph(Surv(t.vec,d.vec)~Lung[temp,"DUSP6"])
coxph(Surv(t.vec,d.vec)~Lung[temp,"STAT1"])
coxph(Surv(t.vec,d.vec)~Lung[temp,"MMD"])
data(Lung)
Lung[1:3,] ## show the first 3 samples ## 

## The five-gene signature in Chen et al. (2007) ##
temp=Lung[,"train"]==TRUE
t.vec=Lung[temp,"t.vec"]
d.vec=Lung[temp,"d.vec"]
coxph(Surv(t.vec,d.vec)~Lung[temp,"ERBB3"])
coxph(Surv(t.vec,d.vec)~Lung[temp,"LCK"])
coxph(Surv(t.vec,d.vec)~Lung[temp,"DUSP6"])
coxph(Surv(t.vec,d.vec)~Lung[temp,"STAT1"])
coxph(Surv(t.vec,d.vec)~Lung[temp,"MMD"])

Primary biliary cirrhosis (PBC) of the liver data

Description

A subset of primary biliary cirrhosis (PBC) of the liver data in the book "Counting Process & Survival Analysis" by Fleming & Harrington (1991). This subset is used in Tibshirani (1997).

Usage

data(PBC)data(PBC)

Format

A data frame with 276 observations on the following 19 variables.

T: Survival times (either time to death or censoring) in days
d: Censoring indicator, 1=death, 0=censoring
trt: Treatment indicator, 1=treatment by D-penicillamine, 0=placebo
age: Age in years (days divided by 365.25)
sex: Sex, 0=male, 1=female
asc: Presence of ascites, 0=no, 1=yes
hep: Presence of hepatomegaly, 0=no, 1=yes
spi: Presence of spiders, 0=no, 1=yes
ede: Presence of edema, 0=no edema, 0.5=edema resolved by therapy, 1=edema not resolved by therapy
bil: log(bililubin, mg/dl)
cho: log(cholesterol, mg/dl)
alb: log(albumin, gm/dl)
cop: log(urine copper, mg/day)
alk: log(alkarine, U/liter)
SGO: log(SGOT, in U/ml)
tri: log(triglycerides, in mg/dl)
pla: log(platelet count, [the number of platelets per-cubic-milliliter of blood]/1000)
pro: log(prothrombin time, in seconds)
gra: Histologic stage of disease, graded 1, 2, 3, or 4

Details

Survival data consisting of 276 patients with 17 covariates. Among them, 111 patients died (d=1) while others were censored (d=0). The covariates consist of a treatment indicator (trt), age, sex, 5 categorical variables (ascites, hepatomegaly, spider, edema, and stage of disease) and 9 log-transformed continuous variables (bilirubin, cholesterol, albumin, urine copper, alkarine, SGOT, triglycerides, platelet count, and prothrombine).

Source

Fleming & Harrignton (1991); Tibshirani (1997)

References

Tibshirani R (1997), The Lasso method for variable selection in the Cox model, Statistics in Medicine, 385-395.

Examples

data(PBC)
PBC[1:5,] ### profiles for the first 5 patients ###
# See also Appendix D.1 of Fleming & Harrington, Counting Process & Survival Analysis (1991) # 
data(PBC)
PBC[1:5,] ### profiles for the first 5 patients ###
# See also Appendix D.1 of Fleming & Harrington, Counting Process & Survival Analysis (1991) #

Factorial survival analysis under dependent censoring

Description

Perform factorial survival analysis under dependent censoring under an assumed copula (Emura et al. 2023-).

Usage

surv.factorial(t.vec,d.vec,group,copula,alpha,R=1000,t.upper=min(tapply(t.vec,group,max)),
C=NULL,S.plot=TRUE,mark.time=FALSE)
surv.factorial(t.vec,d.vec,group,copula,alpha,R=1000,t.upper=min(tapply(t.vec,group,max)),
C=NULL,S.plot=TRUE,mark.time=FALSE)

Arguments

`t.vec`	Vector of survival times (time to either death or censoring)
`d.vec`	Vector of censoring indicators, 1=death, 0=censoring
`group`	Vector of group indicators, 1, 2, ..., d
`copula`	Copula function: "CG.Clayton","CG.Gumbel" or "CG.Frank"
`alpha`	Copula parameter
`R`	The number of Monte Carlo simulations to find the critical value of the F-test
`t.upper`	Follow-up end (default is max(t.vec))
`C`	Contrast matrix
`S.plot`	If TRUE, the survival curve is displayed
`mark.time`	If TRUE, then curves are marked at each censoring time

Details

Estimates of treatment effects and the test results are shown.

Value

`copula.parameter`	Copula parameter
`p`	Estimates of treatment effects
`Var`	Variance estimates
`F`	F-statistic
`c.simu`	Critical value via the simulation method
`c.anal`	Critical value via the analytical method
`P.value`	P-value of the F-test

Author(s)

Takeshi Emura

References

Emura T, Ditzhaus M, Dobler D (2023-), Factorial survival analysis for treatment effects under dependent censoring, in preparation.

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

Emura T, Chen YH (2018). Analysis of Survival Data with Dependent Censoring, Copula-Based Approaches, JSS Research Series in Statistics, Springer, Singapore.

Rivest LP, Wells MT (2001). A Martingale Approach to the Copula-graphic Estimator for the Survival Function under Dependent Censoring, J Multivar Anal; 79: 138-55.

Examples

## to be written ##
## to be written ##

Univariate Cox score test

Description

Univariate significance analyses via the score tests (Witten & Tibshirani 2010; Emura et al. 2019) based on association between individual features and survival.

Usage

uni.score(t.vec, d.vec, X.mat, d0=0)
uni.score(t.vec, d.vec, X.mat, d0=0)

Arguments

`t.vec`	Vector of survival times (time to either death or censoring)
`d.vec`	Vector of censoring indicators, 1=death, 0=censoring
`X.mat`	n by p matrix of covariates, where n is the sample size and p is the number of covariates
`d0`	A positive constant to stabilize the variance (Witten & Tibshirani 2010)

Details

score test

Value

`beta`	Estimated regression coefficients (one-step estimator)
`Z`	Z-value for testing H_0: beta=0 (score test)
`P`	P-value for testing H_0: beta=0 (score test)

Author(s)

Takeshi Emura and Shigeyuki Matsui

References

Emura T, Matsui S, Chen HY (2018-). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine, to appear.

Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Method Med Res 19:29-51

Examples

data(Lung)
t.vec=Lung$t.vec[Lung$train==TRUE]
d.vec=Lung$d.vec[Lung$train==TRUE]
X.mat=Lung[Lung$train==TRUE,-c(1,2,3)]
uni.score(t.vec, d.vec, X.mat)
data(Lung)
t.vec=Lung$t.vec[Lung$train==TRUE]
d.vec=Lung$d.vec[Lung$train==TRUE]
X.mat=Lung[Lung$train==TRUE,-c(1,2,3)]
uni.score(t.vec, d.vec, X.mat)

Univariate feature selection based on univariate significance tests

Description

This function performs univariate feature selection using significance tests (Wald tests or score tests) based on association between individual features and survival. Features are selected if their P-values are less than a given threshold (P.value).

Usage

uni.selection(t.vec, d.vec, X.mat, P.value=0.001,K=10,score=TRUE,d0=0,
                       randomize=FALSE,CC.plot=FALSE,permutation=FALSE,M=200)
uni.selection(t.vec, d.vec, X.mat, P.value=0.001,K=10,score=TRUE,d0=0,
                       randomize=FALSE,CC.plot=FALSE,permutation=FALSE,M=200)

Arguments

`t.vec`	Vector of survival times (time to either death or censoring)
`d.vec`	Vector of censoring indicators (1=death, 0=censoring)
`X.mat`	n by p matrix of covariates, where n is the sample size and p is the number of covariates
`P.value`	A threshold for selecting features
`K`	The number of cross-validation folds
`score`	If TRUE, the score tests are used; if not, the Wald tests are used
`d0`	A positive constant to stabilize the variance of score statistics (Witten & Tibshirani 2010)
`randomize`	If TRUE, randomize patient ID's before cross-validation
`CC.plot`	If TRUE, the compound covariate (CC) predictors are plotted
`permutation`	If TRUE, the FDR is computed by a permutation method (Witten & Tibshirani 2010; Emura et al. 2019).
`M`	The number of permutations to calculate the FDR

Details

The cross-validated likelihood (CVL) value is computed for selected features (Matsui 2006; Emura et al. 2019). A high CVL value corresponds to a better predictive ability of selected features. Hence, the CVL value can be used to find the optimal set of features. The CVL value is computed by a K-fold cross-validation, where the number K can be chosen by user. The false discovery rate (FDR) is also computed by a formula and a permutation test (if "permutation=TRUE"). The RCVL1 and RCVL2 are "re-substitution" CVL values and provide upper control limits for the CVL value. If the CVL value is less than RCVL1 and RCVL2 values, the CVL value would be in-control. On the other hand, if the CVL value exceeds either RCVL1 or RCVL2 value, then the CVL may be computed again after changing the sample allocation.

Value

`gene`	Gene symbols
`beta`	Estimated regression coefficients
`Z`	Z-values for significance tests
`P`	P-values for significance tests
`CVL`	The value of CVL, RCVL1, and RCVL2 (Emura et al. 2019)
`Genes`	The number of genes, the number of selected genes, and the number of falsely selected genes
`FDR`	False discovery rate (by a formula or a permutation method)

Author(s)

Takeshi Emura

References

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

Matsui S (2006). Predicting Survival Outcomes Using Subsets of Significant Genes in Prognostic Marker Studies with Microarrays. BMC Bioinformatics: 7:156.

Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Method Med Res 19:29-51

Examples

data(Lung)
t.vec=Lung$t.vec[Lung$train==TRUE]
d.vec=Lung$d.vec[Lung$train==TRUE]
X.mat=Lung[Lung$train==TRUE,-c(1,2,3)]
uni.selection(t.vec, d.vec, X.mat, P.value=0.05,K=5,score=FALSE)
## the outputs reproduce Table 3 of Emura and Chen (2016) ## 
data(Lung)
t.vec=Lung$t.vec[Lung$train==TRUE]
d.vec=Lung$d.vec[Lung$train==TRUE]
X.mat=Lung[Lung$train==TRUE,-c(1,2,3)]
uni.selection(t.vec, d.vec, X.mat, P.value=0.05,K=5,score=FALSE)
## the outputs reproduce Table 3 of Emura and Chen (2016) ##

Univariate Cox Wald test

Description

Univariate significance analyses via the Wald tests (Witten & Tibshirani 2010; Emura et al. 2019) based on association between individual features and survival.

Usage

uni.Wald(t.vec, d.vec, X.mat)
uni.Wald(t.vec, d.vec, X.mat)

Arguments

`t.vec`	Vector of survival times (time to either death or censoring)
`d.vec`	Vector of censoring indicators, 1=death, 0=censoring
`X.mat`	n by p matrix of covariates, where n is the sample size and p is the number of covariates

Details

Wald test

Value

`beta`	Estimated regression coefficients
`Z`	Z-value for testing H_0: beta=0 (Wald test)
`P`	P-value for testing H_0: beta=0 (Wald test)

Author(s)

Takeshi Emura

References

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Method Med Res 19:29-51

Examples

data(Lung)
t.vec=Lung$t.vec[Lung$train==TRUE]
d.vec=Lung$d.vec[Lung$train==TRUE]
X.mat=Lung[Lung$train==TRUE,-c(1,2,3)]
uni.Wald(t.vec, d.vec, X.mat)
data(Lung)
t.vec=Lung$t.vec[Lung$train==TRUE]
d.vec=Lung$d.vec[Lung$train==TRUE]
X.mat=Lung[Lung$train==TRUE,-c(1,2,3)]
uni.Wald(t.vec, d.vec, X.mat)

Generate a matrix of gene expressions in the presence of gene pathways

Description

Generate a matrix of gene expressions in the presence of gene pathways (Scenario 2 of Emura et al. (2012)).

Usage

X.pathway(n,p,q1,q2,rho1=0.5,rho2=0.5)X.pathway(n,p,q1,q2,rho1=0.5,rho2=0.5)

Arguments

`n`	the number of individuals (sample size)
`p`	the number of genes
`q1`	the number of genes in the first pathway
`q2`	the number of genes in the second pathway
`rho1`	the correlation coefficient for the first pathway
`rho2`	the correlation coefficient for the second pathway

Details

n by p matrix of gene expressions are generated. Correlation between columns (genes) is introduced to reflect the presence of gene pathways. The distribution of each column (gene) is standardized to have mean=0 and SD=1. Two blocks of correlated genes (i.e., two gene pathways) are generated, where the first pathway include "q1" genes and the second pathway includes "q2" genes. The last p-q1-q2 genes are independent genes. If two genes are correlated, the correlation is 0.5 (can be changed by option "rho1" and "rho2"). Details are referred to p.4 of Emura et al. (2012). This deta generation scheme was used in the simulations of Emura et al. (2012), Emura and Chen (2016) and Emura et al. (2017).

Value

`X`	n by p matrix of gene expressions

Author(s)

Takeshi Emura & Yi-Hau Chen

References

Emura T, Chen YH, Chen HY (2012). Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57

Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V (2017) Personalized dynamic prediction of death according to tumour progression and high-dimensional genetic factors: meta-analysis with a joint model, Stat Methods Med Res, doi:10.1177/0962280216688032

Examples

## generate 6 gene expressions from 10 individuals
X.pathway(n=10,p=6,q1=2,q2=2)

## generate 200 gene expressions and check the mean and SD
X.mat=X.pathway(n=200,p=100,q1=10,q2=10)
round( colMeans(X.mat),3 ) ## mean ~ 0 ##
round( apply(X.mat, MARGIN=2, FUN=sd),3) ## SD ~ 1 ##

## Change correlation coefficients by option "rho1" and "rho2" ##
X.mat=X.pathway(n=10000,p=6,q1=2,q2=2,rho1=0.2,rho2=0.8)
round( colMeans(X.mat),2 ) ## mean ~ 0 ##
round( apply(X.mat, MARGIN=2, FUN=sd),2) ## SD ~ 1 ##
round( cor(X.mat),1) ## Correlation matrix ##
## generate 6 gene expressions from 10 individuals
X.pathway(n=10,p=6,q1=2,q2=2)

## generate 200 gene expressions and check the mean and SD
X.mat=X.pathway(n=200,p=100,q1=10,q2=10)
round( colMeans(X.mat),3 ) ## mean ~ 0 ##
round( apply(X.mat, MARGIN=2, FUN=sd),3) ## SD ~ 1 ##

## Change correlation coefficients by option "rho1" and "rho2" ##
X.mat=X.pathway(n=10000,p=6,q1=2,q2=2,rho1=0.2,rho2=0.8)
round( colMeans(X.mat),2 ) ## mean ~ 0 ##
round( apply(X.mat, MARGIN=2, FUN=sd),2) ## SD ~ 1 ##
round( cor(X.mat),1) ## Correlation matrix ##

Generate a matrix of gene expressions in the presence of tag genes

Description

Generate a matrix of gene expressions in the presence of tag genes (Scenario 1 of Emura et al. (2012)).

Usage

X.tag(n, p, q, s = 1)
X.tag(n, p, q, s = 1)

Arguments

`n`	the number of individuals (sample size)
`p`	the number of genes
`q`	the number of non-null genes
`s`	the number of null genes correlated with a non-null gene (tag)

Details

n by p matrix of gene expressions are generated. Correlation between columns is introduced to reflect the presence of tag genes. The distribution of each column is standardized to have mean=0 and SD=1. If two genes are correlated, the correlation is 0.5. Otherwise, the correlation is 0. Details are referred to p.4 of Emura et al. (2012). This deta generation scheme was used in the simulations of Emura et al. (2012) and Emura and Chen (2016).

Value

`X`	n by p matrix of gene expressions

Author(s)

Takeshi Emura & Yi-Hau Chen

References

Emura T, Chen YH, Chen HY (2012). Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627

Emura T, Chen YH (2016). Gene Selection for Survival Data Under Dependent Censoring: a Copula-based Approach, Stat Methods Med Res 25(No.6): 2840-57.

Examples

X.mat=X.tag(n=200,p=100,q=10,s=4)
round( colMeans(X.mat),3 ) ## mean ~ 0 ##
round( apply(X.mat, MARGIN=2, FUN=sd),3) ## SD ~ 1 ##
X.mat=X.tag(n=200,p=100,q=10,s=4)
round( colMeans(X.mat),3 ) ## mean ~ 0 ##
round( apply(X.mat, MARGIN=2, FUN=sd),3) ## SD ~ 1 ##

Package 'compound.Cox'

Help Index

Univariate Feature Selection and Compound Covariate for Predicting Survival

Description

Details

Author(s)

References

Copula-graphic estimator under the Clayton copula.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Copula-graphic estimator under the Frank copula.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Copula-graphic estimator under the Gumbel copula.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Testing survival difference of two groups via the CG estimators

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Cross-validated c-index for measuring the predictive accuracy of a prognostic index under a copula-based dependent censoring model.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Compound shrinkage estimation under the Cox model

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Univariate Cox regression under dependent censoring.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Cox regression under dependent censoring.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Survival data for patients with non-small-cell lung cancer.