Title: | Survival Analysis of Time Varying Coefficients Using a Tree-Based Approach |
---|---|
Description: | Estimates time varying regression effects under Cox type models in survival data using classification and regression tree. The codes in this package were originally written in S-Plus for the paper "Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach," by Xu, R. and Adak, S. (2002) <doi:10.1111/j.0006-341X.2002.00305.x>, Biometrics, 58: 305-315. Development of this package was supported by NIH grants AG053983 and AG057707, and by the UCSD Altman Translational Research Institute, NIH grant UL1TR001442. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The example data are from the Honolulu Heart Program/Honolulu Asia Aging Study (HHP/HAAS). |
Authors: | Sudeshna Adak [aut], Ronghui Xu [aut], Euyhyun Lee [trl, cre], Steven Edland [ctb], Lon White [ctb] |
Maintainer: | Euyhyun Lee <[email protected]> |
License: | GPL-2 |
Version: | 0.3.1 |
Built: | 2024-11-03 06:24:11 UTC |
Source: | CRAN |
This data set contains subjects' age at the time of death, and alcohol drinking habits. The data set includes 7990 subjects and 7610 events.
data('alcohol')
data('alcohol')
time: Subject's age at death (possibly right censored)
event:
Outcome indicator.
1 = death
0 = censored
alc
0 = no alcohol consumption
1 = moderate alcohol consumption
4 = excessive alcohol consumption
Data is from the Honolulu Heart Program/Honolulu Asia Aging Study (HHP/HAAS). The HHP/HAAS was reviewed and approved by the Kuakini Hospital IRB, Kuakini Hospital, Honolulu, HI.
This function is used to obtain the bias-corrected cost. One may select the final subtree with the lowest bootstrap estimated cost, with or without the additional AIC/BIC as in Xu and Adak (2002).
bootstrap(B = 20, nodetree, subtrees, survtime, survstatus, x, D = 4, minfail = 30, alphac = 2)
bootstrap(B = 20, nodetree, subtrees, survtime, survstatus, x, D = 4, minfail = 30, alphac = 2)
B |
Number of bootstrap samples. Default is 20. |
nodetree |
Full grown tree with original data. Output from |
subtrees |
Pruned subtrees with original data. Output from |
survtime |
survival time/follow up time of subjects |
survstatus |
survival status of subjects. |
x |
a data frame of covariates. In case of single covariate, use [,,drop =FALSE] to keep the data frame structure |
D |
maximum depth the tree will grow. Default depth is 4. |
minfail |
minimum number of unique event required in each block. Default is 10 |
alphac |
Predetermined penalty parameter |
The implemented cost here is the negative log partial likelihood. Each bootstrap sample is used to grow a full tree and then pruned to obtain the set of subtrees. The bias is estimated by the average of the differences between the cost of a bootstrapped subtree itself and the cost of sending the original data down the bootstrapped subtree. The bias-corrected cost is then obtained by subtracting this bias from the original cost. Predetermined penalty parameter can be used to account for the dimension of covariates, via Akaike information criteria (AIC), Schwarz Bayesian information criteria (BIC), or the 0.95 quantile of the chi-square distribution.
bcoef |
coefficient values from each bootstrap sample |
btree |
Tree related information from each bootstrapped sample. Types of information are the same as the ones from |
bomega |
Bias at each subtree for each bootstrapped data, the average of which gives the overall bootstrap estimated bias |
bootcost |
cost based on the bootstrapped data |
ori.boot |
negative log partial likelihood of the original data fitted to the model given by bootstrapped data |
Xu, R. and Adak, S. (2002), Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach. Biometrics, 58: 305-315.
## Not run: data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) #This function requires output from output.coxphout, prune, and the original data set. store.mult.cont <- bootstrap(B=20, nodetree, subtrees, alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D=4,minfail=20, alphac=2) ## End(Not run)
## Not run: data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) #This function requires output from output.coxphout, prune, and the original data set. store.mult.cont <- bootstrap(B=20, nodetree, subtrees, alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D=4,minfail=20, alphac=2) ## End(Not run)
This funtion finds the optimal cutpoints for the time-varying regression effects and grows the 'full tree' using the score statistic.
coxph.tree(survtime, survstatus, x, D = 3, method = "breslow", minfail = 10, iter.max = 20, eps = 1e-06, type = 'mod')
coxph.tree(survtime, survstatus, x, D = 3, method = "breslow", minfail = 10, iter.max = 20, eps = 1e-06, type = 'mod')
survtime |
survival time/ follow up time of subjects |
survstatus |
survival status of subjects. 0 for censored and 1 for event |
x |
a data frame of covariates. In case of single covariate, use |
D |
maximum depth the tree will grow. Default depth is 3. |
method |
argument for coxph function. Default is 'breslow'. See |
minfail |
minimum number of unique events required in each block. Default is 10 |
iter.max |
the maximum number of iteration in coxph; default is 20. See |
eps |
argument for coxph function; default is 0.000001. See |
type |
method to calculate the score statistic. Two options are available: 'mod' for the modified score statistic and 'ori' for the original score statistic. Default value is 'mod.' Modified score statistic is used in the bootstrap part |
coxph.tree
takes in survival time, survival status, and covariates to grow the full tree.
It follows one of the stopping rules: 1) when the pre-specified depth is reached, or 2) the number of events in a node is less than a prespecified number, or 3) the maximized score statistic is less than a default value (0.0001).
Currently, data need to be arranged in descending order of time and with no missing.
coxph.tree
returns an object of class 'coxphtree.'
The function output.coxphout
is used to obtain and print a summary of the result.
An object of class 'coxphtree' is a list containing the following components:
D |
Depth value specified in the argument |
coef |
coefficient values of predictors. First number represents depth and second number represents block number |
lkl |
Likelihood ratio value of each node |
breakpt |
Starting point of each node. Starting point of node at Depth= 0 to maximum Depth = D+1 is shown. |
ntree |
Number of cases in each node |
nevent |
Number of events in each node |
nblocks |
Number of blocks in each depth |
nodes |
Indicator that indicates whether the block was eligible for further split |
nodetree |
A table with depth, block, node, left right, maximum score, start time, end time, # of cases, and # of events |
scoretest |
Maximum score at each block |
xnames |
Name of predictors |
failtime |
The time when events occurred without duplicates |
summary |
|
pvalue |
p-value to test validity of a change point against none |
Xu, R. and Adak, S. (2002), Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach. Biometrics, 58: 305-315.
##Call in alcohol data set data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree)
##Call in alcohol data set data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree)
elbow.tree
is like final.tree
, but instead of using the minimum cost it uses the 'elbow' of the costs. It is similar to the elbow AIC or BIC approaches in the literature.
elbow.tree(nodetree=nodetree, subtrees=subtrees, omega, alphac=2)
elbow.tree(nodetree=nodetree, subtrees=subtrees, omega, alphac=2)
nodetree |
Fully grown tree from the original data. Output from |
subtrees |
Pruned subtrees from the original data. Output from |
omega |
Bias (i.e. third index of the output) from |
alphac |
Predetermined penalty parameter |
One can take the output (table) generated by this function and plot the (penalized) bias-corrected cost of each subtrees, then (visually) identify the 'elbow' as the selected subtree.
subtree |
output from |
cost.p |
This column contains the (penalized) bias-corrected cost of each subtree |
## Not run: data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) store.mult.cont <- bootstrap(B=20, nodetree, subtrees, alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D=4,minfail=20, alphac=2) Balph <- 0.5 * 2 * log(nrow(alcohol)) elbow.tree <- elbow.tree(nodetree, subtrees, store.mult.cont[[3]], alphac= Balph) ## End(Not run)
## Not run: data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) store.mult.cont <- bootstrap(B=20, nodetree, subtrees, alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D=4,minfail=20, alphac=2) Balph <- 0.5 * 2 * log(nrow(alcohol)) elbow.tree <- elbow.tree(nodetree, subtrees, store.mult.cont[[3]], alphac= Balph) ## End(Not run)
final.tree
uses bias-corrected costs obtained from bootstrap
function and the predetermined penalty parameter to find the optimal tree from the set of subtrees.
final.tree(nodetree=nodetree, subtrees=subtrees, omega, alphac=2)
final.tree(nodetree=nodetree, subtrees=subtrees, omega, alphac=2)
nodetree |
Fully grown tree from the original data. Output from |
subtrees |
Pruned subtrees from the original data. Output from |
omega |
Bias (i.e. third index of the output) from |
alphac |
Predetermined penalty parameter |
final.tree
is part of the bootstrap
function but can be used to try different penalty parameters without re-running bootstrap
.
subtree |
output from |
final |
A tree with lowest cost value after applying predetermined penalty |
Xu, R. and Adak, S. (2002), Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach. Biometrics, 58: 305-315.
## Not run: data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) store.mult.cont <- bootstrap(B=20, nodetree, subtrees, alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D=4,minfail=20, alphac=2) Balph <- 0.5 * 2 * log(nrow(alcohol)) final.tree <- final.tree(nodetree, subtrees, store.mult.cont[[3]], alphac= Balph) ## End(Not run)
## Not run: data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) store.mult.cont <- bootstrap(B=20, nodetree, subtrees, alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D=4,minfail=20, alphac=2) Balph <- 0.5 * 2 * log(nrow(alcohol)) final.tree <- final.tree(nodetree, subtrees, store.mult.cont[[3]], alphac= Balph) ## End(Not run)
Function that ouputs beta coefficient estimate of each covariate at each observation time point for a given tree, which can be used to plot the time-varying coefficients.
mat.tvbeta(indx, fulltree, subtrees = NULL, survtime, survstatus, x)
mat.tvbeta(indx, fulltree, subtrees = NULL, survtime, survstatus, x)
indx |
Index number of a subtree that needs to be analyzed |
fulltree |
output of |
subtrees |
(Optional) output of |
survtime |
survival time/ follow up time of subjects |
survstatus |
survival status of subjects. 0 for alive and 1 for dead |
x |
a data frame of covariates. In case of single covariate, use [,,drop =F] to keep the data frame structure |
For each predictor, mat.tvbeta
gives the coefficient values at each observation time for a given subtree.
The function outputs a matrix that can be used to plot the time-varying coefficient estimates over time.
The number of rows in the matrix is the # of observations and the number of columns is the product of the # of covariates and the # of specified subtrees.
Xu, R. and Adak, S. (2002), Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach. Biometrics, 58: 305-315.
#This function requires output from output.coxphout, prune, and the original data set. data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) #creating matrix of beta coefficients at each event time point for all subtrees k <- nrow(subtrees) for (l in 1:k) { print(paste("Tree #",l)) coeftmp <- mat.tvbeta(l,nodetree,subtrees,alcohol[,'time'], alcohol[,'event'], x = data.frame(model.matrix(~alc, data=alcohol)[,-c(1), drop = FALSE])) if (l == 1) coef <- coeftmp if (l > 1) coef <- cbind(coef,coeftmp) } ##Creating plot of all subtrees for each predictor: p <- ncol(coef)/k #Number of variables x = data.frame(model.matrix(~alc, data=alcohol)[,-c(1), drop = FALSE]) xnames <- xname(x) xnames <- c('Alcohol 1', 'Alcohol 4') #Subsetting data coefnew <- data.frame(coef) survtime <- alcohol[,'time'] #Setting desired depth (All the subtrees) kk <- nrow(subtrees) for (j in 1:p) { matplot(survtime,coefnew[,seq(from=j,to=kk*p,by=p)],type="l",lty=1:kk,col= (1:kk)+1 ,xlab="Survival Time",ylab=" ") title(main=paste('all:', xnames[j])) legend('bottomleft', legend = paste('tree number', 1:kk), lty=1:kk,col= (1:kk)+1) } ##Creating a plot showing changes in coefficient of two predictors in full tree #creating matrix of beta coefficients at each event time point for full tree coeftmp <- mat.tvbeta(1,nodetree,subtrees,alcohol[,'time'], alcohol[,'event'], x = data.frame(model.matrix(~alc, data=alcohol)[,-c(1), drop = FALSE])) coefnew <- coeftmp matplot(survtime,coefnew,type="l",lty=1:2,col= (1:2)+1,xlab="Survival Time",ylab=" ") legend('bottomleft', legend = c("Alcohol 1", "Alcohol 4"), lty=1:2,col= (1:2)+1)
#This function requires output from output.coxphout, prune, and the original data set. data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) #creating matrix of beta coefficients at each event time point for all subtrees k <- nrow(subtrees) for (l in 1:k) { print(paste("Tree #",l)) coeftmp <- mat.tvbeta(l,nodetree,subtrees,alcohol[,'time'], alcohol[,'event'], x = data.frame(model.matrix(~alc, data=alcohol)[,-c(1), drop = FALSE])) if (l == 1) coef <- coeftmp if (l > 1) coef <- cbind(coef,coeftmp) } ##Creating plot of all subtrees for each predictor: p <- ncol(coef)/k #Number of variables x = data.frame(model.matrix(~alc, data=alcohol)[,-c(1), drop = FALSE]) xnames <- xname(x) xnames <- c('Alcohol 1', 'Alcohol 4') #Subsetting data coefnew <- data.frame(coef) survtime <- alcohol[,'time'] #Setting desired depth (All the subtrees) kk <- nrow(subtrees) for (j in 1:p) { matplot(survtime,coefnew[,seq(from=j,to=kk*p,by=p)],type="l",lty=1:kk,col= (1:kk)+1 ,xlab="Survival Time",ylab=" ") title(main=paste('all:', xnames[j])) legend('bottomleft', legend = paste('tree number', 1:kk), lty=1:kk,col= (1:kk)+1) } ##Creating a plot showing changes in coefficient of two predictors in full tree #creating matrix of beta coefficients at each event time point for full tree coeftmp <- mat.tvbeta(1,nodetree,subtrees,alcohol[,'time'], alcohol[,'event'], x = data.frame(model.matrix(~alc, data=alcohol)[,-c(1), drop = FALSE])) coefnew <- coeftmp matplot(survtime,coefnew,type="l",lty=1:2,col= (1:2)+1,xlab="Survival Time",ylab=" ") legend('bottomleft', legend = c("Alcohol 1", "Alcohol 4"), lty=1:2,col= (1:2)+1)
This function finds the first optimal cutpoint for the time-varying regression effects based on the maximized score statistics and calculates p-value based on a formula from Davies (1987) and O'Quigley and Pessione (1991). This is for depth 1 only.
optimal.cutpoint(survtime, survstatus, x, method = "breslow", acpf = 10, iter.max = 20, eps = 1e-06)
optimal.cutpoint(survtime, survstatus, x, method = "breslow", acpf = 10, iter.max = 20, eps = 1e-06)
survtime |
survival time/ follow up time of subjects |
survstatus |
survival status of subjects. 0 for censored and 1 for an event |
x |
a data frame of covariates. In case of a single covariate, use |
method |
argument for coxph function. Default is 'breslow'. See |
acpf |
The search for the optimal cutpoint starts from the ((acpf/2)+1)th event until the (k - (acpf/2))th event, where k is the total number of events. Default is 10. |
iter.max |
the maximum number of iteration in coxph; default is 20. See |
eps |
argument for coxph function; default is 0.000001. See |
optimal.cutpoint
takes in survival time, survival status, and covariates to find the first optimal cutpoint.
Currently, data need to be arranged in descending order of time and with no missing.
optimal.cutpoint
returns the following information:
breakpt |
optimal cutpoint |
scoretest |
Maximum score associated with the optimal cut point |
summary |
3 output from |
pvalue |
p-value to test the existance of a change point against none |
Davies, R. (1987). Hypothesis Testing when a Nuisance Parameter is Present Only Under the Alternatives. Biometrika, 74(1), 33-43.
O'Quigley, J., and Pessione, F. (1991). The Problem of a Covariate-Time Qualitative Interaction in a Survival Study. Biometrics, 47(1), 101-115.
##Call in alcohol data set data('alcohol') require(survival) coxtree <- optimal.cutpoint(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE])
##Call in alcohol data set data('alcohol') require(survival) coxtree <- optimal.cutpoint(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE])
This funtion organizes coxph.tree
output into a format that can be used as an input for prune
, plot_coxphtree
, and mat.tvbeta
.
output.coxphout(coxout)
output.coxphout(coxout)
coxout |
output from |
output.coxphout
returns a table with following columns.
Depth |
Depth value specified in the argument |
Block |
Time intervals present at each depth |
Node |
Unique number assigned to each block |
Left |
Node of a block that was divided into the left side in the next depth |
Right |
Node of a block that was divided into the right side in the next depth |
Score |
Modified score statistic of each node |
lkl |
Likelihood ratio value of each node |
Start |
Starting time of the node |
End |
Ending time of the node |
# of Cases |
Number of observations in each node |
# of Events |
Number of events in each node |
Xu, R. and Adak, S. (2002), Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach. Biometrics, 58: 305-315.
This functin uses the full tree and subtrees (optional) to create visual outputs of the tree(s) and segments.
plot_coxphtree(fulltree, subtrees = NULL, mm = 3, start = 0, pdf = FALSE, file.name)
plot_coxphtree(fulltree, subtrees = NULL, mm = 3, start = 0, pdf = FALSE, file.name)
fulltree |
output of |
subtrees |
(Optional) output of |
mm |
Number of subtrees plot to be placed in one page. Default is 3 |
start |
Sets starting point for segments. Useful if the minimum event time is far away from 0. |
pdf |
Do you want to export the plots in pdf format? Default is FALSE. When set as FALSE, all plots need to be cleared before running this function to avoid 'Plot rendering error.' |
file.name |
Name for the pdf file output. |
plot_coxphtree
takes an output from output.coxphout
and creates treeplot and barplot showing blocks at each depth.
If an output from prune
is also included in the argument, the function creates treeplot and barplot for each subtree.
In the barplot, end nodes are in dark blue color.
Xu, R. and Adak, S. (2002), Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach. Biometrics, 58: 305-315.
#This function requires output from output.coxphout and prune(optional) data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) plot_coxphtree(nodetree, subtrees, start = 70, pdf = FALSE)
#This function requires output from output.coxphout and prune(optional) data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree) plot_coxphtree(nodetree, subtrees, start = 70, pdf = FALSE)
This function merges over-segmented intervals to create optimally pruned subtrees.
prune(fulltree)
prune(fulltree)
fulltree |
output from |
prune
uses the CART algorithm and -log (partial likelihood) as cost to find the optimally pruned subtrees.
prune
returns a matrix with the following columns, where each row is an optimally pruned subtree:
K |
subtrees number 1, 2, etc. Tree #1 is the full tree |
N[1] |
Number of terminal nodes |
alpha |
penalty parameter corresponding to the subtree |
S[1] |
-log(partial likelihood) of the subtree |
pruneoff |
Node that was removed from the previous larger subtree to obtain the current subtree |
Xu, R. and Adak, S. (2002), Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach. Biometrics, 58: 305-315.
##Call in alcohol data set data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree)
##Call in alcohol data set data('alcohol') require(survival) coxtree <- coxph.tree(alcohol[,'time'], alcohol[,'event'], x = alcohol[,'alc', drop = FALSE], D = 4) nodetree <- output.coxphout(coxtree) subtrees <- prune(nodetree)