Title: | Landmark Prediction with Multiple Short-Term Events |
---|---|
Description: | Contains functions for a flexible varying-coefficient landmark model by incorporating multiple short-term events into the prediction of long-term survival probability. For more information about landmark prediction please see Li, W., Ning, J., Zhang, J., Li, Z., Savitz, S.I., Tahanan, A., Rahbar.M.H., (2023+). "Enhancing Long-term Survival Prediction with Multiple Short-term Events: Landmarking with A Flexible Varying Coefficient Model". |
Authors: | Wen Li [aut, cre], Qian Wang [aut] |
Maintainer: | Wen Li <[email protected]> |
License: | GPL-3 |
Version: | 0.5.0 |
Built: | 2024-11-04 06:34:14 UTC |
Source: | CRAN |
Landmark prediction with multiple short-term events
multipredict( data, formula, t0, L, SE = FALSE, SE.gs = FALSE, s1_beta1 = NULL, s2_beta2 = NULL, s1s2_beta3 = NULL, grid1 = seq(0.01, 5, length.out = 20), grid2 = seq(0.01, 5, length.out = 20), grid3 = list(seq(0.01, 5, length.out = 20), seq(0.01, 5, length.out = 20)), folds.grid = 8, reps.grid = 3, c01 = 0.1, c02 = 0.1, c03 = 0.05, B = 500, gs.method = "loop", gs.cl = NULL, gs.seed = NULL )
multipredict( data, formula, t0, L, SE = FALSE, SE.gs = FALSE, s1_beta1 = NULL, s2_beta2 = NULL, s1s2_beta3 = NULL, grid1 = seq(0.01, 5, length.out = 20), grid2 = seq(0.01, 5, length.out = 20), grid3 = list(seq(0.01, 5, length.out = 20), seq(0.01, 5, length.out = 20)), folds.grid = 8, reps.grid = 3, c01 = 0.1, c02 = 0.1, c03 = 0.05, B = 500, gs.method = "loop", gs.cl = NULL, gs.seed = NULL )
data |
Input dataset |
formula |
a |
t0 |
Landmark time |
L |
Length of time into the future (starting from the landmark time) for which we want to make a risk prediction. This is called the 'prediction horizon' in the dynamic prediction literature |
SE |
Logical. 'True' if user wants to estimate SE for the coefficient using the perturbation-resampling method |
SE.gs |
Logical. 'True' if user wants to conduct grid search for the bandwidth in each perturbation. It is expected to give more accurate results but will consume longer time. 'False' if user wants to use the same bandwidth found in the point estimation for all perturbations |
s1_beta1 |
A scalar or a vector. Time to the occurrence of short-term event 1 for the estimation
of the regression coefficient beta1 in group 2. If a |
s2_beta2 |
A scalar or a vector. Time to the occurrence of short-term event 2 for the estimation
of the regression coefficient beta2 in group 3. If a |
s1s2_beta3 |
A matrix or a dataframe with two columns. The first column should be s1
and the second should be s2. Time to the occurrence of short-term event 1 & 2 for the estimation
of the regression coefficient beta3 in group 4. If a |
grid1 |
A prespecified grid for bandwidth search for group2 |
grid2 |
A prespecified grid for bandwidth search for group3 |
grid3 |
A list with prespecified grids for bandwidth search for group4 |
folds.grid |
The number of folds in cross-validation |
reps.grid |
The number of repetitions of cross-validation |
c01 |
A constant to shrink the bandwidth for group2 |
c02 |
A constant to shrink the bandwidth for group3 |
c03 |
A constant to shrink the bandwidth for group4 |
B |
Number of perturbations for estimating SE |
gs.method |
Method used by gridsearch. Default is 'loop'. Use 'snow' will implement parallel computing and will speed up the calculation |
gs.cl |
Default is |
gs.seed |
An integer to set the seed for parallel computing to ensure reproducible outcome, or 'NULL' if not to set reproducible outcome |
The multipredict
function fits time-fixed model and univariate/bivariate
varying-coefficient models using the data from subgroups formed based on the
information on the short-term outcomes (such as HF hospitalization and CHD hospitalization)
before landmark time t0, among those who haven't experienced the long-term outcome (such as death) at t0.
In this way the short-term outcome information are incorporated into the prediction
of long-term survival outcomes, and the risk prediction can vary based on the
event times of the short-term outcomes.
The +s1()
statement specified the column that determines the occurrence time of the first short-term outcome.
The +s2()
statement specified the column that determines the occurrence time of the second short-term outcome.
User may set the statement gs.method
= 'True'.
By default the regression coefficients for group 1 is calculated in each run of this function.
Currently, parameter estimates from parallel computing are slightly different in each run because of the different (uncontrolled) random numbers used in the estimation. This will be solved in the near future.
returns estimated coefficients for each short-term outcome and the long-term outcome:
coefficients |
A named vector of the estimated regression coefficients |
SE |
The standard error of coefficients estimated by perturbation resampling |
Wen Li, Qian Wang
Li, Wen. (2023), "Landmarking Using A Flexible Varying Coefficient Model to Improve Prediction Accuracy of Long-term Survival Following Multiple Short-term Events An Application to the Atherosclerosis Risk in Communities (ARIC) Study", Statistics in Medicine 90(7) 1-29. doi:10.18637/jss.v090.i07
Parast, Layla, Su-Chun Cheng, and Tianxi Cai. (2012), "Landmark Prediction of Long Term Survival Incorporating Short Term Event Time Information", J Am Stat Assoc 107(500) 1492-1501. doi: 10.1080/01621459.2012.721281
"Incorporating short-term outcome information to predict long-term survival with discrete markers". Biometrical Journal 53.2 (2011): 294-307. doi: 10.1080/01621459.2012.721281
library(survival) library(emdbook) library(NMOF) library(landpred) library(snow) set.seed(1234) res <- multipredict(data = simulation, formula = Surv(time, outcome) ~ age + s1(st1) + s2(st2), t0 = 5, L = 20, SE = FALSE, gs.method = "loop", gs.cl = 2, SE.gs = FALSE, B = 200, gs.seed = 100, s1_beta1 = 1.5, grid1 = seq(0.01, 5, length.out=20), s2_beta2 = 1.5, grid2 = seq(0.01, 5, length.out=20), s1s2_beta3 = NULL, grid3=list(seq(0.01, 5, length.out=20), seq(0.01, 5, length.out=20))) print(res)
library(survival) library(emdbook) library(NMOF) library(landpred) library(snow) set.seed(1234) res <- multipredict(data = simulation, formula = Surv(time, outcome) ~ age + s1(st1) + s2(st2), t0 = 5, L = 20, SE = FALSE, gs.method = "loop", gs.cl = 2, SE.gs = FALSE, B = 200, gs.seed = 100, s1_beta1 = 1.5, grid1 = seq(0.01, 5, length.out=20), s2_beta2 = 1.5, grid2 = seq(0.01, 5, length.out=20), s1s2_beta3 = NULL, grid3=list(seq(0.01, 5, length.out=20), seq(0.01, 5, length.out=20))) print(res)
Identifier of the first short-term outcome
s1(x)
s1(x)
x |
the first short-term outcome |
This is a helperfunction used by multipredict()
to identify the first
short-term outcome. If used directly it will only return the input. Therefore it should
not be called directly and only be used in the 'formula' argument in multipredict()
to identify the variable that is the first short-term outcome. See 'Examples' section of
multipredict()
for more details.
returns the input if this function is used directly. See 'details' for more explanation
Identifier of the second short-term outcome
s2(x)
s2(x)
x |
the second short-term outcome |
This is a helperfunction used by multipredict()
to identify the second
short-term outcome. It should not be called directly. If used directly it will only return the input. Therefore it should
not be called directly and only be used in the 'formula' argument in multipredict()
to identify the variable that is the second short-term outcome. See 'example' section of
multipredict()
for more details.
returns the input if this function is used directly. See 'details' for more explanation
This is a simulated data used as an example and test for multipredict()
function. The data is
randomly generated by computer and has no practical meanning.
data(simulation)
data(simulation)
An object of class data.frame
with 3000 rows and 6 columns.