Title: | Joint Latent Class Trees for Joint Modeling of Time-to-Event and Longitudinal Data |
---|---|
Description: | Implements the tree-based approach to joint modeling of time-to-event and longitudinal data. This approach looks for a tree-based partitioning such that within each estimated latent class defined by a terminal node, the time-to-event and longitudinal responses display a lack of association. See Zhang and Simonoff (2018) <arXiv:1812.01774>. |
Authors: | Ningshan Zhang and Jeffrey S. Simonoff |
Maintainer: | Ningshan Zhang <[email protected]> |
License: | GPL |
Version: | 0.0.2 |
Built: | 2025-01-20 06:43:50 UTC |
Source: | CRAN |
Fits Joint Latent Class Tree (JLCT) model.
The main function of this package is jlctree
.
The dataset contains three types of variables about each subject:
the time-to-event, the longitudinal outcome, and additional covariates.
The goal is to jointly model the time-to-event by a survival model
and the longitudinal outcomes by a linear mixed-effects model,
and using the additional covariates.
The longitudinal outcomes consist of repeated measurements, thus
are expected to be time-varying for a given subject.
The additional covariates can be either time-invariant or time-varying.
Nevertheless, jlctree
also allows data with time-invariant longitudinal outcome
and covariates.
This package implements the Joint Latent Class Tree (JLCT) modeling approach. JLCT assumes that the population consists of homogeneous latent classes; within a latent class subjects follow the same survival and linear mixed-effects model, but those differ from class to class. In addition, JLCT assumes that conditioning on latent class membership, time-to-event and longitudinal outcomes are independent. JLCT looks for a tree-based partitioning such that within each estimated latent class defined by a terminal node, the time-to-event and longitudinal responses display a lack of association. Once the tree is constructed, JLCT assigns each observation to a latent class (i.e. terminal node), and independently fits survival and linear mixed-effects models, using the class membership information.
The time-to-event data format required by jlctree
depends on the
time-varying nature of the variables to use:
if longitudinal outcome, or any of the covariates
specified in survival
, classmb
, fixef
, and ranef
is time-varying, then the time-to-event data must be in left-truncated right-censored (LTRC) format.
Otherwise, when longitudinal outcome and all of the covariates are time-invariant,
there should be only one observation per subject, and the time-to-event data
can either be in LTRC format (when there exits subject-specific entry time) or in
standard right-censored format.
To construct time-to-event data in left-truncated right-censored format, consider using function
tmerge
in R
package survival
.
See the simulated data_timevar
and data_timeinv
for examples
of LTRC format and right-censored format respectively.
Ningshan Zhang and Jeffrey S. Simonoff: Joint Latent Class Trees: A Tree-Based Approach to Joint Modeling of Time-to-event and Longitudinal Data. arXiv:1812.01774 (2018).
jlctree, data_timeinv, data_timevar
A simulated dataset with time-invariant longitudinal outcome, time-to-event, and time-invariant covariates. Since longitudinal outcome and all of the covariates are time-invariant, there is only one observation per subject. The time-to-event data is right-censored.
data(data_timeinv)
data(data_timeinv)
A data frame with 500 rows and 10 variables.
subject identifier (1 - 500)
continuous covariate between 0 and 1; time-invariant
continuous covariate between 0 and 1; time-invariant
binary covariate; time-invariant
continuous covariate between 0 and 1; time-invariant
categorical covariate taking values from 1, 2, 3, 4, 5; time-invariant
right-censored event time
censoring indicator, 1 if censored and 0 otherwise
longitudinal outcome; time-invariant
true latent class identifier 1, 2, 3, 4, which is determined by
the outcomes of and
, with some noise
# The data for the first five subjects (ID = 1 - 5): # # ID X1 X2 X3 X4 X5 time_Y delta y g # 1 0.27 0.53 1 0.8 1 10.703940 0 0.8923776 2 # 2 0.37 0.68 1 0.5 3 9.153915 1 0.6871529 2 # 3 0.57 0.38 1 0.2 1 4.489658 1 0.8410745 3 # 4 0.91 0.95 0 0.4 3 1.009941 1 2.1058681 4 # 5 0.20 0.12 0 0.8 5 11.125094 0 0.1383508 1
# The data for the first five subjects (ID = 1 - 5): # # ID X1 X2 X3 X4 X5 time_Y delta y g # 1 0.27 0.53 1 0.8 1 10.703940 0 0.8923776 2 # 2 0.37 0.68 1 0.5 3 9.153915 1 0.6871529 2 # 3 0.57 0.38 1 0.2 1 4.489658 1 0.8410745 3 # 4 0.91 0.95 0 0.4 3 1.009941 1 2.1058681 4 # 5 0.20 0.12 0 0.8 5 11.125094 0 0.1383508 1
A simulated dataset with time-varying longitudinal outcome, time-to-event, and time-varying covariates. The dataset is already converted into left-truncated right-censored (LTRC) format, so that the Cox model with time-varying longitudinal outcome as a covariate can be fit. See, for example, Fu and Simonoff (2017).
data(data_timevar)
data(data_timevar)
A data frame with 866 rows and 11 variables. The variables are as follows:
subject identifier (1 - 500)
continuous covariate between 0 and 1; time-varying
continuous covariate between 0 and 1; time-varying
binary covariate; time-varying
continuous covariate between 0 and 1; time-varying
categorical covariate taking values from 1, 2, 3, 4, 5; time-varying
left-truncated time
right-censored time
censoring indicator, 1 if censored and 0 otherwise
longitudinal outcome; time-varying
true latent class identifier 1, 2, 3, 4, which is determined by
the outcomes of and
, with some noise
Fu, W. and Simonoff, J. S. (2017). Survival trees for left-truncated and right-censored data, with application to time-varying covariate data. Biostatistics, 18(2), 352-369.
# The data for the first five subjects (ID = 1 - 5): # # ID X1 X2 X3 X4 X5 time_L time_Y delta y g # 1 0.27 0.53 0 0.0 4 0.09251632 1.536030 0 -0.2191137 1 # 1 0.49 0.71 1 0.0 5 1.53603028 4.366769 1 0.6429496 2 # 2 0.37 0.68 1 0.4 4 0.44674406 1.203560 0 0.5473454 2 # 2 0.65 0.67 0 0.2 5 1.20355968 1.330767 1 1.5515773 4 # 3 0.57 0.38 0 0.2 4 0.82944637 1.267248 0 1.1410397 3 # 3 0.79 0.19 1 0.4 4 1.26724819 5.749602 1 1.0888787 3 # 4 0.91 0.95 0 0.9 1 0.81237396 1.807741 1 2.2105303 4 # 5 0.20 0.12 1 0.3 5 0.80510669 1.029981 0 -0.1167814 1 # 5 0.02 0.31 0 0.4 5 1.02998145 6.404183 1 -0.1747389 1
# The data for the first five subjects (ID = 1 - 5): # # ID X1 X2 X3 X4 X5 time_L time_Y delta y g # 1 0.27 0.53 0 0.0 4 0.09251632 1.536030 0 -0.2191137 1 # 1 0.49 0.71 1 0.0 5 1.53603028 4.366769 1 0.6429496 2 # 2 0.37 0.68 1 0.4 4 0.44674406 1.203560 0 0.5473454 2 # 2 0.65 0.67 0 0.2 5 1.20355968 1.330767 1 1.5515773 4 # 3 0.57 0.38 0 0.2 4 0.82944637 1.267248 0 1.1410397 3 # 3 0.79 0.19 1 0.4 4 1.26724819 5.749602 1 1.0888787 3 # 4 0.91 0.95 0 0.9 1 0.81237396 1.807741 1 2.2105303 4 # 5 0.20 0.12 1 0.3 5 0.80510669 1.029981 0 -0.1167814 1 # 5 0.02 0.31 0 0.4 5 1.02998145 6.404183 1 -0.1747389 1
Computes the likelihood ratio test statistic. Not to be called directly by the user.
get_lrt(f1, f2, data, stable = TRUE, cov.max = 1e+05)
get_lrt(f1, f2, data, stable = TRUE, cov.max = 1e+05)
f1 |
a two-sided formula of the fitted survival model, without the longitudinal outcome in the right side of the formula. |
f2 |
a two-sided formula of the fitted survival model, same as |
data |
a data.frame containing the covariates in both |
stable |
a parameter, see also |
cov.max |
a parameter, see also |
The likelihood ratio test statistic.
data(data_timevar); f1 <- Surv(time_L, time_Y, delta)~X3+X4+X5; f2 <- Surv(time_L, time_Y, delta)~y+X3+X4+X5; get_lrt(f1, f2, data_timevar);
data(data_timevar); f1 <- Surv(time_L, time_Y, delta)~X3+X4+X5; f2 <- Surv(time_L, time_Y, delta)~y+X3+X4+X5; get_lrt(f1, f2, data_timevar);
Computes the test statistic at the current node. Not to be called directly by the user.
get_node_val(f1, f2, data, lrt = TRUE, ...)
get_node_val(f1, f2, data, lrt = TRUE, ...)
f1 |
a two-sided formula of the fitted survival model, without the longitudinal outcome in
the right side of the formula. Only needed when |
f2 |
a two-sided formula of the fitted survival model, same as |
data |
a data.frame containing covariates in |
lrt |
if TRUE, use likelihood ratio test, otherwise use Wald test. Default is TRUE. |
... |
further arguments to pass to or from other methods. |
The test statistic at the current node.
data(data_timevar); f1 <- Surv(time_L, time_Y, delta)~X3+X4+X5; f2 <- Surv(time_L, time_Y, delta)~y+X3+X4+X5; get_node_val(f1, f2, data_timevar, lrt=TRUE);
data(data_timevar); f1 <- Surv(time_L, time_Y, delta)~X3+X4+X5; f2 <- Surv(time_L, time_Y, delta)~y+X3+X4+X5; get_node_val(f1, f2, data_timevar, lrt=TRUE);
Computes the Wald test statistic. Not to be called directly by the user.
get_wald(f, data)
get_wald(f, data)
f |
a two-sided formula of the fitted survival model, with the longitudinal outcome being the first covariate on the right side of the formula. |
data |
a data.frame containing covariates in |
The Wald test statistic.
data(data_timevar); f <- Surv(time_L, time_Y, delta)~y+X3+X4+X5; get_wald(f, data_timevar);
data(data_timevar); f <- Surv(time_L, time_Y, delta)~y+X3+X4+X5; get_wald(f, data_timevar);
Fits Joint Latent Class Tree model.
This is the main function that is normally called by the user.
See jlctree-package
for more details.
jlctree(survival, classmb, fixed, random, subject, data, parms = list(), control = list())
jlctree(survival, classmb, fixed, random, subject, data, parms = list(), control = list())
survival |
a two-sided formula object; required. The left side of the formula corresponds
to a |
classmb |
one-sided formula describing the covariates in the class-membership tree construction; required.
Covariates used for tree construction are separated by |
fixed |
two-sided linear formula object for the fixed-effects in the linear mixed-effects model for
longitudinal outcomes; required.
The longitudinal outcome is on the left of |
random |
one-sided formula for the node-specific random effects in the linear mixed-effects model for
longitudinal outcomes; optional.
If missing, there are no node-specific random effects in the fitted linear mixed-effects model.
Covariates with a random effect are separated by |
subject |
name of the covariate representing the subject identifier; optional. If missing, there are no subject-specific random intercepts in the fitted linear mixed-effects model for longitudinal outcomes. |
data |
the dataset; required. |
parms |
parameter list of Joint Latent Class Tree model parameters.
See also |
control |
|
A list with components:
tree |
an |
control |
the |
parms |
the |
lmmmodel |
an |
coxphmodel_diffh_diffs |
a |
coxphmodel_diffh |
a |
coxphmodel_diffs |
a |
jlctree-package, jlctree.control, rpart.control
# Time-to-event in LTRC format: data(data_timevar) tree <- jlctree(survival=Surv(time_L, time_Y, delta)~X3+X4+X5, classmb=~X1+X2, fixed=y~X1+X2+X3+X4+X5, random=~1, subject='ID',data=subset(data_timevar, ID<=30), parms=list(maxng=4, fity=FALSE, fits=FALSE)) # Time-to-event in right-censored format: data(data_timeinv) tree <- jlctree(survival=Surv(time_Y, delta)~X3+X4+X5, classmb=~X1+X2, fixed=y~X1+X2+X3+X4+X5, random=~1, subject='ID', data=subset(data_timeinv, ID<=30), parms=list(maxng=4, fity=FALSE, fits=FALSE))
# Time-to-event in LTRC format: data(data_timevar) tree <- jlctree(survival=Surv(time_L, time_Y, delta)~X3+X4+X5, classmb=~X1+X2, fixed=y~X1+X2+X3+X4+X5, random=~1, subject='ID',data=subset(data_timevar, ID<=30), parms=list(maxng=4, fity=FALSE, fits=FALSE)) # Time-to-event in right-censored format: data(data_timeinv) tree <- jlctree(survival=Surv(time_Y, delta)~X3+X4+X5, classmb=~X1+X2, fixed=y~X1+X2+X3+X4+X5, random=~1, subject='ID', data=subset(data_timeinv, ID<=30), parms=list(maxng=4, fity=FALSE, fits=FALSE))
jlctree
.Sets the control parameters for jlctree
.
jlctree.control(test.stat = "lrt", stop.thre = 3.84, stable = TRUE, maxng = 6, min.nevents = 5, split.add = 20, cov.max = 1e+05, fity = TRUE, fits = TRUE, ...)
jlctree.control(test.stat = "lrt", stop.thre = 3.84, stable = TRUE, maxng = 6, min.nevents = 5, split.add = 20, cov.max = 1e+05, fity = TRUE, fits = TRUE, ...)
test.stat |
test statistic to use, “lrt” for likelihood ratio test, and “wald” for Wald test. Default is “lrt”. |
stop.thre |
stops splitting if current node has test statistic less than |
stable |
if TRUE, check the variance of the estimated coefficients in survival models fit at tree nodes.
If a node has variance larger than |
maxng |
maximum number of terminal nodes. Default is 6. |
min.nevents |
minimum number of events in any terminal node. By default, this parameter is set to the number of covariates used in the survival model. |
split.add |
when computing the difference between parent node's test statistic
and sum of child nodes' test statistics, add |
cov.max |
upper bound on the variance of the estimated coefficients in survival models at tree nodes. Default is 1e5. |
fity |
if TRUE, once a tree is constructed, fit a linear mixed-effects model using tree nodes as group indicators. Default is TRUE. |
fits |
if TRUE, once a tree is constructed, fit survival models using tree nodes as group indicators. Default is TRUE. |
... |
further arguments to pass to or from other methods. |
A list of all these parameters.
rpart
tree to have the desired number of nodes.Prunes an rpart
tree to have the desired number of nodes.
prune_tree(tree, maxn)
prune_tree(tree, maxn)
tree |
the tree to prune, an |
maxn |
desired number of terminal nodes. |
The pruned tree, an rpart
object.
rpart
.Defines the evaluation function for a new splitting method of rpart
.
Not to be called directly by the user.
surve(y, wt, parms)
surve(y, wt, parms)
y |
the response value as found in the formula that is passed in by |
wt |
the weight vector from the call, if any. |
parms |
the vector or list (if any) supplied by the user as a
|
See reference.
https://cran.r-project.org/package=rpart/vignettes/usercode.pdf
rpart
.Defines the initialization function for a new splitting method of rpart
.
Not to be called directly by the user.
survi(y, offset, parms, wt)
survi(y, offset, parms, wt)
y |
the response value as found in the formula that is passed in by |
offset |
the offset term, if any, found on the right hand side of the
formula that is passed in by |
parms |
the vector or list (if any) supplied by the user as a
|
wt |
the weight vector from the call, if any. |
See reference.
https://cran.r-project.org/package=rpart/vignettes/usercode.pdf
rpart
.Defines the splitting function for a new splitting method of rpart
.
Not to be called directly by the user.
survs(y, wt, x, parms, continuous)
survs(y, wt, x, parms, continuous)
y |
the response value as found in the formula that is passed in by |
wt |
the weight vector from the call, if any. |
x |
vector of |
parms |
the vector or list (if any) supplied by the user as a
|
continuous |
if TRUE the |
See reference.
https://cran.r-project.org/package=rpart/vignettes/usercode.pdf