Title: | Joint Models of Survival and Multivariate Longitudinal Data |
---|---|
Description: | Fit joint models of survival and multivariate longitudinal data. The longitudinal data is specified by generalised linear mixed models. The joint models are fit via maximum likelihood using an approximate expectation maximisation algorithm. Bernhardt (2015) <doi:10.1016/j.csda.2014.11.011>. |
Authors: | James Murray [aut, cre] |
Maintainer: | James Murray <[email protected]> |
License: | GPL-3 |
Version: | 0.4.5 |
Built: | 2024-12-05 07:10:51 UTC |
Source: | CRAN |
Perform a likelihood ratio test between two (nested) joint
models. The user must decide whether the models are truly nested.
## S3 method for class 'joint' anova(object, object2, ...)
## S3 method for class 'joint' anova(object, object2, ...)
object |
a joint model fit by the |
object2 |
a joint model fit by the |
... |
additional arguments (none used). |
A list of class anova.joint
with elements
mod0
the name of object
.
l0
the log-likelihood of the nested model, i.e. fit under the null.
AIC0
AIC for object
.
BIC0
BIC for object
.
mod1
the name of object2
.
l1
the log-likelihood under the alternative hypothesis.
AIC1
AIC for object2
.
BIC1
BIC for object2
.
LRT
likelihood ratio test statistic.
p
the p-value of LRT
.
warnSurv
internal - logical value for printing difference in survival models.
warnRanefs
internal - logical value for printing difference in random effects specifications.
James Murray ([email protected])
joint
and logLik.joint
.
rm(list=ls()) data(PBC) # Compare quadratic vs linear time specification for log(serum bilirubin) ----- PBC$serBilir <- log(PBC$serBilir) long.formulas1 <- list(serBilir ~ drug * time + (1 + time|id)) long.formulas2 <- list(serBilir ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id)) surv.formula <- Surv(survtime, status) ~ drug family <- list('gaussian') # Fit the two competing models (fit is nested in fit2) ------------------------ fit <- joint(long.formulas1, surv.formula, PBC, family, control = list(verbose = FALSE)) fit2 <- joint(long.formulas2, surv.formula, PBC, family, control = list(verbose = FALSE)) anova(fit, fit2) # Quadratic terms improve fit significantly.
rm(list=ls()) data(PBC) # Compare quadratic vs linear time specification for log(serum bilirubin) ----- PBC$serBilir <- log(PBC$serBilir) long.formulas1 <- list(serBilir ~ drug * time + (1 + time|id)) long.formulas2 <- list(serBilir ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id)) surv.formula <- Surv(survtime, status) ~ drug family <- list('gaussian') # Fit the two competing models (fit is nested in fit2) ------------------------ fit <- joint(long.formulas1, surv.formula, PBC, family, control = list(verbose = FALSE)) fit2 <- joint(long.formulas2, surv.formula, PBC, family, control = list(verbose = FALSE)) anova(fit, fit2) # Quadratic terms improve fit significantly.
joint
objectUse an existing model fit by joint
along with the data object originally
used and obtain a mean estimate, standard errors and 95% confidence interval using the
bootstrap. The original data is resampled by subject, not by observation.
boot.joint( fit, data, boot.size = NULL, nboot = 100L, replace = TRUE, progress = TRUE, use.MLEs = TRUE, control = list() )
boot.joint( fit, data, boot.size = NULL, nboot = 100L, replace = TRUE, progress = TRUE, use.MLEs = TRUE, control = list() )
fit |
a joint model fit by the |
data |
the original data used to fit the above joint model. |
boot.size |
integer, specifies the number of subjects to resample in the bootstrapping
approach. The default value is |
nboot |
integer, specifies the number of bootstrap samples, default value is
|
replace |
logical, should sampling be done with replacement? Defaults to
|
progress |
logical, should a text progress bar showing overall progress be shown
and updated after each successful bootstrapped model fit? Defaults to |
use.MLEs |
logical, should the MLEs of the |
control |
a list of control arguments, with same possible arguments as shown in
|
A list of class boot.joint
which contains the MLEs from supplied joint
object, as well as the bootstrapped summaries and some model/computation information.
James Murray ([email protected]).
# Bivariate fit on PBC data ----------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin', 'platelets')) PBC <- na.omit(PBC) # Specify bivariate fit long.formulas <- list( albumin ~ time*drug + (1 + time|id), platelets ~ time * drug + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian', 'poisson')) # Set 50 bootstraps, with lower absolute tolerance and convergence of 'either'. BOOT <- boot.joint(fit, PBC, nboot = 50L, control = list(tol.abs = 5e-3, conv = 'either'), use.MLEs = TRUE) BOOT # Print to console via S3 method
# Bivariate fit on PBC data ----------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin', 'platelets')) PBC <- na.omit(PBC) # Specify bivariate fit long.formulas <- list( albumin ~ time*drug + (1 + time|id), platelets ~ time * drug + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian', 'poisson')) # Set 50 bootstraps, with lower absolute tolerance and convergence of 'either'. BOOT <- boot.joint(fit, PBC, nboot = 50L, control = list(tol.abs = 5e-3, conv = 'either'), use.MLEs = TRUE) BOOT # Print to console via S3 method
Obtain the conditional distribution of the random effects of a joint
model
fit. This is achieved by a Metropolis scheme. Approximate normality across random effects is
expected, and could be useful in diagnosing potential issues surrounding model fits.
cond.ranefs(fit, burnin = 500L, N = 3500L, tune = 2)
cond.ranefs(fit, burnin = 500L, N = 3500L, tune = 2)
fit |
a joint model fit by the |
burnin |
Number of burn-in iterations to discard, defaults to 500. |
N |
Number of MC iterations to carry out post burn-in, defaults to 3500. |
tune |
Tuning parameter, problem-specific, defaults to 2. |
A list of class cond.b.joint
containing:
A list of length n
containing the history of post burn-in.
A numeric vector containing the acceptance rate for each sampled subject.
The ModelInfo list from joint
. Used by S3 methods for class
cond.b.joint
.
Posterior estimates at MLEs for the random effects. Same as ranef(joint)
.
The covariances of bhats
.
The MLE estimate for the variance-covariance matrix of random effects from
fit
.
Dimension of random effects.
Number of responses.
The names of the random effects as determined by call to joint
.
The amount of burn-in used.
Number of MC iterations.
tuning parameter used
The number of observations for each subject for each response.
Time taken for cond.ranefs
to complete.
dat <- simData()$data long.formulas <- list(Y.1 ~ time + cont + bin + (1 + time|id), Y.2 ~ time + cont + bin + (1 + time|id)) surv.formula <- Surv(survtime, status) ~ bin fit <- joint(long.formulas, surv.formula, dat, list("gaussian","gaussian")) cond.b <- cond.ranefs(fit, burnin = 50L, N = 1000, tune = 2) cond.b plot(cond.b) # Overall plot(cond.b, id = 1) # Plot the first subject (see plot.cond.b.joint).
dat <- simData()$data long.formulas <- list(Y.1 ~ time + cont + bin + (1 + time|id), Y.2 ~ time + cont + bin + (1 + time|id)) surv.formula <- Surv(survtime, status) ~ bin fit <- joint(long.formulas, surv.formula, dat, list("gaussian","gaussian")) cond.b <- cond.ranefs(fit, burnin = 50L, N = 1000, tune = 2) cond.b plot(cond.b) # Overall plot(cond.b, id = 1) # Plot the first subject (see plot.cond.b.joint).
Calculates individualised conditional survival probabilities for subjects
during a period of follow-up using a joint
model fit along with requisite longitudinal
process history.
Note that this function is largely designed for use within the ROC function which assesses discriminatory power of the joint model, however it does function by itself with proper use of its arguments.
dynPred( data, id, fit, u = NULL, nsim = 200, progress = TRUE, scale = NULL, df = NULL )
dynPred( data, id, fit, u = NULL, nsim = 200, progress = TRUE, scale = NULL, df = NULL )
data |
the data to which the original |
id |
subject identifier, i.e. for which subject is the conditional survival probabilities desired? |
fit |
a joint model fit by the |
u |
a numeric |
nsim |
how many Monte Carlo simulations should be carried out? Defaults to
|
progress |
a logical, if |
scale |
numeric scales the variance-covariance parameter in the proposal distribution for
the Metropolis-Hastings algorithm. Defaults to |
df |
numeric denotes the degrees of freedom of the proposed |
Dynamic predictions for the time-to-event process based on information available
on the subject's longitudinal process up to given time are calculated by Monte Carlo
simulation outlined in Rizopoulos (2011). For a subject last observed at time
, the
probability that they survive until future time
is
where is the true failure time for subject
,
their
longitudinal measurements up to time
, and
the survival function.
is drawn from the multivariate normal distribution with mean
and its variance taken from a fitted
joint
object.
is drawn from the
distribution by means of a
Metropolis-Hastings algorithm with
nsim
iterations.
A list of class dynPred
which consists of three items:
pi
A data.frame
which contains each candidate failure time (supplied by
u
), with the mean, median and 2.5% and 97.5% quantiles of probability of survival
until this failure time.
pi.raw
A matrix
of with nsim
rows and length(u)
columns,
each row represents the th conditional survival probability of survival each
u
survival time. This is largely for debugging purposes.
The acceptance rate of the Metropolis-Hastings algorithm on the random effects.
James Murray ([email protected]).
Bernhardt PW, Zhang D and Wang HJ. A fast EM Algorithm for Fitting Joint Models of a Binary Response to Multiple Longitudinal Covariates Subject to Detection Limits. Computational Statistics and Data Analysis 2015; 85; 37–53
Rizopoulos D. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics 2011; 67: 819–829.
ROC
and plot.dynPred
.
data(PBC) PBC$serBilir <- log(PBC$serBilir) # Focus in on id 81, who fails at around 7 years of follow-up. \code{dynPred} allows us to # infer how the model believes their survival probability would've progressed (ignoring the # true outcome at start time). # Univariate ----------------------------------------------------------- long.formulas <- list(serBilir ~ drug * time + (1 + time|id)) surv.formula <- Surv(survtime, status) ~ drug family <- list('gaussian') fit <- joint(long.formulas, surv.formula, PBC, family) preds <- dynPred(PBC, id = 81, fit = fit, u = NULL, nsim = 200, scale = 2) preds plot(preds) # Bivariate ------------------------------------------------------------ # Does introduction of albumin affect conditional survival probability? long.formulas <- list( serBilir ~ drug * time + I(time^2) + (1 + time + I(time^2)|id), albumin ~ drug * time + (1 + time|id) ) fit <- joint(long.formulas, surv.formula, data = PBC, family = list("gaussian", "gaussian")) bi.preds <- dynPred(PBC, id = 81, fit = fit, u = NULL, nsim = 200, scale = fit$coeffs$D/sqrt(fit$ModelInfo$n)) bi.preds plot(bi.preds) # Appears to level-off dramatically; perhaps indicative of this id's albumin # levels, or acceleration in serBilir trajectory around 8.5 years.
data(PBC) PBC$serBilir <- log(PBC$serBilir) # Focus in on id 81, who fails at around 7 years of follow-up. \code{dynPred} allows us to # infer how the model believes their survival probability would've progressed (ignoring the # true outcome at start time). # Univariate ----------------------------------------------------------- long.formulas <- list(serBilir ~ drug * time + (1 + time|id)) surv.formula <- Surv(survtime, status) ~ drug family <- list('gaussian') fit <- joint(long.formulas, surv.formula, PBC, family) preds <- dynPred(PBC, id = 81, fit = fit, u = NULL, nsim = 200, scale = 2) preds plot(preds) # Bivariate ------------------------------------------------------------ # Does introduction of albumin affect conditional survival probability? long.formulas <- list( serBilir ~ drug * time + I(time^2) + (1 + time + I(time^2)|id), albumin ~ drug * time + (1 + time|id) ) fit <- joint(long.formulas, surv.formula, data = PBC, family = list("gaussian", "gaussian")) bi.preds <- dynPred(PBC, id = 81, fit = fit, u = NULL, nsim = 200, scale = fit$coeffs$D/sqrt(fit$ModelInfo$n)) bi.preds plot(bi.preds) # Appears to level-off dramatically; perhaps indicative of this id's albumin # levels, or acceleration in serBilir trajectory around 8.5 years.
Extract AIC from a joint model fit.
## S3 method for class 'joint' extractAIC(fit, scale, k = 2, conditional = FALSE, ...)
## S3 method for class 'joint' extractAIC(fit, scale, k = 2, conditional = FALSE, ...)
fit |
A fitted |
scale |
See |
k |
Numeric specifying the "weight" of degrees of freedom (default |
conditional |
Should AIC of conditional or observed log-likelihood be used? Defaults
to |
... |
additional arguments (none used). |
A numeric vector of length 2, with first and second element giving
df
The degrees of freedom for the fitted model.
AIC
The Akaike Information Criterion for the fitted model.
returns the fitted values from a joint
object. Note that the
linear predictor for each response is returned.
## S3 method for class 'joint' fitted(object, as = "matrix", ...)
## S3 method for class 'joint' fitted(object, as = "matrix", ...)
object |
a joint model fit by the |
as |
should the fitted values be returned as a |
... |
Additional arguments (none used). |
A matrix (or list) with a column (or list entry) for each of the fitted linear
predictors with class fitted.joint
.
James Murray ([email protected]).
# Bivariate fit on PBC data ----------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin', 'platelets')) PBC <- na.omit(PBC) # Specify bivariate fit long.formulas <- list( albumin ~ time*drug + (1 + time|id), platelets ~ time * drug + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian', 'poisson')) fitted(fit)
# Bivariate fit on PBC data ----------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin', 'platelets')) PBC <- na.omit(PBC) # Specify bivariate fit long.formulas <- list( albumin ~ time*drug + (1 + time|id), platelets ~ time * drug + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian', 'poisson')) fitted(fit)
joint
object.Extract fixed effects from a joint
object.
## S3 method for class 'joint' fixef(object, what = c("long", "surv"), ...)
## S3 method for class 'joint' fixef(object, what = c("long", "surv"), ...)
object |
a joint model fit by the |
what |
character string. Should the |
... |
additional arguments (none used). |
A vector containing requested fixed effects.
James Murray ([email protected]).
# Univariate fit on PBC data ------------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin')) PBC <- na.omit(PBC) # Specify simple univariate fit long.formulas <- list( albumin ~ time + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian')) fixef(fit, 'long') fixef(fit, 'surv')
# Univariate fit on PBC data ------------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin')) PBC <- na.omit(PBC) # Specify simple univariate fit long.formulas <- list( albumin ~ time + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian')) fixef(fit, 'long') fixef(fit, 'surv')
gmvjoint allows the user to fit joint models of survival and multivariate longitudinal data. The
longitudinal data is specified by generalised linear mixed models (GLMMs). The joint models
are fit via maximum likelihood using an approximate EM algorithm first proposed by Bernhardt et
al. (2015). The GLMMs are specified using the same syntax as for package glmmTMB
Brooks et
al. (2017). The joint models themselves are then the flexible extensions to those in e.g.
Wulfsohn and Tsiatis (1997). The user is able to simulate data under many different response
types.
James Murray <[email protected]>
Bernhardt PW, Zhang D and Wang HJ. A fast EM Algorithm for Fitting Joint Models of a Binary Response to Multiple Longitudinal Covariates Subject to Detection Limits. Computational Statistics and Data Analysis 2015; 85; 37–53
Mollie E. Brooks, Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler and Benjamin M. Bolker (2017). glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal, 9(2), 378-400.
Murray, J and Philipson P. A fast approximate EM algorithm for joint models of survival and multivariate longitudinal data.Computational Statistics and Data Analysis 2022
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997; 53(1), 330-339.
Fit a joint model to time-to-event and multivariate longitudinal data
joint( long.formulas, surv.formula, data, family, disp.formulas = NULL, control = list() )
joint( long.formulas, surv.formula, data, family, disp.formulas = NULL, control = list() )
long.formulas |
A list of formula objects specifying the |
surv.formula |
A formula specifying the time-to-event sub-model. Must be usable by
|
data |
A |
family |
A list of length |
disp.formulas |
An optional list of length |
control |
A list of control values:
|
Function joint
fits a joint model to time-to-event data and multivariate
longitudinal data. The longitudinal data can be specified by numerous models encompassing
a fairly wide range of data. This joint model fit is achieved by the use of an approximate
EM algorithm first proposed in Bernhardt et al. (2015), and later used in the 'classic'
multivariate joint model in Murray and Philipson (2022). Each longitudinal response is
modelled by
where is a known, monotonic link function. An association is induced between the
th response and the hazard
by:
where is the association parameter and
is the vector function of
time imposed on the
th random effects structure (i.e. intercept-and-slope; spline).
An object with class joint
. See joint.object
for information.
Currently, five families are available for implementation, spanning continuous, binary and count data types:
'gaussian'
Normally distributed. The identity link is used. A term
will be estimated, denoting the variance of this response
'binomial'
For binary data types, a logit link is used.
'poisson'
For count data types where dispersion is either non-consequential or ignored. A log link is used.
'genpois'
For count data types where dispersion is at least of some
secondary interest. A log link is used. A term is estimated, denoting
the dispersion,
of the response. This follows interpretation of Zamani &
Ismail (2012):
: Over-dispersion;
: Under-dispersion.
.
'Gamma'
For continuous data where a Gamma distribution might be sensible.
The log link is used. A term is be estimated, denoting the (log) shape of
the distribution, which is then reported as
.
"negbin"
For count data types where overdispersion is modelled. A log link
is used. A term is estimated, which is then reported as
which is the overdispersion. The variance of the response
is
.
For families "negbin"
, "Gamma"
, "genpois"
, the user can define the
dispersion model desired in disp.formulas
. For the "negbin"
and "Gamma"
cases, we define (i.e. the exponent of the linear
predictor of the dispersion model; and for
"genpois"
the identity of the linear
is used.
The disp.formulas
in the function call allows the user to model the dispersion for
a given sub-model if wanted. The default value disp.formulas = NULL
simply imposes
an 'intercept only' model. If the th item in
disp.formulas
corresponds to
a longitudinal sub-model with no dispersion term, then it is simply ignored. With this in
mind then, if a dispersion model is only required for, say, one sub-model, then the
corresponding item in this list of models should be specified as such, with the others set to
~1
.
We follow the approximation of the observed empirical information matrix detailed by
Mclachlan and Krishnan (2008), and later used in joineRML
(Hickey et al., 2018).
These are only calculated if post.process=TRUE
. Generally, these SEs are well-behaved,
but their reliability will depend on multiple factors: Sample size; number of events;
collinearity of REs of responses; number of observed times, and so on. Some more discussion/
references are given in vcov.joint
.
A few convergence criteria (specified by control$conv
) are available:
abs
Convergence reached when maximum absolute change in parameter estimates
is <tol.abs
.
rel
Convergence reached when maximum absolute relative change in parameter
estimates is <tol.rel
. A small amount (tol.den
) is added to the denominator
to eschew numerical issues if parameters are nearly zero.
either
Convergence is reached when either abs
or rel
are met.
sas
Assess convergence for parameters <tol.thr
by the
abs
criterion, else rel
. This is the default.
Note that the baseline hazard is updated at each EM iteration, but is not monitored for convergence.
James Murray ([email protected]).
Bernhardt PW, Zhang D and Wang HJ. A fast EM Algorithm for Fitting Joint Models of a Binary Response to Multiple Longitudinal Covariates Subject to Detection Limits. Computational Statistics and Data Analysis 2015; 85; 37–53
Hickey GL, Philipson P, Jorgensen A, Kolamunnage-Dona R. joineRML
: a joint model and
software package for time-to-event and multivariate longitudinal outcomes.
BMC Med. Res. Methodol. 2018; 50
McLachlan GJ, Krishnan T. The EM Algorithm and Extensions. Second Edition. Wiley-Interscience; 2008.
Murray, J and Philipson P. A fast approximate EM algorithm for joint models of survival and multivariate longitudinal data.Computational Statistics and Data Analysis 2022; 170; 107438
Zamani H and Ismail N. Functional Form for the Generalized Poisson Regression Model, Communications in Statistics - Theory and Methods 2012; 41(20); 3666-3675.
summary.joint
, logLik.joint
, boot.joint
,
extractAIC.joint
, fixef.joint
, ranef.joint
,
vcov.joint
, joint.object
and xtable.joint
. For
data simulation see simData
.
# 1) Fit on simulated bivariate data, (1x gaussian, 1x poisson) -------- beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE)) gamma <- c(0.3, -0.3) D <- diag(c(0.25, 0.09, 0.25, 0.05)) family <- list('gaussian', 'poisson') data <- simData(ntms = 10, beta = beta, D = D, n = 100, family = family, zeta = c(0, -0.2), sigma = list(0.16, 0), gamma = gamma)$data # Specify formulae and target families long.formulas <- list( Y.1 ~ time + cont + bin + (1 + time|id), # Gaussian Y.2 ~ time + cont + bin + (1 + time|id) # Poisson ) surv.formula <- Surv(survtime, status) ~ bin fit <- joint(long.formulas, surv.formula, data, family) # 2) Fit on PBC data ----------------------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'serBilir', 'albumin', 'spiders', 'platelets')) PBC <- na.omit(PBC) # Specify GLMM sub-models, including interaction and quadratic time terms long.formulas <- list( log(serBilir) ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id), albumin ~ drug * time + (1 + time|id), platelets ~ drug * time + (1 + time|id), spiders ~ drug * time + (1|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list("gaussian", "gaussian", "poisson", "binomial"), control = list(verbose = TRUE)) fit # 3) Fit with dispersion models ---------------------------------------- beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE)) gamma <- c(0.3, -0.3) D <- diag(c(0.25, 0.09, 0.25, 0.05)) family <- list('negbin', 'poisson') # As an example; only requires one dispersion model. sigma <- list(c(1, 0.2), 0) # Need to specify the model in simData call too. disp.formulas = list(~time, ~1) # Even though poisson doesn't model dispersion, need to # populate this element in disp.formulas! # Simulate some data data <- simData(ntms = 10, beta = beta, D = D, n = 500, family = family, zeta = c(0, -0.2), sigma = sigma, disp.formulas = disp.formulas, gamma = gamma)$data # Now fit using joint long.formulas <- list( Y.1 ~ time + cont + bin + (1+time|id), Y.2 ~ time + cont + bin + (1+time|id) ) fit <- joint( long.formulas, Surv(survtime, status) ~ bin, data, family, disp.formulas = disp.formulas ) fit summary(fit)
# 1) Fit on simulated bivariate data, (1x gaussian, 1x poisson) -------- beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE)) gamma <- c(0.3, -0.3) D <- diag(c(0.25, 0.09, 0.25, 0.05)) family <- list('gaussian', 'poisson') data <- simData(ntms = 10, beta = beta, D = D, n = 100, family = family, zeta = c(0, -0.2), sigma = list(0.16, 0), gamma = gamma)$data # Specify formulae and target families long.formulas <- list( Y.1 ~ time + cont + bin + (1 + time|id), # Gaussian Y.2 ~ time + cont + bin + (1 + time|id) # Poisson ) surv.formula <- Surv(survtime, status) ~ bin fit <- joint(long.formulas, surv.formula, data, family) # 2) Fit on PBC data ----------------------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'serBilir', 'albumin', 'spiders', 'platelets')) PBC <- na.omit(PBC) # Specify GLMM sub-models, including interaction and quadratic time terms long.formulas <- list( log(serBilir) ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id), albumin ~ drug * time + (1 + time|id), platelets ~ drug * time + (1 + time|id), spiders ~ drug * time + (1|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list("gaussian", "gaussian", "poisson", "binomial"), control = list(verbose = TRUE)) fit # 3) Fit with dispersion models ---------------------------------------- beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE)) gamma <- c(0.3, -0.3) D <- diag(c(0.25, 0.09, 0.25, 0.05)) family <- list('negbin', 'poisson') # As an example; only requires one dispersion model. sigma <- list(c(1, 0.2), 0) # Need to specify the model in simData call too. disp.formulas = list(~time, ~1) # Even though poisson doesn't model dispersion, need to # populate this element in disp.formulas! # Simulate some data data <- simData(ntms = 10, beta = beta, D = D, n = 500, family = family, zeta = c(0, -0.2), sigma = sigma, disp.formulas = disp.formulas, gamma = gamma)$data # Now fit using joint long.formulas <- list( Y.1 ~ time + cont + bin + (1+time|id), Y.2 ~ time + cont + bin + (1+time|id) ) fit <- joint( long.formulas, Surv(survtime, status) ~ bin, data, family, disp.formulas = disp.formulas ) fit summary(fit)
joint
objectAn object returned by the joint
function, with class joint
a fitted joint model. Objects of this class currently have methods for: logLik
,
print
, ranef
, fixef
, summary
, AIC
, and vcov
.
joint.object
joint.object
An object of class NULL
of length 0.
A list with the following components.
coeffs
A list containing parameter estimates:
D
The variance-covariance matrix of the random effects.
beta
Vector of fixed effects for longitudinal processes.
sigma
List of dispersion parameters, families with no dispersion parameter are returned as an unnamed zero value.
gamma
Vector of association parameters.
zeta
Vector of time-invariant survival coefficients.
hazard
A matrix containing unique failure times ft
, their hazard
contribution haz
and the number of events at that failure time nev
.
ModelInfo
A list containing information on the model fit:
ResponseInfo
A vector containing response names with (family) reported.
Resps
A vector containing response names only.
family
A list of families fit.
K
An integer specifying the number of longitudinal sub-models.
Pcounts
A list containing informations about the number of parameters/random effects:
P
A vector of length K containing the number of fixed effects for each response (in order).
Pd
A vector of length K containing the number of dispersion parameters for each response (in order) 0 denotes no parameter for that response.
q
An integer denoting the number of random effects.
vD
An integer denoting the number of unique variance-covariance parameters estimated.
long.formulas
A list of long.formulas
(i.e. from joint
call).
disp.formulas
A list of disp.formulas
(i.e. from joint
call).
If no disp.formulas
are supplied to joint
, then this is populated by a list of
"
~1
". The environment is set to parent.frame
in this case to avoid
memory overheads in returned objects.
surv.formula
Formula object from joint
call.
survtime
The name of the event time used in surv.formula
.
status
The name of the event indicator used in surv.formula
.
control
List of control parameters used, see joint
.
convergence.criteria
List of parameters relating to the stopping rule.
inds
A list of length two, named R
and Cpp
, each of which contains
the indices for fixed effects for each response, or the random effects
for the named platform.
n
Number of subjects.
nobs
A vector containing total number of observations for each response.
mi
A x
matrix containing the number of observations for
subject
for the
th response.
nev
Number of events.
id.assign
A list containing the original ids of subjects in the data
supplied to joint
, and the id assigned to them for use in subsequent functions.
Hessian
The (approximated) Hessian found at MLEs. Only returned if
control argument post.process=TRUE
.
vcov
The full variance-covariance matrix between parameters. Only returned if
control argument post.process=TRUE
.
SE
A named vector of approximated standard error for each estimated parameter.
Only returned if control argument post.process=TRUE
.
logLik
log-likelihood evaluated at parameter estimates. Only returned if control
argument post.process=TRUE
.
REs
The random effects, with subject-specific variance matrices attributed. If
control argumnet post.process=TRUE
then these are found at MLEs (i.e. are posterior
estimates), otherwise they are taken from the final EM iteration.
elapsed.time
Named numeric containing breakdown of elapsed time for joint
fit.
dmats
A list of data matrices on each of the longitudinal and survival processes for each subject.
James Murray ([email protected]).
Calculate joint log-likelihood, degrees of freedom, AIC and BIC of joint model fit.
## S3 method for class 'joint' logLik(object, conditional = FALSE, ...)
## S3 method for class 'joint' logLik(object, conditional = FALSE, ...)
object |
a |
conditional |
Logical. Should the conditional or observed data log-likelihood be returned? See details. |
... |
additional arguments (none used). |
Calculate the log-likelihood of a joint model of survival and multivariate longitudinal
data (i.e. a joint
object). The argument conditional
manages whether
or not the log-likelihood conditional on the random effects, or simply
the observed data log-likelihood is returned (the default, conditional = FALSE
).
If conditional = TRUE
, then the log-likelihood conditional on the random
effects is returned. That is
If conditional = FALSE
, then the observed data log-likelihood is returned i.e.
Additionally, the degrees of freedom, is given by
where denotes the number of coefficients estimated for the
th response,
and
the number of dispersion parameters estimated.
denotes
the number of survival coefficients, i.e. the length of
c(zeta, gamma)
. Finally,
all covariance parameters are captured in length(vech(D))
.
With the degrees of freedom, we can additionally compute AIC and BIC, which are defined in no special way; and are calculated using the observed data log-likelihood.
Returns an object of class logLik
, a number which is the log-likelihood
of the fitted model object
. This has multiple attributes: df
which is the
degrees of freedom, df.residual
; the number of residual degrees of freedom;
AIC
and BIC
which are the Akaike or Bayes information criterion evaluated at
either the conditional or observed log-likelihood (as requested by argument
conditional
).
James Murray ([email protected])
Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics 2000; 1(4); 465-480.
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics 1997; 53(1); 330-339.
extractAIC.joint
and anova.joint
# Bivariate simulated data (2x Gaussian) data <- simData(n = 100, D = diag(c(.25, .04, .2, .02)), gamma = c(0.4, -0.2), theta = c(-2, .2))$data fit <- joint(list( Y.1 ~ time + cont + bin + (1 + time|id), Y.2 ~ time + cont + bin + (1 + time|id) ), Surv(survtime, status) ~ cont + bin, data = data, family = list('gaussian', 'gaussian')) logLik(fit)
# Bivariate simulated data (2x Gaussian) data <- simData(n = 100, D = diag(c(.25, .04, .2, .02)), gamma = c(0.4, -0.2), theta = c(-2, .2))$data fit <- joint(list( Y.1 ~ time + cont + bin + (1 + time|id), Y.2 ~ time + cont + bin + (1 + time|id) ), Surv(survtime, status) ~ cont + bin, data = data, family = list('gaussian', 'gaussian')) logLik(fit)
Creates a set of survival data and fits a coxph
model
using a survival formula and a data set.
parseCoxph(surv.formula, data, center = TRUE)
parseCoxph(surv.formula, data, center = TRUE)
surv.formula |
A formula readable by 'coxph'. |
data |
a set of data containing covariate information for variables named by ‘surv.formula'. Can be of any ’completeness', as the function returns a reduced set. |
center |
Should the covariate matrices be mean-centered before being returned?
defaults to |
A list with class parseCoxph
containing:
survdata
reduced version of data
, with only one row per subject, with
covariates specified by surv.formula
along with survival time and failure status.
Smat
matrix containing all requisite survival covariates (one row per subject).
ph
the model fit from coxph
.
Delta
list of failure indicators for each of the unique subjects.
n
number of unique subjects.
ft
vector of unique failure times.
nev
vector containing number of failures at each failure time ft
.
survtime
the name of the time
variable in surv.formula
.
status
the name of the event
variable in surv.formula
.
data = simData()$data parseCoxph(Surv(survtime, status) ~ bin, data = data)
data = simData()$data parseCoxph(Surv(survtime, status) ~ bin, data = data)
Primary biliary cirrhosis (PBC) data. PBC is a chronic liver disease which affects the bile
ducts of the liver, complications of which can ultimately lead to death. The longitudinal
profile of numerous biomarkers were observed for 312 patients at the Mayo Clinic between 1974
and 1984 with patients assigned to either the active (D-penicillamine, n=154 (50.6
placebo treatment arm (Murtaugh 1994). The data is publicly available in numerous places,
including joineRML
and survival
. The presence of many longitudinal biomarkers
of clinical interest as well as an event-time has lead to the PBC data becoming
widely used in literature.
data('PBC')
data('PBC')
data.frame
with 312 patients and 19 variables:
id
Subject identifier
survtime
Survival time in years
drug
Binary indicator covariate: was the patient assigned active
(drug=1
) or placebo?
sex
Binary indicator covariate: Takes value 1 if the subject is female, and zero if male.
time
Time of visit (0=baseline).
ascites
Binary response variable. Takes value 1 if accumulation of fluid in abdomen ("ascites") present.
hepatomegaly
Binary response variable. Takes value 1 if enlarged liver ("hepatomegaly") present.
spiders
Binary response variable. Takes value 1 if malformed blood vessels in skin ("hepatomegaly") present.
edema
Factor variable describing edema therapy, see
pbcseq
.
serBilir
Serum bilirubin (measured in mg/dl).
serChol
Serum cholesterol (measured in mg/dl).
album
Serum albumin (measured in mg/dl).
alkaline
Alkaline phosphotase (measured in U/liter).
SGOT
Aspartate aminotransferase (measured in U/liter).
platelets
Platelet count per cubic ml/1000.
histologic
Histologic stage of disease, see pbcseq
.
status
Survival status, status=1
if the subject experienced mortality
and =0
if censored.
age
Standardised age at baseline visit.
Nine longitudinal biomarkers exist with varying degrees of completeness in the data.
Murtaugh PA, Dickson ER, Van Dam GM, Malinchoc M, Grambsch PM, Langworthy AL, Gips CH. Primary biliary cirrhosis: Prediction of short-term survival based on repeated patient visits. Hepatology 1994; 20(1); 126-134.
Plot residuals obtained by a joint model (obtained by joint
).
If the residuals.joint
object represents the longitudinal process, a simple (paneled)
plot is produced (one for each response). If the residual object contains the Cox-Snell
residuals then several plots are produced (interactively): The KM estimate of survival
function of said residuals and then repeated for each survival covariate in the model call
to joint
(if requested).
## S3 method for class 'residuals.joint' plot(x, strata = FALSE, ...)
## S3 method for class 'residuals.joint' plot(x, strata = FALSE, ...)
x |
an object with class |
strata |
logical, should strata (for the survival sub-model only). Defaults to
|
... |
additional arguments (none used). |
James Murray ([email protected]).
joint
object.Return the random effects which maximises the complete
data log-likelihood at the MLEs
.
## S3 method for class 'joint' ranef(object, Var = FALSE, ...)
## S3 method for class 'joint' ranef(object, Var = FALSE, ...)
object |
a joint model fit by the |
Var |
logical, should the estimated variance of the random effects at |
... |
additional arguments (none used). |
A matrix
containing required random effects effects. If Var=TRUE
,
instead a list is returned with first element the matrix
of random effects and second a
matrix
of the variances . Note that these are posterior modes
of the random effects. Conditional distribution can be found by
cond.ranefs
.
James Murray ([email protected]).
# Univariate fit on PBC data ----------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin')) PBC <- na.omit(PBC) # Specify univariate fit long.formulas <- list( albumin ~ time*drug + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian')) b <- ranef(fit, FALSE)
# Univariate fit on PBC data ----------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin')) PBC <- na.omit(PBC) # Specify univariate fit long.formulas <- list( albumin ~ time*drug + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian')) b <- ranef(fit, FALSE)
returns the Pearson residuals values from a joint
object.
## S3 method for class 'joint' residuals( object, what = c("longit", "surv"), type = c("response", "pearson"), ... )
## S3 method for class 'joint' residuals( object, what = c("longit", "surv"), type = c("response", "pearson"), ... )
object |
a joint model fit by |
what |
character string. Should the |
type |
character. The residual type for |
... |
Additional arguments (none used). |
a named list of length of class
residuals.joint
containing
residuals produced by the joint model for each of the responses,
along with the fitted values as an attribute.
James Murray ([email protected]).
fitted.joint
plot.residuals.joint
# Trivariate fit on PBC data ----------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin', 'hepatomegaly', 'platelets')) PBC <- na.omit(PBC) # Specify trivariate fit long.formulas <- list( albumin ~ time*drug + (1 + time|id), platelets ~ time * drug + (1 + time|id), hepatomegaly ~ time * drug + (1|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian', 'poisson', 'binomial')) R <- residuals(fit, type = 'pearson') plot(R) plot(residuals(fit, what = "surv"))
# Trivariate fit on PBC data ----------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin', 'hepatomegaly', 'platelets')) PBC <- na.omit(PBC) # Specify trivariate fit long.formulas <- list( albumin ~ time*drug + (1 + time|id), platelets ~ time * drug + (1 + time|id), hepatomegaly ~ time * drug + (1|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian', 'poisson', 'binomial')) R <- residuals(fit, type = 'pearson') plot(R) plot(residuals(fit, what = "surv"))
Simulate realisations from a generalised poisson distribution
rgenpois(mu, phi)
rgenpois(mu, phi)
mu |
A numeric vector of rates |
phi |
A numeric specifying the dispersion |
Follows the "GP-1" implementation of the generalised Poisson distribution outlined
in Zamani & Ismail (2012). The variance of produced is
. As such
the dispersion parameter is bounded (i.e. not in positive reals as with CMP distribution).
An appropriately-dimensioned vector of count data.
Zamani H and Ismail N. Functional Form for the Generalized Poisson Regression Model, Communications in Statistics - Theory and Methods 2012; 41(20); 3666-3675.
joint
model.Using longitudinal information available up to a time, establish diagnostic capabilities (ROC, AUC and Brier score) of a fitted joint model.
ROC(fit, data, Tstart, delta, control = list(), progress = TRUE, boot = FALSE)
ROC(fit, data, Tstart, delta, control = list(), progress = TRUE, boot = FALSE)
fit |
a joint model fit by the |
data |
the data to which the original |
Tstart |
The start of the time window of interest, |
delta |
scalar denoting the length of time interval to check for failure times. |
control |
list of control arguments to be passed to |
progress |
should a progress bar be shown, showing the current progress of the ROC
function (
to |
boot |
logical. Not currently used, legacy argument. |
A list of class ROC.joint
consisting of:
Tstart
numeric denoting the start of the time window of interest; all dynamic
predictions generated used longitudinal information up-to time .
delta
scalar which denotes length of interval to check, such that the window
is defined by .
candidate.u
candidate vector of failure times to calculate dynamic probability
of surviving for each subject alive in data
at time .
window.failures
numeric denoting the number of observed failures in
.
Tstart.alive
numeric denoting the risk set at Tstart
.
metrics
a data.frame
containing probabilistic thresholds
with:
TP
true positives; FN
false negatives; FP
false positives;
TN
true negatives; TPR
true positive rate (sensitivity); FPR
false
positive rate (1-specificity); Acc
accuracy; PPV
positive predictive value
(precision); NPV
negative predictive value; F1s
F1 score and J
Youden's
J statistic.
the area under the curve.
The Brier score.
The predicted error (taking into account censoring), loss function: square.
Raw acceptance percentages for each subject sampled.
mean acceptance of M-H scheme across all subjects.
list containing information about call to dynPred
.
James Murray ([email protected]).
dynPred
, and plot.ROC.joint
.
data(PBC) PBC$serBilir <- log(PBC$serBilir) long.formulas <- list(serBilir ~ drug * time + (1 + time|id)) surv.formula <- Surv(survtime, status) ~ drug family <- list('gaussian') fit <- joint(long.formulas, surv.formula, PBC, family) (roc <- ROC(fit, PBC, Tstart = 8, delta = 2, control = list(nsim = 25))) plot(roc)
data(PBC) PBC$serBilir <- log(PBC$serBilir) long.formulas <- list(serBilir ~ drug * time + (1 + time|id)) surv.formula <- Surv(survtime, status) ~ drug family <- list('gaussian') fit <- joint(long.formulas, surv.formula, PBC, family) (roc <- ROC(fit, PBC, Tstart = 8, delta = 2, control = list(nsim = 25))) plot(roc)
Simulate multivariate longitudinal and survival data from a joint model
specification, with potential mixture of response families. Implementation is similar
to existing packages (e.g. joineR
, joineRML
).
simData( n = 250, ntms = 10, fup = 5, family = list("gaussian", "gaussian"), sigma = list(0.16, 0.16), beta = rbind(c(1, 0.1, 0.33, -0.5), c(1, 0.1, 0.33, -0.5)), D = NULL, gamma = c(0.5, -0.5), zeta = c(0.05, -0.3), theta = c(-4, 0.2), cens.rate = exp(-3.5), regular.times = TRUE, dof = Inf, random.formulas = NULL, disp.formulas = NULL, return.ranefs = FALSE )
simData( n = 250, ntms = 10, fup = 5, family = list("gaussian", "gaussian"), sigma = list(0.16, 0.16), beta = rbind(c(1, 0.1, 0.33, -0.5), c(1, 0.1, 0.33, -0.5)), D = NULL, gamma = c(0.5, -0.5), zeta = c(0.05, -0.3), theta = c(-4, 0.2), cens.rate = exp(-3.5), regular.times = TRUE, dof = Inf, random.formulas = NULL, disp.formulas = NULL, return.ranefs = FALSE )
n |
the number of subjects |
ntms |
the number of time points |
fup |
the maximum follow-up time, such that t = [0, ..., fup] with length |
family |
a |
sigma |
a |
beta |
a |
D |
a positive-definite matrix specifying the variance-covariance matrix for the random effects. If not supplied an identity matrix is assumed. |
gamma |
a |
zeta |
a vector of length 2 specifying the coefficients for the baseline covariates in the survival sub-model, in the order of continuous and binary. |
theta |
parameters to control the failure rate, see baseline hazard. |
cens.rate |
parameter for |
regular.times |
logical, if |
dof |
integer, specifies the degrees of freedom of the multivariate t-distribution
used to generate the random effects. If specified, this t-distribution is used. If left
at the default |
random.formulas |
allows user to specify if an intercept-and-slope ( |
disp.formulas |
allows user to specify the dispersion model simulated. Intended use is
to allow swapping between intercept only (the default) and a time-varying one ( |
return.ranefs |
a logical determining whether the true random effects should be
returned. This is largely for internal/simulation use. Default |
simData
simulates data from a multivariate joint model with a mixture of
families for each response. The specification of
family
changes
requisite dispersion parameter sigma
, if applicable. The family
list can
(currently) contain:
"gaussian"
Simulated with identity link, corresponding item in sigma
will be the variance.
"poisson"
Simulated with log link, corresponding dispersion in sigma
can be anything, as it doesn't impact simulation.
"binomial"
Simulated with logit link, corresponding dispersion in sigma
can be anything, as it doesn't impact simulation.
"negbin"
Simulated with a log link, corresponding item in sigma
will be
the overdispersion defined on the log scale. Simulated variance is
.
"genpois"
Simulated with a log link, corresponding item in sigma
will be
the dispersion. Values < 0 correspond to under-dispersion, and values > 0 over-
dispersion. See rgenpois
for more information. Simulated variance is
.
"Gamma"
Simulated with a log link, corresponding item in sigma
will be
the shape parameter, defined on the log-scale.
Therefore, for families "negbin"
, "Gamma"
, "genpois"
, the user can
define the dispersion model desired in disp.formulas
, which creates a data matrix
. For the
"negbin"
and "Gamma"
cases, we define
(i.e. the exponent of the linear predictor of the
dispersion model); and for
"genpois"
the identity of the linear is used.
A list of two data.frame
s: One with the full longitudinal data, and another
with only survival data. If return.ranefs=TRUE
, a matrix of the true values is
also returned. By default (i.e. no arguments provided), a bivariate Gaussian set of joint
data is returned.
When simulating the survival time, the baseline hazard is a Gompertz distribution controlled
by theta=c(x,y)
:
where is the shape parameter, and the scale parameter is
.
James Murray ([email protected]).
Austin PC. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Stat Med. 2012; 31(29): 3946-3958.
# 1) A set of univariate data ------------------------------------------ beta <- c(2.0, 0.33, -0.25, 0.15) # Note that by default arguments are bivariate, so need to specify the univariate case univ.data <- simData(beta = beta, gamma = 0.15, sigma = list(0.2), family = list("gaussian"), D = diag(c(0.25, 0.05))) # 2) Univariate data, with failure rate controlled --------------------- # In reality, many facets contribute to the simulated failure rate, in # this example, we'll just atler the baseline hazard via 'theta'. univ.data.highfail <- simData(beta = beta, gamma = 0.15, sigma = list(0.0), family = list("poisson"), D = diag(c(0.40, 0.08)), theta = c(-2, 0.1)) # 3) Trivariate (K = 3) mixture of families with dispersion parameters - beta <- do.call(rbind, replicate(3, c(2, -0.1, 0.1, -0.2), simplify = FALSE)) gamma <- c(0.3, -0.3, 0.3) D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09)) family <- list('gaussian', 'genpois', 'negbin') sigma <- list(.16, 1.5, log(1.5)) triv.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D, gamma = gamma, theta = c(-3, 0.2), zeta = c(0,-.2)) # 4) K = 4 mixture of families with/out dispersion --------------------- beta <- do.call(rbind, replicate(4, c(2, -0.1, 0.1, -0.2), simplify = FALSE)) gamma <- c(-0.75, 0.3, -0.6, 0.5) D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09, 0.16, 0.02)) family <- list('gaussian', 'poisson', 'binomial', 'gaussian') sigma <- list(.16, 0, 0, .05) # 0 can be anything here, as it is ignored internally. mix.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D, gamma = gamma, theta = c(-3, 0.2), zeta = c(0,-.2)) # 5) Bivariate joint model with two dispersion models. ----------------- disp.formulas <- list(~time, ~time) # Two time-varying dispersion models sigma <- list(c(0.00, -0.10), c(0.10, 0.15)) # specified in form of intercept, slope D <- diag(c(.25, 0.04, 0.50, 0.10)) disp.data <- simData(family = list("genpois", "negbin"), sigma = sigma, D = D, beta = rbind(c(0, 0.05, -0.15, 0.00), 1 + c(0, 0.25, 0.15, -0.20)), gamma = c(1.5, 1.5), disp.formulas = disp.formulas, fup = 5) # 6) Trivariate joint model with mixture of random effects models ------ # It can be hard to e.g. fit a binomial model on an intercept and slope, since e.g. # glmmTMB might struggle to accurately fit it (singular fits, etc.). To that end, could # swap the corresponding random effects specification to be an intercept-only. family <- list("gaussian", "binomial", "gaussian") # A list of formulae, even though we want to change the second sub-model's specification # we need to specify the rest of the items, too (same as disp.formulas, sigma). random.formulas <- list(~time, ~1, ~time) beta <- rbind(c(2, -0.2, 0.5, -0.25), c(0, 0.5, 1, -1), c(-2, 0.2, -0.5, 0.25)) # NOTE that the specification of RE matrix will need to match. D <- diag(c(0.25, 0.09, 1, 0.33, 0.05)) # Simulate data, and return REs as a sanity check... mix.REspec.data <- simData(beta = beta, D = D, family = family, gamma = c(-0.5, 1, 0.5), sigma = list(0.15, 0, 0.15), random.formulas = random.formulas, return.ranefs = TRUE)
# 1) A set of univariate data ------------------------------------------ beta <- c(2.0, 0.33, -0.25, 0.15) # Note that by default arguments are bivariate, so need to specify the univariate case univ.data <- simData(beta = beta, gamma = 0.15, sigma = list(0.2), family = list("gaussian"), D = diag(c(0.25, 0.05))) # 2) Univariate data, with failure rate controlled --------------------- # In reality, many facets contribute to the simulated failure rate, in # this example, we'll just atler the baseline hazard via 'theta'. univ.data.highfail <- simData(beta = beta, gamma = 0.15, sigma = list(0.0), family = list("poisson"), D = diag(c(0.40, 0.08)), theta = c(-2, 0.1)) # 3) Trivariate (K = 3) mixture of families with dispersion parameters - beta <- do.call(rbind, replicate(3, c(2, -0.1, 0.1, -0.2), simplify = FALSE)) gamma <- c(0.3, -0.3, 0.3) D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09)) family <- list('gaussian', 'genpois', 'negbin') sigma <- list(.16, 1.5, log(1.5)) triv.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D, gamma = gamma, theta = c(-3, 0.2), zeta = c(0,-.2)) # 4) K = 4 mixture of families with/out dispersion --------------------- beta <- do.call(rbind, replicate(4, c(2, -0.1, 0.1, -0.2), simplify = FALSE)) gamma <- c(-0.75, 0.3, -0.6, 0.5) D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09, 0.16, 0.02)) family <- list('gaussian', 'poisson', 'binomial', 'gaussian') sigma <- list(.16, 0, 0, .05) # 0 can be anything here, as it is ignored internally. mix.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D, gamma = gamma, theta = c(-3, 0.2), zeta = c(0,-.2)) # 5) Bivariate joint model with two dispersion models. ----------------- disp.formulas <- list(~time, ~time) # Two time-varying dispersion models sigma <- list(c(0.00, -0.10), c(0.10, 0.15)) # specified in form of intercept, slope D <- diag(c(.25, 0.04, 0.50, 0.10)) disp.data <- simData(family = list("genpois", "negbin"), sigma = sigma, D = D, beta = rbind(c(0, 0.05, -0.15, 0.00), 1 + c(0, 0.25, 0.15, -0.20)), gamma = c(1.5, 1.5), disp.formulas = disp.formulas, fup = 5) # 6) Trivariate joint model with mixture of random effects models ------ # It can be hard to e.g. fit a binomial model on an intercept and slope, since e.g. # glmmTMB might struggle to accurately fit it (singular fits, etc.). To that end, could # swap the corresponding random effects specification to be an intercept-only. family <- list("gaussian", "binomial", "gaussian") # A list of formulae, even though we want to change the second sub-model's specification # we need to specify the rest of the items, too (same as disp.formulas, sigma). random.formulas <- list(~time, ~1, ~time) beta <- rbind(c(2, -0.2, 0.5, -0.25), c(0, 0.5, 1, -1), c(-2, 0.2, -0.5, 0.25)) # NOTE that the specification of RE matrix will need to match. D <- diag(c(0.25, 0.09, 1, 0.33, 0.05)) # Simulate data, and return REs as a sanity check... mix.REspec.data <- simData(beta = beta, D = D, family = family, gamma = c(-0.5, 1, 0.5), sigma = list(0.15, 0, 0.15), random.formulas = random.formulas, return.ranefs = TRUE)
joint
object.Generate summary of a fitted multivariate joint model.
## S3 method for class 'joint' summary(object, ...)
## S3 method for class 'joint' summary(object, ...)
object |
a joint model fit by the |
... |
additional arguments (none used). |
Object of class summary.joint
.
James Murray [email protected]
joint
and joint.object
# Simple univariate on log(serum bilirubin) ---------------------------- data(PBC) long.formulas <- list( log(serBilir) ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id) ) surv.formula <- Surv(survtime, status) ~ sex + drug fit <- joint(long.formulas = long.formulas, surv.formula = surv.formula, data = PBC, family = list("gaussian")) summary(fit) # Bivariate with a dispersion model ------------------------------------ PBC <- na.omit(PBC[,c('id', 'survtime', 'status', 'sex', 'drug', 'platelets', 'albumin', 'time')]) long.formula <- list( platelets ~ time * drug + (1 + time|id), albumin ~ time * drug + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ sex + drug fit <- joint(long.formula, surv.formula, PBC, family = list("negbin", "gaussian"), disp.formula = list(~time, ~1)) summary(fit)
# Simple univariate on log(serum bilirubin) ---------------------------- data(PBC) long.formulas <- list( log(serBilir) ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id) ) surv.formula <- Surv(survtime, status) ~ sex + drug fit <- joint(long.formulas = long.formulas, surv.formula = surv.formula, data = PBC, family = list("gaussian")) summary(fit) # Bivariate with a dispersion model ------------------------------------ PBC <- na.omit(PBC[,c('id', 'survtime', 'status', 'sex', 'drug', 'platelets', 'albumin', 'time')]) long.formula <- list( platelets ~ time * drug + (1 + time|id), albumin ~ time * drug + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ sex + drug fit <- joint(long.formula, surv.formula, PBC, family = list("negbin", "gaussian"), disp.formula = list(~time, ~1)) summary(fit)
joint
fit.Extract the variance-covariance matrix from a joint
fit.
## S3 method for class 'joint' vcov(object, corr = FALSE, ...)
## S3 method for class 'joint' vcov(object, corr = FALSE, ...)
object |
a joint model fit by the |
corr |
should the correlation matrix be returned instead of the variance-covariance? |
... |
extra arguments (none used). |
Uses the observed-empirical approximation of information matrix (Mclachlan & Krishnan, 2008). The standard errors for the baseline hazard are not estimated.
A variance-covariance matrix for the joint model object.
Many competing ways exist for obtaining the observed information matrix in an EM algorithm.
In the context of joint modelling, the observed empirical approximation of the information
matrix has been used previously (joineRML
, Hickey et al. 2018). Elsewhere,
estimation of the observed information in a semi-parametric setting is outlined neatly in
Xu et al. (2014). Here, they advocate for approximation of this information matrix by
numerical differentiation of the profile Fisher Score vector. We do not consider this
methodology owing to its computational expense. That is, for each element of
which is perturbed by some small amount
, we must re-calculate
and
.
James Murray [email protected]
Hickey GL, Philipson P, Jorgensen A, Kolamunnage-Dona R. joineRML
: a joint model and
software package for time-to-event and multivariate longitudinal outcomes.
BMC Med. Res. Methodol. 2018; 50
McLachlan GJ, Krishnan T. The EM Algorithm and Extensions. Second Edition. Wiley-Interscience; 2008.
Xu C, Baines PD, Wang J. Standard error estimation using the EM algorithm for the joint modeling of survival and longitudinal data. Biostatistics 2014; 15(4).
# Univariate fit on PBC data ------------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin')) PBC <- na.omit(PBC) # Specify univariate fit long.formulas <- list( albumin ~ time + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian')) vcov(fit)
# Univariate fit on PBC data ------------------------------------------- data(PBC) # Subset data and remove NAs PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time', 'albumin')) PBC <- na.omit(PBC) # Specify univariate fit long.formulas <- list( albumin ~ time + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ drug fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian')) vcov(fit)
xtable
for a joint
object.Prints an xtable
output for a fitted joint
object to the console,
or to a specified save location
## S3 method for class 'joint' xtable( x, caption = NULL, label = NULL, align = NULL, digits = NULL, display = NULL, auto = FALSE, p.val = FALSE, max.row = NULL, dp = 3, vcov = FALSE, capture = FALSE, capture.location = "", hlines = "middle-bottom", booktabs = TRUE, size = "footnotesize", ... )
## S3 method for class 'joint' xtable( x, caption = NULL, label = NULL, align = NULL, digits = NULL, display = NULL, auto = FALSE, p.val = FALSE, max.row = NULL, dp = 3, vcov = FALSE, capture = FALSE, capture.location = "", hlines = "middle-bottom", booktabs = TRUE, size = "footnotesize", ... )
x |
a joint model fit by the |
caption |
character, specifies the |
label |
character, specifies the |
align |
character, specifies the |
digits |
integer, specifies the |
display |
character, specifies the |
auto |
logical, specifies the |
p.val |
logical, should p-values be returned? Defaults to |
max.row |
integer, the number of rows after which the table is ‘broken’ vertically
and merged horizontally; useful for long tables. Defaults to |
dp |
integer, the number of decimal places to round the estimate, standard error and
confidence intervals to; defaults to |
vcov |
logical, should the half-vectorisation of the block diagonal of covariance
matrix be reported? Default is |
capture |
logical, should the printed |
capture.location |
character, if |
hlines |
character, specifies which horizontal lines are used in the outputted
LaTeX table. Supply a character string which contains |
booktabs |
logical, if |
size |
character, LaTeX size to be placed before the tabular environment, defaults
to |
... |
additional arguments, none used. |
A LaTeX-ready xtable
print-out of the joint model. A list containing
constituent tables is also returned invisibly, along with the final xtable
output.
James Murray ([email protected]).
# Bivariate joint model ------------------------------------------------ require(xtable) data <- simData(n = 100)$data long.formula <- list( Y.1 ~ time + cont + bin + (1 + time|id), Y.2 ~ time + cont + bin + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ cont + bin family <- list("gaussian", "gaussian") fit <- joint(long.formula, surv.formula, data, family) xtable(fit) # Example of arguments: add dummy caption, add p-values. xtable(fit, p.val = TRUE, dp = 4, caption = "This is a caption") # Change size, place horizontal lines everywhere xtable(fit, size = "normalsize", hlines = c("top-middle-bottom")) # Make a wider table without booktabs xtable(fit, booktabs = FALSE, max.row = 6)
# Bivariate joint model ------------------------------------------------ require(xtable) data <- simData(n = 100)$data long.formula <- list( Y.1 ~ time + cont + bin + (1 + time|id), Y.2 ~ time + cont + bin + (1 + time|id) ) surv.formula <- Surv(survtime, status) ~ cont + bin family <- list("gaussian", "gaussian") fit <- joint(long.formula, surv.formula, data, family) xtable(fit) # Example of arguments: add dummy caption, add p-values. xtable(fit, p.val = TRUE, dp = 4, caption = "This is a caption") # Change size, place horizontal lines everywhere xtable(fit, size = "normalsize", hlines = c("top-middle-bottom")) # Make a wider table without booktabs xtable(fit, booktabs = FALSE, max.row = 6)