Title: | Efficient Bayesian Inference for Dynamic Survival Models with Shrinkage |
---|---|
Description: | Efficient Markov chain Monte Carlo (MCMC) algorithms for fully Bayesian estimation of dynamic survival models with shrinkage priors. Details on the algorithms used are provided in Wagner (2011) <doi:10.1007/s11222-009-9164-5>, Bitto and Frühwirth-Schnatter (2019) <doi:10.1016/j.jeconom.2018.11.006> and Cadonna et al. (2020) <doi:10.3390/econometrics8020020>. |
Authors: | Daniel Winkler [aut, cre], Peter Knaus [aut] |
Maintainer: | Daniel Winkler <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.0 |
Built: | 2024-12-22 06:45:33 UTC |
Source: | CRAN |
Create a vector of division points for the model.
These points mark the times at which the parameters are allowed to evolve,
with the parameters being fixed between division points. The points
are generated in a data driven fashion, with a new point being created
when events
number of interesting events have been observed since
the last division point.
divisionpoints(times, delta, events = 1)
divisionpoints(times, delta, events = 1)
times |
A vector of real, positive numbers indicating the survival times. For right censored data, this is the follow up time. |
delta |
A vector of status indicators, with 0 = alive and 1 = dead.
Other choices are |
events |
A positive integer indicating the number of interesting events per interval until a new division is created. |
Returns an integer vector of time points to be used as division points S
in shrinkDSM
.
Daniel Winkler [email protected]
data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2)
data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2)
A data set of survival times of patients with locally advanced, nonresectable gastric carcinoma. The patients were either treated with chemotherapy plus radiation or chemotherapy alone.
gastric
gastric
A data frame with 90 rows and 4 variables:
patient id
dummy variable indicating which treatment was employed, 0 = chemotherapy, 1 = combined chemotherapy/radiation
time survived by patient in days
dummy variable indicating whether death of the patient was observed, 0 = death not observed (i.e. censored), 1 = death observed.
Moreau, T., O'Quigley, J., and Mesbah M. (1985) A global goodness-of-fit statistic for the proportional hazards model Appl. Statist., 34, 212:218 (p 213)
https://www.mayo.edu/research/documents/gastrichtml/DOC-10027680
plot.mcmc.dsm.tvp
plots empirical posterior quantiles for a piecewise constant, time-varying parameter.
## S3 method for class 'mcmc.dsm.tvp' plot( x, probs = c(0.025, 0.25, 0.75, 0.975), shaded = TRUE, quantlines = FALSE, shadecol = "skyblue", shadealpha = 0.5, quantlty = 2, quantcol = "black", quantlwd = 0.5, drawzero = TRUE, zerolty = 2, zerolwd = 1, zerocol = "grey", ... )
## S3 method for class 'mcmc.dsm.tvp' plot( x, probs = c(0.025, 0.25, 0.75, 0.975), shaded = TRUE, quantlines = FALSE, shadecol = "skyblue", shadealpha = 0.5, quantlty = 2, quantcol = "black", quantlwd = 0.5, drawzero = TRUE, zerolty = 2, zerolwd = 1, zerocol = "grey", ... )
x |
|
probs |
vector of boundaries for credible intervals to plot for each point in time, with values in [0,1].
The largest and smallest value form the outermost credible interval, the second smallest and second largest the second outermost and so forth.
The default value is |
shaded |
single logical value or a vector of logical values, indicating whether or not to shade the area between the pointwise
credible intervals. If a vector is given, the first value given is used to determine if the area between the outermost credible interval
is shaded, the second for the second outermost and so forth. Recycled in the usual fashion if the vector is shorter than the
number of quantile pairs. The default value is |
quantlines |
single logical value or a vector of logical values, indicating whether or not to draw borders along the pointwise
credible intervals. If a vector is given, the first value given is used to determine whether the outermost credible interval
is marked by lines, the second for the second outermost and so forth. Recycled in the usual fashion if the vector is shorter than the
number of credible intervals. The default value is |
shadecol |
single character string or a vector of character strings. Determines the color of the shaded areas that represent
the credible intervals. If a vector is given, the first color given is used for the outermost area,
the second for the second outermost and so forth. Recycled in the usual fashion if the vector is shorter than the number
of shaded areas. Has no effect if |
shadealpha |
real number between 0 and 1 or a vector of real numbers between 0 and 1.
Determines the level of transparency of the shaded areas that represent
the credible intervals. If a vector is given, the first value
given is used for the outermost area, the second for the second outermost and so forth. Recycled in the usual fashion
if the vector is shorter than the number of shaded areas. Has no effect if |
quantlty |
either a single integer in [0,6] or one of the character strings |
quantcol |
single character string or a vector of character strings. Determines the color of the borders drawn around the shaded
areas that represent the credible intervals. If a vector is given, the first color given is used for borders of the outermost area,
the second for the second outermost and so forth. Recycled in the usual fashion if the vector is shorter than the number
of shaded areas. Has no effect if |
quantlwd |
single real, positive number or a vector of real, positive numbers. Determines the line width of the borders
drawn around the shaded areas that represent the credible intervals. If a vector is given, the first number given is used for
the borders of the outermost area, the second for the second outermost and so forth. Recycled in the usual fashion if the vector
is shorter than the number of shaded areas. Has no effect if |
drawzero |
single logical value determining whether to draw a horizontal line at zero or not. The default value is |
zerolty |
single integer in [0,6] or one of the character strings |
zerolwd |
single real, positive number. Determines the line width of the horizontal line at zero. Has no effect
if |
zerocol |
single character string. Determines the color of the horizontal line at zero. Has no effect if |
... |
further arguments to be passed to |
Called for its side effects and returns invisibly.
Peter Knaus [email protected]
Other plotting functions:
plot.shrinkDSM_pred()
,
plot.shrinkDSM()
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Plot piecewise constant, time-varying parameter plot(mod$beta$beta_radiation)
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Plot piecewise constant, time-varying parameter plot(mod$beta$beta_radiation)
plot.shrinkDSM
generates plots visualizing the posterior distribution estimated
as a result from a call to shrinkDSM
.
## S3 method for class 'shrinkDSM' plot( x, pars = c("beta"), nplot = 3, h_borders = c(0.05, 0.05), w_borders = c(0.02, 0.02), ... )
## S3 method for class 'shrinkDSM' plot( x, pars = c("beta"), nplot = 3, h_borders = c(0.05, 0.05), w_borders = c(0.02, 0.02), ... )
x |
a |
pars |
a character vector containing the names of the parameters to be visualized.
The names have to coincide with the names of the list elements of the |
nplot |
positive integer that indicates the number of tvp plots to display on a single page before a new page is generated. The default value is 3. |
h_borders |
single real, positive number smaller than 0.5 or a vector containing two such numbers. Determines
the relative amount of space (the total amount summing up to 1) left blank on the left and right of the plot, in that order.
The default is |
w_borders |
single real, positive number smaller than 0.5 or a vector containing two such numbers. Determines
the relative amount of space (the total amount summing up to 1) left blank at the top and bottom of the plot, in that order.
The default is |
... |
further arguments to be passed to the respective plotting functions. |
Called for its side effects and returns invisibly.
Peter Knaus [email protected]
Other plotting functions:
plot.mcmc.dsm.tvp()
,
plot.shrinkDSM_pred()
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) plot(mod) # Will produce an error because 'hello' is not a parameter in the model ## Not run: plot(mod, pars = c("beta", "hello")) ## End(Not run)
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) plot(mod) # Will produce an error because 'hello' is not a parameter in the model ## Not run: plot(mod, pars = c("beta", "hello")) ## End(Not run)
plot.shrinkDSM_pred
generates plots visualizing the posterior predictive density generated by predict.shrinkDSM
.
## S3 method for class 'shrinkDSM_pred' plot(x, dens_args, legend = TRUE, ...)
## S3 method for class 'shrinkDSM_pred' plot(x, dens_args, legend = TRUE, ...)
x |
a |
dens_args |
optional named list containing arguments passed to |
legend |
logical value inidicating whether a legend should be added. Defaults to |
... |
further arguments to be passed to |
Called for its side effects and returns invisibly.
Other plotting functions:
plot.mcmc.dsm.tvp()
,
plot.shrinkDSM()
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Draw from posterior predictive distribution newdata <- data.frame(radiation = c(0, 1)) pred <- predict(mod, newdata = newdata) # Plot predictions plot(pred)
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Draw from posterior predictive distribution newdata <- data.frame(radiation = c(0, 1)) pred <- predict(mod, newdata = newdata) # Plot predictions plot(pred)
Draws from the posterior predictive distribution of survival times based on a fitted
time-varying parameter survival model resulting from a call to
shrinkDSM
.
## S3 method for class 'shrinkDSM' predict(object, newdata, cens = TRUE, ...)
## S3 method for class 'shrinkDSM' predict(object, newdata, cens = TRUE, ...)
object |
an object of class |
newdata |
a data frame containing the covariates used for the prediction.
The names of the covariates
have to match the names used during model estimation in the call to |
cens |
logical value indicating whether the predictions should be censored at the
largest survival time in the data used for estimation.
The default value is |
... |
included for S3 method consistency and currently ignored. |
The value returned is a list object of class shrinkTVP_pred
containing the
samples from the
posterior predictive density.
Peter Knaus [email protected]
Daniel Winkler [email protected]
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Draw from posterior predictive distribution newdata <- data.frame(radiation = c(0, 1)) pred <- predict(mod, newdata = newdata)
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Draw from posterior predictive distribution newdata <- data.frame(radiation = c(0, 1)) pred <- predict(mod, newdata = newdata)
This function pre-processes time-varying inputs in such a way that shrinkDSM
can work with time-varying inputs. Its main inputs are two data frames, namely
surv_data
and covariate_data
. surv_data
contains meta data about
each observation (i.e. survival time and censoring indicator), while covariate_data
contains the time-varying covariates (one per observation and time interval) and an
index for which time interval each covariate is observed in. The two are merged
together via an ID that needs to be unique for each observation and present in both
surv_data
and covariate_data
.
prep_tvinput( surv_data, covariate_data, id_var, surv_var, delta_var, interval_var, covariate_id_var = id_var )
prep_tvinput( surv_data, covariate_data, id_var, surv_var, delta_var, interval_var, covariate_id_var = id_var )
surv_data |
data frame containing meta-data for each observation (survival time and censoring indicator) as well as an ID for each observation. |
covariate_data |
data frame containing the time-varying covariates (one per observation and time interval), an index for which time interval each covariate is observed in as well as an ID for each observation. |
id_var |
character string specifying the column name of the ID variable. If the name
is different in |
surv_var |
character string specifying the column name of the survival times in
|
delta_var |
character string specifying the column name of the status indicators in
|
interval_var |
character string specifying the column name of the time interval ID in
|
covariate_id_var |
character string specifying the column name of the ID variable in
|
Returns an object of class data.frame
and tvsurv
to be used as an input in
shrinkDSM
.
Daniel Winkler [email protected]
Peter Knaus [email protected]
# A toy example with 5 observations and 2 covariates, observed over 3 time periods set.seed(123) n_obs <- 5 surv_var <- round(rgamma(n_obs, 1, .1)) + 1 delta_var <- sample(size = n_obs, c(0, 1), prob = c(0.2, 0.8), replace = TRUE) surv_data <- data.frame(id_var = 1:n_obs, surv_var, delta_var) # Determine intervals S <- c(3, 11) # Create synthetic observations for each individual covariate_list <- list() for (i in 1:n_obs) { nr_periods_survived <- sum(surv_var[i] > S) + 1 covariate_list[[i]] <- data.frame(id_var = i, interval_var = 1:nr_periods_survived, x1 = rnorm(nr_periods_survived), x2 = rnorm(nr_periods_survived)) } # Bind all individual covariate data frames together # Each observation now has a covariate in each period they # were observed in. covariate_data <- do.call(rbind, covariate_list) # Call prep_tvinput to pre-process for shrinkDSM merged_data <- prep_tvinput(surv_data, covariate_data, id_var = "id_var", surv_var = "surv_var", delta_var = "delta_var", interval_var = "interval_var") # Can now be used in shrinkDSM # Note that delta is now automatically extracted from merged_data, # providing it will throw a warning mod <- shrinkDSM(surv_var ~ x1 + x2, merged_data, S = S)
# A toy example with 5 observations and 2 covariates, observed over 3 time periods set.seed(123) n_obs <- 5 surv_var <- round(rgamma(n_obs, 1, .1)) + 1 delta_var <- sample(size = n_obs, c(0, 1), prob = c(0.2, 0.8), replace = TRUE) surv_data <- data.frame(id_var = 1:n_obs, surv_var, delta_var) # Determine intervals S <- c(3, 11) # Create synthetic observations for each individual covariate_list <- list() for (i in 1:n_obs) { nr_periods_survived <- sum(surv_var[i] > S) + 1 covariate_list[[i]] <- data.frame(id_var = i, interval_var = 1:nr_periods_survived, x1 = rnorm(nr_periods_survived), x2 = rnorm(nr_periods_survived)) } # Bind all individual covariate data frames together # Each observation now has a covariate in each period they # were observed in. covariate_data <- do.call(rbind, covariate_list) # Call prep_tvinput to pre-process for shrinkDSM merged_data <- prep_tvinput(surv_data, covariate_data, id_var = "id_var", surv_var = "surv_var", delta_var = "delta_var", interval_var = "interval_var") # Can now be used in shrinkDSM # Note that delta is now automatically extracted from merged_data, # providing it will throw a warning mod <- shrinkDSM(surv_var ~ x1 + x2, merged_data, S = S)
Nicer printing of shrinkDSM objects
## S3 method for class 'shrinkDSM' print(x, ...)
## S3 method for class 'shrinkDSM' print(x, ...)
x |
a |
... |
Currently ignored. |
Called for its side effects and returns invisibly.
Peter Knaus [email protected]
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Print print(mod) # Alternatively mod
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Print print(mod) # Alternatively mod
shrinkDSM
samples from the joint posterior distribution of the parameters of a time-varying
parameter survival model with shrinkage and returns the MCMC draws.
See also shrinkTVP
to see more examples of how to modify the prior setup of the time-varying
component of the model.
shrinkDSM( formula, data, mod_type = "double", delta, S, group, subset, niter = 10000, nburn = round(niter/2), nthin = 1, learn_a_xi = TRUE, learn_a_tau = TRUE, a_xi = 0.1, a_tau = 0.1, learn_c_xi = TRUE, learn_c_tau = TRUE, c_xi = 0.1, c_tau = 0.1, a_eq_c_xi = FALSE, a_eq_c_tau = FALSE, learn_kappa2_B = TRUE, learn_lambda2_B = TRUE, kappa2_B = 20, lambda2_B = 20, hyperprior_param, sv_param, MH_tuning, phi_param, display_progress = TRUE )
shrinkDSM( formula, data, mod_type = "double", delta, S, group, subset, niter = 10000, nburn = round(niter/2), nthin = 1, learn_a_xi = TRUE, learn_a_tau = TRUE, a_xi = 0.1, a_tau = 0.1, learn_c_xi = TRUE, learn_c_tau = TRUE, c_xi = 0.1, c_tau = 0.1, a_eq_c_xi = FALSE, a_eq_c_tau = FALSE, learn_kappa2_B = TRUE, learn_lambda2_B = TRUE, kappa2_B = 20, lambda2_B = 20, hyperprior_param, sv_param, MH_tuning, phi_param, display_progress = TRUE )
formula |
an object of class "formula": a symbolic representation of the model, as in the
function |
data |
optional data frame containing the response variable and the covariates. If not found in |
mod_type |
character string that reads either |
delta |
The status indicator of the last period, 0 = censored or 1 = event observed. |
S |
integer vector of time points that start a new interval. Parameters are fixed within an interval and vary across intervals. |
group |
optional grouping indicator for latent factor. |
subset |
optional vector specifying a subset of observations to be used in the fitting process. |
niter |
positive integer, indicating the number of MCMC iterations
to perform, including the burn-in. Has to be larger than or equal to |
nburn |
non-negative integer, indicating the number of iterations discarded
as burn-in. Has to be smaller than or equal to |
nthin |
positive integer, indicating the degree of thinning to be performed. Every |
learn_a_xi |
logical value indicating whether to learn a_xi, the spike parameter of the state variances.
The default value is |
learn_a_tau |
logical value indicating whether to learn a_tau, the spike parameter of the mean of the
initial values of the states. The default value is |
a_xi |
positive, real number, indicating the (fixed) value for a_xi. Ignored if
|
a_tau |
positive, real number, indicating the (fixed) value for a_tau. Ignored if
|
learn_c_xi |
logical value indicating whether to learn c_xi, the tail parameter of the state variances.
Ignored if |
learn_c_tau |
logical value indicating whether to learn c_tau, the tail parameter of the mean of the
initial values of the states. Ignored if |
c_xi |
positive, real number, indicating the (fixed) value for c_xi. Ignored if
|
c_tau |
positive, real number, indicating the (fixed) value for c_tau. Ignored if
|
a_eq_c_xi |
logical value indicating whether to force |
a_eq_c_tau |
logical value indicating whether to force |
learn_kappa2_B |
logical value indicating whether to learn kappa2_B, the global level of shrinkage for
the state variances. The default value is |
learn_lambda2_B |
logical value indicating whether to learn the lambda2_B parameter,
the global level of shrinkage for the mean of the initial values of the states. The default value is |
kappa2_B |
positive, real number. Initial value of kappa2_B. The default value is 20. |
lambda2_B |
positive, real number. Initial value of lambda2_B. The default value is |
hyperprior_param |
optional named list containing hyperparameter values. Not all have to be supplied, with those missing being replaced by the default values. Any list elements that are misnamed will be ignored and a warning will be thrown. All hyperparameter values have to be positive, real numbers. The following hyperparameters can be supplied:
|
sv_param |
optional named list containing hyperparameter values for the stochastic volatility
parameters. Not all have to be supplied, with those missing being replaced by the default values.
Any list elements that are misnamed will be ignored and a warning will be thrown. Ignored if
|
MH_tuning |
optional named list containing values used to tune the MH steps for
|
phi_param |
optional named list containing hyperparameter values for the grouped factor
and values to tune the MH steps for
|
display_progress |
logical value indicating whether the progress bar and other informative output should be
displayed. The default value is |
The value returned is a list object of class shrinkDSM
containing
beta |
|
beta_mean |
|
theta_sr |
|
tau2 |
|
xi2 |
|
lambda2 |
(optional) |
kappa2 |
(optional) |
a_xi |
(optional) |
a_tau |
(optional) |
c_xi |
(optional) |
c_tau |
(optional) |
lambda2_B |
(optional) |
kappa2_B |
(optional) |
MH_diag |
(optional) named list containing statistics for assessing MH performance. Not returned if no MH steps are required or none of them are specified to be adaptive. |
priorvals |
|
model |
|
summaries |
|
To display the output, use plot
and summary
. The summary
method displays the specified prior values stored in
priorvals
and the posterior summaries stored in summaries
, while the plot
method calls coda
's plot.mcmc
or the plot.mcmc.dsm.tvp
method. Furthermore, all functions that can be applied to coda::mcmc
objects
(e.g. coda::acfplot
) can be applied to all output elements that are coda
compatible.
Daniel Winkler [email protected]
Peter Knaus [email protected]
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate baseline model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Estimate model with different prior setup mod2 <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals, mod_type = "triple") # Change some of the hyperparameters mod3 <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals, mod_type = "triple", hyperprior_param = list(beta_a_xi = 5, alpha_a_xi = 10))
set.seed(123) data("gastric") # Create intervals for piecewise exponential model intervals <- divisionpoints(gastric$time, gastric$status, 2) # Estimate baseline model mod <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals) # Estimate model with different prior setup mod2 <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals, mod_type = "triple") # Change some of the hyperparameters mod3 <- shrinkDSM(time ~ radiation, gastric, delta = gastric$status, S = intervals, mod_type = "triple", hyperprior_param = list(beta_a_xi = 5, alpha_a_xi = 10))