Title: | Bayesian Cure Rate Modeling for Time-to-Event Data |
---|---|
Description: | A fully Bayesian approach in order to estimate a general family of cure rate models under the presence of covariates, see Papastamoulis and Milienos (2024) <doi:10.1007/s11749-024-00942-w>. The promotion time can be modelled (a) parametrically using typical distributional assumptions for time to event data (including the Weibull, Exponential, Gompertz, log-Logistic distributions), or (b) semiparametrically using finite mixtures of distributions. In both cases, user-defined families of distributions are allowed under some specific requirements. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution. |
Authors: | Panagiotis Papastamoulis [aut, cre] , Fotios Milienos [aut] |
Maintainer: | Panagiotis Papastamoulis <[email protected]> |
License: | GPL-2 |
Version: | 1.3 |
Built: | 2024-11-03 07:25:44 UTC |
Source: | CRAN |
A fully Bayesian approach in order to estimate a general family of cure rate models under the presence of covariates, see Papastamoulis and Milienos (2024) <doi:10.1007/s11749-024-00942-w>. The promotion time can be modelled (a) parametrically using typical distributional assumptions for time to event data (including the Weibull, Exponential, Gompertz, log-Logistic distributions), or (b) semiparametrically using finite mixtures of distributions. In both cases, user-defined families of distributions are allowed under some specific requirements. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution.
The main function of the package is cure_rate_MC3
. See details for a brief description of the model.
Let denote the observed data, which correspond to time-to-event data or censoring times. Let also
denote the covariates for subject
,
.
Assuming that the observations are independent, the observed likelihood is defined as
where if the
-th observation corresponds to time-to-event while
indicates censoring time. The parameter vector
is decomposed as
where
are the parameters of the promotion time distribution whose cumulative distribution and density functions are denoted as
and
, respectively.
are the regression coefficients with
denoting the number of columns in the design matrix (it may include a constant term or not).
.
The population survival and density functions are defined as
whereas,
Finally, the cure rate is affected through covariates and parameters as follows
where .
The promotion time distribution can be a member of standard families (currently available are the following: Exponential, Weibull, Gamma, Lomax, Gompertz, log-Logistic) and in this case . Also considered is the Dagum distribution, which has three parameters
. In case that the previous parametric assumptions are not justified, the promotion time can belong to the more flexible family of finite mixtures of Gamma distributions. For example, assume a mixture of two Gamma distributions of the form
where
denotes the density of the Gamma distribution with parameters (shape) and
(rate).
For the previous model, the parameter vector is
where .
More generally, one can fit a mixture of Gamma distributions. The appropriate model can be selected according to information criteria such as the BIC.
The binary vector contains the (latent) cure indicators, that is,
if the
-th subject is susceptible and
if the
-th subject is cured.
denotes the subset of
containing the censored subjects, whereas
is the (complementary) subset of uncensored subjects. The complete likelihood of the model is
and
denote the probability density and survival function of the susceptibles, respectively, that is
Index of help topics:
bayesCureRateModel-package Bayesian Cure Rate Modeling for Time-to-Event Data complete_log_likelihood_general Logarithm of the complete log-likelihood for the general cure rate model. compute_fdr_tpr Compute the achieved FDR and TPR cure_rate_MC3 Main function of the package cure_rate_mcmc The basic MCMC scheme. logLik.bayesCureModel Extract the log-likelihood. log_dagum PDF and CDF of the Dagum distribution log_gamma PDF and CDF of the Gamma distribution log_gamma_mixture PDF and CDF of a Gamma mixture distribution log_gompertz PDF and CDF of the Gompertz distribution log_logLogistic PDF and CDF of the log-Logistic distribution. log_lomax PDF and CDF of the Lomax distribution log_user_mixture Define a finite mixture of a given family of distributions. log_weibull PDF and CDF of the Weibull distribution marriage_dataset Marriage data plot.bayesCureModel Plot method plot.predict_bayesCureModel Plot method predict.bayesCureModel Predict method. print.bayesCureModel Print method print.predict_bayesCureModel Print method for the 'predict' object print.summary_bayesCureModel Print method for the summary residuals.bayesCureModel Computation of residuals. sim_mix_data Simulated dataset summary.bayesCureModel Summary method. summary.predict_bayesCureModel Summary method for predictions.
Panagiotis Papastamoulis and Fotios S. Milienos
Maintainer: Panagiotis Papastamoulis <[email protected]>
Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.
# TOY EXAMPLE (very small numbers... only for CRAN check purposes) # simulate toy data set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'weibull'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) # print method fit1 # summary method summary1 <- summary(fit1) # WARNING: the following parameters # mcmc_cycles, nChains # should take _larger_ values. E.g. a typical implementation consists of: # mcmc_cycles = 15000, nChains = 12
# TOY EXAMPLE (very small numbers... only for CRAN check purposes) # simulate toy data set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'weibull'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) # print method fit1 # summary method summary1 <- summary(fit1) # WARNING: the following parameters # mcmc_cycles, nChains # should take _larger_ values. E.g. a typical implementation consists of: # mcmc_cycles = 15000, nChains = 12
Compute the logarithm of the complete likelihood, given a realization of the latent binary vector of cure indicators I_sim
and current values of the model parameters g
, lambda
, b
and promotion time parameters () which yield log-density values (one per observation) stored to the vector
log_f
and log-cdf values stored to the vector log_F
.
complete_log_likelihood_general(y, X, Censoring_status, g, lambda, log_f, log_F, b, I_sim, alpha)
complete_log_likelihood_general(y, X, Censoring_status, g, lambda, log_f, log_F, b, I_sim, alpha)
y |
observed data (time-to-event or censored time) |
X |
design matrix. Should contain a column of 1's if the model has a constant term. |
Censoring_status |
binary variables corresponding to time-to-event and censoring. |
g |
The |
lambda |
The |
log_f |
A vector containing the natural logarithm of the density function of the promotion time distribution per observation, for the current set of parameters. Its length should be equal to the sample size. |
log_F |
A vector containing the natural logarithm of the cumulative density function of the promotion time distribution per observation, for the current set of parameters. Its length should be equal to the sample size. |
b |
Vector of regression coefficients. Its length should be equal to the number of columns of the design matrix. |
I_sim |
Binary vector of the current value of the latent cured status per observation. Its length should be equal to the sample size. A time-to-event cannot be cured. |
alpha |
A parameter between 0 and 1, corresponding to the temperature of the complete posterior distribution. |
The complete likelihood of the model is
and
denote the probability density and survival function of the susceptibles, respectively, that is
A list with the following entries
cll |
the complete log-likelihood for the current parameter values. |
logS |
Vector of logS values (one for each observation). |
logP0 |
Vector of logP0 values (one for each observation). |
Panagiotis Papastamoulis
Papastamoulis and Milienos (2023). Bayesian inference and cure rate modeling for event history data. arXiv:2310.06926.
# simulate toy data set.seed(1) n = 4 stat = rbinom(n, size = 1, prob = 0.5) x <- cbind(1, matrix(rnorm(n), n, 1)) y <- rexp(n) lw <- log_weibull(y, a1 = 1, a2 = 1, c_under = 1e-9) # compute complete log-likelihood complete_log_likelihood_general(y = y, X = x, Censoring_status = stat, g = 1, lambda = 1, log_f = lw$log_f, log_F = lw$log_F, b = c(-0.5,0.5), I_sim = stat, alpha = 1)
# simulate toy data set.seed(1) n = 4 stat = rbinom(n, size = 1, prob = 0.5) x <- cbind(1, matrix(rnorm(n), n, 1)) y <- rexp(n) lw <- log_weibull(y, a1 = 1, a2 = 1, c_under = 1e-9) # compute complete log-likelihood complete_log_likelihood_general(y = y, X = x, Censoring_status = stat, g = 1, lambda = 1, log_f = lw$log_f, log_F = lw$log_F, b = c(-0.5,0.5), I_sim = stat, alpha = 1)
This help function computes the achieved False Discovery Rate (FDR) and True Positive Rate (TPR). Useful for simulation studies where the ground truth classification of subjects in susceptibles and cured items is known.
compute_fdr_tpr(true_latent_status, posterior_probs, myCut = c(0.01, 0.05, 0.1, 0.15))
compute_fdr_tpr(true_latent_status, posterior_probs, myCut = c(0.01, 0.05, 0.1, 0.15))
true_latent_status |
a binary vector containing the true latent status: 1 should correspond to the positive instances ("cured") and 0 to the negative ("susceptibles"). |
posterior_probs |
a numeric vector with entries between 0 and 1 containing the scores (posterior probabilities) of being positive ("cured") for each item. |
myCut |
Vector containing the desired nominal FDR levels. |
This function will return for every nominal FDR level the following quantities:
achieved_fdr |
the achieved false discovery rate. |
tpr |
the true positive rate. |
nominal_fdr |
the nominal FDR level. |
Panagiotis Papastamoulis
set.seed(1) v1 <- sample(0:1, size = 100, replace=TRUE, prob=c(0.8,0.2) ) v2 <- runif(100) compute_fdr_tpr(true_latent_status = v1, posterior_probs = v2)
set.seed(1) v1 <- sample(0:1, size = 100, replace=TRUE, prob=c(0.8,0.2) ) v2 <- runif(100) compute_fdr_tpr(true_latent_status = v1, posterior_probs = v2)
Runs a Metropolis Coupled MCMC (MC) sampler in order to estimate the joint posterior distribution of the model.
cure_rate_MC3(formula, data, nChains = 12, mcmc_cycles = 15000, alpha = NULL,nCores = 1, sweep = 5, mu_g = 1, s2_g = 1, a_l = 2.1, b_l = 1.1, mu_b = NULL, Sigma = NULL, g_prop_sd = 0.045, lambda_prop_scale = 0.03, b_prop_sd = NULL, initialValues = NULL, plot = TRUE, adjust_scales = FALSE, verbose = FALSE, tau_mala = 1.5e-05, mala = 0.15, promotion_time = list(distribution = "weibull", prior_parameters = matrix(rep(c(2.1, 1.1), 2), byrow = TRUE, 2, 2), prop_scale = c(0.1, 0.2)), single_MH_in_f = 0.2, c_under = 1e-9)
cure_rate_MC3(formula, data, nChains = 12, mcmc_cycles = 15000, alpha = NULL,nCores = 1, sweep = 5, mu_g = 1, s2_g = 1, a_l = 2.1, b_l = 1.1, mu_b = NULL, Sigma = NULL, g_prop_sd = 0.045, lambda_prop_scale = 0.03, b_prop_sd = NULL, initialValues = NULL, plot = TRUE, adjust_scales = FALSE, verbose = FALSE, tau_mala = 1.5e-05, mala = 0.15, promotion_time = list(distribution = "weibull", prior_parameters = matrix(rep(c(2.1, 1.1), 2), byrow = TRUE, 2, 2), prop_scale = c(0.1, 0.2)), single_MH_in_f = 0.2, c_under = 1e-9)
formula |
an object of class |
data |
a data frame containing all variable names included in |
nChains |
Positive integer corresponding to the number of heated chains in the MC |
mcmc_cycles |
Length of the generated MCMC sample. Default value: 15000. Note that each MCMC cycle consists of |
alpha |
A decreasing sequence in |
nCores |
The number of cores used for parallel processing. In case where |
sweep |
The number of usual MCMC iterations per MCMC cycle. Default value: 10. |
mu_g |
Parameter |
s2_g |
Parameter |
a_l |
Shape parameter |
b_l |
Scale parameter |
mu_b |
Mean ( |
Sigma |
Covariance matrix of the multivariate normal prior distribution of regression coefficients. |
g_prop_sd |
The scale of the proposal distribution for single-site updates of the |
lambda_prop_scale |
The scale of the proposal distribution for single-site updates of the |
b_prop_sd |
The scale of the proposal distribution for the update of the |
initialValues |
A list of initial values for each parameter (optional). |
plot |
Plot MCMC sample on the run. Default: TRUE. |
adjust_scales |
Boolean. If TRUE the MCMC sampler runs an initial phase of a small number of iterations in order to tune the scale of the proposal distributions in the Metropolis-Hastings steps. |
verbose |
Print progress on the terminal if TRUE. |
tau_mala |
Scale of the Metropolis adjusted Langevin diffussion proposal distribution. |
mala |
A number between |
promotion_time |
A list with details indicating the parametric family of distribution describing the promotion time and corresponding prior distributions. See 'details'. |
single_MH_in_f |
The probability for attempting a series of single site updates in the typical Metropolis-Hastings move. Otherwise, with probability 1 - |
c_under |
A small positive number (much smaller than 1) which is used as a threshold in the CDF of the promotion time for avoiding underflows in the computation of the log-likelihood function. Default value: 1e-9. |
It is advised to scale all continuous explanatory variables in the design matrix, so their sample mean and standard deviations are equal to 0 and 1, respectively.
The promotion_time
argument should be a list containing the following entries
distribution
Character string specifying the family of distributions describing the promotion time.
prior_parameters
Values of hyper-parameters in the prior distribution of the parameters .
prop_scale
The scale of the proposal distributions for each parameter in .
K
Optional. The number of mixture components in case where a mixture model is fitted, that is, when setting distribution
to either 'gamma_mixture'
or 'user_mixture'
.
dirichlet_concentration_parameter
Optional. Relevant only in the case of the 'gamma_mixture'
or 'user_mixture'
. Positive scalar (typically, set to 1) determining the (common) concentration parameter of the Dirichlet prior distribution of mixing proportions.
The distribution
specifies the distributional family of promotion time and corresponds to a character string with available choices: 'exponential'
, 'weibull'
, 'gamma'
, 'logLogistic'
, 'gompertz'
, 'lomax'
, 'dagum'
, 'gamma_mixture'
. User defined promotion time distributions and finite mixtures of them can be also fitted using the options 'user'
and 'user_mixture'
, respectively. In this case, the user should specify the distributional family in a separate argument named define
which is passed as an additional entry in the promotion_time
. This function should accept two input arguments y
and a
corresponding to the observed data (vector of positive numbers) and the parameters of the distribution (vector of positives). Pay attention to the positivity requirement of the parameters: if this is not the case, the user should suitably reparameterize the distribution in terms of positive parameters.
The joint prior distribution of factorizes into products of inverse Gamma distributions for all (positive) parameters of
. Moreover, in the case of
'gamma_mixture'
, the joint prior also consists of another term to the Dirichlet prior distribution on the mixing proportions.
The prop_scale
argument should be a vector with length equal to the length of vector (number of elements in
), containing (positive) numbers which correspond to the scale of the proposal distribution. Note that these scale parameters are used only as initial values in case where
adjust_scales = TRUE
.
An object of class bayesCureModel
, i.e. a list with the following entries
mcmc_sample |
Object of class
|
latent_status_censored |
A data frame with the simulated binary latent status for each censored item. |
complete_log_likelihood |
The complete log-likelihood for the target chain. |
swap_accept_rate |
the acceptance rate of proposed swappings between adjacent MCMC chains. |
all_cll_values |
The complete log-likelihood for all chains. |
input_data_and_model_prior |
the input data, model specification and selected prior parameters values. |
log_posterior |
the logarithm of the (non-augmented) posterior distribution (after integrating the latent cured-status parameters out), up to a normalizing constant. |
map_estimate |
The Maximum A Posterior estimate of parameters. |
BIC |
Bayesian Information Criterion of the fitted model. |
AIC |
Akaike Information Criterion of the fitted model. |
residuals |
The Cox-Snell residuals of the fitted model. |
The core function is cure_rate_mcmc
.
Panagiotis Papastamoulis
Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.
Plummer M, Best N, Cowles K, Vines K (2006). "CODA: Convergence Diagnosis and Output Analysis for MCMC." R News, 6(1), 7-11.
Therneau T (2024). A Package for Survival Analysis in R. R package version 3.7-0, https://CRAN.R-project.org/package=survival.
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'weibull'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep = 2)
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'weibull'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep = 2)
This is core MCMC function. The continuous parameters of the model are updated using (a) single-site Metropolis-Hastings steps and (b) a Metropolis adjusted Langevin diffusion step. The binary latent variables of the model (cured status per censored observation) are updated according to a Gibbs step. This function is embedded to the main function of the package cure_rate_MC3
which runs parallel tempered MCMC chains.
cure_rate_mcmc(y, X, Censoring_status, m, alpha = 1, mu_g = 1, s2_g = 1, a_l = 2.1, b_l = 1.1, promotion_time = list(distribution = "weibull", prior_parameters = matrix(rep(c(2.1, 1.1), 2), byrow = TRUE, 2, 2), prop_scale = c(0.2, 0.03)), mu_b = NULL, Sigma = NULL, g_prop_sd = 0.045, lambda_prop_scale = 0.03, b_prop_sd = NULL, initialValues = NULL, plot = FALSE, verbose = FALSE, tau_mala = 1.5e-05, mala = 0.15, single_MH_in_f = 0.5, c_under = 1e-9)
cure_rate_mcmc(y, X, Censoring_status, m, alpha = 1, mu_g = 1, s2_g = 1, a_l = 2.1, b_l = 1.1, promotion_time = list(distribution = "weibull", prior_parameters = matrix(rep(c(2.1, 1.1), 2), byrow = TRUE, 2, 2), prop_scale = c(0.2, 0.03)), mu_b = NULL, Sigma = NULL, g_prop_sd = 0.045, lambda_prop_scale = 0.03, b_prop_sd = NULL, initialValues = NULL, plot = FALSE, verbose = FALSE, tau_mala = 1.5e-05, mala = 0.15, single_MH_in_f = 0.5, c_under = 1e-9)
y |
observed data (time-to-event or censored time) |
X |
design matrix. Should contain a column of 1's if the model has a constant term. |
Censoring_status |
binary variables corresponding to time-to-event and censoring. |
m |
number of MCMC iterations. |
alpha |
A value between 0 and 1, corresponding to the temperature of the complete posterior distribution. The target posterior distribution corresponds to |
mu_g |
Parameter |
s2_g |
Parameter |
a_l |
Shape parameter |
b_l |
Scale parameter |
promotion_time |
A list containing the specification of the promotion time distribution. See 'details'. |
mu_b |
Mean |
Sigma |
Covariance matrix of the multivariate normal prior distribution of regression coefficients. |
g_prop_sd |
The scale of the proposal distribution for single-site updates of the |
lambda_prop_scale |
The scale of the proposal distribution for single-site updates of the |
b_prop_sd |
The scale of the proposal distribution for the update of the |
initialValues |
A list of initial values for each parameter (optional). |
plot |
Boolean for plotting on the run. |
verbose |
Boolean for printing progress on the run. |
tau_mala |
scale of the MALA proposal. |
mala |
Propability of attempting a MALA step. Otherwise, a simple MH move is attempted. |
single_MH_in_f |
Probability of attempting a single-site MH move in the basic Metropolis-Hastings step. Otherwise, a joint update is attempted. |
c_under |
A small positive number (much smaller than 1) which is used as a threshold in the CDF of the promotion time for avoiding underflows in the computation of the log-likelihood function. Default value: 1e-9. |
A list containing the following entries
mcmc_sample |
The sampled MCMC values per parameter. See 'note'. |
complete_log_likelihood |
Logarithm of the complete likelihood per MCMC iteration. |
acceptance_rates |
The acceptance rate per move. |
latent_status_censored |
The MCMC sample of the latent status per censored observation. |
log_prior_density |
Logarithm of the prior density per MCMC iteration. |
In the case where the promotion time distribution is a mixture model, the mixing proportions are reparameterized according to the following transformation
where for
and
. The sampler returns the parameters
, not the mixing proportions.
Panagiotis Papastamoulis
Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.
# simulate toy data just for cran-check purposes set.seed(1) n = 10 stat = rbinom(n, size = 1, prob = 0.5) x <- cbind(1, matrix(rnorm(2*n), n, 2)) y <- rexp(n) # run a weibull model (including const. term) # for m = 10 mcmc iterations fit1 <- cure_rate_mcmc(y = y, X = x, Censoring_status = stat, plot = FALSE, promotion_time = list(distribution = 'weibull', prior_parameters = matrix(rep(c(2.1, 1.1), 2), byrow = TRUE, 2, 2), prop_scale = c(0.1, 0.1) ), m = 10) # the generated mcmc sampled values fit1$mcmc_sample
# simulate toy data just for cran-check purposes set.seed(1) n = 10 stat = rbinom(n, size = 1, prob = 0.5) x <- cbind(1, matrix(rnorm(2*n), n, 2)) y <- rexp(n) # run a weibull model (including const. term) # for m = 10 mcmc iterations fit1 <- cure_rate_mcmc(y = y, X = x, Censoring_status = stat, plot = FALSE, promotion_time = list(distribution = 'weibull', prior_parameters = matrix(rep(c(2.1, 1.1), 2), byrow = TRUE, 2, 2), prop_scale = c(0.1, 0.1) ), m = 10) # the generated mcmc sampled values fit1$mcmc_sample
The Dagum distribution as evaluated at the VGAM package.
log_dagum(y, a1, a2, a3, c_under = 1e-09)
log_dagum(y, a1, a2, a3, c_under = 1e-09)
y |
observed data |
a1 |
scale parameter |
a2 |
shape1.a parameter |
a3 |
shape2.p parameter |
c_under |
A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9. |
The Dagum distribution is a special case of the 4-parameter generalized beta II distribution.
A list containing the following entries
log_f |
natural logarithm of the pdf, evaluated at each datapoint. |
log_F |
natural logarithm of the CDF, evaluated at each datapoint. |
Panagiotis Papastamoulis
Thomas W. Yee (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. New York, USA: Springer.
log_dagum(y = 1:10, a1 = 1, a2 = 1, a3 = 1, c_under = 1e-9)
log_dagum(y = 1:10, a1 = 1, a2 = 1, a3 = 1, c_under = 1e-9)
Computes the pdf and cdf of the Gamma distribution.
log_gamma(y, a1, a2, c_under = 1e-09)
log_gamma(y, a1, a2, c_under = 1e-09)
y |
observed data |
a1 |
shape parameter |
a2 |
rate parameter |
c_under |
A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9. |
A list containing the following entries
log_f |
natural logarithm of the pdf, evaluated at each datapoint. |
log_F |
natural logarithm of the CDF, evaluated at each datapoint. |
Panagiotis Papastamoulis
log_gamma(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
log_gamma(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
Computes the logarithm of the probability density function and cumulative density function per observation for each observation under a Gamma mixture model.
log_gamma_mixture(y, a1, a2, p, c_under = 1e-09)
log_gamma_mixture(y, a1, a2, p, c_under = 1e-09)
y |
observed data |
a1 |
vector containing the shape parameters of each Gamma mixture component |
a2 |
vector containing the rate parameters of each Gamma mixture component |
p |
vector of mixing proportions |
c_under |
threshold for underflows. |
A list containing the following entries
log_f |
natural logarithm of the pdf, evaluated at each datapoint. |
log_F |
natural logarithm of the CDF, evaluated at each datapoint. |
Panagiotis Papastamoulis
y <- runif(10) a1 <- c(1,2) a2 <- c(1,1) p <- c(0.9,0.1) log_gamma_mixture(y, a1, a2, p)
y <- runif(10) a1 <- c(1,2) a2 <- c(1,1) p <- c(0.9,0.1) log_gamma_mixture(y, a1, a2, p)
The Gompertz distribution as evaluated at the flexsurv package.
log_gompertz(y, a1, a2, c_under = 1e-09)
log_gompertz(y, a1, a2, c_under = 1e-09)
y |
observed data |
a1 |
shape parameter |
a2 |
rate parameter |
c_under |
A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9. |
A list containing the following entries
log_f |
natural logarithm of the pdf, evaluated at each datapoint. |
log_F |
natural logarithm of the CDF, evaluated at each datapoint. |
Panagiotis Papastamoulis
Christopher Jackson (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 1-33. doi:10.18637/jss.v070.i08
log_gompertz(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
log_gompertz(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
The log-Logistic distribution as evaluated at the flexsurv package.
log_logLogistic(y, a1, a2, c_under = 1e-09)
log_logLogistic(y, a1, a2, c_under = 1e-09)
y |
observed data |
a1 |
shape parameter |
a2 |
scale parameter |
c_under |
A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9. |
The log-logistic distribution is the probability distribution of a random variable whose logarithm has a logistic distribution.
A list containing the following entries
log_f |
natural logarithm of the pdf, evaluated at each datapoint. |
log_F |
natural logarithm of the CDF, evaluated at each datapoint. |
Panagiotis Papastamoulis
Christopher Jackson (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 1-33. doi:10.18637/jss.v070.i08
log_logLogistic(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
log_logLogistic(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
The Lomax distribution as evaluated at the VGAM package.
log_lomax(y, a1, a2, c_under = 1e-09)
log_lomax(y, a1, a2, c_under = 1e-09)
y |
observed data |
a1 |
scale parameter |
a2 |
shape parameter |
c_under |
A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9. |
The Lomax distribution is a special case of the 4-parameter generalized beta II distribution.
A list containing the following entries
log_f |
natural logarithm of the pdf, evaluated at each datapoint. |
log_F |
natural logarithm of the CDF, evaluated at each datapoint. |
Panagiotis Papastamoulis
Thomas W. Yee (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. New York, USA: Springer.
log_lomax(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
log_lomax(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
This function computes the logarithm of the probability density function and cumulative density function per observation for each observation under a user-defined mixture of a given family of distributions. The parameters of the given family of distributions should belong to (0, inf).
log_user_mixture(user_f, y, a, p, c_under = 1e-09)
log_user_mixture(user_f, y, a, p, c_under = 1e-09)
user_f |
a user defined function that returns the logarithm of a given probability density and the corresponding logarithm of the cumulative distribution function. These arguments should be returned in the form of a list with two entries: |
y |
observed data |
a |
a matrix where each column corresponds to component specific parameters and the columns to different components. All parameters should be positive. The number of columns should be the same with the number of mixture components. |
p |
vector of mixing proportions |
c_under |
threshold for underflows. |
A list containing the following entries
log_f |
natural logarithm of the pdf, evaluated at each datapoint. |
log_F |
natural logarithm of the CDF, evaluated at each datapoint. |
Panagiotis Papastamoulis
# We will define a mixture of 2 exponentials distributions. # First we pass the exponential distribution at user_f user_f <- function(y, a){ log_f <- dexp(y, rate = a, log = TRUE) log_F <- pexp(y, rate = a, log.p = TRUE) result <- vector('list', length = 2) names(result) <- c('log_f', 'log_F') result[["log_f"]] = log_f result[["log_F"]] = log_F return(result) } # simulate some date y <- runif(10) # Now compute the log of pdf and cdf for a mixture of K=2 exponentials p <- c(0.9,0.1) a <- matrix(c(0.1, 1.5), nrow = 1, ncol = 2) log_user_mixture(user_f = user_f, y = y, a = a, p = p)
# We will define a mixture of 2 exponentials distributions. # First we pass the exponential distribution at user_f user_f <- function(y, a){ log_f <- dexp(y, rate = a, log = TRUE) log_F <- pexp(y, rate = a, log.p = TRUE) result <- vector('list', length = 2) names(result) <- c('log_f', 'log_F') result[["log_f"]] = log_f result[["log_F"]] = log_F return(result) } # simulate some date y <- runif(10) # Now compute the log of pdf and cdf for a mixture of K=2 exponentials p <- c(0.9,0.1) a <- matrix(c(0.1, 1.5), nrow = 1, ncol = 2) log_user_mixture(user_f = user_f, y = y, a = a, p = p)
Computes the log pdf and cdf of the weibull distribution.
log_weibull(y, a1, a2, c_under)
log_weibull(y, a1, a2, c_under)
y |
observed data |
a1 |
shape parameter |
a2 |
rate parameter |
c_under |
A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9. |
A list containing the following entries
log_f |
natural logarithm of the pdf, evaluated at each datapoint. |
log_F |
natural logarithm of the CDF, evaluated at each datapoint. |
Panagiotis Papastamoulis
log_weibull(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
log_weibull(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)
Method to extract the log-likelihood of a bayesCureModel
object.
## S3 method for class 'bayesCureModel' logLik(object, ...)
## S3 method for class 'bayesCureModel' logLik(object, ...)
object |
An object of class |
... |
ignored. |
The maximum (observed) log-likelihood value obtained across the MCMC run.
Panagiotis Papastamoulis
Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) ll <- logLik(fit1)
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) ll <- logLik(fit1)
The variable of interest (time
) corresponds to the duration (in years) of first marriage for 1500 individuals. The available covariates are:
age
age of respondent (in years) at the time of first marriage. The values are standardized (sample mean and variance equal to 0 and 1, respectively).
kids
factor: whether there were kids during the first marriage ("yes"
) or not ("no"
).
race
race of respondent with levels: "black"
, "hispanic"
and "other"
.
Among the 1500 observations, there are 1018 censoring times (censoring = 0
) and 482 divorces (censoring = 1
). Source: National Longitudinal Survey of Youth 1997 (NLSY97).
data(marriage_dataset)
data(marriage_dataset)
Time-to-event data.
Bureau of Labor Statistics, U.S. Department of Labor. National Longitudinal Survey of Youth 1997 cohort, 1997-2022 (rounds 1-20). Produced and distributed by the Center for Human Resource Research (CHRR), The Ohio State University. Columbus, OH: 2023.
Plots and computes HDIs.
## S3 method for class 'bayesCureModel' plot(x, burn = NULL, alpha = 0.05, gamma_mix = TRUE, K_gamma = 5, plot_graphs = TRUE, bw = "nrd0", what = NULL, predict_output = NULL, index_of_main_mode = NULL, draw_legend = TRUE,...)
## S3 method for class 'bayesCureModel' plot(x, burn = NULL, alpha = 0.05, gamma_mix = TRUE, K_gamma = 5, plot_graphs = TRUE, bw = "nrd0", what = NULL, predict_output = NULL, index_of_main_mode = NULL, draw_legend = TRUE,...)
x |
An object of class |
burn |
Number of iterations to discard as burn-in period. |
alpha |
A value between 0 and 1 in order to compute the 1- |
gamma_mix |
Boolean. If TRUE, the density of the marginal posterior distribution of the |
K_gamma |
Used only when |
plot_graphs |
Boolean, if FALSE only HDIs are computed. |
bw |
bandwidth |
what |
Integer or a character string with possible values equal to |
predict_output |
Optional argument which is required only when |
index_of_main_mode |
If NULL (default), the whole MCMC output is used for plotting. Otherwise, it is a subset of the retained MCMC iterations in order to identify the main mode of the posterior distribution, as returned by the |
draw_legend |
Boolean. If TRUE (default), a legend is plotted in the case where |
... |
arguments passed by other methods. |
The function plots graphic output on the plot device if plot_graphs = TRUE
. Furthermore, a list of Highest Density Intervals per parameter is returned.
Panagiotis Papastamoulis
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) mySummary <- summary(fit1, burn = 0) # plot the marginal posterior distribution of the first parameter in returned mcmc output plot(fit1, what = 1, burn = 0)
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) mySummary <- summary(fit1, burn = 0) # plot the marginal posterior distribution of the first parameter in returned mcmc output plot(fit1, what = 1, burn = 0)
Plot the output of the predict method.
## S3 method for class 'predict_bayesCureModel' plot(x, what = 'survival', draw_legend = TRUE,...)
## S3 method for class 'predict_bayesCureModel' plot(x, what = 'survival', draw_legend = TRUE,...)
x |
An object of class |
what |
Character with possible values: |
draw_legend |
Boolean. If TRUE (default), a legend is plotted in the case where |
... |
arguments passed by other methods. |
No value, just a plot.
Panagiotis Papastamoulis
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) #compute predictions for two individuals with # x1 = 0.2 and x2 = -1 # and # x1 = -1 and x2 = 0 covariate_levels1 <- data.frame(x1 = c(0.2,-1), x2 = c(-1,0)) predictions <- predict(fit1, newdata = covariate_levels1, burn = 0) # plot cured probabilities based on the previous output plot(predictions, what='cured_prob')
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) #compute predictions for two individuals with # x1 = 0.2 and x2 = -1 # and # x1 = -1 and x2 = 0 covariate_levels1 <- data.frame(x1 = c(0.2,-1), x2 = c(-1,0)) predictions <- predict(fit1, newdata = covariate_levels1, burn = 0) # plot cured probabilities based on the previous output plot(predictions, what='cured_prob')
Returns MAP estimates of the survival function and the conditional cured probability for a given set of covariates.
## S3 method for class 'bayesCureModel' predict(object, newdata = NULL, tau_values = NULL, burn = NULL, K_max = 1, alpha = 0.1, nDigits = 3, verbose = FALSE, ...)
## S3 method for class 'bayesCureModel' predict(object, newdata = NULL, tau_values = NULL, burn = NULL, K_max = 1, alpha = 0.1, nDigits = 3, verbose = FALSE, ...)
object |
An object of class |
newdata |
A |
tau_values |
A vector of values for the response variable (time) for returning predictions for each row in the |
burn |
Positive integer corresponding to the number of mcmc iterations to discard as burn-in period |
K_max |
Maximum number of components in order to cluster the (univariate) values of the joint posterior distribution across the MCMC run. Used to identify the main mode of the posterior distribution. |
alpha |
Scalar between 0 and 1 corresponding to 1 - confidencel level for computing Highest Density Intervals. If set to NULL, the confidence intervals are not computed. |
nDigits |
A positive integer for printing the output, after rounding to the corresponding number of digits. Default: |
verbose |
Boolean. If set to TRUE (default) the function prints a summary of the predictions. |
... |
ignored. |
The values of the posterior draws are clustered according to a (univariate) normal mixture model, and the main mode corresponds to the cluster with the largest mean. The maximum number of mixture components corresponds to the K_max
argument. The mclust library is used for this purpose. The inference for the latent cure status of each (censored) observation is based on the MCMC draws corresponding to the main mode of the posterior distribution. The FDR is controlled according to the technique proposed in Papastamoulis and Rattray (2018).
In case where covariate_levels
is set to TRUE
, the summary
function also returns a list named p_cured_output
with the following entries
It is returned only in the case where the argument covariate_values
is not NULL
. A vector of posterior cured probabilities for the given values in covariate_values
, per retained MCMC draw.
It is returned only in the case where the argument covariate_values
is not NULL
. The cured probabilities computed at the MAP estimate of the parameters, for the given values covariate_values
.
tau values
covariate levels
the subset of MCMC draws allocated to the main mode of the posterior distribution.
A list with the following entries
map_estimate |
Maximum A Posteriori (MAP) estimate of the parameters of the model. |
highest_density_intervals |
Highest Density Interval per parameter |
latent_cured_status |
Estimated posterior probabilities of the latent cure status per censored subject. |
cured_at_given_FDR |
Classification as cured or not, at given FDR level. |
p_cured_output |
It is returned only in the case where the argument |
main_mode_index |
The retained MCMC iterations which correspond to the main mode of the posterior distribution. |
Panagiotis Papastamoulis
Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.
Papastamoulis and Rattray (2018). A Bayesian Model Selection Approach for Identifying Differentially Expressed Transcripts from RNA Sequencing Data, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 67, Issue 1.
Scrucca L, Fraley C, Murphy TB, Raftery AE (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC. ISBN 978-1032234953
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) newdata <- data.frame(x1 = c(0.2,-1), x2 = c(-1,0)) # return predicted values at tau = c(0.5, 1) my_prediction <- predict(fit1, newdata = newdata, burn = 0, tau_values = c(0.5, 1))
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) newdata <- data.frame(x1 = c(0.2,-1), x2 = c(-1,0)) # return predicted values at tau = c(0.5, 1) my_prediction <- predict(fit1, newdata = newdata, burn = 0, tau_values = c(0.5, 1))
This function prints a summary of objects returned by the cure_rate_MC3
function.
## S3 method for class 'bayesCureModel' print(x, ...)
## S3 method for class 'bayesCureModel' print(x, ...)
x |
An object of class |
... |
ignored. |
The function prints some basic information for a cure_rate_MC3
, such as the MAP estimate of model parameters and the value of Bayesian information criterion.
No return value, called for side effects.
Panagiotis Papastamoulis
predict
object
This function prints a summary of objects returned by the predict.cure_rate_MC3
method.
## S3 method for class 'predict_bayesCureModel' print(x, ...)
## S3 method for class 'predict_bayesCureModel' print(x, ...)
x |
An object of class |
... |
ignored. |
The function prints some basic information for the predict
method of a bayesCureModel
object.
No return value, called for side effects.
Panagiotis Papastamoulis
This function prints a summary of objects returned by the summary.cure_rate_MC3
method.
## S3 method for class 'summary_bayesCureModel' print(x, digits = 2, ...)
## S3 method for class 'summary_bayesCureModel' print(x, digits = 2, ...)
x |
An object of class |
digits |
Number of digits to print. |
... |
ignored. |
The function prints some basic information for the summary of a bayesCureModel
object.
No return value, called for side effects.
Panagiotis Papastamoulis
Methods for computing residuals for an object of class bayesCureModel
. The Cox-Snell residuals are available for now.
## S3 method for class 'bayesCureModel' residuals(object, type = "cox-snell",...)
## S3 method for class 'bayesCureModel' residuals(object, type = "cox-snell",...)
object |
An object of class |
type |
The type of residuals to be computed. |
... |
ignored. |
A vector of residuals.
Panagiotis Papastamoulis
Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) my_residuals <- residuals(fit1)
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) my_residuals <- residuals(fit1)
A synthetic dataset generated from a bimodal promotion time distribution. The available covariates are:
x1
continuous.
x2
factor with three levels.
Among the 500 observations, there are 123 censoring times (censoring = 0
) and 377 "events" (censoring = 1
). The true status (cured or susceptible) is contained in the column true_status
and contains 59 cured and 441 susceptible subjects.
data(sim_mix_data)
data(sim_mix_data)
Time-to-event data.
This function produces all summaries after fitting a cure rate model.
## S3 method for class 'bayesCureModel' summary(object, burn = NULL, gamma_mix = TRUE, K_gamma = 3, K_max = 3, fdr = 0.1, covariate_levels = NULL, yRange = NULL, alpha = 0.1, quantiles = c(0.05, 0.5, 0.95), verbose = FALSE, ...)
## S3 method for class 'bayesCureModel' summary(object, burn = NULL, gamma_mix = TRUE, K_gamma = 3, K_max = 3, fdr = 0.1, covariate_levels = NULL, yRange = NULL, alpha = 0.1, quantiles = c(0.05, 0.5, 0.95), verbose = FALSE, ...)
object |
An object of class |
burn |
Positive integer corresponding to the number of mcmc iterations to discard as burn-in period |
gamma_mix |
Boolean. If TRUE, the density of the marginal posterior distribution of the |
K_gamma |
Used only when |
K_max |
Maximum number of components in order to cluster the (univariate) values of the joint posterior distribution across the MCMC run. Used to identify the main mode of the posterior distribution. See details. |
fdr |
The target value for controlling the False Discovery Rate when classifying subjects as cured or not. |
covariate_levels |
Optional |
yRange |
Optional range (a vector of two non-negative values) for computing the sequence of posterior probabilities for the given values in |
alpha |
Scalar between 0 and 1 corresponding to 1 - confidencel level for computing Highest Density Intervals. If set to NULL, the confidence intervals are not computed. |
quantiles |
A vector of quantiles to evaluate for each variable. |
verbose |
Boolean: if TRUE the function prints the summary. |
... |
ignored. |
The values of the posterior draws are clustered according to a (univariate) normal mixture model, and the main mode corresponds to the cluster with the largest mean. The maximum number of mixture components corresponds to the K_max
argument. The mclust library is used for this purpose. The inference for the latent cure status of each (censored) observation is based on the MCMC draws corresponding to the main mode of the posterior distribution. The FDR is controlled according to the technique proposed in Papastamoulis and Rattray (2018).
In case where covariate_levels
is set to TRUE
, the summary
function also returns a list named p_cured_output
with the following entries
It is returned only in the case where the argument covariate_values
is not NULL
. A vector of posterior cured probabilities for the given values in covariate_values
, per retained MCMC draw.
It is returned only in the case where the argument covariate_values
is not NULL
. The cured probabilities computed at the MAP estimate of the parameters, for the given values covariate_values
.
tau values
covariate levels
the subset of MCMC draws allocated to the main mode of the posterior distribution.
A list with the following entries
map_estimate |
Maximum A Posteriori (MAP) estimate of the parameters of the model. |
highest_density_intervals |
Highest Density Interval per parameter |
latent_cured_status |
Estimated posterior probabilities of the latent cure status per censored subject. |
cured_at_given_FDR |
Classification as cured or not, at given FDR level. |
p_cured_output |
It is returned only in the case where the argument |
main_mode_index |
The retained MCMC iterations which correspond to the main mode of the posterior distribution. |
Panagiotis Papastamoulis
Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.
Papastamoulis and Rattray (2018). A Bayesian Model Selection Approach for Identifying Differentially Expressed Transcripts from RNA Sequencing Data, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 67, Issue 1.
Scrucca L, Fraley C, Murphy TB, Raftery AE (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC. ISBN 978-1032234953
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) mySummary <- summary(fit1, burn = 0)
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) mySummary <- summary(fit1, burn = 0)
This function produces MCMC summaries for an object of class predict_bayesCureModel
.
## S3 method for class 'predict_bayesCureModel' summary(object, ...)
## S3 method for class 'predict_bayesCureModel' summary(object, ...)
object |
An object of class |
... |
Other options passed to the |
A list with the following entries
survival |
MCMC summaries (quantiles) for the survival function. |
cured_probability |
MCMC summaries (quantiles) for the conditional cured probability. |
cumulative_hazard |
MCMC summaries (quantiles) for the cumulative hazard function. |
hazard_rate |
MCMC summaries (quantiles) for the hazard rate function. |
Panagiotis Papastamoulis
Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) newdata <- data.frame(x1 = c(0.2,-1), x2 = c(-1,0)) # return predicted values at tau = c(0.5, 1) my_prediction <- predict(fit1, newdata = newdata, burn = 0, tau_values = c(0.5, 1)) my_summary <- summary(my_prediction, quantiles = c(0.1,0.9))
# simulate toy data just for cran-check purposes set.seed(10) n = 4 # censoring indicators stat = rbinom(n, size = 1, prob = 0.5) # covariates x <- matrix(rnorm(2*n), n, 2) # observed response variable y <- rexp(n) # define a data frame with the response and the covariates my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2]) # run a weibull model with default prior setup # considering 2 heated chains fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, promotion_time = list(distribution = 'exponential'), nChains = 2, nCores = 1, mcmc_cycles = 3, sweep=2) newdata <- data.frame(x1 = c(0.2,-1), x2 = c(-1,0)) # return predicted values at tau = c(0.5, 1) my_prediction <- predict(fit1, newdata = newdata, burn = 0, tau_values = c(0.5, 1)) my_summary <- summary(my_prediction, quantiles = c(0.1,0.9))