Package 'bayesCureRateModel'

Title: Bayesian Cure Rate Modeling for Time-to-Event Data
Description: A fully Bayesian approach in order to estimate a general family of cure rate models under the presence of covariates, see Papastamoulis and Milienos (2024) <doi:10.1007/s11749-024-00942-w>. The promotion time can be modelled (a) parametrically using typical distributional assumptions for time to event data (including the Weibull, Exponential, Gompertz, log-Logistic distributions), or (b) semiparametrically using finite mixtures of distributions. In both cases, user-defined families of distributions are allowed under some specific requirements. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution.
Authors: Panagiotis Papastamoulis [aut, cre] , Fotios Milienos [aut]
Maintainer: Panagiotis Papastamoulis <[email protected]>
License: GPL-2
Version: 1.3
Built: 2024-10-04 08:31:06 UTC
Source: CRAN

Help Index


Bayesian Cure Rate Modeling for Time-to-Event Data

Description

A fully Bayesian approach in order to estimate a general family of cure rate models under the presence of covariates, see Papastamoulis and Milienos (2024) <doi:10.1007/s11749-024-00942-w>. The promotion time can be modelled (a) parametrically using typical distributional assumptions for time to event data (including the Weibull, Exponential, Gompertz, log-Logistic distributions), or (b) semiparametrically using finite mixtures of distributions. In both cases, user-defined families of distributions are allowed under some specific requirements. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution.

The main function of the package is cure_rate_MC3. See details for a brief description of the model.

Details

Let y=(y1,,yn)\boldsymbol{y} = (y_1,\ldots,y_n) denote the observed data, which correspond to time-to-event data or censoring times. Let also xi=(xi1,,xxip)\boldsymbol{x}_i = (x_{i1},\ldots,x_{x_{ip}})' denote the covariates for subject ii, i=1,,ni=1,\ldots,n.

Assuming that the nn observations are independent, the observed likelihood is defined as

L=L(θ;y,x)=i=1nfP(yi;θ,xi)δiSP(yi;θ,xi)1δi,L=L({\boldsymbol \theta}; {\boldsymbol y}, {\boldsymbol x})=\prod_{i=1}^{n}f_P(y_i;{\boldsymbol\theta},{\boldsymbol x}_i)^{\delta_i}S_P(y_i;{\boldsymbol \theta},{\boldsymbol x}_i)^{1-\delta_i},

where δi=1\delta_i=1 if the ii-th observation corresponds to time-to-event while δi=0\delta_i=0 indicates censoring time. The parameter vector θ\boldsymbol\theta is decomposed as

θ=(α,β,γ,λ)\boldsymbol\theta = (\boldsymbol\alpha', \boldsymbol\beta', \gamma,\lambda)

where

  • α=(α1,,αd)A\boldsymbol\alpha = (\alpha_1,\ldots,\alpha_d)'\in\mathcal A are the parameters of the promotion time distribution whose cumulative distribution and density functions are denoted as F(,α)F(\cdot,\boldsymbol\alpha) and f(,α)f(\cdot,\boldsymbol\alpha), respectively.

  • βRk\boldsymbol\beta\in\mathbf R^{k} are the regression coefficients with kk denoting the number of columns in the design matrix (it may include a constant term or not).

  • γR\gamma\in\mathbf R

  • λ>0\lambda > 0.

The population survival and density functions are defined as

SP(y;θ)=(1+γexp{xiβ}cγexp{xiβ}F(y;α)λ)1/γS_P(y;\boldsymbol\theta) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}F(y;\boldsymbol\alpha)^\lambda\right)^{-1/\gamma}

whereas,

fP(y;θ)=SP(y;θ)y.f_P(y;\boldsymbol\theta)=-\frac{\partial S_P(y;\boldsymbol\theta)}{\partial y}.

Finally, the cure rate is affected through covariates and parameters as follows

p0(xi;θ)=(1+γexp{xiβ}cγexp{xiβ})1/γp_0(\boldsymbol{x}_i;\boldsymbol{\theta}) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}\right)^{-1/\gamma}

where c=ee1c = e^{e^{-1}}.

The promotion time distribution can be a member of standard families (currently available are the following: Exponential, Weibull, Gamma, Lomax, Gompertz, log-Logistic) and in this case α=(α1,α2)(0,)2\alpha = (\alpha_1,\alpha_2)\in (0,\infty)^2. Also considered is the Dagum distribution, which has three parameters (α1,α2,α3)(0,)3(\alpha_1,\alpha_2,\alpha_3)\in(0,\infty)^3. In case that the previous parametric assumptions are not justified, the promotion time can belong to the more flexible family of finite mixtures of Gamma distributions. For example, assume a mixture of two Gamma distributions of the form

f(y;α)=α5fG(y;α1,α3)+(1α5)fG(y;α2,α4),f(y;\boldsymbol \alpha) = \alpha_5 f_{\mathcal G}(y;\alpha_1,\alpha_3) + (1-\alpha_5) f_{\mathcal G}(y;\alpha_2,\alpha_4),

where

fG(y;α,β)=βαΓ(α)yα1exp{βy},y>0f_\mathcal{G}(y;\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}y^{\alpha-1}\exp\{-\beta y\}, y>0

denotes the density of the Gamma distribution with parameters α>0\alpha > 0 (shape) and β>0\beta > 0 (rate). For the previous model, the parameter vector is

α=(α1,α2,α3,α4,α5)A\boldsymbol\alpha = (\alpha_1,\alpha_2,\alpha_3,\alpha_4,\alpha_5)'\in\mathcal A

where A=(0,)4×(0,1)\mathcal A = (0,\infty)^4\times (0,1).

More generally, one can fit a mixture of K>2K>2 Gamma distributions. The appropriate model can be selected according to information criteria such as the BIC.

The binary vector I=(I1,,In)\boldsymbol{I} = (I_1,\ldots,I_n) contains the (latent) cure indicators, that is, Ii=1I_i = 1 if the ii-th subject is susceptible and Ii=0I_i = 0 if the ii-th subject is cured. Δ0\Delta_0 denotes the subset of {1,,n}\{1,\ldots,n\} containing the censored subjects, whereas Δ1=Δ0c\Delta_1 = \Delta_0^c is the (complementary) subset of uncensored subjects. The complete likelihood of the model is

Lc(θ;y,I)=iΔ1(1p0(xi,θ))fU(yi;θ,xi)iΔ0p0(xi,θ)1Ii{(1p0(xi,θ))SU(yi;θ,xi)}Ii.L_c(\boldsymbol{\theta};\boldsymbol{y}, \boldsymbol{I}) = \prod_{i\in\Delta_1}(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))f_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\\ \prod_{i\in\Delta_0}p_0(\boldsymbol{x}_i,\boldsymbol\theta)^{1-I_i}\{(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))S_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\}^{I_i}.

fUf_U and SUS_U denote the probability density and survival function of the susceptibles, respectively, that is

SU(yi;θ,xi)=SP(yi;θ,xi)p0(xi;θ)1p0(xi;θ),fU(yi;θ,xi)=fP(yi;θ,xi)1p0(xi;θ).S_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{S_P(y_i;\boldsymbol{\theta},{\boldsymbol x}_i)-p_0({\boldsymbol x}_i;\boldsymbol\theta)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}, f_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{f_P(y_i;\boldsymbol\theta,{\boldsymbol x}_i)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}.

Index of help topics:

bayesCureRateModel-package
                        Bayesian Cure Rate Modeling for Time-to-Event
                        Data
complete_log_likelihood_general
                        Logarithm of the complete log-likelihood for
                        the general cure rate model.
compute_fdr_tpr         Compute the achieved FDR and TPR
cure_rate_MC3           Main function of the package
cure_rate_mcmc          The basic MCMC scheme.
logLik.bayesCureModel   Extract the log-likelihood.
log_dagum               PDF and CDF of the Dagum distribution
log_gamma               PDF and CDF of the Gamma distribution
log_gamma_mixture       PDF and CDF of a Gamma mixture distribution
log_gompertz            PDF and CDF of the Gompertz distribution
log_logLogistic         PDF and CDF of the log-Logistic distribution.
log_lomax               PDF and CDF of the Lomax distribution
log_user_mixture        Define a finite mixture of a given family of
                        distributions.
log_weibull             PDF and CDF of the Weibull distribution
marriage_dataset        Marriage data
plot.bayesCureModel     Plot method
plot.predict_bayesCureModel
                        Plot method
predict.bayesCureModel
                        Predict method.
print.bayesCureModel    Print method
print.predict_bayesCureModel
                        Print method for the 'predict' object
print.summary_bayesCureModel
                        Print method for the summary
residuals.bayesCureModel
                        Computation of residuals.
sim_mix_data            Simulated dataset
summary.bayesCureModel
                        Summary method.
summary.predict_bayesCureModel
                        Summary method for predictions.

Author(s)

Panagiotis Papastamoulis and Fotios S. Milienos

Maintainer: Panagiotis Papastamoulis <[email protected]>

References

Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.

See Also

cure_rate_MC3

Examples

# TOY EXAMPLE (very small numbers... only for CRAN check purposes)
# simulate toy data 
	set.seed(10)
        n = 4
        # censoring indicators
        stat = rbinom(n, size = 1, prob = 0.5)
        # covariates
        x <- matrix(rnorm(2*n), n, 2)
        # observed response variable 
        y <- rexp(n)
#	define a data frame with the response and the covariates        
        my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
# run a weibull model with default prior setup
# considering 2 heated chains 
	fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, 
		data = my_data_frame, 
		promotion_time = list(distribution = 'weibull'),
		nChains = 2, 
		nCores = 1, 
		mcmc_cycles = 3, sweep=2)
#	print method
	fit1	
# 	summary method	
	summary1 <- summary(fit1)
	
# WARNING: the following parameters
#  mcmc_cycles, nChains
#        should take _larger_ values. E.g. a typical implementation consists of:
#        mcmc_cycles = 15000, nChains = 12

Logarithm of the complete log-likelihood for the general cure rate model.

Description

Compute the logarithm of the complete likelihood, given a realization of the latent binary vector of cure indicators I_sim and current values of the model parameters g, lambda, b and promotion time parameters (α\boldsymbol\alpha) which yield log-density values (one per observation) stored to the vector log_f and log-cdf values stored to the vector log_F.

Usage

complete_log_likelihood_general(y, X, Censoring_status, 
	g, lambda, log_f, log_F, b, I_sim, alpha)

Arguments

y

observed data (time-to-event or censored time)

X

design matrix. Should contain a column of 1's if the model has a constant term.

Censoring_status

binary variables corresponding to time-to-event and censoring.

g

The γ\gamma parameter of the model (real).

lambda

The λ\lambda parameter of the model (positive).

log_f

A vector containing the natural logarithm of the density function of the promotion time distribution per observation, for the current set of parameters. Its length should be equal to the sample size.

log_F

A vector containing the natural logarithm of the cumulative density function of the promotion time distribution per observation, for the current set of parameters. Its length should be equal to the sample size.

b

Vector of regression coefficients. Its length should be equal to the number of columns of the design matrix.

I_sim

Binary vector of the current value of the latent cured status per observation. Its length should be equal to the sample size. A time-to-event cannot be cured.

alpha

A parameter between 0 and 1, corresponding to the temperature of the complete posterior distribution.

Details

The complete likelihood of the model is

Lc(θ;y,I)=iΔ1(1p0(xi,θ))fU(yi;θ,xi)iΔ0p0(xi,θ)1Ii{(1p0(xi,θ))SU(yi;θ,xi)}Ii.L_c(\boldsymbol{\theta};\boldsymbol{y}, \boldsymbol{I}) = \prod_{i\in\Delta_1}(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))f_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\\ \prod_{i\in\Delta_0}p_0(\boldsymbol{x}_i,\boldsymbol\theta)^{1-I_i}\{(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))S_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\}^{I_i}.

fUf_U and SUS_U denote the probability density and survival function of the susceptibles, respectively, that is

SU(yi;θ,xi)=SP(yi;θ,xi)p0(xi;θ)1p0(xi;θ),fU(yi;θ,xi)=fP(yi;θ,xi)1p0(xi;θ).S_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{S_P(y_i;\boldsymbol{\theta},{\boldsymbol x}_i)-p_0({\boldsymbol x}_i;\boldsymbol\theta)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}, f_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{f_P(y_i;\boldsymbol\theta,{\boldsymbol x}_i)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}.

Value

A list with the following entries

cll

the complete log-likelihood for the current parameter values.

logS

Vector of logS values (one for each observation).

logP0

Vector of logP0 values (one for each observation).

Author(s)

Panagiotis Papastamoulis

References

Papastamoulis and Milienos (2023). Bayesian inference and cure rate modeling for event history data. arXiv:2310.06926.

Examples

# simulate toy data 
	set.seed(1)
	n = 4
	stat = rbinom(n, size = 1, prob = 0.5)
	x <- cbind(1, matrix(rnorm(n), n, 1))
	y <- rexp(n)
	lw <- log_weibull(y, a1 = 1, a2 = 1, c_under = 1e-9)
# compute complete log-likelihood
complete_log_likelihood_general(y = y, X = x, 
	Censoring_status = stat, 
	g = 1, lambda = 1, 
	log_f = lw$log_f, log_F = lw$log_F, 
	b = c(-0.5,0.5), 
	I_sim = stat, alpha = 1)

Compute the achieved FDR and TPR

Description

This help function computes the achieved False Discovery Rate (FDR) and True Positive Rate (TPR). Useful for simulation studies where the ground truth classification of subjects in susceptibles and cured items is known.

Usage

compute_fdr_tpr(true_latent_status, posterior_probs, 
	myCut = c(0.01, 0.05, 0.1, 0.15))

Arguments

true_latent_status

a binary vector containing the true latent status: 1 should correspond to the positive instances ("cured") and 0 to the negative ("susceptibles").

posterior_probs

a numeric vector with entries between 0 and 1 containing the scores (posterior probabilities) of being positive ("cured") for each item.

myCut

Vector containing the desired nominal FDR levels.

Value

This function will return for every nominal FDR level the following quantities:

achieved_fdr

the achieved false discovery rate.

tpr

the true positive rate.

nominal_fdr

the nominal FDR level.

Author(s)

Panagiotis Papastamoulis

Examples

set.seed(1)
v1 <- sample(0:1, size = 100, replace=TRUE, prob=c(0.8,0.2) )
v2 <- runif(100)
compute_fdr_tpr(true_latent_status = v1, posterior_probs = v2)

Main function of the package

Description

Runs a Metropolis Coupled MCMC (MC3^3) sampler in order to estimate the joint posterior distribution of the model.

Usage

cure_rate_MC3(formula, data, nChains = 12, mcmc_cycles = 15000, 
	alpha = NULL,nCores = 1, sweep = 5, mu_g = 1, s2_g = 1, 
	a_l = 2.1, b_l = 1.1, mu_b = NULL, 
	Sigma = NULL, g_prop_sd = 0.045, 
	lambda_prop_scale = 0.03, b_prop_sd = NULL, 
	initialValues = NULL, plot = TRUE, adjust_scales = FALSE, 
	verbose = FALSE, tau_mala = 1.5e-05, mala = 0.15, 
	promotion_time = list(distribution = "weibull", 
	prior_parameters = matrix(rep(c(2.1, 1.1), 2), byrow = TRUE, 2, 2), 
	prop_scale = c(0.1, 0.2)), single_MH_in_f = 0.2, c_under = 1e-9)

Arguments

formula

an object of class formula: a symbolic description of the model to be fitted. The left-hand side should be a Surv object, a class inherited from the survival package. The binary censoring indicators are interpreted as a time-to-event (1) or as a censoring time (0).

data

a data frame containing all variable names included in formula.

nChains

Positive integer corresponding to the number of heated chains in the MC3^3 scheme.

mcmc_cycles

Length of the generated MCMC sample. Default value: 15000. Note that each MCMC cycle consists of sweep (see below) usual MCMC iterations.

alpha

A decreasing sequence in [1,0)[1,0) of nChains temperatures (or heat values). The first value should always be equal to 1, which corresponds to the target posterior distribution (that is, the first chain).

nCores

The number of cores used for parallel processing. In case where nCores > 1, the nChains will be processed in parallel using nCores cores. This may speed up significantly the run-time of the algorithm in Linux or macOS, but it is not suggested in Windows.

sweep

The number of usual MCMC iterations per MCMC cycle. Default value: 10.

mu_g

Parameter aγa_{\gamma} of the prior distribution of γ\gamma.

s2_g

Parameter bγb_{\gamma} of the prior distribution of γ\gamma.

a_l

Shape parameter aλa_{\lambda} of the Inverse Gamma prior distribution of λ\lambda.

b_l

Scale parameter bλb_{\lambda} of the Inverse Gamma prior distribution of λ\lambda.

mu_b

Mean (μ\boldsymbol\mu) of the multivariate normal prior distribution of regression coefficients. Should be a vector whose length is equal to kk, i.e. the number of columns of the design matrix X. Default value: rep(0, k).

Sigma

Covariance matrix of the multivariate normal prior distribution of regression coefficients.

g_prop_sd

The scale of the proposal distribution for single-site updates of the γ\gamma parameter.

lambda_prop_scale

The scale of the proposal distribution for single-site updates of the λ\lambda parameter.

b_prop_sd

The scale of the proposal distribution for the update of the β\beta parameter (regression coefficients).

initialValues

A list of initial values for each parameter (optional).

plot

Plot MCMC sample on the run. Default: TRUE.

adjust_scales

Boolean. If TRUE the MCMC sampler runs an initial phase of a small number of iterations in order to tune the scale of the proposal distributions in the Metropolis-Hastings steps.

verbose

Print progress on the terminal if TRUE.

tau_mala

Scale of the Metropolis adjusted Langevin diffussion proposal distribution.

mala

A number between [0,1][0,1] indicating the proportion of times the sampler attempts a MALA proposal. Thus, the probability of attempting a typical Metropolis-Hastings move is equal to 1 - mala.

promotion_time

A list with details indicating the parametric family of distribution describing the promotion time and corresponding prior distributions. See 'details'.

single_MH_in_f

The probability for attempting a series of single site updates in the typical Metropolis-Hastings move. Otherwise, with probability 1 - single_MH_in_f a Metropolis-Hastings move will attempt to update all continuous parameters simultaneously. It only makes sense when mala < 1.

c_under

A small positive number (much smaller than 1) which is used as a threshold in the CDF of the promotion time for avoiding underflows in the computation of the log-likelihood function. Default value: 1e-9.

Details

It is advised to scale all continuous explanatory variables in the design matrix, so their sample mean and standard deviations are equal to 0 and 1, respectively. The promotion_time argument should be a list containing the following entries

distribution

Character string specifying the family of distributions {F(;α);αA}\{F(\cdot;\boldsymbol\alpha);\boldsymbol\alpha\in\mathcal A\} describing the promotion time.

prior_parameters

Values of hyper-parameters in the prior distribution of the parameters α\boldsymbol\alpha.

prop_scale

The scale of the proposal distributions for each parameter in α\boldsymbol\alpha.

K

Optional. The number of mixture components in case where a mixture model is fitted, that is, when setting distribution to either 'gamma_mixture' or 'user_mixture'.

dirichlet_concentration_parameter

Optional. Relevant only in the case of the 'gamma_mixture' or 'user_mixture'. Positive scalar (typically, set to 1) determining the (common) concentration parameter of the Dirichlet prior distribution of mixing proportions.

The distribution specifies the distributional family of promotion time and corresponds to a character string with available choices: 'exponential', 'weibull', 'gamma', 'logLogistic', 'gompertz', 'lomax', 'dagum', 'gamma_mixture'. User defined promotion time distributions and finite mixtures of them can be also fitted using the options 'user' and 'user_mixture', respectively. In this case, the user should specify the distributional family in a separate argument named define which is passed as an additional entry in the promotion_time. This function should accept two input arguments y and a corresponding to the observed data (vector of positive numbers) and the parameters of the distribution (vector of positives). Pay attention to the positivity requirement of the parameters: if this is not the case, the user should suitably reparameterize the distribution in terms of positive parameters.

The joint prior distribution of α=(α1,,αd)\boldsymbol\alpha = (\alpha_1,\ldots,\alpha_d) factorizes into products of inverse Gamma distributions for all (positive) parameters of FF. Moreover, in the case of 'gamma_mixture', the joint prior also consists of another term to the Dirichlet prior distribution on the mixing proportions.

The prop_scale argument should be a vector with length equal to the length of vector dd (number of elements in α\boldsymbol\alpha), containing (positive) numbers which correspond to the scale of the proposal distribution. Note that these scale parameters are used only as initial values in case where adjust_scales = TRUE.

Value

An object of class bayesCureModel, i.e. a list with the following entries

mcmc_sample

Object of class mcmc (see the coda package), containing the generated MCMC sample for the target chain. The column names correspond to

g_mcmc

Sampled values for parameter γ\gamma

lambda_mcmc

Sampled values for parameter λ\lambda

alpha1_mcmc...

Sampled values for parameter α1\alpha_1... of the promotion time distribution F(;α1,,αd)F(\cdot;\alpha_1,\ldots,\alpha_d). The subsequent d1d-1 columns contain the sampled values for all remaining parameters, α2,...,,αd\alpha_2,...,\ldots,\alpha_d, where dd depens on the family used in promotion_time.

b0_mcmc

Sampled values for the constant term of the regression (present only in the case where the design matrix X contains a column of 1s).

b1_mcmc...

Sampled values for the regression coefficient for the first explanatory variable, and similar for all subsequent columns.

latent_status_censored

A data frame with the simulated binary latent status for each censored item.

complete_log_likelihood

The complete log-likelihood for the target chain.

swap_accept_rate

the acceptance rate of proposed swappings between adjacent MCMC chains.

all_cll_values

The complete log-likelihood for all chains.

input_data_and_model_prior

the input data, model specification and selected prior parameters values.

log_posterior

the logarithm of the (non-augmented) posterior distribution (after integrating the latent cured-status parameters out), up to a normalizing constant.

map_estimate

The Maximum A Posterior estimate of parameters.

BIC

Bayesian Information Criterion of the fitted model.

AIC

Akaike Information Criterion of the fitted model.

residuals

The Cox-Snell residuals of the fitted model.

Note

The core function is cure_rate_mcmc.

Author(s)

Panagiotis Papastamoulis

References

Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.

Plummer M, Best N, Cowles K, Vines K (2006). "CODA: Convergence Diagnosis and Output Analysis for MCMC." R News, 6(1), 7-11.

Therneau T (2024). A Package for Survival Analysis in R. R package version 3.7-0, https://CRAN.R-project.org/package=survival.

See Also

cure_rate_mcmc

Examples

# simulate toy data just for cran-check purposes        
	set.seed(10)
        n = 4
        # censoring indicators
        stat = rbinom(n, size = 1, prob = 0.5)
        # covariates
        x <- matrix(rnorm(2*n), n, 2)
        # observed response variable 
        y <- rexp(n)
#	define a data frame with the response and the covariates        
        my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
	fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, 
		data = my_data_frame,
		promotion_time = list(distribution = 'weibull'),
		nChains = 2, nCores = 1, 
		mcmc_cycles = 3, sweep = 2)

The basic MCMC scheme.

Description

This is core MCMC function. The continuous parameters of the model are updated using (a) single-site Metropolis-Hastings steps and (b) a Metropolis adjusted Langevin diffusion step. The binary latent variables of the model (cured status per censored observation) are updated according to a Gibbs step. This function is embedded to the main function of the package cure_rate_MC3 which runs parallel tempered MCMC chains.

Usage

cure_rate_mcmc(y, X, Censoring_status, m, alpha = 1, mu_g = 1, s2_g = 1, 
	a_l = 2.1, b_l = 1.1, promotion_time = list(distribution = "weibull", 
	prior_parameters = matrix(rep(c(2.1, 1.1), 2), byrow = TRUE, 2, 2), 
	prop_scale = c(0.2, 0.03)), mu_b = NULL, Sigma = NULL, g_prop_sd = 0.045, 
	lambda_prop_scale = 0.03, b_prop_sd = NULL, initialValues = NULL, 
	plot = FALSE, verbose = FALSE, tau_mala = 1.5e-05, mala = 0.15, 
	single_MH_in_f = 0.5, c_under = 1e-9)

Arguments

y

observed data (time-to-event or censored time)

X

design matrix. Should contain a column of 1's if the model has a constant term.

Censoring_status

binary variables corresponding to time-to-event and censoring.

m

number of MCMC iterations.

alpha

A value between 0 and 1, corresponding to the temperature of the complete posterior distribution. The target posterior distribution corresponds to alpha = 1.

mu_g

Parameter aγa_{\gamma} of the prior distribution of γ\gamma.

s2_g

Parameter bγb_{\gamma} of the prior distribution of γ\gamma.

a_l

Shape parameter aλa_{\lambda} of the Inverse Gamma prior distribution of λ\lambda.

b_l

Scale parameter bλb_{\lambda} of the Inverse Gamma prior distribution of λ\lambda.

promotion_time

A list containing the specification of the promotion time distribution. See 'details'.

mu_b

Mean μ\mu of the multivariate normal prior distribution of regression coefficients. Should be a vector whose length is equal to the number of columns of the design matrix X.

Sigma

Covariance matrix of the multivariate normal prior distribution of regression coefficients.

g_prop_sd

The scale of the proposal distribution for single-site updates of the γ\gamma parameter.

lambda_prop_scale

The scale of the proposal distribution for single-site updates of the λ\lambda parameter.

b_prop_sd

The scale of the proposal distribution for the update of the β\beta parameter (regression coefficients).

initialValues

A list of initial values for each parameter (optional).

plot

Boolean for plotting on the run.

verbose

Boolean for printing progress on the run.

tau_mala

scale of the MALA proposal.

mala

Propability of attempting a MALA step. Otherwise, a simple MH move is attempted.

single_MH_in_f

Probability of attempting a single-site MH move in the basic Metropolis-Hastings step. Otherwise, a joint update is attempted.

c_under

A small positive number (much smaller than 1) which is used as a threshold in the CDF of the promotion time for avoiding underflows in the computation of the log-likelihood function. Default value: 1e-9.

Value

A list containing the following entries

mcmc_sample

The sampled MCMC values per parameter. See 'note'.

complete_log_likelihood

Logarithm of the complete likelihood per MCMC iteration.

acceptance_rates

The acceptance rate per move.

latent_status_censored

The MCMC sample of the latent status per censored observation.

log_prior_density

Logarithm of the prior density per MCMC iteration.

Note

In the case where the promotion time distribution is a mixture model, the mixing proportions w1,,wKw_1,\ldots,w_K are reparameterized according to the following transformation

wj=ρji=1Kρi,j=1,,Kw_j = \frac{\rho_j}{\sum_{i=1}^{K}\rho_i}, j = 1,\ldots,K

where ρi>0\rho_i > 0 for i=1,,K1i=1,\ldots,K-1 and ρK=1\rho_{K}=1. The sampler returns the parameters ρ1,,ρK1\rho_1,\ldots,\rho_{K-1}, not the mixing proportions.

Author(s)

Panagiotis Papastamoulis

References

Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.

See Also

cure_rate_MC3

Examples

# simulate toy data just for cran-check purposes        
        set.seed(1)
        n = 10
        stat = rbinom(n, size = 1, prob = 0.5)
        x <- cbind(1, matrix(rnorm(2*n), n, 2))
        y <- rexp(n)
# run a weibull model (including const. term) 
#	for m = 10 mcmc iterations 
        fit1 <- cure_rate_mcmc(y = y, X = x, Censoring_status = stat, 
              	plot = FALSE,
                promotion_time = list(distribution = 'weibull', 
                        prior_parameters = matrix(rep(c(2.1, 1.1), 2), 
                                                byrow = TRUE, 2, 2),
                        prop_scale = c(0.1, 0.1)
                ),
                m = 10)
#	the generated mcmc sampled values         
	fit1$mcmc_sample

PDF and CDF of the Dagum distribution

Description

The Dagum distribution as evaluated at the VGAM package.

Usage

log_dagum(y, a1, a2, a3, c_under = 1e-09)

Arguments

y

observed data

a1

scale parameter

a2

shape1.a parameter

a3

shape2.p parameter

c_under

A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9.

Details

The Dagum distribution is a special case of the 4-parameter generalized beta II distribution.

Value

A list containing the following entries

log_f

natural logarithm of the pdf, evaluated at each datapoint.

log_F

natural logarithm of the CDF, evaluated at each datapoint.

Author(s)

Panagiotis Papastamoulis

References

Thomas W. Yee (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. New York, USA: Springer.

See Also

ddagum

Examples

log_dagum(y = 1:10, a1 = 1, a2 = 1, a3 = 1, c_under = 1e-9)

PDF and CDF of the Gamma distribution

Description

Computes the pdf and cdf of the Gamma distribution.

Usage

log_gamma(y, a1, a2, c_under = 1e-09)

Arguments

y

observed data

a1

shape parameter

a2

rate parameter

c_under

A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9.

Value

A list containing the following entries

log_f

natural logarithm of the pdf, evaluated at each datapoint.

log_F

natural logarithm of the CDF, evaluated at each datapoint.

Author(s)

Panagiotis Papastamoulis

See Also

dgamma

Examples

log_gamma(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)

PDF and CDF of a Gamma mixture distribution

Description

Computes the logarithm of the probability density function and cumulative density function per observation for each observation under a Gamma mixture model.

Usage

log_gamma_mixture(y, a1, a2, p, c_under = 1e-09)

Arguments

y

observed data

a1

vector containing the shape parameters of each Gamma mixture component

a2

vector containing the rate parameters of each Gamma mixture component

p

vector of mixing proportions

c_under

threshold for underflows.

Value

A list containing the following entries

log_f

natural logarithm of the pdf, evaluated at each datapoint.

log_F

natural logarithm of the CDF, evaluated at each datapoint.

Author(s)

Panagiotis Papastamoulis

Examples

y <- runif(10)
a1 <- c(1,2)
a2 <- c(1,1)
p <- c(0.9,0.1)
log_gamma_mixture(y, a1, a2, p)

PDF and CDF of the Gompertz distribution

Description

The Gompertz distribution as evaluated at the flexsurv package.

Usage

log_gompertz(y, a1, a2, c_under = 1e-09)

Arguments

y

observed data

a1

shape parameter

a2

rate parameter

c_under

A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9.

Value

A list containing the following entries

log_f

natural logarithm of the pdf, evaluated at each datapoint.

log_F

natural logarithm of the CDF, evaluated at each datapoint.

Author(s)

Panagiotis Papastamoulis

References

Christopher Jackson (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 1-33. doi:10.18637/jss.v070.i08

See Also

dgompertz

Examples

log_gompertz(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)

PDF and CDF of the log-Logistic distribution.

Description

The log-Logistic distribution as evaluated at the flexsurv package.

Usage

log_logLogistic(y, a1, a2, c_under = 1e-09)

Arguments

y

observed data

a1

shape parameter

a2

scale parameter

c_under

A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9.

Details

The log-logistic distribution is the probability distribution of a random variable whose logarithm has a logistic distribution.

Value

A list containing the following entries

log_f

natural logarithm of the pdf, evaluated at each datapoint.

log_F

natural logarithm of the CDF, evaluated at each datapoint.

Author(s)

Panagiotis Papastamoulis

References

Christopher Jackson (2016). flexsurv: A Platform for Parametric Survival Modeling in R. Journal of Statistical Software, 70(8), 1-33. doi:10.18637/jss.v070.i08

See Also

dllogis

Examples

log_logLogistic(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)

PDF and CDF of the Lomax distribution

Description

The Lomax distribution as evaluated at the VGAM package.

Usage

log_lomax(y, a1, a2, c_under = 1e-09)

Arguments

y

observed data

a1

scale parameter

a2

shape parameter

c_under

A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9.

Details

The Lomax distribution is a special case of the 4-parameter generalized beta II distribution.

Value

A list containing the following entries

log_f

natural logarithm of the pdf, evaluated at each datapoint.

log_F

natural logarithm of the CDF, evaluated at each datapoint.

Author(s)

Panagiotis Papastamoulis

References

Thomas W. Yee (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. New York, USA: Springer.

See Also

dlomax

Examples

log_lomax(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)

Define a finite mixture of a given family of distributions.

Description

This function computes the logarithm of the probability density function and cumulative density function per observation for each observation under a user-defined mixture of a given family of distributions. The parameters of the given family of distributions should belong to (0, inf).

Usage

log_user_mixture(user_f, y, a, p, c_under = 1e-09)

Arguments

user_f

a user defined function that returns the logarithm of a given probability density and the corresponding logarithm of the cumulative distribution function. These arguments should be returned in the form of a list with two entries: log_f and log_F, containing the logarithm of the pdf and cdf values of y, respectively, for a given set of parameter values.

y

observed data

a

a matrix where each column corresponds to component specific parameters and the columns to different components. All parameters should be positive. The number of columns should be the same with the number of mixture components.

p

vector of mixing proportions

c_under

threshold for underflows.

Value

A list containing the following entries

log_f

natural logarithm of the pdf, evaluated at each datapoint.

log_F

natural logarithm of the CDF, evaluated at each datapoint.

Author(s)

Panagiotis Papastamoulis

Examples

# We will define a mixture of 2 exponentials distributions.
# First we pass the exponential distribution at user_f
user_f <- function(y, a){
	log_f <- dexp(y, rate = a, log = TRUE)
	log_F <- pexp(y, rate = a, log.p = TRUE)
	result <- vector('list', length = 2)
	names(result) <- c('log_f', 'log_F')
	result[["log_f"]] = log_f
	result[["log_F"]] = log_F
	return(result)
}
#	simulate some date
y <- runif(10)
# Now compute the log of pdf and cdf for a mixture of K=2 exponentials
p <- c(0.9,0.1)
a <- matrix(c(0.1, 1.5), nrow = 1, ncol = 2)
log_user_mixture(user_f = user_f, y = y, a = a, p = p)

PDF and CDF of the Weibull distribution

Description

Computes the log pdf and cdf of the weibull distribution.

Usage

log_weibull(y, a1, a2, c_under)

Arguments

y

observed data

a1

shape parameter

a2

rate parameter

c_under

A small positive value corresponding to the underflow threshold, e.g. c_under = 1e-9.

Value

A list containing the following entries

log_f

natural logarithm of the pdf, evaluated at each datapoint.

log_F

natural logarithm of the CDF, evaluated at each datapoint.

Author(s)

Panagiotis Papastamoulis

Examples

log_weibull(y = 1:10, a1 = 1, a2 = 1, c_under = 1e-9)

Extract the log-likelihood.

Description

Method to extract the log-likelihood of a bayesCureModel object.

Usage

## S3 method for class 'bayesCureModel'
logLik(object, ...)

Arguments

object

An object of class bayesCureModel

...

ignored.

Value

The maximum (observed) log-likelihood value obtained across the MCMC run.

Author(s)

Panagiotis Papastamoulis

References

Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.

See Also

cure_rate_MC3

Examples

# simulate toy data just for cran-check purposes        
	set.seed(10)
        n = 4
        # censoring indicators
        stat = rbinom(n, size = 1, prob = 0.5)
        # covariates
        x <- matrix(rnorm(2*n), n, 2)
        # observed response variable 
        y <- rexp(n)
#	define a data frame with the response and the covariates        
        my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
# run a weibull model with default prior setup
# considering 2 heated chains 
	fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, 
		data = my_data_frame, 
		promotion_time = list(distribution = 'exponential'),
		nChains = 2, 
		nCores = 1, 
		mcmc_cycles = 3, sweep=2)
	ll <- logLik(fit1)

Marriage data

Description

The variable of interest (time) corresponds to the duration (in years) of first marriage for 1500 individuals. The available covariates are:

age

age of respondent (in years) at the time of first marriage. The values are standardized (sample mean and variance equal to 0 and 1, respectively).

kids

factor: whether there were kids during the first marriage ("yes") or not ("no").

race

race of respondent with levels: "black", "hispanic" and "other".

Among the 1500 observations, there are 1018 censoring times (censoring = 0) and 482 divorces (censoring = 1). Source: National Longitudinal Survey of Youth 1997 (NLSY97).

Usage

data(marriage_dataset)

Format

Time-to-event data.

References

Bureau of Labor Statistics, U.S. Department of Labor. National Longitudinal Survey of Youth 1997 cohort, 1997-2022 (rounds 1-20). Produced and distributed by the Center for Human Resource Research (CHRR), The Ohio State University. Columbus, OH: 2023.


Plot method

Description

Plots and computes HDIs.

Usage

## S3 method for class 'bayesCureModel'
plot(x, burn = NULL, alpha = 0.05, gamma_mix = TRUE, 
	K_gamma = 5, plot_graphs = TRUE, bw = "nrd0", what = NULL, predict_output = NULL,  
	index_of_main_mode = NULL, draw_legend = TRUE,...)

Arguments

x

An object of class bayesCureModel

burn

Number of iterations to discard as burn-in period.

alpha

A value between 0 and 1 in order to compute the 1-α\alpha Highest Posterior Density regions.

gamma_mix

Boolean. If TRUE, the density of the marginal posterior distribution of the γ\gamma parameter is estimated from the sampled MCMC values by fitting a normal mixture model.

K_gamma

Used only when gamma_mix = TRUE and corresponds to the number of normal mixture components used to estimate the marginal posterior density of the γ\gamma parameter.

plot_graphs

Boolean, if FALSE only HDIs are computed.

bw

bandwidth

what

Integer or a character string with possible values equal to 'cured_prob', 'survival' or 'residuals'. An integer entry indicates which parameter should be plotted. If set to NULL (default), all parameters are plotted one by one. If set to 'cured_prob' or 'survival' the estimated cured probability or survival function is plotted, conditional on a set of covariates defined in the p_cured_output argument. In case where what = 'residuals' the residuals of the fitted model are plotted versus the quantity log(S)-log(S) where SS denotes the estimated survival function arising from the Kaplan-Meier estimate based on the residuals and the censoring times.

predict_output

Optional argument which is required only when what = 'cured_prob' or what = 'survival'. It is returned by a call to the predict.bayesCureModel function.

index_of_main_mode

If NULL (default), the whole MCMC output is used for plotting. Otherwise, it is a subset of the retained MCMC iterations in order to identify the main mode of the posterior distribution, as returned by the index_of_main_mode value of the summary.bayesCureRateModel function.

draw_legend

Boolean. If TRUE (default), a legend is plotted in the case where what = 'survival' or what = 'cured_prob'.

...

arguments passed by other methods.

Value

The function plots graphic output on the plot device if plot_graphs = TRUE. Furthermore, a list of 100(1α)%100(1-\alpha)\% Highest Density Intervals per parameter is returned.

Author(s)

Panagiotis Papastamoulis

Examples

# simulate toy data just for cran-check purposes        
	set.seed(10)
        n = 4
        # censoring indicators
        stat = rbinom(n, size = 1, prob = 0.5)
        # covariates
        x <- matrix(rnorm(2*n), n, 2)
        # observed response variable 
        y <- rexp(n)
#	define a data frame with the response and the covariates        
        my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
# run a weibull model with default prior setup
# considering 2 heated chains 
	fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, 
		promotion_time = list(distribution = 'exponential'),
		nChains = 2, 
		nCores = 1, 
		mcmc_cycles = 3, sweep=2)
	mySummary <- summary(fit1, burn = 0)
	# plot the marginal posterior distribution of the first parameter in returned mcmc output
	plot(fit1, what = 1, burn = 0)

Plot method

Description

Plot the output of the predict method.

Usage

## S3 method for class 'predict_bayesCureModel'
plot(x, what = 'survival', draw_legend = TRUE,...)

Arguments

x

An object of class predict_bayesCureModel

what

Character with possible values: 'cured_prob' or 'survival', corresponding to the estimated cured probability or survival function.

draw_legend

Boolean. If TRUE (default), a legend is plotted in the case where what = 'survival' or what = 'cured_prob'.

...

arguments passed by other methods.

Value

No value, just a plot.

Author(s)

Panagiotis Papastamoulis

Examples

# simulate toy data just for cran-check purposes        
	set.seed(10)
        n = 4
        # censoring indicators
        stat = rbinom(n, size = 1, prob = 0.5)
        # covariates
        x <- matrix(rnorm(2*n), n, 2)
        # observed response variable 
        y <- rexp(n)
#	define a data frame with the response and the covariates        
        my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
# run a weibull model with default prior setup
# considering 2 heated chains 
	fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, 
		promotion_time = list(distribution = 'exponential'),
		nChains = 2, 
		nCores = 1, 
		mcmc_cycles = 3, sweep=2)
	#compute predictions for two individuals with 
	#	x1 = 0.2 and x2 = -1
	#	and 
	#	x1 = -1 and x2 = 0
	covariate_levels1 <- data.frame(x1 = c(0.2,-1), x2 = c(-1,0))
	predictions <- predict(fit1, newdata = covariate_levels1, burn = 0)
	# plot cured probabilities based on the previous output
	plot(predictions, what='cured_prob')

Predict method.

Description

Returns MAP estimates of the survival function and the conditional cured probability for a given set of covariates.

Usage

## S3 method for class 'bayesCureModel'
predict(object, newdata = NULL, tau_values = NULL, 
	burn = NULL, K_max = 1, alpha = 0.1, nDigits = 3, verbose = FALSE, ...)

Arguments

object

An object of class bayesCureModel

newdata

A data.frame with new data for the covariates. The column names as well as the class of each column (variable) should match with the input data.

tau_values

A vector of values for the response variable (time) for returning predictions for each row in the newdata.

burn

Positive integer corresponding to the number of mcmc iterations to discard as burn-in period

K_max

Maximum number of components in order to cluster the (univariate) values of the joint posterior distribution across the MCMC run. Used to identify the main mode of the posterior distribution.

alpha

Scalar between 0 and 1 corresponding to 1 - confidencel level for computing Highest Density Intervals. If set to NULL, the confidence intervals are not computed.

nDigits

A positive integer for printing the output, after rounding to the corresponding number of digits. Default: nDigits = 3.

verbose

Boolean. If set to TRUE (default) the function prints a summary of the predictions.

...

ignored.

Details

The values of the posterior draws are clustered according to a (univariate) normal mixture model, and the main mode corresponds to the cluster with the largest mean. The maximum number of mixture components corresponds to the K_max argument. The mclust library is used for this purpose. The inference for the latent cure status of each (censored) observation is based on the MCMC draws corresponding to the main mode of the posterior distribution. The FDR is controlled according to the technique proposed in Papastamoulis and Rattray (2018).

In case where covariate_levels is set to TRUE, the summary function also returns a list named p_cured_output with the following entries

mcmc

It is returned only in the case where the argument covariate_values is not NULL. A vector of posterior cured probabilities for the given values in covariate_values, per retained MCMC draw.

map

It is returned only in the case where the argument covariate_values is not NULL. The cured probabilities computed at the MAP estimate of the parameters, for the given values covariate_values.

tau_values

tau values

covariate_levels

covariate levels

index_of_main_mode

the subset of MCMC draws allocated to the main mode of the posterior distribution.

Value

A list with the following entries

map_estimate

Maximum A Posteriori (MAP) estimate of the parameters of the model.

highest_density_intervals

Highest Density Interval per parameter

latent_cured_status

Estimated posterior probabilities of the latent cure status per censored subject.

cured_at_given_FDR

Classification as cured or not, at given FDR level.

p_cured_output

It is returned only in the case where the argument covariate_values is not NULL. See details.

main_mode_index

The retained MCMC iterations which correspond to the main mode of the posterior distribution.

Author(s)

Panagiotis Papastamoulis

References

Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.

Papastamoulis and Rattray (2018). A Bayesian Model Selection Approach for Identifying Differentially Expressed Transcripts from RNA Sequencing Data, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 67, Issue 1.

Scrucca L, Fraley C, Murphy TB, Raftery AE (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC. ISBN 978-1032234953

See Also

cure_rate_MC3

Examples

# simulate toy data just for cran-check purposes        
	set.seed(10)
        n = 4
        # censoring indicators
        stat = rbinom(n, size = 1, prob = 0.5)
        # covariates
        x <- matrix(rnorm(2*n), n, 2)
        # observed response variable 
        y <- rexp(n)
#	define a data frame with the response and the covariates        
        my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
# run a weibull model with default prior setup
# considering 2 heated chains 
	fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, 
		promotion_time = list(distribution = 'exponential'),
		nChains = 2, 
		nCores = 1, 
		mcmc_cycles = 3, sweep=2)
	newdata <- data.frame(x1 = c(0.2,-1), x2 = c(-1,0))
	# return predicted values at tau = c(0.5, 1)
	my_prediction <- predict(fit1, newdata = newdata, 
		burn = 0, tau_values = c(0.5, 1))

Print method

Description

This function prints a summary of objects returned by the cure_rate_MC3 function.

Usage

## S3 method for class 'bayesCureModel'
print(x, ...)

Arguments

x

An object of class bayesCureModel, which is returned by the cure_rate_MC3 function.

...

ignored.

Details

The function prints some basic information for a cure_rate_MC3, such as the MAP estimate of model parameters and the value of Bayesian information criterion.

Value

No return value, called for side effects.

Author(s)

Panagiotis Papastamoulis


Print method for the predict object

Description

This function prints a summary of objects returned by the predict.cure_rate_MC3 method.

Usage

## S3 method for class 'predict_bayesCureModel'
print(x, ...)

Arguments

x

An object of class predict_bayesCureModel, which is returned by the predict.cure_rate_MC3 method.

...

ignored.

Details

The function prints some basic information for the predict method of a bayesCureModel object.

Value

No return value, called for side effects.

Author(s)

Panagiotis Papastamoulis


Print method for the summary

Description

This function prints a summary of objects returned by the summary.cure_rate_MC3 method.

Usage

## S3 method for class 'summary_bayesCureModel'
print(x, digits = 2, ...)

Arguments

x

An object of class summary_bayesCureModel, which is returned by the summary.cure_rate_MC3 method.

digits

Number of digits to print.

...

ignored.

Details

The function prints some basic information for the summary of a bayesCureModel object.

Value

No return value, called for side effects.

Author(s)

Panagiotis Papastamoulis


Computation of residuals.

Description

Methods for computing residuals for an object of class bayesCureModel. The Cox-Snell residuals are available for now.

Usage

## S3 method for class 'bayesCureModel'
residuals(object, type = "cox-snell",...)

Arguments

object

An object of class bayesCureModel

type

The type of residuals to be computed.

...

ignored.

Value

A vector of residuals.

Author(s)

Panagiotis Papastamoulis

References

Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.

See Also

cure_rate_MC3

Examples

# simulate toy data just for cran-check purposes        
	set.seed(10)
        n = 4
        # censoring indicators
        stat = rbinom(n, size = 1, prob = 0.5)
        # covariates
        x <- matrix(rnorm(2*n), n, 2)
        # observed response variable 
        y <- rexp(n)
#	define a data frame with the response and the covariates        
        my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
# run a weibull model with default prior setup
# considering 2 heated chains 
	fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, 
		data = my_data_frame, 
		promotion_time = list(distribution = 'exponential'),
		nChains = 2, 
		nCores = 1, 
		mcmc_cycles = 3, sweep=2)
	my_residuals <- residuals(fit1)

Simulated dataset

Description

A synthetic dataset generated from a bimodal promotion time distribution. The available covariates are:

x1

continuous.

x2

factor with three levels.

Among the 500 observations, there are 123 censoring times (censoring = 0) and 377 "events" (censoring = 1). The true status (cured or susceptible) is contained in the column true_status and contains 59 cured and 441 susceptible subjects.

Usage

data(sim_mix_data)

Format

Time-to-event data.


Summary method.

Description

This function produces all summaries after fitting a cure rate model.

Usage

## S3 method for class 'bayesCureModel'
summary(object, burn = NULL, gamma_mix = TRUE, 
	K_gamma = 3,  K_max = 3, fdr = 0.1, covariate_levels = NULL, 
	yRange = NULL, alpha = 0.1, quantiles = c(0.05, 0.5, 0.95), 
	verbose = FALSE, ...)

Arguments

object

An object of class bayesCureModel

burn

Positive integer corresponding to the number of mcmc iterations to discard as burn-in period

gamma_mix

Boolean. If TRUE, the density of the marginal posterior distribution of the γ\gamma parameter is estimated from the sampled MCMC values by fitting a normal mixture model.

K_gamma

Used only when gamma_mix = TRUE and corresponds to the number of normal mixture components used to estimate the marginal posterior density of the γ\gamma parameter.

K_max

Maximum number of components in order to cluster the (univariate) values of the joint posterior distribution across the MCMC run. Used to identify the main mode of the posterior distribution. See details.

fdr

The target value for controlling the False Discovery Rate when classifying subjects as cured or not.

covariate_levels

Optional data.frame with new data for the covariates. It is only required when the user wishes to obtain a vector with the estimated posterior cured probabilities for a given combination of covariates. The column names should be exactly the same with the ones used in the input data.

yRange

Optional range (a vector of two non-negative values) for computing the sequence of posterior probabilities for the given values in covariate_levels.

alpha

Scalar between 0 and 1 corresponding to 1 - confidencel level for computing Highest Density Intervals. If set to NULL, the confidence intervals are not computed.

quantiles

A vector of quantiles to evaluate for each variable.

verbose

Boolean: if TRUE the function prints the summary.

...

ignored.

Details

The values of the posterior draws are clustered according to a (univariate) normal mixture model, and the main mode corresponds to the cluster with the largest mean. The maximum number of mixture components corresponds to the K_max argument. The mclust library is used for this purpose. The inference for the latent cure status of each (censored) observation is based on the MCMC draws corresponding to the main mode of the posterior distribution. The FDR is controlled according to the technique proposed in Papastamoulis and Rattray (2018).

In case where covariate_levels is set to TRUE, the summary function also returns a list named p_cured_output with the following entries

mcmc

It is returned only in the case where the argument covariate_values is not NULL. A vector of posterior cured probabilities for the given values in covariate_values, per retained MCMC draw.

map

It is returned only in the case where the argument covariate_values is not NULL. The cured probabilities computed at the MAP estimate of the parameters, for the given values covariate_values.

tau_values

tau values

covariate_levels

covariate levels

index_of_main_mode

the subset of MCMC draws allocated to the main mode of the posterior distribution.

Value

A list with the following entries

map_estimate

Maximum A Posteriori (MAP) estimate of the parameters of the model.

highest_density_intervals

Highest Density Interval per parameter

latent_cured_status

Estimated posterior probabilities of the latent cure status per censored subject.

cured_at_given_FDR

Classification as cured or not, at given FDR level.

p_cured_output

It is returned only in the case where the argument covariate_values is not NULL. See details.

main_mode_index

The retained MCMC iterations which correspond to the main mode of the posterior distribution.

Author(s)

Panagiotis Papastamoulis

References

Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.

Papastamoulis and Rattray (2018). A Bayesian Model Selection Approach for Identifying Differentially Expressed Transcripts from RNA Sequencing Data, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 67, Issue 1.

Scrucca L, Fraley C, Murphy TB, Raftery AE (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC. ISBN 978-1032234953

See Also

cure_rate_MC3

Examples

# simulate toy data just for cran-check purposes        
	set.seed(10)
        n = 4
        # censoring indicators
        stat = rbinom(n, size = 1, prob = 0.5)
        # covariates
        x <- matrix(rnorm(2*n), n, 2)
        # observed response variable 
        y <- rexp(n)
#	define a data frame with the response and the covariates        
        my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
# run a weibull model with default prior setup
# considering 2 heated chains 
	fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, 
		data = my_data_frame, 
		promotion_time = list(distribution = 'exponential'),
		nChains = 2, 
		nCores = 1, 
		mcmc_cycles = 3, sweep=2)
	mySummary <- summary(fit1, burn = 0)

Summary method for predictions.

Description

This function produces MCMC summaries for an object of class predict_bayesCureModel.

Usage

## S3 method for class 'predict_bayesCureModel'
summary(object, ...)

Arguments

object

An object of class predict_bayesCureModel.

...

Other options passed to the summary.mcmc method of the coda package.

Value

A list with the following entries

survival

MCMC summaries (quantiles) for the survival function.

cured_probability

MCMC summaries (quantiles) for the conditional cured probability.

cumulative_hazard

MCMC summaries (quantiles) for the cumulative hazard function.

hazard_rate

MCMC summaries (quantiles) for the hazard rate function.

Author(s)

Panagiotis Papastamoulis

References

Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.

See Also

cure_rate_MC3

Examples

# simulate toy data just for cran-check purposes        
	set.seed(10)
	n = 4
	# censoring indicators
	stat = rbinom(n, size = 1, prob = 0.5)
	# covariates
	x <- matrix(rnorm(2*n), n, 2)
	# observed response variable 
	y <- rexp(n)
#       define a data frame with the response and the covariates        
	my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
# run a weibull model with default prior setup
# considering 2 heated chains 
	fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2, data = my_data_frame, 
	     promotion_time = list(distribution = 'exponential'),
	     nChains = 2, 
	     nCores = 1, 
	     mcmc_cycles = 3, sweep=2)
	newdata <- data.frame(x1 = c(0.2,-1), x2 = c(-1,0))
	# return predicted values at tau = c(0.5, 1)
	my_prediction <- predict(fit1, newdata = newdata, 
	     burn = 0, tau_values = c(0.5, 1))
	my_summary <- summary(my_prediction, quantiles = c(0.1,0.9))