Title: | Bayesian Mixture Log-Normal Survival Model |
---|---|
Description: | Bayesian Survival models via the mixture of Log-Normal distribution extends the well-known survival models and accommodates different behaviour over time and considers higher censored survival times. The proposal combines mixture distributions Fruhwirth-Schnatter(2006) <doi:10.1007/s11336-009-9121-4>, and data augmentation techniques Tanner and Wong (1987) <doi:10.1080/01621459.1987.10478458>. |
Authors: | Viviana das Graças Ribeiro Lobo [aut], Thaís Cristina Oliveira da Fonseca [aut], Mariane Branco Alves [aut], Vitor Capdeville [aut], Victor Hugo Soares Ney [cre] |
Maintainer: | Victor Hugo Soares Ney <[email protected]> |
License: | MIT + file LICENSE |
Version: | 3.1.6 |
Built: | 2025-01-02 06:38:33 UTC |
Source: | CRAN |
Include information about hazard and survival distribution for each individual in a dataset.
## S3 method for class 'survival_ln_mixture' augment(x, newdata, eval_time, ...)
## S3 method for class 'survival_ln_mixture' augment(x, newdata, eval_time, ...)
x |
A |
newdata |
A |
eval_time |
a vector with the times where the hazard and survival distribuition will be evaluated. |
... |
Not used. |
A tibble::tibble()
with the original covariates and ther survvial and hazard
distributions.
Include information about hazard and survival distribution for each individual in a dataset.
## S3 method for class 'survival_ln_mixture_em' augment(x, newdata, eval_time, ...)
## S3 method for class 'survival_ln_mixture_em' augment(x, newdata, eval_time, ...)
x |
A |
newdata |
A |
eval_time |
a vector with the times where the hazard and survival distribuition will be evaluated. |
... |
Not used. |
A tibble::tibble()
with the original covariates and ther survvial and hazard
distributions.
join_empirical_hazard()
takes a Kaplan Meier empirical estimate which includes the survival estimates and joins the hazard estimates to it.
join_empirical_hazard(km)
join_empirical_hazard(km)
km |
Kaplan-Meier estimates, i.e., object generated after running broom::tidy(survfit_obj), in which survfit_obj is a survfit object. Can also be a survfit object. |
The same object as inputed, but with the hazard estimates column (hazard_estimate) joined to it.
survival_ln_mixture
fit.Extract the number of observations used in a survival_ln_mixture
fit.
## S3 method for class 'survival_ln_mixture' nobs(object, ...)
## S3 method for class 'survival_ln_mixture' nobs(object, ...)
object |
A fitted |
... |
Not used. |
A single integer.
plot_fit_on_data()
estimates survival/hazard for the data the model was fitted on and plots the results.
plot_fit_on_data( model, data, type = "survival", interval = "none", level = 0.95 )
plot_fit_on_data( model, data, type = "survival", interval = "none", level = 0.95 )
model |
A |
data |
A |
type |
A character string specifying the type of plot. The default is "survival", but can be "hazard". |
interval |
A character string specifying the type of interval to be plotted. The default is "none", but can be "credible". The EM algorithm does not provide confidence intervals and this parameter is only support for the Bayesian version ( |
level |
A numeric value between 0 and 1 specifying the level of the confidence interval. The default is 0.95. |
A list with two objects, one ggplot ($ggplot
) with the predictions plotted against the empirical data and a tibble with the predictions ($preds
).
Visualizes the path of the EM algorithm
## S3 method for class 'survival_ln_mixture_em' plot(x, ...)
## S3 method for class 'survival_ln_mixture_em' plot(x, ...)
x |
A fitted |
... |
Not used. |
A ggplot
object (or plotly
, if the package is avaiable) of the EM algorithm iterations.
Predict from a Lognormal Mixture Model
## S3 method for class 'survival_ln_mixture' predict( object, new_data, type, eval_time, interval = "none", level = 0.95, ... )
## S3 method for class 'survival_ln_mixture' predict( object, new_data, type, eval_time, interval = "none", level = 0.95, ... )
object |
A |
new_data |
A data frame or matrix of new predictors. |
type |
A single character. The type of predictions to generate. Valid options are:
|
eval_time |
For type = "hazard" or type = "survival", the times for the distribution. |
interval |
should interval estimates be added? Options are "none" and "credible". |
level |
the tail area of the intervals. Default value is 0.95. |
... |
Not used, but required for extensibility. |
A tibble of predictions. The number of rows in the tibble is guaranteed
to be the same as the number of rows in new_data
.
Categorical predictors must be converted to factors before the fit, otherwise the predictions will fail.
# Categorical variables must be converted to factor before the fit. require(survival) # Wrong way of doing set.seed(1) mod <- survival_ln_mixture(Surv(time, status == 2) ~ factor(sex), lung, intercept = TRUE) ## Not run: # this piece of code will throw error predict(mod, data.frame(sex = 1), type = "survival", eval_time = 100) ## End(Not run) # Correct way lung$sex <- factor(lung$sex) # converting to factor before set.seed(1) mod2 <- survival_ln_mixture(Surv(time, status == 2) ~ sex, lung, intercept = TRUE) # Note: the categorical predictors must be a character. predict(mod2, data.frame(sex = "1"), type = "survival", eval_time = 100)
# Categorical variables must be converted to factor before the fit. require(survival) # Wrong way of doing set.seed(1) mod <- survival_ln_mixture(Surv(time, status == 2) ~ factor(sex), lung, intercept = TRUE) ## Not run: # this piece of code will throw error predict(mod, data.frame(sex = 1), type = "survival", eval_time = 100) ## End(Not run) # Correct way lung$sex <- factor(lung$sex) # converting to factor before set.seed(1) mod2 <- survival_ln_mixture(Surv(time, status == 2) ~ sex, lung, intercept = TRUE) # Note: the categorical predictors must be a character. predict(mod2, data.frame(sex = "1"), type = "survival", eval_time = 100)
Predict from a lognormal_em Mixture Model fitted using EM algorithm.
## S3 method for class 'survival_ln_mixture_em' predict(object, new_data, type, eval_time, ...)
## S3 method for class 'survival_ln_mixture_em' predict(object, new_data, type, eval_time, ...)
object |
A |
new_data |
A data frame or matrix of new predictors. |
type |
A single character. The type of predictions to generate. Valid options are:
|
eval_time |
For type = "hazard" or type = "survival", the times for the distribution. |
... |
Not used, but required for extensibility. |
A tibble of predictions. The number of rows in the tibble is guaranteed
to be the same as the number of rows in new_data
.
Categorical predictors must be converted to factors before the fit, otherwise the predictions will fail.
A simulated dataset with 10000 observations from a lognormal mixutre model with 2 componentes.
sim_data
sim_data
sim_data
A list with two componentes:
$data: A data frame with 10,000 rows and 3 columns:
observed survival time
event indicator. 1 == event, 0 == censored.
binary covariate
$true_vals: A named vector with the true values used to generate the data.
simulate_data()
simulates data from a mixture model.
simulate_data( n = 4000, mixture_components = 2, k = 2, percentage_censored = 0.4, starting_seed = sample(1:2^28, 1) )
simulate_data( n = 4000, mixture_components = 2, k = 2, percentage_censored = 0.4, starting_seed = sample(1:2^28, 1) )
n |
Number of observations desired. |
mixture_components |
Number of mixtures to include in the generation of the data. |
k |
number of covariates generated (the total of covariates will be intercept + (k - 1) covariates). |
percentage_censored |
Percentage of censored observations (defined as decimal value between 0 and 1). This will generate a delta vector in which 1 is an event that ocurred and 0 is a censored observation. |
starting_seed |
Seed to start the random number generation. |
A list with two elements: data
and real_values
. The data
element is a tibble with the simulated data. The real_values
is a tibble with the real values of the parameters used to generate the data.
survival_ln_mixture()
fits a Bayesian lognormal mixture model with Gibbs sampling (optional EM algorithm to find local maximum at the likelihood function), as described in LOBO, Viviana GR; FONSECA, Thaís CO; ALVES, Mariane B. Lapse risk modeling in insurance: a Bayesian mixture approach. Annals of Actuarial Science, v. 18, n. 1, p. 126-151, 2024.
survival_ln_mixture( formula, data, intercept = TRUE, iter = 1000, warmup = floor(iter/10), thin = 1, chains = 1, cores = 1, mixture_components = 2, show_progress = FALSE, em_iter = 0, starting_seed = sample(1:2^28, 1), use_W = FALSE, number_em_search = 200, iteration_em_search = 1, fast_groups = TRUE, ... ) ## Default S3 method: survival_ln_mixture(formula, ...) ## S3 method for class 'formula' survival_ln_mixture(formula, data, intercept = TRUE, ...)
survival_ln_mixture( formula, data, intercept = TRUE, iter = 1000, warmup = floor(iter/10), thin = 1, chains = 1, cores = 1, mixture_components = 2, show_progress = FALSE, em_iter = 0, starting_seed = sample(1:2^28, 1), use_W = FALSE, number_em_search = 200, iteration_em_search = 1, fast_groups = TRUE, ... ) ## Default S3 method: survival_ln_mixture(formula, ...) ## S3 method for class 'formula' survival_ln_mixture(formula, data, intercept = TRUE, ...)
formula |
A formula specifying the outcome terms on the left-hand side, and the predictor terms on the right-hand side. The outcome must be a survival::Surv object. |
data |
A data frame containing both the predictors and the outcome. |
intercept |
A logical. Should an intercept be included in the processed data? |
iter |
A positive integer specifying the number of iterations for each chain (including warmup). |
warmup |
A positive integer specifying the number of warmup (aka burnin) iterations per chain. The number of warmup iterations should be smaller than iter. |
thin |
A positive integer specifying the period for saving samples. |
chains |
A positive integer specifying the number of Markov chains. |
cores |
A positive integer specifying the maximum number of cores to run the chains. Setting this to a value bigger than 1 will automatically trigger the parallel mode |
mixture_components |
number of mixture componentes >= 2. |
show_progress |
Indicates if the code shows the progress of the EM algorithm and the Gibbs Sampler. |
em_iter |
A positive integer specifying the number of iterations for the EM algorithm. The EM algorithm is performed before the Gibbs sampler to find better initial values for the chains. On simulations, values lower than 200 seems to work nice. |
starting_seed |
Starting seed for the sampler. If not specified by the user, uses a random integer between 1 and 2^28 This way we ensure, when the user sets a seed in R, that this is passed into the C++ code. |
use_W |
Specifies is the W (groups weight's matrix for each observation) should be used from EM. It holds W constant through the code, resulting in a faster Bayesian Inference (close to what Empirical Bayes would do). It may helps generating credible intervals for the survival and hazard curves, using the information from the previous EM iteration. Make sure the EM have converged before setting this parameter to true. In doubt, leave this as FALSE, the default. |
number_em_search |
Number of different EM's to search for maximum likelihoods. Recommended to leave, at least, at 100. This value can be set to 0 to disable the search for maximum likelihood initial values. |
iteration_em_search |
Number of iterations for each of the EM's used to find the maximum likelihoods. Recommended to leave at small values, such as from 1 to 5. |
fast_groups |
Use fast computation of groups allocations probabilities, defaults to TRUE. Setting it to FALSE can increase the computation time (a lot) but it's worth trying if the chains are not converging. |
... |
Not currently used, but required for extensibility. |
A survival_ln_mixture
object, which is a list with the following componentes:
posterior |
A posterior::draws_matrix with the posterior of the parameters of the model. |
nobs |
A integer holding the number of observations used to generate the fit. |
blueprint |
The blueprint component of the output of hardhat::mold |
Categorical predictors must be converted to factors before the fit, otherwise the predictions will fail.
# Formula interface library(survival) set.seed(1) mod <- survival_ln_mixture(Surv(time, status == 2) ~ NULL, lung, intercept = TRUE)
# Formula interface library(survival) set.seed(1) mod <- survival_ln_mixture(Surv(time, status == 2) ~ NULL, lung, intercept = TRUE)
survival_ln_mixture_em()
fits an EM algorithm, as described in LOBO, Viviana GR; FONSECA, Thaís CO; ALVES, Mariane B. Lapse risk modeling in insurance: a Bayesian mixture approach. Annals of Actuarial Science, v. 18, n. 1, p. 126-151, 2024, for modelling mixtures of lognormal distributions applied to survival data.
survival_ln_mixture_em( formula, data, intercept = TRUE, iter = 50, mixture_components = 2, starting_seed = sample(1:2^28, 1), number_em_search = 200, iteration_em_search = 1, show_progress = FALSE, ... ) ## Default S3 method: survival_ln_mixture_em(formula, ...) ## S3 method for class 'formula' survival_ln_mixture_em(formula, data, intercept = TRUE, ...)
survival_ln_mixture_em( formula, data, intercept = TRUE, iter = 50, mixture_components = 2, starting_seed = sample(1:2^28, 1), number_em_search = 200, iteration_em_search = 1, show_progress = FALSE, ... ) ## Default S3 method: survival_ln_mixture_em(formula, ...) ## S3 method for class 'formula' survival_ln_mixture_em(formula, data, intercept = TRUE, ...)
formula |
A formula specifying the outcome terms on the left-hand side, and the predictor terms on the right-hand side. The outcome must be a survival::Surv object. |
data |
A data frame containing both the predictors and the outcome. |
intercept |
A logical. Should an intercept be included in the processed data? |
iter |
A positive integer specifying the number of iterations for the EM algorithm. |
mixture_components |
number of mixture componentes >= 2. |
starting_seed |
Starting seed for the algorithm. If not specified by the user, uses a random integer between 1 and 2^28 This way we ensure, when the user sets a seed in R, that this is passed into the C++ code. |
number_em_search |
Number of different EM's to search for maximum likelihoods. Recommended to leave, at least, at 100. |
iteration_em_search |
Number of iterations for each of the EM's used to find the maximum likelihoods. Recommended to leave at small values, such as from 1 to 5. |
show_progress |
A logical. Should the progress of the EM algorithm be shown? |
... |
Not currently used, but required for extensibility. |
An object of class survival_ln_mixture_em
containing the following elements:
em_iterations
: A data frame containing the EM iterations.
nobs
: The number of observations.
predictors_name
: The names of the predictors.
logLik
: The log-likelihood of the model.
mixture_groups
: The number of mixture groups.
blueprint
: The blueprint used to process the formula
These method tidy the estimates from survival_ln_mixture
fits into a summary.
## S3 method for class 'survival_ln_mixture' tidy( x, effects = "fixed", conf.int = FALSE, conf.level = 0.9, digits = NULL, ... )
## S3 method for class 'survival_ln_mixture' tidy( x, effects = "fixed", conf.int = FALSE, conf.level = 0.9, digits = NULL, ... )
x |
Fitted model object (survival_ln_mixture). |
effects |
A character vector including one or more of |
conf.int |
If |
conf.level |
A number between 0 and 1 indicating the desired probability mass to include in the
intervals. Only used if |
digits |
How many significant digits should be displayed? |
... |
Not used. |
A data.frame
without rownames. When effects="fixed"
(the default), tidy.survival_ln_mixutre
returns one row for each coefficient for each component of the mixture with three columns:
term |
The name of the corresponding term in the model. |
estimate |
A point estimate of the coefficient (posterior median). |
std.error |
A standard error for the point estimate based on
|
Setting effects="auxiliary"
will select the precision and proportion of mixture components parameters.
require(survival) lung$sex <- factor(lung$sex) set.seed(1) mod2 <- survival_ln_mixture(Surv(time, status == 2) ~ sex, lung) tidy(mod2) tidy(mod2, conf.int = TRUE) tidy(mod2, effects = c("fixed", "auxiliary"), conf.int = TRUE)
require(survival) lung$sex <- factor(lung$sex) set.seed(1) mod2 <- survival_ln_mixture(Surv(time, status == 2) ~ sex, lung) tidy(mod2) tidy(mod2, conf.int = TRUE) tidy(mod2, effects = c("fixed", "auxiliary"), conf.int = TRUE)
These method tidy the estimates from survival_ln_mixture
fits into a short summary. It doesn't contain uncertainty estimates since it's a likelihood maximization algorithm.
## S3 method for class 'survival_ln_mixture_em' tidy(x, effects = "fixed", digits = NULL, ...)
## S3 method for class 'survival_ln_mixture_em' tidy(x, effects = "fixed", digits = NULL, ...)
x |
Fitted model object (survival_ln_mixture_em). |
effects |
A character vector including one or more of |
digits |
How many significant digits should be displayed? |
... |
Not used. |
A data.frame
without rownames. When effects="fixed"
(the default), tidy.survival_ln_mixutre
returns one row for each coefficient for each component of the mixture with two columns:
term |
The name of the corresponding term in the model. |
estimate |
A point estimate of the coefficient (last iteration value). |
Setting effects="auxiliary"
will select the precision and proportion of mixture components parameters.
require(survival) lung$sex <- factor(lung$sex) set.seed(1) mod2 <- survival_ln_mixture_em(Surv(time, status == 2) ~ sex, lung) tidy(mod2) tidy(mod2, effects = c("fixed", "auxiliary"))
require(survival) lung$sex <- factor(lung$sex) set.seed(1) mod2 <- survival_ln_mixture_em(Surv(time, status == 2) ~ sex, lung) tidy(mod2) tidy(mod2, effects = c("fixed", "auxiliary"))