Package 'rsurv'

Title: Random Generation of Survival Data
Description: Random generation of survival data from a wide range of regression models, including accelerated failure time (AFT), proportional hazards (PH), proportional odds (PO), accelerated hazard (AH), Yang and Prentice (YP), and extended hazard (EH) models. The package 'rsurv' also stands out by its ability to generate survival data from an unlimited number of baseline distributions provided that an implementation of the quantile function of the chosen baseline distribution is available in R. Another nice feature of the package 'rsurv' lies in the fact that linear predictors are specified via a formula-based approach, facilitating the inclusion of categorical variables and interaction terms. The functions implemented in the package 'rsurv' can also be employed to simulate survival data with more complex structures, such as survival data with different types of censoring mechanisms, survival data with cure fraction, survival data with random effects (frailties), multivariate survival data, and competing risks survival data. Details about the R package 'rsurv' can be found in Demarqui (2024) <doi:10.48550/arXiv.2406.01750>.
Authors: Fabio Demarqui [aut, cre, cph]
Maintainer: Fabio Demarqui <[email protected]>
License: GPL (>= 3)
Version: 0.0.2
Built: 2024-10-26 03:40:03 UTC
Source: CRAN

Help Index


The 'rsurv' package

Description

Random generation of survival data based on different survival regression models available in the literature, including Accelerated Failure Time (AFT) model, Proportional Hazard (PH) model, Proportional Odds (PO) model and the Yang & Prentice (YP) model.

_PACKAGE

References

Demarqui FN, Mayrink VD (2021). “Yang and Prentice model with piecewise exponential baseline distribution for modeling lifetime data with crossing survival curves.” Brazilian Journal of Probability and Statistics, 35(1), 172 – 186. doi:10.1214/20-BJPS471.

Yang S, Prentice RL (2005). “Semiparametric analysis of short-term and long-term hazard ratios with two-sample survival data.” Biometrika, 92(1), 1-17.


Implemented link functions for the mixture cure rate model

Description

This function is used to specify different link functions for the count component of the mixture cure rate model.

Usage

bernoulli(link = "logit")

Arguments

link

desired link function; currently implemented links are: logit, probit, cloglog and cauchy.

Value

A list containing the codes associated with the count distribution assumed for the latent variable N and the chosen link.


Inverse of the probability generating function

Description

This function is used to specify different link functions for the count component of the promotion time cure rate model

Usage

inv_pgf(formula, incidence = "bernoulli", kappa = NULL, zeta = NULL, data, ...)

Arguments

formula

formula specifying the linear predictor for the incidence sub-model.

incidence

the desired incidence model.

kappa

vector of regression coefficients associated with the incidence sub-model.

zeta

extra negative-binomial parameter.

data

a data.frame containing the explanatory covariates passed to the formula.

...

further arguments passed to other methods.

Value

A vector with the values of the inverse of the desired probability generating function.


Linear predictors

Description

Function to construct linear predictors.

Usage

lp(formula, coefs, data, ...)

Arguments

formula

formula specifying the linear predictors.

coefs

vector of regression coefficients.

data

data frame containing the covariates used to construct the linear predictors.

...

further arguments passed to other methods.

Value

a vector containing the linear predictors.

Examples

library(rsurv)
library(dplyr)

n <- 100
coefs <- c(1, 0.7, 2.3)

simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("male", "female"), size = n, replace = TRUE)
) |>
  mutate(
    lp = lp(~age+sex, coefs)
  )
glimpse(simdata)

Implemented link functions for the promotion time cure rate model with negative binomial distribution

Description

This function is used to specify different link functions for the count component of the promotion time cure rate model.

Usage

negbin(zeta = stop("'theta' must be specified"), link = "log")

Arguments

zeta

The known value of the additional parameter.

link

desired link function; currently implemented links are: log, identity and sqrt.

Value

A list containing the codes associated with the count distribution assumed for the latent variable N and the chosen link.


Generic quantile function

Description

Generic quantile function used internally to simulating from an arbitrary baseline survival distribution.

Usage

qsurv(p, baseline, package = NULL, ...)

Arguments

p

vector of quantiles associated with the right tail area of the baseline survival distribution.

baseline

the name of the baseline distribution.

package

the name of the package where the baseline distribution is implemented. It ensures that the right quantile function from the right package is found, regardless of the current R search path.

...

further arguments passed to other methods.

Value

a vector of quantiles.

Examples

library(rsurv)
set.seed(1234567890)


u <- sort(runif(5))
x1 <- qexp(u, rate = 1, lower.tail = FALSE)
x2 <- qsurv(u, baseline = "exp", rate = 1)
x3 <- qsurv(u, baseline = "exp", rate = 1, package = "stats")
x4 <- qsurv(u, baseline = "gengamma.orig", shape=1, scale=1, k=1, package = "flexsurv")

cbind(x1, x2, x3, x4)

Random generation from accelerated failure time models

Description

Function to generate a random sample of survival data from accelerated failure time models.

Usage

raftreg(u, formula, baseline, beta, dist = NULL, package = NULL, data, ...)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = raftreg(runif(n), ~ age+sex, beta = c(1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

Random generation from accelerated hazard models

Description

Function to generate a random sample of survival data from accelerated hazard models.

Usage

rahreg(u, formula, baseline, beta, dist = NULL, package = NULL, data, ...)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rahreg(runif(n), ~ age+sex, beta = c(1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

Random generation from extended hazard models

Description

Function to generate a random sample of survival data from extended hazard models.

Usage

rehreg(u, formula, baseline, beta, phi, dist = NULL, package = NULL, data, ...)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

phi

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rehreg(runif(n), ~ age+sex, beta = c(1, 2), phi = c(-1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

Frailties random generation

Description

The frailty function for adding a simple random effects term to the linear predictor of a given survival regression model.

Usage

rfrailty(
  cluster,
  frailty = c("gamma", "gaussian", "ps"),
  sigma = 1,
  alpha = NULL,
  ...
)

Arguments

cluster

a vector determining the grouping of subjects (always converted to a factor object internally.

frailty

the frailty distribution; current implementation includes the gamma (default), lognormal and positive stable (ps) distributions.

sigma

standard deviation assumed for the frailty distribution; sigma = 1 by default; this value is ignored for positive stable (ps) distribution.

alpha

stability parameter of the positive stable distribution; alpha must lie in (0,1) interval and an NA is return otherwise.

...

further arguments passed to other methods.

Value

a vector with the generated frailties.


Random generation of type I and type II interval censored survival data

Description

Function to generate a random sample of type I and type II interval censored survival data.

Usage

rinterval(time, tau, type = c("I", "II"), prob)

Arguments

time

a numeric vector of survival times.

tau

either a vector of censoring times (for type I interval-censored survival data) or time grid of scheduled visits (for type II interval censored survival data).

type

type of interval-censored survival data (I or II).

prob

= 0.5 attendance probability of scheduled visit; ignored when type = I.

Value

a data.frame containing the generated random sample.


Random generation from proportional hazards models

Description

Function to generate a random sample of survival data from proportional hazards models.

Usage

rphreg(u, formula, baseline, beta, dist = NULL, package = NULL, data, ...)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rphreg(runif(n), ~ age+sex, beta = c(1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

Random generation from proportional odds models

Description

Function to generate a random sample of survival data from proportional odds models.

Usage

rporeg(u, formula, baseline, beta, dist = NULL, package = NULL, data, ...)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rporeg(runif(n), ~ age+sex, beta = c(1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)

Random generation from Yang and Prentice models

Description

Function to generate a random sample of survival data from Yang and Prentice models.

Usage

rypreg(u, formula, baseline, beta, phi, dist = NULL, package = NULL, data, ...)

Arguments

u

a numeric vector of quantiles.

formula

formula specifying the linear predictors.

baseline

the name of the baseline survival distribution.

beta

vector of short-term regression coefficients.

phi

vector of long-term regression coefficients.

dist

an alternative way to specify the baseline survival distribution.

package

the name of the package where the assumed quantile function is implemented.

data

data frame containing the covariates used to generate the survival times.

...

further arguments passed to other methods.

Value

a numeric vector containing the generated random sample.

Examples

library(rsurv)
library(dplyr)
n <-  1000
simdata <- data.frame(
  age = rnorm(n),
  sex = sample(c("f", "m"), size = n, replace = TRUE)
) %>%
  mutate(
    t = rypreg(runif(n), ~ age+sex, beta = c(1, 2), phi = c(-1, 2),
                dist = "weibull", shape = 1.5, scale = 1),
    c = runif(n, 0, 10)
  ) %>%
  rowwise() %>%
  mutate(
    time = min(t, c),
    status = as.numeric(time == t)
  )
glimpse(simdata)