Title: | A Pseudo-Observations Approach for Analyzing Survival Data with a Cure Fraction |
---|---|
Description: | A collection of easy-to-use tools for regression analysis of survival data with a cure fraction proposed in Su et al. (2022) <doi:10.1177/09622802221108579>. The modeling framework is based on the Cox proportional hazards mixture cure model and the bounded cumulative hazard (promotion time cure) model. The pseudo-observations approach is utilized to assess covariate effects and embedded in the variable selection procedure. |
Authors: | Sy Han (Steven) Chiou [aut, cre], Chien-Lin Su [aut], Feng-Chang Lin [aut] |
Maintainer: | Sy Han (Steven) Chiou <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2025-02-06 15:25:36 UTC |
Source: | CRAN |
A collection of easy-to-use tools for regression analysis of survival data with a cure fraction. The modeling framework is based on the Cox proportional hazards mixture cure model and the bounded cumulative hazard model. The pseudo-observations approach is utilized to assess covariate effects and embedded in the variable selection procedure.
Maintainer: Sy Han (Steven) Chiou [email protected]
Authors:
Chien-Lin Su [email protected]
Feng-Chang Lin [email protected]
Fits a generalized estimating equation (GEE) model with
Gaussian family with different link functions.
The geelm
function also supports LASSO or SCAD
regularization.
geelm( formula, data, subset, id, link = c("identity", "log", "cloglog", "logit"), corstr = c("independence", "exchangeable", "ar1"), lambda, exclude, penalty = c("lasso", "scad"), nfolds = 5, nlambda = 200, binit, tol = 1e-07, maxit = 100 )
geelm( formula, data, subset, id, link = c("identity", "log", "cloglog", "logit"), corstr = c("independence", "exchangeable", "ar1"), lambda, exclude, penalty = c("lasso", "scad"), nfolds = 5, nlambda = 200, binit, tol = 1e-07, maxit = 100 )
formula |
A formula object starting with |
data |
An optional data frame that contains the covariates and response variables. |
subset |
An optional logical vector specifying a subset of observations to be used in the fitting process. |
id |
A vector which identifies the clusters. If not specified, each observation is treated as its own cluster. |
link |
A character string specifying the model link function. Available options are
|
corstr |
A character string specifying the correlation structure.
Available options are |
lambda |
An option for specifying the tuning parameter used in penalization.
When this is unspecified or has a |
exclude |
A binary numerical vector specifying which variables to exclude in variable selection.
The length of |
penalty |
A character string specifying the penalty function.
The available options are |
nfolds |
An optional integer value specifying the number of folds. The default value is 5. |
nlambda |
An optional integer value specifying the number of tuning parameters to try
if |
binit |
A optional numerical vector for the initial value. A zero vector is used when not specified. |
tol |
A positive numerical value specifying the absolute
error tolerance in root search. Default at |
maxit |
A positive integer specifying the maximum number of iteration. Default at 100. |
An object of class "geelm"
representing a linear model fit with GEE.
gendat <- function() { id <- gl(50, 4, 200) visit <- rep(1:4, 50) x1 <- rbinom(200, 1, 0.6) x2 <- runif(200, 0, 1) phi <- 1 + 2 * x1 rhomat <- 0.667^outer(1:4, 1:4, function(x, y) abs(x - y)) chol.u <- chol(rhomat) noise <- as.vector(sapply(1:50, function(x) chol.u %*% rnorm(4))) e <- sqrt(phi) * noise y <- 1 + 3 * x1 - 2 * x2 + e dat <- data.frame(y, id, visit, x1, x2) dat } set.seed(1); str(dat <- gendat()) geelm(y ~ x1 + x2, id = id, data = dat, corstr = "ar1")
gendat <- function() { id <- gl(50, 4, 200) visit <- rep(1:4, 50) x1 <- rbinom(200, 1, 0.6) x2 <- runif(200, 0, 1) phi <- 1 + 2 * x1 rhomat <- 0.667^outer(1:4, 1:4, function(x, y) abs(x - y)) chol.u <- chol(rhomat) noise <- as.vector(sapply(1:50, function(x) chol.u %*% rnorm(4))) e <- sqrt(phi) * noise y <- 1 + 3 * x1 - 2 * x2 + e dat <- data.frame(y, id, visit, x1, x2) dat } set.seed(1); str(dat <- gendat()) geelm(y ~ x1 + x2, id = id, data = dat, corstr = "ar1")
This function exclusively returns the Kaplan-Meier survival estimate and the corresponding time points.
It does not provide standard errors or any additional outputs
that are typically included with the survfit()
function.
km(time, status)
km(time, status)
time |
A numeric vector for the observed survival times. |
status |
A numeric vector for the event indicator; 0 indicates right-censoring and 1 indicates events. |
A data frame with the Kaplan-Meier survival estimates, containing:
time |
Time points at which the survival probability is estimated. |
surv |
Estimated survival probability at each time point. |
data(Teeth500) km(Teeth500$time, Teeth500$event)
data(Teeth500) km(Teeth500$time, Teeth500$event)
Performs the Maller-Zhou test.
mzTest(time, status)
mzTest(time, status)
time |
A numeric vector for the observed survival times. |
status |
A numeric vector for the event indicator; 0 indicates right-censoring and 1 indicates events. |
A list containing the Maller-Zhou test results, including the test statistic, p-value, and the number of observed events.
data(Teeth500) mzTest(Teeth500$time, Teeth500$event)
data(Teeth500) mzTest(Teeth500$time, Teeth500$event)
Fits either a mixture cure model or a bounded cumulative hazard (promotion time) model with pseudo-observation approach.
pCure( formula1, formula2, time, status, data, subset, t0, model = c("mixture", "promotion"), nfolds = 5, lambda1 = NULL, exclude1 = NULL, penalty1 = c("lasso", "scad"), lambda2 = NULL, exclude2 = NULL, penalty2 = c("lasso", "scad"), control = list() )
pCure( formula1, formula2, time, status, data, subset, t0, model = c("mixture", "promotion"), nfolds = 5, lambda1 = NULL, exclude1 = NULL, penalty1 = c("lasso", "scad"), lambda2 = NULL, exclude2 = NULL, penalty2 = c("lasso", "scad"), control = list() )
formula1 |
A formula object starting with |
formula2 |
A formula object starting with |
time |
A numeric vector for the observed survival times. |
status |
A numeric vector for the event indicator; 0 indicates right-censoring and 1 indicates events. |
data |
An optional data frame that contains the covariates and response variables
( |
subset |
An optional logical vector specifying a subset of observations to be used in the fitting process. |
t0 |
A vector of times, where the pseudo-observations are constructed. When not specified, the default values are the 10, 20, ..., 90th percentiles of uncensored event times. |
model |
A character string specifying the underlying model.
The available functional form are |
nfolds |
An optional integer value specifying the number of folds. The default value is 5. |
lambda1 , lambda2
|
An option for specifying the tuning parameter used in penalization.
When this is unspecified or has a |
exclude1 , exclude2
|
A character string specifying which variables to exclude from variable selection. Variables matching elements in this string will not be penalized during the variable selection process. in variable selection. |
penalty1 , penalty2
|
A character string specifying the penalty function.
The available options are |
control |
A list of control parameters. See detail. |
An object of class "pCure"
representing a cure model fit.
Su, C.-L., Chiou, S., Lin, F.-C., and Platt, R. W. (2022) Analysis of survival data with cure fraction and variable selection: A pseudo-observations approach Statistical Methods in Medical Research, 31(11): 2037–2053.
## Function to generate simulated data under the PHMC model simMC <- function(n) { p <- 10 a <- c(1, 0, -1, 0, 0, 0, 0, 0, 0, 0) # incidence coefs. b <- c(-1, 0, 1, 0, 0, 0, 0, 0, 0, 0) # latency coefs. X <- data.frame(x = matrix(runif(n * p), n)) X$x.3 <- 1 * (X$x.3 > .5) X$x.4 <- 1 * (X$x.4 > .5) X[,5:10] <- apply(X[,5:10], 2, qnorm) time <- -3 * exp(-colSums(b * t(X))) * log(runif(n)) cure.prob <- 1 / (1 + exp(-2 - colSums(a * t(X)))) Y <- rbinom(n, 1, cure.prob) cen <- rexp(n, .02) dat <- NULL dat$Time <- pmin(time / Y, cen) dat$Status <- 1 * (dat$Time == time) data.frame(dat, X) } ## Fix seed and generate data set.seed(1); datMC <- simMC(200) ## Oracle model with an unpenalized PHMC model summary(fit1 <- pCure(~ x.1 + x.3, ~ x.1 + x.3, Time, Status, datMC)) ## Penalized PHMC model with tuning parameters selected by 10-fold cross validation ## User specifies the range of tuning parameters summary(fit2 <- pCure(~ ., ~ ., Time, Status, datMC, lambda1 = 1:10 / 10, lambda2 = 1:10 / 10)) ## Penalized PHMC model given tuning parameters summary(update(fit2, lambda1 = 0.7, lambda2 = 0.4))
## Function to generate simulated data under the PHMC model simMC <- function(n) { p <- 10 a <- c(1, 0, -1, 0, 0, 0, 0, 0, 0, 0) # incidence coefs. b <- c(-1, 0, 1, 0, 0, 0, 0, 0, 0, 0) # latency coefs. X <- data.frame(x = matrix(runif(n * p), n)) X$x.3 <- 1 * (X$x.3 > .5) X$x.4 <- 1 * (X$x.4 > .5) X[,5:10] <- apply(X[,5:10], 2, qnorm) time <- -3 * exp(-colSums(b * t(X))) * log(runif(n)) cure.prob <- 1 / (1 + exp(-2 - colSums(a * t(X)))) Y <- rbinom(n, 1, cure.prob) cen <- rexp(n, .02) dat <- NULL dat$Time <- pmin(time / Y, cen) dat$Status <- 1 * (dat$Time == time) data.frame(dat, X) } ## Fix seed and generate data set.seed(1); datMC <- simMC(200) ## Oracle model with an unpenalized PHMC model summary(fit1 <- pCure(~ x.1 + x.3, ~ x.1 + x.3, Time, Status, datMC)) ## Penalized PHMC model with tuning parameters selected by 10-fold cross validation ## User specifies the range of tuning parameters summary(fit2 <- pCure(~ ., ~ ., Time, Status, datMC, lambda1 = 1:10 / 10, lambda2 = 1:10 / 10)) ## Penalized PHMC model given tuning parameters summary(update(fit2, lambda1 = 0.7, lambda2 = 0.4))
This function provides the fitting options for the pCure()
function.
pCure.control( binit1 = NULL, binit2 = NULL, corstr = c("independence", "exchangeable", "ar1"), nlambda1 = 100, nlambda2 = 100, tol = 1e-07, maxit = 100 )
pCure.control( binit1 = NULL, binit2 = NULL, corstr = c("independence", "exchangeable", "ar1"), nlambda1 = 100, nlambda2 = 100, tol = 1e-07, maxit = 100 )
binit1 |
Initial value for the first component. A zero vector will be used if not specified. |
binit2 |
Initial value for the second component A zero vector will be used if not specified. |
corstr |
A character string specifying the correlation structure.
The following are permitted: |
nlambda1 , nlambda2
|
An integer value specifying the number of lambda.
This is only evoked when |
tol |
A positive numerical value specifying the absolute error tolerance in GEE algorithms. |
maxit |
An integer value specifying the maximum number of iteration. |
A list with control parameters.
Plot method for 'geelm' objects
## S3 method for class 'geelm' plot(x, type = c("residuals", "cv", "trace"), ...)
## S3 method for class 'geelm' plot(x, type = c("residuals", "cv", "trace"), ...)
x |
An object of class 'pCure', usually returned by the 'pCure()' function. |
type |
A character string specifying the type of plot to generate. Available options are "residuals," "cv," and "trace," which correspond to the pseudo-residual plot, cross-validation plot, and trace plot for different values of the tuning parameter, respectively. |
... |
Other arguments for future extension. |
A ggplot object representing the residual plot, cross-validation plot,
or the trace plot for an object of class "geelm"
.
This can be further modified using "ggplot2"
functions.
Plot method for 'pCure' objects
## S3 method for class 'pCure' plot(x, part = "both", type = c("residuals", "cv", "trace"), ...)
## S3 method for class 'pCure' plot(x, part = "both", type = c("residuals", "cv", "trace"), ...)
x |
An object of class 'pCure', usually returned by the 'pCure()' function. |
part |
A character string specifies which component of the cure model to plot. The default is "both", which plots both the incidence and latency components if a mixture cure model was fitted, or both the long- and short-term effects if a promotion time model was fitted. |
type |
A character string specifying the type of plot to generate. Available options are "residuals," "cv," and "trace," which correspond to the pseudo-residual plot, cross-validation plot, and trace plot for different values of the tuning parameter, respectively. |
... |
Other arguments for future extension. |
A ggplot object representing the residual plot, cross-validation plot,
or the trace plot for an object of class "pCure"
.
This can be further modified using "ggplot2"
functions.
Data on the survival of teeth with many predictors
data(Teeth500)
data(Teeth500)
A data frame containing the following variables:
tooth survival time subject to right censoring.
Tooth loss status: 1 = lost, 0 = not lost.
Molar indicator; 1 = molar tooth, 0 = non-molar tooth.
Mobility score, on a scale from 0 to 5.
Bleeding on probing, expressed as a percentage.
Plaque score, expressed as a percentage.
Periodontal probing depth.
Clinical Attachment Level.
Free Gingival Margin.
Number of filled surfaces.
New decayed surfaces.
Recurrent decayed surfaces.
Crown indicator; 1 = tooth has a crown, 0 = no crown.
Endodontic therapy indicator; 1 = endo therapy performed, 0 = no endo therapy.
Filled tooth indicator; 1 = filled, 0 = not filled.
Decayed tooth indicator; 1 = decayed, 0 = not decayed.
Total number of teeth.
Gender; 1 = male, 0 = female
Diabetes indicator; 1 = diabetes, 0 = no diabetes.
Tobacco use indicator; 1 = had tobacco use, 0 = never had tobacco use.
A data frame with 500 observations and 20 variables.
The data is a subset of the original dataset included in the MST
package
under the name Teeth
.
This subset contains the time to the first tooth loss due to periodontal reasons.
Calhoun, Peter and Su, Xiaogang and Nunn, Martha and Fan, Juanjuan (2018) Constructing Multivariate Survival Trees: The MST Package for R. Journal of Statistical Software, 83(12).