Title: | Accelerated Failure Time Model with Generalized Estimating Equations |
---|---|
Description: | A collection of methods for both the rank-based estimates and least-square estimates to the Accelerated Failure Time (AFT) model. For rank-based estimation, it provides approaches that include the computationally efficient Gehan's weight and the general's weight such as the logrank weight. Details of the rank-based estimation can be found in Chiou et al. (2014) <doi:10.1007/s11222-013-9388-2> and Chiou et al. (2015) <doi:10.1002/sim.6415>. For the least-square estimation, the estimating equation is solved with generalized estimating equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE's setting. Details on the least-squares estimation can be found in Chiou et al. (2014) <doi:10.1007/s10985-014-9292-x>. |
Authors: | Sy Han Chiou [aut, cre], Sangwook Kang [aut], Jun Yan [aut] |
Maintainer: | Sy Han Chiou <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.1 |
Built: | 2024-12-07 06:28:02 UTC |
Source: | CRAN |
A package that uses Generalized Estimating Equations (GEE) to estimate Multivariate Accelerated Failure Time Model (AFT). This package implements recently developed inference procedures for AFT models with both the rank-based approach and the least squares approach. For the rank-based approach, the package allows various weight choices and uses an induced smoothing procedure that leads to much more efficient computation than the linear programming method. With the rank-based estimator as an initial value, the generalized estimating equation approach is used as an extension of the least squares approach to the multivariate case. Additional sampling weights are incorporated to handle missing data needed as in case-cohort studies or general sampling schemes.
Maintainer: Sy Han Chiou [email protected]
Authors:
Sangwook Kang
Jun Yan
Chiou, S., Kim, J. and Yan, J. (2014) Marginal Semiparametric Multivariate Accelerated Failure Time Model with Generalized Estimating Equation. Life Time Data, 20(4): 599–618.
Chiou, S., Kang, S. and Yan, J. (2014) Fast Accelerated Failure Time Modeling for Case-Cohort Data. Statistics and Computing, 24(4): 559–568.
Chiou, S., Kang, S. and Yan, J. (2014) Fitting Accelerated Failure Time Model in Routine Survival Analysis with R Package aftgee. Journal of Statistical Software, 61(11): 1–23.
Huang, Y. (2002) Calibration Regression of Censored Lifetime Medical Cost. Journal of American Statistical Association, 97, 318–327.
Jin, Z. and Lin, D. Y. and Ying, Z. (2006) On Least-squares Regression with Censored Data. Biometrika, 90, 341–353.
Johnson, L. M. and Strawderman, R. L. (2009) Induced Smoothing for the Semiparametric Accelerated Failure Time Model: Asymptotic and Extensions to Clustered Data. Biometrika, 96, 577 – 590.
Zeng, D. and Lin, D. Y. (2008) Efficient Resampling Methods for Nonsmooth Estimating Functions. Biostatistics, 9, 355–363
Useful links:
Fits a semiparametric accelerated failure time (AFT) model with least-squares approach. Generalized estimating equation is generalized to multivariate AFT modeling to account for multivariate dependence through working correlation structures to improve efficiency.
aftgee( formula, data, subset, id = NULL, contrasts = NULL, weights = NULL, margin = NULL, corstr = c("independence", "exchangeable", "ar1", "unstructured", "userdefined", "fixed"), binit = "srrgehan", B = 100, control = aftgee.control() )
aftgee( formula, data, subset, id = NULL, contrasts = NULL, weights = NULL, margin = NULL, corstr = c("independence", "exchangeable", "ar1", "unstructured", "userdefined", "fixed"), binit = "srrgehan", B = 100, control = aftgee.control() )
formula |
a formula expression, of the form |
data |
an optional data.frame in which to interpret the variables occurring
in the |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
id |
an optional vector used to identify the clusters.
If missing, then each individual row of |
contrasts |
an optional list. |
weights |
an optional vector of observation weights. |
margin |
a |
corstr |
a character string specifying the correlation structure. The following are permitted:
|
binit |
an optional vector can be either a numeric vector or a character string specifying the initial slope estimator.
The default value is "srrgehan". |
B |
a numeric value specifies the resampling number. When B = 0, only the beta estimate will be displayed. |
control |
controls maxiter and tolerance. |
An object of class "aftgee
" representing the fit.
The aftgee
object is a list containing at least the following components:
a vector of initial value and a vector of point estimates
a vector of point estimates
estimated covariance matrix
a vector of initial value
estimated initial covariance matrix
a character string specifying the initial estimator.
An integer code indicating type of convergence after GEE
iteration. 0 indicates successful convergence; 1 indicates that the
iteration limit maxit
has been reached
An integer code indicating type of convergence for
initial value. 0 indicates successful convergence; 1 indicates that the
iteration limit maxit
has been reached
An integer code indicating the step until convergence
Chiou, S., Kim, J. and Yan, J. (2014) Marginal Semiparametric Multivariate Accelerated Failure Time Model with Generalized Estimating Equation. Lifetime Data Analysis, 20(4): 599–618.
Jin, Z. and Lin, D. Y. and Ying, Z. (2006) On Least-squares Regression with Censored Data. Biometrika, 90, 341–353.
## Simulate data from an AFT model with possible depended response datgen <- function(n = 100, tau = 0.3, dim = 2) { x1 <- rbinom(dim * n, 1, 0.5) x2 <- rnorm(dim * n) e <- c(t(exp(MASS::mvrnorm(n = n, mu = rep(0, dim), Sigma = tau + (1 - tau) * diag(dim))))) tt <- exp(2 + x1 + x2 + e) cen <- runif(n, 0, 100) data.frame(Time = pmin(tt, cen), status = 1 * (tt < cen), x1 = x1, x2 = x2, id = rep(1:n, each = dim)) } set.seed(1); dat <- datgen(n = 50, dim = 2) fm <- Surv(Time, status) ~ x1 + x2 fit1 <- aftgee(fm, data = dat, id = id, corstr = "ind") fit2 <- aftgee(fm, data = dat, id = id, corstr = "ex") summary(fit1) summary(fit2) confint(fit1) confint(fit2)
## Simulate data from an AFT model with possible depended response datgen <- function(n = 100, tau = 0.3, dim = 2) { x1 <- rbinom(dim * n, 1, 0.5) x2 <- rnorm(dim * n) e <- c(t(exp(MASS::mvrnorm(n = n, mu = rep(0, dim), Sigma = tau + (1 - tau) * diag(dim))))) tt <- exp(2 + x1 + x2 + e) cen <- runif(n, 0, 100) data.frame(Time = pmin(tt, cen), status = 1 * (tt < cen), x1 = x1, x2 = x2, id = rep(1:n, each = dim)) } set.seed(1); dat <- datgen(n = 50, dim = 2) fm <- Surv(Time, status) ~ x1 + x2 fit1 <- aftgee(fm, data = dat, id = id, corstr = "ind") fit2 <- aftgee(fm, data = dat, id = id, corstr = "ex") summary(fit1) summary(fit2) confint(fit1) confint(fit2)
Auxiliary function as user interface for aftgee
and aftsrr
fitting.
aftgee.control( maxiter = 50, reltol = 0.001, trace = FALSE, seIni = FALSE, parallel = FALSE, parCl = parallel::detectCores()/2, gp.pwr = -999 )
aftgee.control( maxiter = 50, reltol = 0.001, trace = FALSE, seIni = FALSE, parallel = FALSE, parCl = parallel::detectCores()/2, gp.pwr = -999 )
maxiter |
max number of iteration. |
reltol |
relative error tolerance. |
trace |
a binary variable, determine whether to display output for each iteration. |
seIni |
a logical value indicating whether a new rank-based initial value is computed for each resampling sample in variance estimation. |
parallel |
an logical value indicating whether parallel computing is used for resampling and bootstrap. |
parCl |
an integer value indicating the number of CPU cores used when |
gp.pwr |
an numerical value indicating the GP parameter when |
When trace
is TRUE, output for each iteration is printed to the screen.
A list with the arguments as components.
Fits a semiparametric accelerated failure time (AFT) model with rank-based approach.
General weights, additional sampling weights and fast sandwich variance estimations
are also incorporated.
Estimating equations are solved with Barzilar-Borwein spectral method implemented as
BBsolve
in package BB.
aftsrr( formula, data, subset, id = NULL, contrasts = NULL, weights = NULL, B = 100, rankWeights = c("gehan", "logrank", "PW", "GP", "userdefined"), eqType = c("is", "ns", "mis", "mns"), se = c("NULL", "bootstrap", "MB", "ZLCF", "ZLMB", "sHCF", "sHMB", "ISCF", "ISMB"), control = list() )
aftsrr( formula, data, subset, id = NULL, contrasts = NULL, weights = NULL, B = 100, rankWeights = c("gehan", "logrank", "PW", "GP", "userdefined"), eqType = c("is", "ns", "mis", "mns"), se = c("NULL", "bootstrap", "MB", "ZLCF", "ZLMB", "sHCF", "sHMB", "ISCF", "ISMB"), control = list() )
formula |
a formula expression, of the form |
data |
an optional data frame in which to interpret the variables
occurring in the |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
id |
an optional vector used to identify the clusters.
If missing, then each individual row of |
contrasts |
an optional list. |
weights |
an optional vector of observation weights. |
B |
a numeric value specifies the resampling number.
When |
rankWeights |
a character string specifying the type of general weights. The following are permitted:
|
eqType |
a character string specifying the type of the estimating equation used to obtain the regression parameters. The following are permitted:
|
se |
a character string specifying the estimating method for the variance-covariance matrix. The following are permitted:
|
control |
controls equation solver, maxiter, tolerance, and resampling variance estimation.
The available equation solvers are
.
The readers are refered to the BB package for details.
Instead of searching for the zero crossing, options including |
When se = "bootstrap"
or se = "MB"
, the variance-covariance matrix
is estimated through a bootstrap fashion.
Bootstrap samples that failed to converge are removed when computing the empirical variance matrix.
When bootstrap is not called, we assume the variance-covariance matrix has a sandwich form
where is the asymptotic variance of the estimating function and
is the slope matrix.
In this package, we provide seveal methods to estimate the variance-covariance
matrix via this sandwich form, depending on how
and
are estimated.
Specifically, the asymptotic variance,
, can be estimated by either a
closed-form formulation (
CF
) or through bootstrap the estimating equations (MB
).
On the other hand, the methods to estimate the slope matrix are
the inducing smoothing approach (
IS
), Zeng and Lin's approach (ZL
),
and the smoothed Huang's approach (sH
).
aftsrr
returns an object of class "aftsrr
" representing the fit.
An object of class "aftsrr
" is a list containing at least the following components:
A vector of beta estimates
A list of covariance estimates
An integer code indicating type of convergence.
indicates successful convergence.
indicates that the iteration limit maxit
has been reached.
indicates failure due to stagnation.
indicates error in function evaluation.
is failure due to exceeding 100 step length reductions in line-search.
indicates lack of improvement in objective function.
When variance = "MB"
, bhist
gives the bootstrap samples.
Chiou, S., Kang, S. and Yan, J. (2014) Fast Accelerated Failure Time Modeling for Case-Cohort Data. Statistics and Computing, 24(4): 559–568.
Chiou, S., Kang, S. and Yan, J. (2014) Fitting Accelerated Failure Time Model in Routine Survival Analysis with R Package Aftgee. Journal of Statistical Software, 61(11): 1–23.
Huang, Y. (2002) Calibration Regression of Censored Lifetime Medical Cost. Journal of American Statistical Association, 97, 318–327.
Johnson, L. M. and Strawderman, R. L. (2009) Induced Smoothing for the Semiparametric Accelerated Failure Time Model: Asymptotic and Extensions to Clustered Data. Biometrika, 96, 577 – 590.
Varadhan, R. and Gilbert, P. (2009) BB: An R Package for Solving a Large System of Nonlinear Equations and for Optimizing a High-Dimensional Nonlinear Objective Function. Journal of Statistical Software, 32(4): 1–26
Zeng, D. and Lin, D. Y. (2008) Efficient Resampling Methods for Nonsmooth Estimating Functions. Biostatistics, 9, 355–363
## Simulate data from an AFT model datgen <- function(n = 100) { x1 <- rbinom(n, 1, 0.5) x2 <- rnorm(n) e <- rnorm(n) tt <- exp(2 + x1 + x2 + e) cen <- runif(n, 0, 100) data.frame(Time = pmin(tt, cen), status = 1 * (tt < cen), x1 = x1, x2 = x2, id = 1:n) } set.seed(1); dat <- datgen(n = 50) summary(aftsrr(Surv(Time, status) ~ x1 + x2, data = dat, se = c("ISMB", "ZLMB"), B = 10)) ## Data set with sampling weights data(nwtco, package = "survival") subinx <- sample(1:nrow(nwtco), 668, replace = FALSE) nwtco$subcohort <- 0 nwtco$subcohort[subinx] <- 1 pn <- mean(nwtco$subcohort) nwtco$hi <- nwtco$rel + ( 1 - nwtco$rel) * nwtco$subcohort / pn nwtco$age12 <- nwtco$age / 12 nwtco$study <- factor(nwtco$study) nwtco$histol <- factor(nwtco$histol) sub <- nwtco[subinx,] fit <- aftsrr(Surv(edrel, rel) ~ histol + age12 + study, id = seqno, weights = hi, data = sub, B = 10, se = c("ISMB", "ZLMB"), subset = stage == 4) summary(fit) confint(fit)
## Simulate data from an AFT model datgen <- function(n = 100) { x1 <- rbinom(n, 1, 0.5) x2 <- rnorm(n) e <- rnorm(n) tt <- exp(2 + x1 + x2 + e) cen <- runif(n, 0, 100) data.frame(Time = pmin(tt, cen), status = 1 * (tt < cen), x1 = x1, x2 = x2, id = 1:n) } set.seed(1); dat <- datgen(n = 50) summary(aftsrr(Surv(Time, status) ~ x1 + x2, data = dat, se = c("ISMB", "ZLMB"), B = 10)) ## Data set with sampling weights data(nwtco, package = "survival") subinx <- sample(1:nrow(nwtco), 668, replace = FALSE) nwtco$subcohort <- 0 nwtco$subcohort[subinx] <- 1 pn <- mean(nwtco$subcohort) nwtco$hi <- nwtco$rel + ( 1 - nwtco$rel) * nwtco$subcohort / pn nwtco$age12 <- nwtco$age / 12 nwtco$study <- factor(nwtco$study) nwtco$histol <- factor(nwtco$histol) sub <- nwtco[subinx,] fit <- aftsrr(Surv(edrel, rel) ~ histol + age12 + study, id = seqno, weights = hi, data = sub, B = 10, se = c("ISMB", "ZLMB"), subset = stage == 4) summary(fit) confint(fit)
is.Surv
function imported from survival
This function is imported from the survival
package. See
is.Surv
.
Surv
function imported from survival
This function is imported from the survival
package. See
Surv
.
Implementation based on MES::QIC.geeglm
QIC(object)
QIC(object)
object |
is a |
## Simulate data from an AFT model with possible depended response datgen <- function(n = 100, tau = 0.3, dim = 2) { x1 <- rbinom(dim * n, 1, 0.5) x2 <- rnorm(dim * n) e <- c(t(exp(MASS::mvrnorm(n = n, mu = rep(0, dim), Sigma = tau + (1 - tau) * diag(dim))))) tt <- exp(2 + x1 + x2 + e) cen <- runif(n, 0, 100) data.frame(Time = pmin(tt, cen), status = 1 * (tt < cen), x1 = x1, x2 = x2, id = rep(1:n, each = dim)) } set.seed(1); dat <- datgen(n = 50, dim = 2) fm <- Surv(Time, status) ~ x1 + x2 fit1 <- aftgee(fm, data = dat, id = id, corstr = "ind") fit2 <- aftgee(fm, data = dat, id = id, corstr = "ex") QIC(fit1) QIC(fit2)
## Simulate data from an AFT model with possible depended response datgen <- function(n = 100, tau = 0.3, dim = 2) { x1 <- rbinom(dim * n, 1, 0.5) x2 <- rnorm(dim * n) e <- c(t(exp(MASS::mvrnorm(n = n, mu = rep(0, dim), Sigma = tau + (1 - tau) * diag(dim))))) tt <- exp(2 + x1 + x2 + e) cen <- runif(n, 0, 100) data.frame(Time = pmin(tt, cen), status = 1 * (tt < cen), x1 = x1, x2 = x2, id = rep(1:n, each = dim)) } set.seed(1); dat <- datgen(n = 50, dim = 2) fm <- Surv(Time, status) ~ x1 + x2 fit1 <- aftgee(fm, data = dat, id = id, corstr = "ind") fit2 <- aftgee(fm, data = dat, id = id, corstr = "ex") QIC(fit1) QIC(fit2)