Title: | Data-Driven Estimation for Multi-Threshold Accelerate Failure Time Model |
---|---|
Description: | Developed a data-driven estimation framework for the multi-threshold accelerate failure time (MTAFT) model. The MTAFT model features different linear forms in different subdomains, and one of the major challenges is determining the number of threshold effects. The package introduces a data-driven approach that utilizes a Schwarz' information criterion, which demonstrates consistency under mild conditions. Additionally, a cross-validation (CV) criterion with an order-preserved sample-splitting scheme is proposed to achieve consistent estimation, without the need for additional parameters. The package establishes the asymptotic properties of the parameter estimates and includes an efficient score-type test to examine the existence of threshold effects. The methodologies are supported by numerical experiments and theoretical results, showcasing their reliable performance in finite-sample cases. |
Authors: | Chuang WAN [aut, cre], Hao ZENG [aut], Wei ZHONG [aut], Changliang ZOU [aut] |
Maintainer: | Chuang WAN <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-12-08 06:49:55 UTC |
Source: | CRAN |
This function implements a cross-validation method for the multiple thresholds accelerated failure time (AFT) model using either the "WBS" (Wild Binary Segmentation) or "DP" (Dynamic Programming) algorithm. It determines the optimal number of thresholds by evaluating the cross-validation (CV) values.
MTAFT_CV( Y, X, delta, Tq, algorithm, dist_min = 50, ncps_max = 4, wbs_nintervals = 200 )
MTAFT_CV( Y, X, delta, Tq, algorithm, dist_min = 50, ncps_max = 4, wbs_nintervals = 200 )
Y |
the censored logarithm of the failure time. |
X |
the design matrix without the intercept. |
delta |
the censoring indicator. |
Tq |
the threshold values. |
algorithm |
the threshold detection algorithm, either "WBS" or "DP". |
dist_min |
the pre-specified minimal number of observations within each subgroup. Default is 50. |
ncps_max |
the pre-specified maximum number of thresholds. Default is 4. |
wbs_nintervals |
the number of random intervals in the WBS algorithm. Default is 200. |
A list with the following components:
the subgroup-specific slope estimates and variance estimates.
the threshold estimates.
the CV values for all candidate number of thresholds.
# Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] # Run mAFT_CV with WBS algorithm maft_cv_result <- MTAFT_CV(Y, X, delta, Tq, algorithm = "WBS") maft_cv_result$params maft_cv_result$thres maft_cv_result$CV_vals
# Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] # Run mAFT_CV with WBS algorithm maft_cv_result <- MTAFT_CV(Y, X, delta, Tq, algorithm = "WBS") maft_cv_result$params maft_cv_result$thres maft_cv_result$CV_vals
This function implements a method for multiple thresholds accelerated failure time (AFT) model with information criteria. It estimates the subgroup-specific slope coefficients and variance estimates, as well as the threshold estimates using either the "WBS" (Wild Binary Segmentation) or "DP" (Dynamic Programming) algorithm.
MTAFT_IC( Y, X, delta, Tq, c0 = 0.299, delta0 = 2.01, algorithm = c("WBS", "DP"), dist_min = 50, ncps_max = 4, wbs_nintervals = 200 )
MTAFT_IC( Y, X, delta, Tq, c0 = 0.299, delta0 = 2.01, algorithm = c("WBS", "DP"), dist_min = 50, ncps_max = 4, wbs_nintervals = 200 )
Y |
the censored logarithm of the failure time. |
X |
the design matrix without the intercept. |
delta |
the censoring indicator. |
Tq |
the threshold values. |
c0 |
the penalty factor c0 in the information criteria (IC), default is 0.299. |
delta0 |
the penalty factor delta0 in the information criteria (IC), default is 2.01. |
algorithm |
the threshold detection algorithm, either "WBS" or "DP". Default is "WBS". |
dist_min |
the pre-specified minimal number of observations within each subgroup. Default is 50. |
ncps_max |
the pre-specified maximum number of thresholds. Default is 4. |
wbs_nintervals |
the number of random intervals in the WBS algorithm. Default is 200. |
A list with the following components:
the subgroup-specific slope estimates and variance estimates.
the threshold estimates.
the IC values for all candidate number of thresholds.
(Add relevant references here)
# Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] # Run MTAFT_IC with WBS algorithm mtaft_ic_result <- MTAFT_IC(Y, X, delta, Tq, algorithm = 'WBS') mtaft_ic_result$params mtaft_ic_result$thres mtaft_ic_result$IC_val
# Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] # Run MTAFT_IC with WBS algorithm mtaft_ic_result <- MTAFT_IC(Y, X, delta, Tq, algorithm = 'WBS') mtaft_ic_result$params mtaft_ic_result$thres mtaft_ic_result$IC_val
This function generates simulated data for the MTAFT (Multi-Threshold Accelerated Failure Time) analysis based on a simple simulation procedure described in the article.
MTAFT_simdata(n, err = c("normal", "t3"))
MTAFT_simdata(n, err = c("normal", "t3"))
n |
The number of sample size. |
err |
The error distribution type, either "normal" or "t3". |
A dataset containing the simulated data for MTAFT analysis.
# Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] # Generate simulated data with 200 samples and t3 error distribution dataset <- MTAFT_simdata(n = 200, err = "t3") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)]
# Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] # Generate simulated data with 200 samples and t3 error distribution dataset <- MTAFT_simdata(n = 200, err = "t3") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)]
This function performs a score-type test statistics for the presence of threshold effect in multi-threshold situations.
MTAFT_test(Y, X, Tq, delta, nboots)
MTAFT_test(Y, X, Tq, delta, nboots)
Y |
Response variable. |
X |
Covariates. |
Tq |
Threshold variable. |
delta |
Indicator vector for censoring. |
nboots |
Number of bootstrap iterations. |
p-value result indicating the presence of threshold effect.
# Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] # Perform score-type test with 500 bootstraps pval <- MTAFT_test(Y, X, Tq, delta, nboots = 500) # Perform score-type test with 1000 bootstraps pval <- MTAFT_test(Y, X, Tq, delta, nboots = 1000)
# Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] # Perform score-type test with 500 bootstraps pval <- MTAFT_test(Y, X, Tq, delta, nboots = 500) # Perform score-type test with 1000 bootstraps pval <- MTAFT_test(Y, X, Tq, delta, nboots = 1000)
This function first formulates the threshold problem as a group model selection problem so that a concave 2-norm group selection method can be applied using the 'grpreg' package in R, and then finalizes it via a refining method.
TSMCP(Y, X, delta, c, penalty = "scad")
TSMCP(Y, X, delta, c, penalty = "scad")
Y |
the censored logarithm of the failure time. |
X |
the design matrix without the intercept. |
delta |
the censoring indicator. |
c |
the length of each segment in the splitting stage, defined as
|
penalty |
Penalty type (default is "scad"). |
An object with the following components:
the change points.
the estimated coefficients.
the variance of the error.
the residuals.
weighted Y by Kaplan-Meier weight.
weighted Xn by Kaplan-Meier weight.
Li, Jialiang, and Baisuo Jin. 2018. “Multi-Threshold Accelerated Failure Time Model.” The Annals of Statistics 46 (6A): 2657–82.
library(grpreg) # Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] n1 = sum(delta) c=seq(0.5,1.5,0.1) m=ceiling(c*sqrt(n1)) bicy= rep(NA,length(c)) tsmc=NULL p = ncol(X) for(i in 1:length(c)){ tsm=try(TSMCP(Y,X,delta,c[i],penalty = "scad"),silent=TRUE) if(is(tsm,"try-error")) next() bicy[i]=log(n)*((length(tsm[[1]])+1)*(p+1))+n*log(tsm[[3]]) tsmc[[i]]=tsm } if((any(!is.na(bicy)))){ tsmcp=tsmc[[which(bicy==min(bicy))[1]]] thre.LJ = Tq[tsmcp[[1]]] thre.num.Lj = length(thre.LJ) thre.LJ thre.num.Lj }
library(grpreg) # Generate simulated data with 500 samples and normal error distribution dataset <- MTAFT_simdata(n = 500, err = "normal") Y <- dataset[, 1] delta <- dataset[, 2] Tq <- dataset[, 3] X <- dataset[, -c(1:3)] n1 = sum(delta) c=seq(0.5,1.5,0.1) m=ceiling(c*sqrt(n1)) bicy= rep(NA,length(c)) tsmc=NULL p = ncol(X) for(i in 1:length(c)){ tsm=try(TSMCP(Y,X,delta,c[i],penalty = "scad"),silent=TRUE) if(is(tsm,"try-error")) next() bicy[i]=log(n)*((length(tsm[[1]])+1)*(p+1))+n*log(tsm[[3]]) tsmc[[i]]=tsm } if((any(!is.na(bicy)))){ tsmcp=tsmc[[which(bicy==min(bicy))[1]]] thre.LJ = Tq[tsmcp[[1]]] thre.num.Lj = length(thre.LJ) thre.LJ thre.num.Lj }