Title: | Semiparametric Transformation Models |
---|---|
Description: | To make the semiparametric transformation models easier to apply in real studies, we introduce this R package, in which the MLE in transformation models via an EM algorithm proposed by Zeng D, Lin DY(2007) <doi:10.1111/j.1369-7412.2007.00606.x> and adaptive lasso method in transformation models proposed by Liu XX, Zeng D(2013) <doi:10.1093/biomet/ast029> are implemented. C++ functions are used to compute complex loops. The coefficient vector and cumulative baseline hazard function can be estimated, along with the corresponding standard errors and P values. |
Authors: | Fengyu Wen [aut, cre], Baosheng Liang [aut] |
Maintainer: | Fengyu Wen <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2024-10-31 22:28:25 UTC |
Source: | CRAN |
Estimate the vector of parameters for baseline covariates
and baseline cumulative hazard function
using the expectation-maximization algorithm.
is estimated
as a step function with jumps only at the observed failure times. Typically,
it would only be used in a call to
trans.m
or Simu
.
EM_est(Y, X, delta, alpha, Q = 60, EM_itmax = 250)
EM_est(Y, X, delta, alpha, Q = 60, EM_itmax = 250)
Y |
observed event times |
X |
design matrix |
delta |
censoring indicator. If |
alpha |
parameter in transformation function |
Q |
number of nodes and weights in Gaussian quadrature. Defaults to 60. |
EM_itmax |
maximum iteration of EM algorithm. Defaults to 250. |
a list containing
beta_new |
estimator of |
||
Lamb_Y |
estimator of
|
||
lamb_Y |
estimator of
|
||
lamb_Ydot |
estimator of
|
||
Y_eq_Yhat |
a matrix used in
trans.m and Simu |
||
Y_geq_Yhat |
a matrix
used in trans.m and Simu |
||
Abramowitz, M., and Stegun, I.A. (1972). Handbook of Mathematical Functions (9th ed.). Dover Publications, New York.
Evans, M. and Swartz, T. (2000). Approximating Integrals via Monte Carlo and Deterministic Methods. Oxford University Press.
Liu, Q. and Pierce, D.A. (1994). A note on Gauss-Hermite quadrature. Biometrika 81: 624-629.
gen_data = generate_data(200, 1, 0.5, c(-0.5, 1)) delta = gen_data$delta Y = gen_data$Y X = gen_data$X EM_est(Y, X, delta, alpha = 1)$beta_new - c(-0.5, 1)
gen_data = generate_data(200, 1, 0.5, c(-0.5, 1)) delta = gen_data$delta Y = gen_data$Y X = gen_data$X EM_est(Y, X, delta, alpha = 1)$beta_new - c(-0.5, 1)
Generate observed event times, covariates and other data used for simulations in the paper.
generate_data(n, alpha, rho, beta_true, now_repeat = 1)
generate_data(n, alpha, rho, beta_true, now_repeat = 1)
n |
number of subjects |
alpha |
parameter in transformation function |
rho |
parameter in baseline cumulative hazard function |
beta_true |
parameter |
now_repeat |
number of duplication of simulation |
The survival function for of the
th observation takes
the form
The failure time can be generated by
a list containing
X |
design matrix | ||
beta_X |
|
||
Y |
observed event time | ||
Abramowitz, M., and Stegun, I.A. (1972). Handbook of Mathematical Functions (9th ed.). Dover Publications, New York. +- Evans, M. and Swartz, T. (2000). Approximating Integrals via Monte Carlo and Deterministic Methods. Oxford University Press.
Liu, Q. and Pierce, D.A. (1994). A note on Gauss-Hermite quadrature. Biometrika 81: 624-629.
generate_data(200,0.5,1,c(0.5,-1))
generate_data(200,0.5,1,c(0.5,-1))
Select the important variables in semiparametric transformation models for right censored data using adaptive lasso.
trans_alasso(Z, Y, delta_i, r, lamb_vec, solu = TRUE)
trans_alasso(Z, Y, delta_i, r, lamb_vec, solu = TRUE)
Z |
the baseline covariates |
Y |
observed event times |
delta_i |
censoring indicator. If |
r |
parameter in transformation function |
lamb_vec |
the grad of the tuning parameter |
solu |
determines whether the solution path will be plotted. The default is TRUE. |
The initial value of the coefficient used as the adapting
weights is EM estimator, which is computed by the function
EM_est
.
The tuning parameter is data-dependent and we select it using
generalized crossvalidation. There may be some errors for small
, in which case the
and the number of adaptive
lasso iteration are recorded in the
skip_para
.
a list containing
beta_res |
the estimated with the selected tuning parameter |
||
GCV_res |
the value of GCV with the selected tuning parameter |
||
lamb_res |
the selected tuning parameter |
||
beta_all |
estimated with all tuning parameters |
||
CSV_all |
value of GCV with all tuning parameters | ||
skip_para |
a list containing the and the number of adaptive lasso iteration when adaptive lasso doesn't work. |
||
Xiaoxi, L. , & Donglin, Z. . (2013). Variable selection in semiparametric transformation models for right-censored data. Biometrika(4), 859-876.
if(!requireNamespace("MASS", quietly = TRUE)) {stop("package MASS needed for this example. Please install it.")} gen_lasdat = function(n,r,rho,beta_true,a,b,seed=66,std = FALSE) { set.seed(seed) beta_len = length(beta_true) beta_len = beta_len sigm = matrix(0, nrow = beta_len, ncol = beta_len) for(i in 1:(beta_len-1)) { diag(sigm[1:(beta_len+1-i),i:beta_len]) = rho^(i-1) } sigm[1,beta_len] = rho^(beta_len-1) sigm[lower.tri(sigm)] = t(sigm)[lower.tri(sigm)] Z = MASS::mvrnorm(n, mu = rep(0, beta_len), Sigma = sigm) beta_Z.true = c(Z %*% beta_true) U = runif(n) if(r>0) { t = ((U^(-r)-1)/(a*r*exp(beta_Z.true)))^(1/b) }else if(r == 0) { t = (-log(U)/(a*exp(beta_Z.true)))^(1/b) #t = (exp(-log(U)/(0.5 * exp(beta_Z.true))) - 1) } C = runif(n,0,8) Y = pmin(C,t) delta_i = ifelse( C >= t, 1, 0) if(std) { Z = apply(Z,2,normalize) } return(list(Z = Z, Y = Y, delta_i = delta_i,censor = mean(1-delta_i))) } now_rep=1 dat = gen_lasdat(100,1,0.5,c(0.3,0.5,0.7,0,0,0,0,0,0,0),2,5,seed= 6+60*now_rep,std = FALSE) Z = dat$Z Y = dat$Y delta_i = dat$delta_i tra_ala = trans_alasso(Z,Y,delta_i,lamb_vec = c(5,7),r=1) tra_ala$GCV_res tra_ala$beta_res tra_ala$lamb_res
if(!requireNamespace("MASS", quietly = TRUE)) {stop("package MASS needed for this example. Please install it.")} gen_lasdat = function(n,r,rho,beta_true,a,b,seed=66,std = FALSE) { set.seed(seed) beta_len = length(beta_true) beta_len = beta_len sigm = matrix(0, nrow = beta_len, ncol = beta_len) for(i in 1:(beta_len-1)) { diag(sigm[1:(beta_len+1-i),i:beta_len]) = rho^(i-1) } sigm[1,beta_len] = rho^(beta_len-1) sigm[lower.tri(sigm)] = t(sigm)[lower.tri(sigm)] Z = MASS::mvrnorm(n, mu = rep(0, beta_len), Sigma = sigm) beta_Z.true = c(Z %*% beta_true) U = runif(n) if(r>0) { t = ((U^(-r)-1)/(a*r*exp(beta_Z.true)))^(1/b) }else if(r == 0) { t = (-log(U)/(a*exp(beta_Z.true)))^(1/b) #t = (exp(-log(U)/(0.5 * exp(beta_Z.true))) - 1) } C = runif(n,0,8) Y = pmin(C,t) delta_i = ifelse( C >= t, 1, 0) if(std) { Z = apply(Z,2,normalize) } return(list(Z = Z, Y = Y, delta_i = delta_i,censor = mean(1-delta_i))) } now_rep=1 dat = gen_lasdat(100,1,0.5,c(0.3,0.5,0.7,0,0,0,0,0,0,0),2,5,seed= 6+60*now_rep,std = FALSE) Z = dat$Z Y = dat$Y delta_i = dat$delta_i tra_ala = trans_alasso(Z,Y,delta_i,lamb_vec = c(5,7),r=1) tra_ala$GCV_res tra_ala$beta_res tra_ala$lamb_res
This function is used to conduct the regression analysis of right-censored data using semiparametric transformation models. It calculates the estimators, standard errors and p values. A plot of estimated baseline cumulative hazard function and confidence intervals can be produced.
trans_m( X, delta, Y, plot.Lamb = TRUE, alpha = seq(0.1, 1.1, by = 0.1), trsmodel = TRUE, EM_itmax = 250, show_res = TRUE )
trans_m( X, delta, Y, plot.Lamb = TRUE, alpha = seq(0.1, 1.1, by = 0.1), trsmodel = TRUE, EM_itmax = 250, show_res = TRUE )
X |
design matrix |
delta |
censoring indicator. If |
Y |
observed event times |
plot.Lamb |
If TRUE, plot the estimated baseline cumulative hazard function and confidence intervals. The default is TRUE. |
alpha |
parameter in transformation function. Generally, |
trsmodel |
logical value indicating whether to implement transformation models. The default is TRUE. |
EM_itmax |
maximum iteration of EM algorithm. Defaults to 250. |
show_res |
show results after |
If is unknown, we firse set
alpha
.
Then, for each , we estimate the parameters and record the value
of observed log-likelihood function. The
that maximizes the
observed log-likelihood function and the corresponding
and
are chosen as the best estimators. Nonparametric
maximum likelihood estimators are developed for the regression parameters
and cumulative intensity functions of these models based on censored data.
a list containing
beta.est |
estimators of |
||
SE.beta |
standard errors
of the estimated |
||
SE.Ydot |
standard errors
of the estimated |
||
Ydot |
vector of sorted event times with duplicate values removed | ||
Lamb.est
|
estimated baseline cumulative hazard | ||
lamb.est
|
estimated jump sizes of baseline cumulative hazard function | ||
choose.alpha |
the chosen |
||
Lamb.upper |
upper confidence limits for the estimated baseline cumulative hazard function | ||
Lamb.lower |
lower confidence limits for the estimated baseline cumulative hazard function | ||
p.beta |
P values of estimated |
||
p.Lamb |
P values of estimated baseline cumulative hazard | ||
p.beta |
Cheng, S.C., Wei, L.J., and Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika 82, 835-845.
Zeng, D. and Lin, D.Y. (2007). Maximum likelihood estimation in semiparametric regression models with censored data. J. R. Statist. Soc. B 69, 507-564.
Abramowitz, M., and Stegun, I.A. (1972). Handbook of Mathematical Functions (9th ed.). Dover Publications, New York.
Evans, M. and Swartz, T. (2000). Approximating Integrals via Monte Carlo and Deterministic Methods. Oxford University Press.
Liu, Q. and Pierce, D.A. (1994). A note on Gauss-Hermite quadrature. Biometrika 81, 624-629.
Louis, T. (1982). Finding the Observed Information Matrix when Using the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 44(2), 226-233.
gen_data = generate_data(200, 1, 0.5, c(-0.5,1)) delta = gen_data$delta Y = gen_data$Y X = gen_data$X res.trans = trans_m(X, delta, Y, plot.Lamb = TRUE, show_res = FALSE)
gen_data = generate_data(200, 1, 0.5, c(-0.5,1)) delta = gen_data$delta Y = gen_data$Y X = gen_data$X res.trans = trans_m(X, delta, Y, plot.Lamb = TRUE, show_res = FALSE)