| Title: | Superefficient Estimation of Future Conditional Hazards Based on Marker Information |
|---|---|
| Description: | Provides univariate and indexed (multivariate) nonparametric smoothed kernel estimators for the future conditional hazard rate function when time-dependent covariates are present, a bandwidth selector for the estimator's implementation and pointwise and uniform confidence bands. Methods used in the package refer to Bagkavos, Isakson, Mammen, Nielsen and Proust-Lima (2025) <doi:10.1093/biomet/asaf008>. |
| Authors: | Dimitrios Bagkavos [aut, cre], Alex Isakson [ctb], Enno Mammen [ctb], Jens Nielsen [ctb], Cecile Proust-Lima [ctb] |
| Maintainer: | Dimitrios Bagkavos <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 2.1 |
| Built: | 2026-05-25 07:02:44 UTC |
| Source: | https://github.com/cran/HQM |
Calculates the AUC for the HQM estimator.
auc.hqm(xin, est, landm, th, event_time_name, status_name)auc.hqm(xin, est, landm, th, event_time_name, status_name)
xin |
A data frame containing event times and the patient status. |
est |
The HQM estimator values, typically the output of |
landm |
Landmark time. |
th |
Time horizon. |
event_time_name |
The column name of the event times in the |
status_name |
The column name of the status variable in the |
The function auc.hqm implements the AUC calculation for the HQM estimator estimator.
A vector of two values: the landmark time of the calculation and the AUC value.
library(survival) Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) timesS2 <- seq(Landmark,14,by=0.5) b=0.9 arg1<- get_h_x(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) br_s2 = seq(Landmark, 14, length=ls-1) sfalb2<- make_sf( (br_s2[2]-br_s2[1])/4 , arg1) tHor <- 1.5 auc.hq.use<-auc.hqm(pbcT1, sfalb2, Landmark,tHor, event_time_name = 'years', status_name = 'status2') auc.hq.uselibrary(survival) Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) timesS2 <- seq(Landmark,14,by=0.5) b=0.9 arg1<- get_h_x(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) br_s2 = seq(Landmark, 14, length=ls-1) sfalb2<- make_sf( (br_s2[2]-br_s2[1])/4 , arg1) tHor <- 1.5 auc.hq.use<-auc.hqm(pbcT1, sfalb2, Landmark,tHor, event_time_name = 'years', status_name = 'status2') auc.hq.use
Implements the bandwidth selection for the future conditional hazard rate based on K-fold cross validation.
b_selection(data, marker_name, event_time_name = 'years', time_name = 'year', event_name = 'status2', I, b_list)b_selection(data, marker_name, event_time_name = 'years', time_name = 'year', event_name = 'status2', I, b_list)
data |
A data frame of time dependent data points. Missing values are allowed. |
marker_name |
The column name of the marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
I |
Number of observations leave out for a K cross validation. |
b_list |
Vector of bandwidths that need to be tested. |
The function b_selection implements the cross validation bandwidth selection for the future conditional hazard rate given by
where is a smoothed kernel density estimator of and the exposure process of individual . Note that is dependent on .
A list with the tested bandwidths and its cross validation scores.
Bagkavos, I., Isakson, R., Mammen, E., Nielsen, J., and Proust–Lima, C. (2025). Biometrika, 112(2), asaf008. doi:10.1093/biomet/asaf008
b_selection_prep_g, Q1, R_K, prep_cv, dataset_split
I = 26 # For Albumin marker: b_list = seq(0.9, 1.7, 0.1) b_scores_alb = b_selection(pbc2, 'albumin', 'years', 'year', 'status2', I, b_list) b_scores_alb[[2]][which.min(b_scores_alb[[1]])] # For Bilirubin marker: b_list = seq(3, 4, 0.1) b_scores_bil = b_selection(pbc2, 'serBilir', 'years', 'year', 'status2', I, b_list) b_scores_bil[[2]][which.min(b_scores_bil[[1]])] b_scores_bilI = 26 # For Albumin marker: b_list = seq(0.9, 1.7, 0.1) b_scores_alb = b_selection(pbc2, 'albumin', 'years', 'year', 'status2', I, b_list) b_scores_alb[[2]][which.min(b_scores_alb[[1]])] # For Bilirubin marker: b_list = seq(3, 4, 0.1) b_scores_bil = b_selection(pbc2, 'serBilir', 'years', 'year', 'status2', I, b_list) b_scores_bil[[2]][which.min(b_scores_bil[[1]])] b_scores_bil
Implements the index parameter selection for two markers based on K-fold cross validation.
b_selection_index_optim(in.par, data, marker_name1, marker_name2, event_time_name = 'years', time_name = 'year', event_name = 'status2', I, b)b_selection_index_optim(in.par, data, marker_name1, marker_name2, event_time_name = 'years', time_name = 'year', event_name = 'status2', I, b)
in.par |
Vector of candidate values for the index parameters. |
data |
A data frame of time dependent data points. Missing values are allowed. |
marker_name1 |
The column name of the first marker values in the data frame |
marker_name2 |
The column name of the second marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
I |
Number of observations leave out for a K cross validation. |
b |
scalar bandwidth for the HQM estimator. |
The function b_selection_index_optim implements the cross validation index parameter selection for the indexing of two markers and given by
where is the HQM estimator of , see Bagkavos et al. (2025), doi:10.1093/biomet/asaf008, and the exposure process of individual . Note that is dependent on .
A list with the tested parameters and its cross validation scores.
Bagkavos, I., Isakson, R., Mammen, E., Nielsen, J., and Proust–Lima, C. (2025). Biometrika, 112(2), asaf008. doi:10.1093/biomet/asaf008
b_selection_prep_g, Q1, R_K, prep_cv, dataset_split
# Obtain the indexing parameters for the combination of albumin and bilirubin markers # These are the values used in the example of function h_xt_vec # and yielded the values: (par.alb, par.bil) = ( 0.0702, 0.0856 ) that were used there. I = 26 b_list = seq(1.1, 1.2, by=0.1) for(i in 1:length(b_list)) { res<- optim(c(0.5, 0.5), b_selection_index_optim, data=pbc2, marker_name1='albumin', marker_name2= 'serBilir', event_time_name = 'years', time_name = 'year', event_name = 'status2', I=26, b=b_list[i], method="Nelder-Mead") cat("i= ", i, " ", res$par, " ", res$value, "count calls to fn = ", res$counts, " converge? ", res$convergence, "\n") res }# Obtain the indexing parameters for the combination of albumin and bilirubin markers # These are the values used in the example of function h_xt_vec # and yielded the values: (par.alb, par.bil) = ( 0.0702, 0.0856 ) that were used there. I = 26 b_list = seq(1.1, 1.2, by=0.1) for(i in 1:length(b_list)) { res<- optim(c(0.5, 0.5), b_selection_index_optim, data=pbc2, marker_name1='albumin', marker_name2= 'serBilir', event_time_name = 'years', time_name = 'year', event_name = 'status2', I=26, b=b_list[i], method="Nelder-Mead") cat("i= ", i, " ", res$par, " ", res$value, "count calls to fn = ", res$counts, " converge? ", res$convergence, "\n") res }
Calculates an intermediate part for the K-fold cross validation.
b_selection_prep_g(h_mat, int_X, size_X_grid, n, Yi)b_selection_prep_g(h_mat, int_X, size_X_grid, n, Yi)
h_mat |
A matrix of the estimator for the future conditional hazard rate for all values |
int_X |
Vector of the position of the observed marker values in the grid for marker values. |
size_X_grid |
Numeric value indicating the number of grid points for marker values. |
n |
Number of individuals. |
Yi |
A matrix made by |
The function b_selection_prep_g calculates a key component for the bandwidth selection
where is estimated without information from all counting processes with and is the exposure.
A matrix with for all individuals i and time grid points t.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) Ni <- make_Ni(breaks_s=br_s, size_s_grid, ss, delta, n) t = 2 h_xt_mat = t(sapply(br_s[1:99], function(si){ h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)})) b_selection_prep_g(h_xt_mat, int_X, size_X_grid, n, Yi)pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) Ni <- make_Ni(breaks_s=br_s, size_s_grid, ss, delta, n) t = 2 h_xt_mat = t(sapply(br_s[1:99], function(si){ h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)})) b_selection_prep_g(h_xt_mat, int_X, size_X_grid, n, Yi)
Compute the bootstrap estimate of the indexed HQM hazard estimate for a single (bootstrap) sample. The function is meant to be used internally for the calculation of confidence intervals
Boot.hqm(in.par, data, data.id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', b, t)Boot.hqm(in.par, data, data.id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', b, t)
in.par |
Numeric vector, the values of the indexing parameters. |
data |
Bootstrap data.frame. |
data.id |
Id-level data.frame (result of to_id). |
ls |
user supplied grid length on the time_name argument. |
X1 |
List of vectors for indexing: each vector corresponds to a biomarker and contains one summary measurement per individual. |
XX1 |
List of vectors for indexing: each vector corresponds to a single biomarker and contains its longitudinal measurements across all individuals/time points. |
event_time_name |
Name of event time column. |
time_name |
Name of observation time column. |
event_name |
Name of event indicator column. |
b |
Bandwidth parameter. |
t |
Conditioning level. |
Numeric vector with estimated hazard on the grid.
#Single instance of the bootstrap version of the bootstrap version #of the indexed HQM estimator b.alb = 0.9 b.bil = 4 t.alb = 1 # refers to zero mean variables - slightly high t.bil = 1.9 # refers to zero mean variable - high par.alb <- 0.0702 #0.149 par.bil <- 0.0856 #0.10 b = 0.42 # The result, on the indexed marker 'indmar' of #\code{b_selection(pbc2, 'indmar', 'years', 'year', 'status2', I=26, seq(0.2,0.4,by=0.01))} t = t.alb * par.alb + t.bil *par.bil marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' ls<-50 data.use<-pbc2 data.use.id<-to_id(data.use) data.use.id<-data.use.id[complete.cases(data.use.id), ] # mean adjust the data: X1t=data.use[,marker_name1] -mean(data.use[, marker_name1]) XX1t=data.use.id[,marker_name1] -mean(data.use.id[, marker_name1]) X2t=data.use[,marker_name2] -mean(data.use[, marker_name2]) XX2t=data.use.id[,marker_name2] -mean(data.use.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) boot.haz<-Boot.hqm (c(par.alb,par.bil), data.use, data.use.id, ls=ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', b, t)#Single instance of the bootstrap version of the bootstrap version #of the indexed HQM estimator b.alb = 0.9 b.bil = 4 t.alb = 1 # refers to zero mean variables - slightly high t.bil = 1.9 # refers to zero mean variable - high par.alb <- 0.0702 #0.149 par.bil <- 0.0856 #0.10 b = 0.42 # The result, on the indexed marker 'indmar' of #\code{b_selection(pbc2, 'indmar', 'years', 'year', 'status2', I=26, seq(0.2,0.4,by=0.01))} t = t.alb * par.alb + t.bil *par.bil marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' ls<-50 data.use<-pbc2 data.use.id<-to_id(data.use) data.use.id<-data.use.id[complete.cases(data.use.id), ] # mean adjust the data: X1t=data.use[,marker_name1] -mean(data.use[, marker_name1]) XX1t=data.use.id[,marker_name1] -mean(data.use.id[, marker_name1]) X2t=data.use[,marker_name2] -mean(data.use[, marker_name2]) XX2t=data.use.id[,marker_name2] -mean(data.use.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) boot.haz<-Boot.hqm (c(par.alb,par.bil), data.use, data.use.id, ls=ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', b, t)
Performs bootstrap estimation of hazard rates and corresponding index parameters
for a given set of bootstrap samples. For each bootstrap replicate, the function
re-estimates the index parameters via optimisation and computes hazard estimates
using Boot.hqm. The output is used as input in functions Quantile.Index.CIs and Pivot.Index.CIs
Boot.hrandindex.param(B, Boot.samples, marker_name1, marker_name2,event_time_name, time_name, event_name, b, t, true.haz, v.param, n.est.points)Boot.hrandindex.param(B, Boot.samples, marker_name1, marker_name2,event_time_name, time_name, event_name, b, t, true.haz, v.param, n.est.points)
B |
Integer. Number of bootstrap iterations. |
Boot.samples |
A list of bootstrap datasets. Each element corresponds to one replicate. |
marker_name1 |
Character string. Name of the first longitudinal marker. |
marker_name2 |
Character string. Name of the second longitudinal marker. |
event_time_name |
Name of the event time variable in the data. |
time_name |
Name of the time variable for the longitudinal marker measurements. |
event_name |
Name of the event indicator variable. |
b |
Numeric. Bandwidth parameter used in hazard estimation. |
t |
Numeric. Evaluation point for the conditional hazard. |
true.haz |
Numeric vector. The true or reference hazard used in the optimisation criterion. |
v.param |
Numeric vector. Starting values of the indexing parameters for the optimisation of the index coefficients. |
n.est.points |
Integer. Number of estimation grid points for the hazard curve. |
For each bootstrap iteration , the function:
Extracts the bootstrap sample data.use.
Computes centred marker values at the subject and observation level.
Estimates index parameters by minimising index_optim using optim.
Computes the bootstrap hazard estimate via Boot.hqm.
The outputs are matrices collecting the hazard estimates and estimated index parameter vectors across bootstrap replicates.
A matrix of dimension n.est.points × B containing the bootstrap hazard estimates.
marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' par.x1 <- 0.0702 par.x2 <- 0.0856 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42 t = par.x1 * t.x1 + par.x2 *t.x2 # first simulate true HR function: xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) xin.id <- to_id(xin) # Create bootstrap samples by group: set.seed(1) B<- 10 # 400 #50 Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] #xin[i.use,] } true.hazard<- Sim.True.Hazard(Boot.samples, id='id', 100, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) res <- Boot.hrandindex.param( B, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name , b = 0.4, t = 1.0, true.haz = true.hazard, v.param = c(0.07, 0.08), n.est.points = 100) #return bootstrap hazard rate estimators in marix format: resmarker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' par.x1 <- 0.0702 par.x2 <- 0.0856 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42 t = par.x1 * t.x1 + par.x2 *t.x2 # first simulate true HR function: xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) xin.id <- to_id(xin) # Create bootstrap samples by group: set.seed(1) B<- 10 # 400 #50 Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] #xin[i.use,] } true.hazard<- Sim.True.Hazard(Boot.samples, id='id', 100, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) res <- Boot.hrandindex.param( B, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name , b = 0.4, t = 1.0, true.haz = true.hazard, v.param = c(0.07, 0.08), n.est.points = 100) #return bootstrap hazard rate estimators in marix format: res
Calculates the Brier score for the HQM estimator.
bs.hqm(xin, est, landm, th, event_time_name, status_name)bs.hqm(xin, est, landm, th, event_time_name, status_name)
xin |
A data frame containing event times and the patient status. |
est |
The HQM estimator values, typically the output of |
landm |
Landmark time. |
th |
Time horizon. |
event_time_name |
The column name of the event times in the data frame |
status_name |
The column name of the status variable in the data frame |
The function bs.hqm implements the Brier score calculation for the HQM estimator estimator.
Scalar: the Brier score of the HQM estimator.
library(pec) library(survival) Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) timesS2 <- seq(Landmark,14,by=0.5) b=0.9 arg1<- get_h_x(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) br_s2 = seq(Landmark, 14, length=ls-1) sfalb2<- make_sf( (br_s2[2]-br_s2[1])/4 , arg1) tHor <- 1.5 bs.use<-bs.hqm(pbcT1, sfalb2, Landmark,tHor, event_time_name = 'years', status_name = 'status2') bs.uselibrary(pec) library(survival) Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) timesS2 <- seq(Landmark,14,by=0.5) b=0.9 arg1<- get_h_x(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) br_s2 = seq(Landmark, 14, length=ls-1) sfalb2<- make_sf( (br_s2[2]-br_s2[1])/4 , arg1) tHor <- 1.5 bs.use<-bs.hqm(pbcT1, sfalb2, Landmark,tHor, event_time_name = 'years', status_name = 'status2') bs.use
Performs bootstrap estimation of hazard rates and their standard deviation at each grid point (double boostrap) together with the corresponding index parameters
for a given set of bootstrap samples. The output of the function is used as input in StudentizedBwB.Index.CIs.
BwB.HRandIndex.param(B, B1, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name, b, t, true.haz, v.param, hqm.est, id, xin)BwB.HRandIndex.param(B, B1, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name, b, t, true.haz, v.param, hqm.est, id, xin)
B |
Integer. Number of bootstrap samples. |
B1 |
Integer. Number of bootstrap re-samples. |
Boot.samples |
A list of bootstrap datasets. Each element corresponds to one replicate. |
marker_name1 |
Character string. Name of the first longitudinal marker. |
marker_name2 |
Character string. Name of the second longitudinal marker. |
event_time_name |
Name of the event time variable in the data. |
time_name |
Name of the time variable for the longitudinal marker measurements. |
event_name |
Name of the event indicator variable. |
b |
Numeric. Bandwidth parameter used in hazard estimation. |
t |
Numeric. Evaluation point for the conditional hazard. |
true.haz |
Numeric vector. The true or reference hazard used in the optimisation criterion. |
v.param |
Numeric vector. Starting values of the indexing parameters for the optimisation of the index coefficients. |
hqm.est |
HQM estimator on the original sample. |
id |
label of id variable of dataset. |
xin |
original sample. |
For each bootstrap iteration , the function:
Extracts the bootstrap sample data.use.
Computes centred marker values at the subject and observation level.
Estimates index parameters by minimising index_optim using optim.
Computes the bootstrap hazard estimate via Boot.hqm.
The outputs are matrices collecting the hazard estimates and estimated index parameter vectors across bootstrap replicates.
A list of matrices of dimension n.est.points × B containing the bootstrap hazard estimates,
the logarithm of the hazard rate estimates and and two vectors of the estimate's standard deviations
at each grid point.
# See the example for function: StudentizedBwB.Index# See the example for function: StudentizedBwB.Index
Internal function implementing the influence function (iid) decomposition of the time-dependent Area Under the Curve (AUC) in the presence of right-censoring, using inverse probability of censoring weighting (IPCW).
This function reproduces the internal behavior of the corresponding routines
from the timeROC package. It supports both standard survival settings
and competing risks settings through automatic dispatch.
compute_iid_decomposition(t, n, cause, F01t, St, weights, T, delta, marker, MatInt0TcidhatMCksurEff)compute_iid_decomposition(t, n, cause, F01t, St, weights, T, delta, marker, MatInt0TcidhatMCksurEff)
t |
Numeric scalar. Time point at which the AUC is evaluated. |
n |
Integer. Sample size. |
cause |
Integer or numeric. Event type of interest. |
F01t |
Numeric. Estimated cumulative incidence function at time |
St |
Numeric. Estimated survival probability at time |
weights |
List. IPCW weights object returned by |
T |
Numeric vector. Observed event or censoring times. |
delta |
Integer or numeric vector. Event indicator:
|
marker |
Numeric vector. Prognostic marker or risk score. |
MatInt0TcidhatMCksurEff |
Numeric matrix. Influence function representation of the censoring
distribution, typically obtained from |
The function computes the iid (influence function) representation of the time-dependent AUC estimator based on IPCW methodology.
Two cases are handled internally:
Survival setting: only one event type is present (no competing risks).
Competing risks setting: multiple event types are present, and the AUC is defined using different control definitions.
The implementation relies on pairwise comparisons between cases and controls, weighted by inverse probability of censoring weights. The iid representation is derived from a U-statistic formulation and incorporates the influence of the censoring distribution via martingale-based corrections.
The computational complexity is quadratic in the sample size.
A list with the following components:
iid_representation_AUC |
Numeric vector of length |
iid_representation_AUCstar |
Numeric vector of length |
seAUC |
Numeric scalar. Estimated standard error of the AUC. |
seAUCstar |
Numeric scalar. Estimated standard error corresponding to |
computation_times |
Placeholder (numeric or |
This is an internal utility function and is not intended for direct use by end users. It is provided for reproducibility and to avoid dependency on archived packages.
Adapted from the timeROC package.
Blanche, P., Dartigues, J. F., and Jacqmin-Gadda, H. (2013). Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in Medicine, 32(30), 5381–5397.
Uno, H., Cai, T., Tian, L., and Wei, L. J. (2007). Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association, 102(478), 527–537.
timeROC (in the original timeROC package),
pec::ipcw,
Compute.iid.KM
Calculates the AUC for the HQM estimator.
Compute.iid.KM(times,status)Compute.iid.KM(times,status)
times |
The vector of (censored) event-times. |
status |
The vector of event indicators at the corresponding value of the vector T. Censored observations must be denoted by the value 0. |
test
A vector of two values: the landmark time of the calculation and the AUC value.
Implements the uniform and pointwise confidence bands for the future conditional hazard rate based on the last observed marker measure.
Conf_bands(data, marker_name, event_time_name = 'years', time_name = 'year', event_name = 'status2', x, b)Conf_bands(data, marker_name, event_time_name = 'years', time_name = 'year', event_name = 'status2', x, b)
data |
A data frame of time dependent data points. Missing values are allowed. |
marker_name |
The column name of the marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
x |
Numeric value of the last observed marker value. |
b |
Bandwidth. |
The function Conf_bands implements the pointwise and uniform confidence bands for the estimator of the future conditional hazard rate . The confidence bands are based on a wild bootstrap approach .
Pointwise:
For a given generate for and order it . Then
is a pointwise confidence band for , where is a bootrap estimate of the variance. For more details on the wild bootstrap approach, please see prep_boot and g_xt.
Uniform:
Generate for for all and define for . Order . Then
is a uniform confidence band for .
A list with pointwise, uniform confidence bands and the estimator for all possible time points .
b = 10 x = 3 size_s_grid<-100 s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) c_bands = Conf_bands(pbc2, 'serBilir', event_time_name = 'years', time_name = 'year', event_name = 'status2', x, b) J = 80 plot(br_s[1:J], c_bands$h_hat[1:J], type = "l", ylim = c(0,1), ylab = 'Hazard', xlab = 'Years') lines(br_s[1:J], c_bands$I_p_up[1:J], col = "red") lines(br_s[1:J], c_bands$I_p_do[1:J], col = "red") lines(br_s[1:J], c_bands$I_nu[1:J], col = "blue") lines(br_s[1:J], c_bands$I_nd[1:J], col = "blue")b = 10 x = 3 size_s_grid<-100 s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) c_bands = Conf_bands(pbc2, 'serBilir', event_time_name = 'years', time_name = 'year', event_name = 'status2', x, b) J = 80 plot(br_s[1:J], c_bands$h_hat[1:J], type = "l", ylim = c(0,1), ylab = 'Hazard', xlab = 'Years') lines(br_s[1:J], c_bands$I_p_up[1:J], col = "red") lines(br_s[1:J], c_bands$I_p_do[1:J], col = "red") lines(br_s[1:J], c_bands$I_nu[1:J], col = "blue") lines(br_s[1:J], c_bands$I_nd[1:J], col = "blue")
Creates multiple splits of a dataset which is then used in the bandwidth selection with K-fold cross validation.
dataset_split(I, data)dataset_split(I, data)
data |
A data frame of time dependent data points. Missing values are allowed. |
I |
The number of individuals that should be left out. Optimally, |
The function dataset_split takes a data frame and transforms it into data frames with individuals missing from each data frame. Let be sets of indices with , and for all . Then data frames with individuals are created.
A list of data frames with I individuals missing in the above way.
splitted_dataset = dataset_split(26, pbc2)splitted_dataset = dataset_split(26, pbc2)
Calculates the entries of the matrix in the definition of the local linear kernel
dij(b,x,y, K)dij(b,x,y, K)
x |
A vector of design points where the kernel will be evaluated. |
y |
A vector of sample data points. |
b |
The bandwidth to use (a scalar). |
K |
The kernel function to use. |
Implements the caclulation of all entries of matrix , which is part of the definition of the local linear kernel. The actual calculation is performed by
scalar value, the result of .
Implements the Epanechnikov kernel function
Epan(x)Epan(x)
x |
A vector of design points where the kernel will be evaluated. |
Implements the Epanechnikov kernel function
Scalar, the value of the Epanechnikov kernel at .
Implements a key part for the wild bootstrap of the hqm estimator.
g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n)g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n)
br_X |
Marker value grid points that will be used in the evaluatiuon. |
br_s |
Time value grid points that will be used in the evaluatiuon. |
size_s_grid |
Size of the time grid. |
int_X |
Position of the linear interpolated marker values on the marker grid. |
x |
Numeric value of the last observed marker value. |
t |
Numeric value of the time the function should be evaluated. |
b |
Bandwidth. |
Yi |
A matrix made by |
Y |
A matrix made by |
n |
Number of individuals. |
The function implements
for every value on the marker grid, where , the exposure and the marker.
A vector of for all values on the marker grid.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) X = pbc2$serBilir s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) Yi<-make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) Y<-make_Y(pbc2, pbc2_id, X_lin, br_X, br_s,size_s_grid,size_X_grid, int_s,int_X, 'years', n) t = 2 x = 2 b = 10 g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n)pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) X = pbc2$serBilir s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) Yi<-make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) Y<-make_Y(pbc2, pbc2_id, X_lin, br_X, br_s,size_s_grid,size_X_grid, int_s,int_X, 'years', n) t = 2 x = 2 b = 10 g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n)
Calculates the marker-only hazard rate for time dependent data.
get_alpha(N, Y, b, br_X, K=Epan )get_alpha(N, Y, b, br_X, K=Epan )
N |
A matrix made by |
Y |
A matrix made by |
b |
Bandwidth. |
br_X |
Vector of grid points for the marker values |
K |
Used kernel function. |
The function get_alpha implements the marker-only hazard estimator
where is the marker and is the exposure. The marker-only hazard is defined as the underlying hazard which is not dependent on time
A vector of marker-only values for br_X.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan )pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan )
Calculates the (indexed) local constant future hazard rate function, conditional on a marker value x, across across a set of time values t.
get_h_x(data, marker_name, br_s, event_time_name, time_name, event_name, x, b)get_h_x(data, marker_name, br_s, event_time_name, time_name, event_name, x, b)
data |
A data frame of time dependent data points. Missing values are allowed. |
marker_name |
The column name of the marker values in the data frame |
br_s |
User defined grid mesh on time_name variable |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
x |
Numeric value of the last observed marker value. |
b |
Bandwidth parameter. |
The function get_h_x implements the indexed local linear future conditional hazard estimator
across a grid of possible time values , where, for a positive integer , is the vector of the estimated indexing parameters,
is a vector of markers for indexing, is the exposure and is the marker-only hazard, see get_alpha for more details
and is an ordinary kernel function, e.g. the Epanechnikov kernel.
For and the above estimator becomes the HQM hazard rate estimator conditional on one covariate,
defined in equation (2) of Bagkavos et al. (2025), doi:10.1093/biomet/asaf008.
A vector of for a grid of possible time values .
Bagkavos, I., Isakson, R., Mammen, E., Nielsen, J., and Proust–Lima, C. (2025). Biometrika, 112(2), asaf008. doi:10.1093/biomet/asaf008
library(survival) b = 10 x = 3 Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) b=0.9 arg1ll<-get_h_xll(pbcT1,'albumin', br_s = s.out.use, event_time_name='years', time_name='year',event_name='status2',2,0.9) arg1lc<-get_h_x(pbcT1,'albumin', br_s = s.out.use, event_time_name='years', time_name='year',event_name='status2',2,0.9) #Caclulate the local contant and local linear survival functions br_s = seq(Landmark, 14, length= ls-1 ) sfalb2ll<- make_sf( (br_s[2]-br_s[1])/4 , arg1ll) sfalb2lc<- make_sf( (br_s[2]-br_s[1])/4 , arg1lc) #For comparison, also calculate the Kaplan-Meier kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s, sfalb2ll, type="l", col=1, lwd=2, ylab="Survival probability", xlab="Marker level") lines(br_s, sfalb2lc, lty=2, lwd=2, col=2) lines(kma2$time, kma2$surv, type="s", lty=2, lwd=2, col=3) legend("topright", c( "Local linear HQM", "Local constant HQM", "Kaplan-Meier"), lty=c(1, 2, 2), col=1:3, lwd=2, cex=1.7) ## Not run: #Example of get_h_x with two indexed covariates: #First, estimate the joint model for Albumin and Bilirubin combined: library(JM) lmeFit <- lme(albumin + serBilir~ year, random = ~ year | id, data = pbc2) coxFit <- coxph(Surv(years, status2) ~ albumin + serBilir, data = pbc2.id, x = TRUE) jointFit <- jointModel(lmeFit, coxFit, timeVar = "year", method = "piecewise-PH-aGH") Landmark <- 1 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) # Index Albumin and Bilirubin: t.alb = 3 # slightly low albumin value t.bil = 1.9 # slightly high bilirubin value par.alb <- 0.0702 par.bil <- 0.0856 X = par.alb * pbcT1$albumin + par.bil *pbcT1$serBilir # X is now the indexed marker t = par.alb * t.alb + par.bil *t.bil #conditioning value pbcT1$drug<- X ## store the combine variable in place of 'drug' column which is redundant ## i.e. 'drug' corresponds to indexed bilirubin and albumin timesS2 <- seq(Landmark,14,by=0.5) predT1 <- survfitJM(jointFit, newdata = pbcT1, survTimes = timesS2) nm<-length(predT1$summaries) mat.out1<-matrix(nrow=length(timesS2), ncol=nm) for(r in 1:nm) { SurvLand <- predT1$summaries[[r]][,"Mean"][1] #obtain mean predictions mat.out1[,r] <- predT1$summaries[[r]][,"Mean"]/SurvLand } JM.surv.est<-rowMeans(mat.out1, na.rm=TRUE) #average the resulting JM estimates # calculate indexed local linear HQM estimator for bilirubin and albumin b.alb = 1.5 b.bil = 4 b.hqm = par.alb * b.alb + par.bil *b.bil # bandwidth for HQM estimator arg1<- get_h_x(pbcT1, 'drug', br_s =s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', t, b.hqm) br_s2 = seq(Landmark, 14, length=ls-1) #grid points for HMQ estimator hqm.surv.est<- make_sf((br_s2[2]-br_s2[1])/5, arg1) # transform HR to Survival func. km.land<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #KM estimate #Plot the survival functions: plot(br_s2, hqm.surv.est, type="l", ylim=c(0,1), xlim=c(Landmark,14), ylab="Survival probability", xlab="years",lwd=2) lines(timesS2, JM.surv.est, col=2, lwd=2, lty=2) lines(km.land$time, km.land$surv, type="s",lty=2, lwd=2, col=3) legend("bottomleft", c("HQM est.", "Joint Model est.", "Kaplan-Meier"), lty=c(1,2,2), col=1:3, lwd=2, cex=1.7) ## End(Not run)library(survival) b = 10 x = 3 Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) b=0.9 arg1ll<-get_h_xll(pbcT1,'albumin', br_s = s.out.use, event_time_name='years', time_name='year',event_name='status2',2,0.9) arg1lc<-get_h_x(pbcT1,'albumin', br_s = s.out.use, event_time_name='years', time_name='year',event_name='status2',2,0.9) #Caclulate the local contant and local linear survival functions br_s = seq(Landmark, 14, length= ls-1 ) sfalb2ll<- make_sf( (br_s[2]-br_s[1])/4 , arg1ll) sfalb2lc<- make_sf( (br_s[2]-br_s[1])/4 , arg1lc) #For comparison, also calculate the Kaplan-Meier kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s, sfalb2ll, type="l", col=1, lwd=2, ylab="Survival probability", xlab="Marker level") lines(br_s, sfalb2lc, lty=2, lwd=2, col=2) lines(kma2$time, kma2$surv, type="s", lty=2, lwd=2, col=3) legend("topright", c( "Local linear HQM", "Local constant HQM", "Kaplan-Meier"), lty=c(1, 2, 2), col=1:3, lwd=2, cex=1.7) ## Not run: #Example of get_h_x with two indexed covariates: #First, estimate the joint model for Albumin and Bilirubin combined: library(JM) lmeFit <- lme(albumin + serBilir~ year, random = ~ year | id, data = pbc2) coxFit <- coxph(Surv(years, status2) ~ albumin + serBilir, data = pbc2.id, x = TRUE) jointFit <- jointModel(lmeFit, coxFit, timeVar = "year", method = "piecewise-PH-aGH") Landmark <- 1 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) # Index Albumin and Bilirubin: t.alb = 3 # slightly low albumin value t.bil = 1.9 # slightly high bilirubin value par.alb <- 0.0702 par.bil <- 0.0856 X = par.alb * pbcT1$albumin + par.bil *pbcT1$serBilir # X is now the indexed marker t = par.alb * t.alb + par.bil *t.bil #conditioning value pbcT1$drug<- X ## store the combine variable in place of 'drug' column which is redundant ## i.e. 'drug' corresponds to indexed bilirubin and albumin timesS2 <- seq(Landmark,14,by=0.5) predT1 <- survfitJM(jointFit, newdata = pbcT1, survTimes = timesS2) nm<-length(predT1$summaries) mat.out1<-matrix(nrow=length(timesS2), ncol=nm) for(r in 1:nm) { SurvLand <- predT1$summaries[[r]][,"Mean"][1] #obtain mean predictions mat.out1[,r] <- predT1$summaries[[r]][,"Mean"]/SurvLand } JM.surv.est<-rowMeans(mat.out1, na.rm=TRUE) #average the resulting JM estimates # calculate indexed local linear HQM estimator for bilirubin and albumin b.alb = 1.5 b.bil = 4 b.hqm = par.alb * b.alb + par.bil *b.bil # bandwidth for HQM estimator arg1<- get_h_x(pbcT1, 'drug', br_s =s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', t, b.hqm) br_s2 = seq(Landmark, 14, length=ls-1) #grid points for HMQ estimator hqm.surv.est<- make_sf((br_s2[2]-br_s2[1])/5, arg1) # transform HR to Survival func. km.land<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #KM estimate #Plot the survival functions: plot(br_s2, hqm.surv.est, type="l", ylim=c(0,1), xlim=c(Landmark,14), ylab="Survival probability", xlab="years",lwd=2) lines(timesS2, JM.surv.est, col=2, lwd=2, lty=2) lines(km.land$time, km.land$surv, type="s",lty=2, lwd=2, col=3) legend("bottomleft", c("HQM est.", "Joint Model est.", "Kaplan-Meier"), lty=c(1,2,2), col=1:3, lwd=2, cex=1.7) ## End(Not run)
Calculates the (indexed) local linear future hazard rate function, conditional on a marker value x, across a set of time values t.
get_h_xll(data, marker_name, br_s, event_time_name, time_name, event_name, x, b)get_h_xll(data, marker_name, br_s, event_time_name, time_name, event_name, x, b)
data |
A data frame of time dependent data points. Missing values are allowed. |
marker_name |
The column name of the marker values in the data frame |
br_s |
User defined grid mesh on time_name variable |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
x |
Numeric value of the last observed marker value. |
b |
Bandwidth parameter. |
The function get_h_xll uses a local linear kernel to implement the indexed local linear future conditional hazard estimator
across a grid of possible time values , where, for a positive integer , is the vector of the estimated indexing parameters,
is a vector of markers for indexing, is the exposure and is the marker-only hazard, see get_alpha for more details.
For and the above estimator becomes the HQM hazard rate estimator conditional on one covariate,
defined in equation (2) of Bagkavos et al. (2025) doi:10.1093/biomet/asaf008. In the place of , get_h_xll uses the kernel
where with being an ordinary kernel, e.g. the Epanechnikov kernel, with
see also Nielsen (1998), doi:10.1080/03461238.1998.10413997.
A vector of for a grid of possible time values .
Bagkavos, I., Isakson, R., Mammen, E., Nielsen, J., and Proust–Lima, C. (2025). Biometrika, 112(2), asaf008. doi:10.1093/biomet/asaf008
Nielsen (1998), Marker dependent kernel hazard estimation from local linear estimation, Scandinavian Actuarial Journal, pp. 113-124. doi:10.1080/03461238.1998.10413997
library(survival) library(JM) # Compare Local constant and local linear estimator for a single covariate, # use KM for reference. # Albumin marker, use landmarking Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] b=0.9 ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) arg1ll<-get_h_xll(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) arg1lc<-get_h_x(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) #Calculate the local contant and local linear survival functions br_s = seq(Landmark, 14, length=ls-1) sfalb2ll<- make_sf((br_s[2]-br_s[1])/4 , arg1ll) sfalb2lc<- make_sf((br_s[2]-br_s[1])/4 , arg1lc) #For comparison, also calculate the Kaplan-Meier kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s, sfalb2ll, type="l", col=1, lwd=2, ylab="Survival probability", xlab="Marker level") lines(br_s, sfalb2lc, lty=2, lwd=2, col=2) lines(kma2$time, kma2$surv, type="s", lty=2, lwd=2, col=3) legend("topright", c( "Local linear HQM", "Local constant HQM", "Kaplan-Meier"), lty=c(1, 2, 2), col=1:3, lwd=2, cex=1.7) ## Not run: #Example of get_h_xll with a single covariate (no indexing): #Compare JM, HQM and KM for Bilirubin b = 10 Landmark <- 1 lmeFit <- lme(serBilir ~ year, random = ~ year | id, data = pbc2) coxFit <- coxph(Surv(years, status2) ~ serBilir, data = pbc2.id, x = TRUE) jointFit0 <- jointModel(lmeFit, coxFit, timeVar = "year", method = "piecewise-PH-aGH") pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) timesS1 <- seq(1,14,by=0.5) predT1 <- survfitJM(jointFit0, newdata = pbcT1,survTimes = timesS1) nm<-length(predT1$summaries) mat.out1<-matrix(nrow=length(timesS1), ncol=nm) for(r in 1:nm) { SurvLand <- predT1$summaries[[r]][,"Mean"][1] mat.out1[,r] <- predT1$summaries[[r]][,"Mean"]/SurvLand } sfit1y<-rowMeans(mat.out1, na.rm=TRUE) arg1<- get_h_xll(pbcT1, 'serBilir', br_s= s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 1, 10) br_s1 = seq(Landmark, 14, length=ls-1) sfbil1<- make_sf((br_s1[2]-br_s1[1])/5.4 , arg1) kma1<- survfit(Surv(years , status2) ~ 1, data = pbcT1) plot(br_s1, sfbil1, type="l", ylim=c(0,1), xlim=c(Landmark,14), ylab="Survival probability", xlab="years",lwd=2) lines(timesS1, sfit1y, col=2, lwd=2, lty=2) lines(kma1$time, kma1$surv, type="s", lty=2, lwd=2, col=3 ) legend("bottomleft", c("HQM est.", "Joint Model est.", "Kaplan-Meier"), lty=c(1,2,2), col=1:3, lwd=2, cex=1.7) #Example of get_h_xll with two indexed covariates: #First, estimate the joint model for Albumin and Bilirubin combined: lmeFit <- lme(albumin + serBilir~ year, random = ~ year | id, data = pbc2) coxFit <- coxph(Surv(years, status2) ~ albumin + serBilir, data = pbc2.id, x = TRUE) jointFit <- jointModel(lmeFit, coxFit, timeVar = "year", method = "piecewise-PH-aGH") Landmark <- 1 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) # Index Albumin and Bilirubin: t.alb = 3 # slightly low albumin value t.bil = 1.9 # slightly high bilirubin value par.alb <- 0.0702 par.bil <- 0.0856 X = par.alb * pbcT1$albumin + par.bil *pbcT1$serBilir # X is now the indexed marker t = par.alb * t.alb + par.bil *t.bil #conditioning value pbcT1$drug<- X ## store X in place of 'drug' column which is redundant here ## i.e. 'drug' corresponds to indexed bilirubin and albumin timesS2 <- seq(Landmark,14,by=0.5) predT1 <- survfitJM(jointFit, newdata = pbcT1,survTimes = timesS2) nm<-length(predT1$summaries) mat.out1<-matrix(nrow=length(timesS2), ncol=nm) for(r in 1:nm) { SurvLand <- predT1$summaries[[r]][,"Mean"][1] #obtain mean predictions mat.out1[,r] <- predT1$summaries[[r]][,"Mean"]/SurvLand } JM.surv.est<-rowMeans(mat.out1, na.rm=TRUE) #average the resulting JM estimates # calculate indexed local linear HQM estimator for bilirubin and albumin b.alb = 1.5 b.bil = 4 b.hqm = par.alb * b.alb + par.bil *b.bil # bandwidth for HQM estimator arg1<- get_h_xll(pbcT1, 'drug', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', t, b.hqm) br_s2 = seq(Landmark, 14, length=ls-1) #grid points for HMQ estimator hqm.surv.est<- make_sf((br_s2[2]-br_s2[1])/5 ,arg1) # transform HR to Survival func. km.land<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #KM estimate #Plot the survival functions: plot(br_s2, hqm.surv.est, type="l", ylim=c(0,1), xlim=c(Landmark,14), ylab="Survival probability", xlab="years",lwd=2) lines(timesS2, JM.surv.est, col=2, lwd=2, lty=2) lines(km.land$time, km.land$surv, type="s",lty=2, lwd=2, col=3) legend("bottomleft", c("HQM est.", "Joint Model est.", "Kaplan-Meier"), lty=c(1,2,2), col=1:3, lwd=2, cex=1.7) ## End(Not run)library(survival) library(JM) # Compare Local constant and local linear estimator for a single covariate, # use KM for reference. # Albumin marker, use landmarking Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] b=0.9 ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) arg1ll<-get_h_xll(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) arg1lc<-get_h_x(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) #Calculate the local contant and local linear survival functions br_s = seq(Landmark, 14, length=ls-1) sfalb2ll<- make_sf((br_s[2]-br_s[1])/4 , arg1ll) sfalb2lc<- make_sf((br_s[2]-br_s[1])/4 , arg1lc) #For comparison, also calculate the Kaplan-Meier kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s, sfalb2ll, type="l", col=1, lwd=2, ylab="Survival probability", xlab="Marker level") lines(br_s, sfalb2lc, lty=2, lwd=2, col=2) lines(kma2$time, kma2$surv, type="s", lty=2, lwd=2, col=3) legend("topright", c( "Local linear HQM", "Local constant HQM", "Kaplan-Meier"), lty=c(1, 2, 2), col=1:3, lwd=2, cex=1.7) ## Not run: #Example of get_h_xll with a single covariate (no indexing): #Compare JM, HQM and KM for Bilirubin b = 10 Landmark <- 1 lmeFit <- lme(serBilir ~ year, random = ~ year | id, data = pbc2) coxFit <- coxph(Surv(years, status2) ~ serBilir, data = pbc2.id, x = TRUE) jointFit0 <- jointModel(lmeFit, coxFit, timeVar = "year", method = "piecewise-PH-aGH") pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) timesS1 <- seq(1,14,by=0.5) predT1 <- survfitJM(jointFit0, newdata = pbcT1,survTimes = timesS1) nm<-length(predT1$summaries) mat.out1<-matrix(nrow=length(timesS1), ncol=nm) for(r in 1:nm) { SurvLand <- predT1$summaries[[r]][,"Mean"][1] mat.out1[,r] <- predT1$summaries[[r]][,"Mean"]/SurvLand } sfit1y<-rowMeans(mat.out1, na.rm=TRUE) arg1<- get_h_xll(pbcT1, 'serBilir', br_s= s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 1, 10) br_s1 = seq(Landmark, 14, length=ls-1) sfbil1<- make_sf((br_s1[2]-br_s1[1])/5.4 , arg1) kma1<- survfit(Surv(years , status2) ~ 1, data = pbcT1) plot(br_s1, sfbil1, type="l", ylim=c(0,1), xlim=c(Landmark,14), ylab="Survival probability", xlab="years",lwd=2) lines(timesS1, sfit1y, col=2, lwd=2, lty=2) lines(kma1$time, kma1$surv, type="s", lty=2, lwd=2, col=3 ) legend("bottomleft", c("HQM est.", "Joint Model est.", "Kaplan-Meier"), lty=c(1,2,2), col=1:3, lwd=2, cex=1.7) #Example of get_h_xll with two indexed covariates: #First, estimate the joint model for Albumin and Bilirubin combined: lmeFit <- lme(albumin + serBilir~ year, random = ~ year | id, data = pbc2) coxFit <- coxph(Surv(years, status2) ~ albumin + serBilir, data = pbc2.id, x = TRUE) jointFit <- jointModel(lmeFit, coxFit, timeVar = "year", method = "piecewise-PH-aGH") Landmark <- 1 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) # Index Albumin and Bilirubin: t.alb = 3 # slightly low albumin value t.bil = 1.9 # slightly high bilirubin value par.alb <- 0.0702 par.bil <- 0.0856 X = par.alb * pbcT1$albumin + par.bil *pbcT1$serBilir # X is now the indexed marker t = par.alb * t.alb + par.bil *t.bil #conditioning value pbcT1$drug<- X ## store X in place of 'drug' column which is redundant here ## i.e. 'drug' corresponds to indexed bilirubin and albumin timesS2 <- seq(Landmark,14,by=0.5) predT1 <- survfitJM(jointFit, newdata = pbcT1,survTimes = timesS2) nm<-length(predT1$summaries) mat.out1<-matrix(nrow=length(timesS2), ncol=nm) for(r in 1:nm) { SurvLand <- predT1$summaries[[r]][,"Mean"][1] #obtain mean predictions mat.out1[,r] <- predT1$summaries[[r]][,"Mean"]/SurvLand } JM.surv.est<-rowMeans(mat.out1, na.rm=TRUE) #average the resulting JM estimates # calculate indexed local linear HQM estimator for bilirubin and albumin b.alb = 1.5 b.bil = 4 b.hqm = par.alb * b.alb + par.bil *b.bil # bandwidth for HQM estimator arg1<- get_h_xll(pbcT1, 'drug', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', t, b.hqm) br_s2 = seq(Landmark, 14, length=ls-1) #grid points for HMQ estimator hqm.surv.est<- make_sf((br_s2[2]-br_s2[1])/5 ,arg1) # transform HR to Survival func. km.land<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #KM estimate #Plot the survival functions: plot(br_s2, hqm.surv.est, type="l", ylim=c(0,1), xlim=c(Landmark,14), ylab="Survival probability", xlab="years",lwd=2) lines(timesS2, JM.surv.est, col=2, lwd=2, lty=2) lines(km.land$time, km.land$surv, type="s",lty=2, lwd=2, col=3) legend("bottomleft", c("HQM est.", "Joint Model est.", "Kaplan-Meier"), lty=c(1,2,2), col=1:3, lwd=2, cex=1.7) ## End(Not run)
Calculates the (indexed) future conditional hazard rate for a marker value x and a time value t.
h_xt(br_X, br_s, int_X, size_s_grid, alpha, x,t, b, Yi,n)h_xt(br_X, br_s, int_X, size_s_grid, alpha, x,t, b, Yi,n)
br_X |
Vector of grid points for the marker values |
br_s |
Vector of grid points for the time values |
int_X |
Position of the linear interpolated marker values on the marker grid. |
size_s_grid |
Size of the time grid. |
alpha |
Marker-hazard obtained from |
x |
Numeric value of the last observed marker value. |
t |
Numeric time value. |
b |
Bandwidth. |
Yi |
A matrix made by |
n |
Number of individuals. |
Function h_xt implements the future conditional hazard estimator
where, for a positive integer , is the vector of the estimated indexing parameters,
is a vector of markers for indexing, is the exposure and is the marker-only hazard, see get_alpha for more details
and is an ordinary kernel function, e.g. the Epanechnikov kernel. For and the above estimator becomes the HQM hazard rate estimator conditional on one covariate,
defined in equation (2) of Bagkavos et al. (2025) doi:10.1093/biomet/asaf008.
The future conditional hazard is defined as
where is the survival time and the marker of individual observed in the time frame .
A single numeric value of .
Bagkavos, I., Isakson, R., Mammen, E., Nielsen, J., and Proust–Lima, C. (2025). Biometrika, 112(2), asaf008. doi:10.1093/biomet/asaf008
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) x = 2 t = 2 h_hat = h_xt(br_X, br_s, int_X, size_s_grid, alpha, x, t, b, Yi, n)pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) x = 2 t = 2 h_hat = h_xt(br_X, br_s, int_X, size_s_grid, alpha, x, t, b, Yi, n)
Computes the (indexed) hqm estimator on the marker grid.
h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)
br_X |
Marker value grid points that will be used in the evaluatiuon. |
br_s |
Time value grid points that will be used in the evaluatiuon. |
size_s_grid |
Size of the time grid. |
alpha |
Marker-hazard obtained from |
t |
Numeric value of the time the function should be evaluated. |
b |
Bandwidth. |
Yi |
A matrix made by |
int_X |
Position of the linear interpolated marker values on the marker grid. |
n |
Number of individuals. |
The function implements the future conditional hazard estimator
for every on the marker grid, where, for a positive integer , is the vector of the estimated indexing parameters,
is a vector of markers for indexing, is the exposure and is the marker-only hazard, see get_alpha for more details
and is an ordinary kernel function, e.g. the Epanechnikov kernel.
For and the above estimator becomes the HQM hazard rate estimator conditional on one covariate,
defined in equation (2) of Bagkavos et al. (2025), doi:10.1093/biomet/asaf008.
A vector of for all values on the marker grid.
Bagkavos, I., Isakson, R., Mammen, E., Nielsen, J., and Proust–Lima, C. (2025). Biometrika, 112(2), asaf008. doi:10.1093/biomet/asaf008
# Longitudinal data example pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) t = 2 h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n) # Time-invariant data example: # pbc2 dataset, single event per individual version: # 312 observations, most recent event per individual. # Use landmarking to produce comparable curve with KM. library(survival) Landmark <- 3 #set the landmark to 3 years pbc2.use<- to_id(pbc2) # keep only the most recent row per patient pbcT1 <- pbc2.use[which(pbc2.use$year< Landmark & pbc2.use$years> Landmark),] timesS2 <- seq(Landmark,14,by=0.5) b=0.9 arg1<- get_h_x(pbcT1, 'albumin', br_s=br_s, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, b) br_s2 = seq(Landmark, 14, length=99) sfalb2<- make_sf( (br_s2[2]-br_s2[1])/1.35 , arg1) kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s2, sfalb2, type="l", ylim=c(0,1), xlim=c(Landmark,14), ylab="Survival probability", xlab="years",lwd=2, main="HQM and KM survival functions, conditional on albumin=2, for the time-invariant pbc dataset") lines(kma2$time, kma2$surv, type="s",lty=2, lwd=2, col=2) legend("bottomleft", c("HQM est.", "Kaplan-Meier"), lty=c(1,2), col=1:2, lwd=2, cex=1.7) ############# Indexed HR estimator example 1: ############## ### The example is based on indexing two covariates: albumin and bilirubin index.use<-84 x.grid<-br_s[1:index.use] #non need to plot the last year b.alb = 1.5 # results from CV rule of Bagkavos et al. (2025), i.e.: # b_list = seq(1.5, 3, 0.05) # b_scores_alb = b_selection(pbc2, 'albumin', 'years', 'year', 'status2', I, b_list) # b.alb = b_scores_alb[[2]][which.min(b_scores_alb[[1]])] b.bil = 4 # results from CV rule of Bagkavos et al. (2025) : # b_list.bil = seq(3, 4, 0.05) # b_scores_bil = b_selection(pbc2, 'serBilir', 'years', 'year', 'status2', I, b_list.bil) # b.bil = b_scores_alb[[2]][which.min(b_scores_alb[[1]])] - result = 4 t.alb = 1 # refers to zero mean variables #corresponds to approximately normal albumin level t.bil = 1.9 # refers to zero mean variable - corresponds to #elevated bilirubin levelm approximately 7mg/dl par.alb <- 0.0702 # results from the indexing process par.bil <- 0.0856 # results from the indexing process # First, mean adjust the albumin covariate Xalb = pbc2$albumin - mean(pbc2$albumin) XXalb = pbc2_id$albumin - mean(pbc2_id$albumin) br_Xalb = seq(min(Xalb), max(Xalb), (max(Xalb)-min(Xalb))/( size_X_grid-1)) X_linalb = lin_interpolate(br_s, pbc2_id$id, pbc2$id, Xalb, s) int_Xalb <- findInterval(X_linalb, br_Xalb) N.alb <- make_N(pbc2, pbc2_id, br_Xalb, br_s, ss, XXalb, delta) Y.alb <- make_Y(pbc2, pbc2_id, X_linalb, br_Xalb, br_s, size_s_grid, size_X_grid, int_s, int_Xalb, event_time = 'years', n) alpha.alb<-get_alpha(N.alb, Y.alb, b.alb, br_Xalb, K=Epan ) Yi.alb <- make_Yi(pbc2, pbc2_id, X_linalb, br_Xalb, br_s, size_s_grid, size_X_grid, int_s, int_Xalb, event_time = 'years', n) #calculate HQM estimator for original albumin covariate: arg.alb<- h_xt_vec(br_Xalb, br_s, size_s_grid, alpha.alb, t.alb, b.alb, Yi.alb, int_Xalb, n) y.grid.alb<-arg.alb[1:index.use] # Now, mean adjust the bilirubon covariate: Xbil = pbc2$serBilir - mean(pbc2$serBilir) XXbil = pbc2_id$serBilir - mean(pbc2_id$serBilir) br_Xbil = seq(min(Xbil), max(Xbil), (max(Xbil)-min(Xbil))/( size_X_grid-1)) X_linbil = lin_interpolate(br_s, pbc2_id$id, pbc2$id, Xbil, s) int_Xbil <- findInterval(X_linbil, br_Xbil) N.bil <- make_N(pbc2, pbc2_id, br_Xbil, br_s, ss, XXbil, delta) Y.bil <- make_Y(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) alpha.bil<-get_alpha(N.bil, Y.bil, b.bil, br_Xbil, K=Epan ) Yibil <- make_Yi(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) #calculate HQM estimator for original bilirubin covariate: arg1.bil<- h_xt_vec(br_Xbil, br_s, size_s_grid, alpha.bil, t.bil, b.bil, Yibil, int_Xbil, n) y.grid.bil<-arg1.bil[1:index.use] # calculate the indexed variables: X = par.alb * Xalb + par.bil *Xbil XX = par.alb * XXalb + par.bil *XXbil b = 0.4 # results from CV rule in Bagkavos et al (2025) t = par.alb * t.alb + par.bil *t.bil br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) # Now calculate indexed HR estimator: arg2<- h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n) y.grid2<-arg2[1:index.use] ############## Plot the results: plot(x.grid, y.grid2, type="l", ylim=c(0,2), xlab="Years", ylab="Hazard rate", lwd=2 ) lines(x.grid, y.grid.bil , lty=2, col=4, lwd=2) lines(x.grid, y.grid.alb, lty=2, col=2, lwd=2) legend("topleft", lty=c(1, 2, 2), lwd=c(2,2,2), c("Indexed (Bilirubin and Albumin) HQM hazard rate est.", "HQM hazard rate est. conditional on Albumin=2mg/dl", "HQM hazard rate est. conditional on Bilirubin=7mg/dl" ), col=c(1,2,4), cex=2, bty="n") ############# Indexed HR estimator example 2: ############## ### The example is based on indexing two covariates: bilirubin and prothrombin time index.use<-84 x.grid<-br_s[1:index.use] b.pro = 3 b.bil = 4 t.pro = 1.5 # refers to zero mean variables - slightly high t.bil = 1.9 # refers to zero mean variable - high par.pro <- 0.349 #0.5 #0.149 par.bil <- 0.0776 #0.10 ############ prothrombin time ############# ind1 <- which( is.na( pbc2$prothrombin ) ) Xpro1 = pbc2$prothrombin Xpro1[ind1]<-mean(pbc2$prothrombin, na.rm=TRUE) Xpro1 <- Xpro1 - mean(Xpro1) ind2 <- which( is.na( pbc2_id$prothrombin ) ) XXpro1 = pbc2_id$prothrombin XXpro1[ind2]<-mean(pbc2_id$prothrombin, na.rm=TRUE) XXpro1 <- XXpro1 - mean(XXpro1) br_Xbil = seq(min(Xpro1), max(Xpro1), (max(Xpro1)-min(Xpro1))/( size_X_grid-1)) X_linbil = lin_interpolate(br_s, pbc2_id$id, pbc2$id, Xpro1, s) int_Xbil <- findInterval(X_linbil, br_Xbil) N.bil <- make_N(pbc2, pbc2_id, br_Xbil, br_s, ss, XXpro1, delta) Y.bil <- make_Y(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) alpha.bil<-get_alpha(N.bil, Y.bil, b.pro, br_Xbil, K=Epan ) Yibil <- make_Yi(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) arg1.pro<- h_xt_vec(br_Xbil, br_s, size_s_grid, alpha.bil, t.pro, b.pro, Yibil, int_Xbil, n) y.grid.pro<-arg1.pro[1:index.use] ############ Bilirubin ############# Xbil = pbc2$serBilir - mean(pbc2$serBilir) XXbil = pbc2_id$serBilir - mean(pbc2_id$serBilir) br_Xbil = seq(min(Xbil), max(Xbil), (max(Xbil)-min(Xbil))/( size_X_grid-1)) X_linbil = lin_interpolate(br_s, pbc2_id$id, pbc2$id, Xbil, s) int_Xbil <- findInterval(X_linbil, br_Xbil) N.bil <- make_N(pbc2, pbc2_id, br_Xbil, br_s, ss, XXbil, delta) Y.bil <- make_Y(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) alpha.bil<-get_alpha(N.bil, Y.bil, b.bil, br_Xbil, K=Epan ) Yibil <- make_Yi(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) arg1.bil<- h_xt_vec(br_Xbil, br_s, size_s_grid, alpha.bil, t.bil, b.bil, Yibil, int_Xbil, n) y.grid.bil<-arg1.bil[1:index.use] ######### Index prothrombin time and bilirubin X = par.pro * Xpro1 + par.bil *Xbil XX = par.pro * XXpro1 + par.bil *XXbil b = 1.5 t = par.pro * t.pro + par.bil *t.bil br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) arg2<- h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n) y.grid2<-arg2[1:index.use] ##############Plot the results plot(x.grid, y.grid2, type="l", ylim=c(0,2), xlab="Years", ylab="Hazard rate", lwd=2 ) lines(x.grid, y.grid.bil , lty=2, col=4, lwd=2) lines(x.grid, y.grid.pro, lty=2, col=2, lwd=2) legend("topleft", lty=c(1, 2, 2), lwd=c(2,2,2), c("Indexed (Bilirubin and PT) HQM hazard rate est.", "HQM hazard rate est. conditional on PT=15.5s", "HQM hazard rate est. conditional on Bilirubin=7mg/dl" ), col=c(1,2,4), cex=2, bty="n")# Longitudinal data example pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) t = 2 h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n) # Time-invariant data example: # pbc2 dataset, single event per individual version: # 312 observations, most recent event per individual. # Use landmarking to produce comparable curve with KM. library(survival) Landmark <- 3 #set the landmark to 3 years pbc2.use<- to_id(pbc2) # keep only the most recent row per patient pbcT1 <- pbc2.use[which(pbc2.use$year< Landmark & pbc2.use$years> Landmark),] timesS2 <- seq(Landmark,14,by=0.5) b=0.9 arg1<- get_h_x(pbcT1, 'albumin', br_s=br_s, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, b) br_s2 = seq(Landmark, 14, length=99) sfalb2<- make_sf( (br_s2[2]-br_s2[1])/1.35 , arg1) kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s2, sfalb2, type="l", ylim=c(0,1), xlim=c(Landmark,14), ylab="Survival probability", xlab="years",lwd=2, main="HQM and KM survival functions, conditional on albumin=2, for the time-invariant pbc dataset") lines(kma2$time, kma2$surv, type="s",lty=2, lwd=2, col=2) legend("bottomleft", c("HQM est.", "Kaplan-Meier"), lty=c(1,2), col=1:2, lwd=2, cex=1.7) ############# Indexed HR estimator example 1: ############## ### The example is based on indexing two covariates: albumin and bilirubin index.use<-84 x.grid<-br_s[1:index.use] #non need to plot the last year b.alb = 1.5 # results from CV rule of Bagkavos et al. (2025), i.e.: # b_list = seq(1.5, 3, 0.05) # b_scores_alb = b_selection(pbc2, 'albumin', 'years', 'year', 'status2', I, b_list) # b.alb = b_scores_alb[[2]][which.min(b_scores_alb[[1]])] b.bil = 4 # results from CV rule of Bagkavos et al. (2025) : # b_list.bil = seq(3, 4, 0.05) # b_scores_bil = b_selection(pbc2, 'serBilir', 'years', 'year', 'status2', I, b_list.bil) # b.bil = b_scores_alb[[2]][which.min(b_scores_alb[[1]])] - result = 4 t.alb = 1 # refers to zero mean variables #corresponds to approximately normal albumin level t.bil = 1.9 # refers to zero mean variable - corresponds to #elevated bilirubin levelm approximately 7mg/dl par.alb <- 0.0702 # results from the indexing process par.bil <- 0.0856 # results from the indexing process # First, mean adjust the albumin covariate Xalb = pbc2$albumin - mean(pbc2$albumin) XXalb = pbc2_id$albumin - mean(pbc2_id$albumin) br_Xalb = seq(min(Xalb), max(Xalb), (max(Xalb)-min(Xalb))/( size_X_grid-1)) X_linalb = lin_interpolate(br_s, pbc2_id$id, pbc2$id, Xalb, s) int_Xalb <- findInterval(X_linalb, br_Xalb) N.alb <- make_N(pbc2, pbc2_id, br_Xalb, br_s, ss, XXalb, delta) Y.alb <- make_Y(pbc2, pbc2_id, X_linalb, br_Xalb, br_s, size_s_grid, size_X_grid, int_s, int_Xalb, event_time = 'years', n) alpha.alb<-get_alpha(N.alb, Y.alb, b.alb, br_Xalb, K=Epan ) Yi.alb <- make_Yi(pbc2, pbc2_id, X_linalb, br_Xalb, br_s, size_s_grid, size_X_grid, int_s, int_Xalb, event_time = 'years', n) #calculate HQM estimator for original albumin covariate: arg.alb<- h_xt_vec(br_Xalb, br_s, size_s_grid, alpha.alb, t.alb, b.alb, Yi.alb, int_Xalb, n) y.grid.alb<-arg.alb[1:index.use] # Now, mean adjust the bilirubon covariate: Xbil = pbc2$serBilir - mean(pbc2$serBilir) XXbil = pbc2_id$serBilir - mean(pbc2_id$serBilir) br_Xbil = seq(min(Xbil), max(Xbil), (max(Xbil)-min(Xbil))/( size_X_grid-1)) X_linbil = lin_interpolate(br_s, pbc2_id$id, pbc2$id, Xbil, s) int_Xbil <- findInterval(X_linbil, br_Xbil) N.bil <- make_N(pbc2, pbc2_id, br_Xbil, br_s, ss, XXbil, delta) Y.bil <- make_Y(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) alpha.bil<-get_alpha(N.bil, Y.bil, b.bil, br_Xbil, K=Epan ) Yibil <- make_Yi(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) #calculate HQM estimator for original bilirubin covariate: arg1.bil<- h_xt_vec(br_Xbil, br_s, size_s_grid, alpha.bil, t.bil, b.bil, Yibil, int_Xbil, n) y.grid.bil<-arg1.bil[1:index.use] # calculate the indexed variables: X = par.alb * Xalb + par.bil *Xbil XX = par.alb * XXalb + par.bil *XXbil b = 0.4 # results from CV rule in Bagkavos et al (2025) t = par.alb * t.alb + par.bil *t.bil br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) # Now calculate indexed HR estimator: arg2<- h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n) y.grid2<-arg2[1:index.use] ############## Plot the results: plot(x.grid, y.grid2, type="l", ylim=c(0,2), xlab="Years", ylab="Hazard rate", lwd=2 ) lines(x.grid, y.grid.bil , lty=2, col=4, lwd=2) lines(x.grid, y.grid.alb, lty=2, col=2, lwd=2) legend("topleft", lty=c(1, 2, 2), lwd=c(2,2,2), c("Indexed (Bilirubin and Albumin) HQM hazard rate est.", "HQM hazard rate est. conditional on Albumin=2mg/dl", "HQM hazard rate est. conditional on Bilirubin=7mg/dl" ), col=c(1,2,4), cex=2, bty="n") ############# Indexed HR estimator example 2: ############## ### The example is based on indexing two covariates: bilirubin and prothrombin time index.use<-84 x.grid<-br_s[1:index.use] b.pro = 3 b.bil = 4 t.pro = 1.5 # refers to zero mean variables - slightly high t.bil = 1.9 # refers to zero mean variable - high par.pro <- 0.349 #0.5 #0.149 par.bil <- 0.0776 #0.10 ############ prothrombin time ############# ind1 <- which( is.na( pbc2$prothrombin ) ) Xpro1 = pbc2$prothrombin Xpro1[ind1]<-mean(pbc2$prothrombin, na.rm=TRUE) Xpro1 <- Xpro1 - mean(Xpro1) ind2 <- which( is.na( pbc2_id$prothrombin ) ) XXpro1 = pbc2_id$prothrombin XXpro1[ind2]<-mean(pbc2_id$prothrombin, na.rm=TRUE) XXpro1 <- XXpro1 - mean(XXpro1) br_Xbil = seq(min(Xpro1), max(Xpro1), (max(Xpro1)-min(Xpro1))/( size_X_grid-1)) X_linbil = lin_interpolate(br_s, pbc2_id$id, pbc2$id, Xpro1, s) int_Xbil <- findInterval(X_linbil, br_Xbil) N.bil <- make_N(pbc2, pbc2_id, br_Xbil, br_s, ss, XXpro1, delta) Y.bil <- make_Y(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) alpha.bil<-get_alpha(N.bil, Y.bil, b.pro, br_Xbil, K=Epan ) Yibil <- make_Yi(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) arg1.pro<- h_xt_vec(br_Xbil, br_s, size_s_grid, alpha.bil, t.pro, b.pro, Yibil, int_Xbil, n) y.grid.pro<-arg1.pro[1:index.use] ############ Bilirubin ############# Xbil = pbc2$serBilir - mean(pbc2$serBilir) XXbil = pbc2_id$serBilir - mean(pbc2_id$serBilir) br_Xbil = seq(min(Xbil), max(Xbil), (max(Xbil)-min(Xbil))/( size_X_grid-1)) X_linbil = lin_interpolate(br_s, pbc2_id$id, pbc2$id, Xbil, s) int_Xbil <- findInterval(X_linbil, br_Xbil) N.bil <- make_N(pbc2, pbc2_id, br_Xbil, br_s, ss, XXbil, delta) Y.bil <- make_Y(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) alpha.bil<-get_alpha(N.bil, Y.bil, b.bil, br_Xbil, K=Epan ) Yibil <- make_Yi(pbc2, pbc2_id, X_linbil, br_Xbil, br_s, size_s_grid, size_X_grid, int_s, int_Xbil, event_time = 'years', n) arg1.bil<- h_xt_vec(br_Xbil, br_s, size_s_grid, alpha.bil, t.bil, b.bil, Yibil, int_Xbil, n) y.grid.bil<-arg1.bil[1:index.use] ######### Index prothrombin time and bilirubin X = par.pro * Xpro1 + par.bil *Xbil XX = par.pro * XXpro1 + par.bil *XXbil b = 1.5 t = par.pro * t.pro + par.bil *t.bil br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) arg2<- h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n) y.grid2<-arg2[1:index.use] ##############Plot the results plot(x.grid, y.grid2, type="l", ylim=c(0,2), xlab="Years", ylab="Hazard rate", lwd=2 ) lines(x.grid, y.grid.bil , lty=2, col=4, lwd=2) lines(x.grid, y.grid.pro, lty=2, col=2, lwd=2) legend("topleft", lty=c(1, 2, 2), lwd=c(2,2,2), c("Indexed (Bilirubin and PT) HQM hazard rate est.", "HQM hazard rate est. conditional on PT=15.5s", "HQM hazard rate est. conditional on Bilirubin=7mg/dl" ), col=c(1,2,4), cex=2, bty="n")
Calculates the (indexed) local linear future conditional hazard rate for a marker value x and a time value t.
h_xtll(br_X, br_s, int_X, size_s_grid, alpha, x,t, b, Yi,n, Y)h_xtll(br_X, br_s, int_X, size_s_grid, alpha, x,t, b, Yi,n, Y)
br_X |
Vector of grid points for the marker values |
br_s |
Vector of grid points for the time values |
int_X |
Position of the linear interpolated marker values on the marker grid. |
size_s_grid |
Size of the time grid. |
alpha |
Marker-hazard obtained from |
x |
Numeric value of the last observed marker value. |
t |
Numeric time value. |
b |
Bandwidth. |
Yi |
A matrix made by |
n |
Number of individuals. |
Y |
A matrix made by |
Function h_xtll implements the future local linear conditional hazard estimator
for every on the marker grid, where, for a positive integer , is the vector of the estimated indexing parameters,
is a vector of markers for indexing, is the exposure and is the marker-only hazard, see get_alpha for more details.
For and the above estimator becomes the HQM hazard rate estimator conditional on a single covariate,
defined in equation (2) of Bagkavos et al. (2025) doi:10.1093/biomet/asaf008. In the place of , h_xtll uses the kernel
where with being an ordinary kernel, e.g. the Epanechnikov kernel, with
see also Nielsen (1998) doi:10.1080/03461238.1998.10413997.
The future conditional hazard is defined as
where is the survival time and the marker of individual observed in the time frame .
A single numeric value of .
Bagkavos, I., Isakson, R., Mammen, E., Nielsen, J., and Proust–Lima, C. (2025). Biometrika, 112(2), asaf008. doi:10.1093/biomet/asaf008
Nielsen (1998), Marker dependent kernel hazard estimation from local linear estimation, Scandinavian Actuarial Journal, pp. 113-124. doi:10.1080/03461238.1998.10413997
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) x = 2 t = 2 h_hat = h_xtll(br_X, br_s, int_X, size_s_grid, alpha, x, t, b, Yi, n, Y)pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) x = 2 t = 2 h_hat = h_xtll(br_X, br_s, int_X, size_s_grid, alpha, x, t, b, Yi, n, Y)
Objective used for optimizing indexing parameters via optim.
index_optim(in.par, data, data.id, br_s, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', b, t, true.haz)index_optim(in.par, data, data.id, br_s, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', b, t, true.haz)
in.par |
Numeric vector length 2. |
data |
Data frame. |
data.id |
Id-level data frame. |
br_s |
user-supplied grid points ranging from the minimum to the maximum of the time_name argument |
X1 |
List of vectors for indexing: each vector corresponds to a biomarker and contains one summary measurement per individual. |
XX1 |
List of vectors for indexing: each vector corresponds to a single biomarker and contains its longitudinal measurements across all individuals/time points. |
event_time_name |
Name of event time column. |
time_name |
Name of observation time column. |
event_name |
Name of event indicator column. |
b |
Bandwidth. |
t |
Evaluation point or grid. |
true.haz |
Numeric vector of true hazard values for CV. |
Scalar cross-validation score.
marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' par.x1 <- 0.0702 #0.149 par.x2 <- 0.0856 #0.10 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42 t = par.x1 * t.x1 + par.x2 *t.x2 # first simulate true HR function: xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] ls<-50 # 50 grid points to evaluate the estimates s.out<- xin[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) ################### Create bootstrap samples by group #################### set.seed(1) B<- 10 # 400 #50 Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] #xin[i.use,] } true.hazard<- Sim.True.Hazard(Boot.samples, id='id', size_s_grid=ls, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) ########################################################################## # Then run the optimization for the indexing parameters: data.use<-Boot.samples[[1]] data.use.id<-to_id(data.use) data.use.id<-data.use.id[complete.cases(data.use.id), ] X1t=data.use[,marker_name1] -mean(data.use[, marker_name1]) XX1t=data.use.id[,marker_name1] -mean(data.use.id[, marker_name1]) X2t=data.use[,marker_name2] -mean(data.use[, marker_name2]) XX2t=data.use.id[,marker_name2] -mean(data.use.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) s.out<- data.use[, time_name] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) res<- optim(par=c(par.x1, par.x2), fn=index_optim, data=data.use, data.id=data.use.id, br_s=s.out.use, X1=X1, XX1=XX1, event_time_name = event_time_name, time_name = time_name, event_name = event_name, b=b, t=t, true.haz=true.hazard, method="Nelder-Mead")marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' par.x1 <- 0.0702 #0.149 par.x2 <- 0.0856 #0.10 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42 t = par.x1 * t.x1 + par.x2 *t.x2 # first simulate true HR function: xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] ls<-50 # 50 grid points to evaluate the estimates s.out<- xin[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) ################### Create bootstrap samples by group #################### set.seed(1) B<- 10 # 400 #50 Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] #xin[i.use,] } true.hazard<- Sim.True.Hazard(Boot.samples, id='id', size_s_grid=ls, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) ########################################################################## # Then run the optimization for the indexing parameters: data.use<-Boot.samples[[1]] data.use.id<-to_id(data.use) data.use.id<-data.use.id[complete.cases(data.use.id), ] X1t=data.use[,marker_name1] -mean(data.use[, marker_name1]) XX1t=data.use.id[,marker_name1] -mean(data.use.id[, marker_name1]) X2t=data.use[,marker_name2] -mean(data.use[, marker_name2]) XX2t=data.use.id[,marker_name2] -mean(data.use.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) s.out<- data.use[, time_name] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) res<- optim(par=c(par.x1, par.x2), fn=index_optim, data=data.use, data.id=data.use.id, br_s=s.out.use, X1=X1, XX1=XX1, event_time_name = event_time_name, time_name = time_name, event_name = event_name, b=b, t=t, true.haz=true.hazard, method="Nelder-Mead")
Implements the classical kernel function and related functionals
K_b(b,x,y, K) xK_b(b,x,y, K) K_b_mat(b,x,y, K)K_b(b,x,y, K) xK_b(b,x,y, K) K_b_mat(b,x,y, K)
x |
A vector of design points where the kernel will be evaluated. |
y |
A vector of sample data points. |
b |
The bandwidth to use (a scalar). |
K |
The kernel function to use. |
The function K_b implements the classical kernel function calculation
for scalars and while xK_b implements the functional
again for for scalars and . The function K_b_mat is the vectorized version of K_b. It uses as inputs the vectors and and returns a matrix with entries
Scalar values for K_b and xK_b and matrix outputs for K_b_mat.
Implements a linear interpolation between observered marker values.
lin_interpolate(t, i, data_id, data_marker, data_time)lin_interpolate(t, i, data_id, data_marker, data_time)
t |
A vector of time values where the function should be evaluated. |
i |
A vector of ids of individuals for whom the marker values should be interpolated. |
data_id |
The vector of ids from a data frame of time dependent variables. |
data_marker |
The vector of marker values from a data frame of time dependent variables. |
data_time |
The vector of time values from a data frame of time dependent variables. |
Given time points and marker values at different time points , the function calculates a linear interpolation with at the time points for all indicated individuals. Returned are then . Note that the first value is always observed at time point and the function is extrapolated constantly after the last observed marker value.
A matrix with columns as described above for every individual in the vector i.
size_s_grid <- 100 X = pbc2$serBilir s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) pbc2_id = to_id(pbc2) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s)size_s_grid <- 100 X = pbc2$serBilir s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) pbc2_id = to_id(pbc2) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s)
Implements the local linear kernel function.
llK_b(b,x,y, K)llK_b(b,x,y, K)
x |
A vector of design points where the kernel will be evaluated. |
y |
A vector of sample data points. |
b |
The bandwidth to use (a scalar). |
K |
The kernel function to use. |
Implements the local linear kernel
where with
see also Nielsen (1998), doi:10.1080/03461238.1998.10413997.
Matrix output with entries the values of the kernel function at each point.
Nielsen (1998), Marker dependent kernel hazard estimation from local linear estimation, Scandinavian Actuarial Journal, pp. 113-124. doi:10.1080/03461238.1998.10413997
Implements the weights to be used in the local linear HQM estimator.
sn.0(xin, xout, h, kfun) sn.1(xin, xout, h, kfun) sn.2(xin, xout, h, kfun)sn.0(xin, xout, h, kfun) sn.1(xin, xout, h, kfun) sn.2(xin, xout, h, kfun)
xin |
Sample values. |
xout |
Grid points where the estimator will be evaluated. |
h |
Bandwidth parameter. |
kfun |
Kernel function. |
The function implements the local linear weights in the definition of the estimator , see also h_xt
A vector of for all values on the marker grid.
Auxiliary functions that help automate the process of calculating integrals with occurances or exposure processes.
make_N(data, data.id, breaks_X, breaks_s, ss, XX, delta) make_Ni(breaks_s, size_s_grid, ss, delta, n) make_Y(data, data.id, X_lin, breaks_X, breaks_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) make_Yi(data, data.id, X_lin, breaks_X, breaks_s, size_s_grid, size_X_grid, int_s,int_X, event_time = 'years', n)make_N(data, data.id, breaks_X, breaks_s, ss, XX, delta) make_Ni(breaks_s, size_s_grid, ss, delta, n) make_Y(data, data.id, X_lin, breaks_X, breaks_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) make_Yi(data, data.id, X_lin, breaks_X, breaks_s, size_s_grid, size_X_grid, int_s,int_X, event_time = 'years', n)
data |
A data frame of time dependent data points. Missing values are allowed. |
data.id |
An id data frame obtained from |
breaks_X |
Marker value grid points where the function will be evaluated. |
breaks_s |
Time value grid points where the function will be evaluated. |
ss |
Vector with event times. |
XX |
Vector of last observed marker values. |
delta |
0-1 vector of whether events happened. |
size_s_grid |
Size of the time grid. |
size_X_grid |
Size of the marker grid. |
n |
Number of individuals. |
X_lin |
Linear interpolation of observed marker values evaluated on the marker grid. |
int_s |
Position of the observed time values on the time grid. |
int_X |
Position of the linear interpolated marker values on the marker grid. |
event_time |
String of the column name with the event times. |
Implements matrices for the computation of integrals with occurences and exposures of the form
where is a 0-1 counting process, the exposure and an arbitrary function.
The functions make_N and make_Y return a matrix on the time grid and marker grid for occurence and exposure, respectively, while make_Ni and make_Yi return a matrix on the time grid for evey individual again for occurence and exposure, respectively.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n)pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n)
Creates a survival function from a hazard rate which was calculated on a grid.
make_sf(step_size_s_grid, haz)make_sf(step_size_s_grid, haz)
step_size_s_grid |
Numeric value indicating the distance between two grid continuous grid points. |
haz |
Vector of hazard values. Hazard rate must have been calculated on a time grid. |
The function make_sf calculates the survival function
where is the hazard rate. Here, a discritisation via an equidistant grid on is used to calculate the integral and it is assumed that has been calculated for exactly these time points .
A vector of values .
make_sf(0.1, rep(0.1,10))make_sf(0.1, rep(0.1,10))
Followup of 312 randomised patients with primary biliary cirrhosis, a rare autoimmune liver disease, at Mayo Clinic.
pbc2pbc2
A data frame with 1945 observations on the following 20 variables.
idpatients identifier; in total there are 312 patients.
yearsnumber of years between registration and the earlier of death, transplantion, or study analysis time.
statusa factor with levels alive, transplanted and dead.
druga factor with levels placebo and D-penicil.
ageat registration in years.
sexa factor with levels male and female.
yearnumber of years between enrollment and this visit date, remaining values on the line of data refer to this visit.
ascitesa factor with levels No and Yes.
hepatomegalya factor with levels No and Yes.
spidersa factor with levels No and Yes.
edemaa factor with levels No edema (i.e., no edema and no diuretic therapy for edema),
edema no diuretics (i.e., edema present without diuretics, or edema resolved by diuretics), and
edema despite diuretics (i.e., edema despite diuretic therapy).
serBilirserum bilirubin in mg/dl.
serCholserum cholesterol in mg/dl.
albuminalbumin in gm/dl.
alkalinealkaline phosphatase in U/liter.
SGOTSGOT in U/ml.
plateletsplatelets per cubic ml / 1000.
prothrombinprothrombin time in seconds.
histologichistologic stage of disease.
status2a numeric vector with the value 1 denoting if the patient was dead, and 0 if the patient was alive or transplanted.
Fleming, T. and Harrington, D. (1991) Counting Processes and Survival Analysis. Wiley, New York.
Therneau, T. and Grambsch, P. (2000) Modeling Survival Data: Extending the Cox Model. Springer-Verlag, New York.
summary(pbc2)summary(pbc2)
Computes bootstrap pivot and symmetric bootstrap-pivot confidence intervals for the hazard, log-hazard, and back-transformed (log-scale) hazard rate functions, using the indexed hazard estimator.
Pivot.Index.CIs(B, n.est.points, Mat.boot.haz.rate, time.grid, hqm.est, a.sig)Pivot.Index.CIs(B, n.est.points, Mat.boot.haz.rate, time.grid, hqm.est, a.sig)
B |
Integer. Number of bootsrap replications |
n.est.points |
Integer. Number of estimation points at which the indexed hazard estimates and confidence intervals are evaluated. |
Mat.boot.haz.rate |
A matrix of bootstrap estimated hazard rates with
dimensions |
time.grid |
Numeric vector of length |
hqm.est |
Indexed hazard estimator, calculated at the grid points |
a.sig |
The significance level (e.g., 0.05) which will be used in computing the confidence intervals. |
This function computes several forms of pivot confidence intervals for indexed hazard rate estimates. Let
be the sample standard deviation of the bootstrap estimators
and let and be the and
quantiles respectively of
Then, the pivot bootstrap confidence interval (CI) for is given by
The symmetric pivot bootstrap CI for is
For the confidence intervals for the logarithm of the hazard rate function, first let
and and be respectively the and quantile of
For the log hazard function a pivotal bootstrap confidence interval is
A symmetric pivotal bootstrap CI for the log hazard is
These last two confidence intervals can be transformed back to yield confidence intervals for the hazard rate function .
These are:
and
respectively.
Note: The bootstrap matrix Mat.boot.haz.rate is assumed to contain estimates produced using the same time grid as time.grid and the same estimator used to generate hqm.est.
A data frame with the following columns:
time |
The evaluation grid points. |
est |
Indexed hazard rate estimator |
downci, upci
|
Lower and upper endpoints of basic pivot CIs. |
docisym, upcisym
|
Lower and upper endpoints of symmetric pivot CIs. |
logdoci, logupci
|
Lower and upper endpoints of pivot CIs on the log-scale. |
logdocisym, logupcisym
|
Symmetric pivot CIs on the log-scale. |
log.est |
The logarithm of the indexed hazard rate estimate, |
tLogDoCI, tLogUpCI
|
Transformed-log CIs based on |
tSymLogDoCI, tSymLogUpCI
|
Symmetric transformed-log CIs. |
Boot.hrandindex.param,
Boot.hqm
marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' par.x1 <- 0.0702 #0.149 par.x2 <- 0.0856 #0.10 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42#par.alb * b.alb + par.bil *b.bil # 7 t = par.x1 * t.x1 + par.x2 *t.x2 ls<-50 #Store original sample values: xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) xin.id <- to_id(xin) X1t=xin[,marker_name1] -mean(xin[, marker_name1]) XX1t=xin.id[,marker_name1] -mean(xin.id[, marker_name1]) X2t=xin[,marker_name2] -mean(xin[, marker_name2]) XX2t=xin.id[,marker_name2] -mean(xin.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) # Calculate the indexed HQM estimator on the original sample: arg2<- SingleIndCondFutHaz(pbc2, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par= c(par.x1, par.x2), b, t) hqm.est<-arg2[,2] # Indexed HQM estimator on original sample time.grid<-arg2[,1] # evaluation grid points n.est.points<- ls # length(hqm.est) # Create bootstrap samples by group set.seed(1) B<- 50 #for display purposes only; for sensible results use B=1000 (slower) Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] } # Simulate true hazard rate function: true.hazard<- Sim.True.Hazard(Boot.samples, id='id', n.est.points, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) # Bootstrap the original indexed HQM estimator: res <- Boot.hrandindex.param(B, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name, b = 0.4, t = t, true.haz = true.hazard, v.param = c(0.07, 0.08), n.est.points) # Construct Ci's: a.sig<-0.05 all.cis.pivot<- Pivot.Index.CIs(B, n.est.points, res, time.grid, hqm.est, a.sig) # extract Plain + symmetric CIs UpCI<-all.cis.pivot[,"UpCI"] DoCI<-all.cis.pivot[,"DoCI"] SymUpCI<-all.cis.pivot[,"SymUpCI"] SymDoCI<-all.cis.pivot[,"SymDoCI"] J <- 80 #select the first 80 grid points (for display purposes only) #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J], type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(UpCI[1:J], rev(DoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymDoCI[1:J], lty=2, lwd=2) # extract transformed from Log HR + symmetric CIs LogUpCI<-all.cis.pivot[,"LogUpCI"] LogDoCI<-all.cis.pivot[,"LogDoCI"] SymLogUpCI<-all.cis.pivot[,"LogSymUpCI"] SymLogDoCI<-all.cis.pivot[,"LogSymDoCI"] #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J] , type="l", ylim=c(0,2), ylab="Log Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(LogUpCI[1:J], rev(LogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymLogDoCI[1:J], lty=2, lwd=2) # extract Log HR + symmetric CIs tLogUpCI<-all.cis.pivot[,"LogtUpCI"] tLogDoCI<-all.cis.pivot[,"LogTDoCI"] tSymLogUpCI<-all.cis.pivot[,"SymLogtUpCI"] tSymLogDoCI<-all.cis.pivot[,"SymLogTDoCI"] #Plot the selected CIs plot(time.grid[1:J], log(hqm.est[1:J] ), type="l", ylim=c(-5,4), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(tLogUpCI[1:J], rev(tLogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], tSymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], tSymLogDoCI[1:J], lty=2, lwd=2)marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' par.x1 <- 0.0702 #0.149 par.x2 <- 0.0856 #0.10 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42#par.alb * b.alb + par.bil *b.bil # 7 t = par.x1 * t.x1 + par.x2 *t.x2 ls<-50 #Store original sample values: xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) xin.id <- to_id(xin) X1t=xin[,marker_name1] -mean(xin[, marker_name1]) XX1t=xin.id[,marker_name1] -mean(xin.id[, marker_name1]) X2t=xin[,marker_name2] -mean(xin[, marker_name2]) XX2t=xin.id[,marker_name2] -mean(xin.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) # Calculate the indexed HQM estimator on the original sample: arg2<- SingleIndCondFutHaz(pbc2, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par= c(par.x1, par.x2), b, t) hqm.est<-arg2[,2] # Indexed HQM estimator on original sample time.grid<-arg2[,1] # evaluation grid points n.est.points<- ls # length(hqm.est) # Create bootstrap samples by group set.seed(1) B<- 50 #for display purposes only; for sensible results use B=1000 (slower) Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] } # Simulate true hazard rate function: true.hazard<- Sim.True.Hazard(Boot.samples, id='id', n.est.points, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) # Bootstrap the original indexed HQM estimator: res <- Boot.hrandindex.param(B, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name, b = 0.4, t = t, true.haz = true.hazard, v.param = c(0.07, 0.08), n.est.points) # Construct Ci's: a.sig<-0.05 all.cis.pivot<- Pivot.Index.CIs(B, n.est.points, res, time.grid, hqm.est, a.sig) # extract Plain + symmetric CIs UpCI<-all.cis.pivot[,"UpCI"] DoCI<-all.cis.pivot[,"DoCI"] SymUpCI<-all.cis.pivot[,"SymUpCI"] SymDoCI<-all.cis.pivot[,"SymDoCI"] J <- 80 #select the first 80 grid points (for display purposes only) #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J], type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(UpCI[1:J], rev(DoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymDoCI[1:J], lty=2, lwd=2) # extract transformed from Log HR + symmetric CIs LogUpCI<-all.cis.pivot[,"LogUpCI"] LogDoCI<-all.cis.pivot[,"LogDoCI"] SymLogUpCI<-all.cis.pivot[,"LogSymUpCI"] SymLogDoCI<-all.cis.pivot[,"LogSymDoCI"] #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J] , type="l", ylim=c(0,2), ylab="Log Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(LogUpCI[1:J], rev(LogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymLogDoCI[1:J], lty=2, lwd=2) # extract Log HR + symmetric CIs tLogUpCI<-all.cis.pivot[,"LogtUpCI"] tLogDoCI<-all.cis.pivot[,"LogTDoCI"] tSymLogUpCI<-all.cis.pivot[,"SymLogtUpCI"] tSymLogDoCI<-all.cis.pivot[,"SymLogTDoCI"] #Plot the selected CIs plot(time.grid[1:J], log(hqm.est[1:J] ), type="l", ylim=c(-5,4), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(tLogUpCI[1:J], rev(tLogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], tSymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], tSymLogDoCI[1:J], lty=2, lwd=2)
Implements key components for the wild bootstrap of the hqm estimator in preparation for obtaining confidence bands.
prep_boot(g_xt, alpha, Ni, Yi, size_s_grid, br_X, br_s, t, b, int_X, x, n)prep_boot(g_xt, alpha, Ni, Yi, size_s_grid, br_X, br_s, t, b, int_X, x, n)
g_xt |
A vector obtained by |
alpha |
A vector of the marker only hazard on the marker grid obtained by |
Ni |
A matrix made by |
Yi |
A matrix made by |
size_s_grid |
Size of the time grid. |
br_X |
Vector of grid points for the marker values. |
br_s |
Time value grid points that will be used in the evaluatiuon. |
t |
Numeric value of the time the function should be evaluated. |
b |
Bandwidth. |
int_X |
Position of the linear interpolated marker values on the marker grid. |
x |
Numeric value of the last observed marker value. |
n |
Number of individuals. |
The function implements
and
where ,
and
with being the exposure and the marker.
A list of 5 items. The first two are vectors for calculating and the third one a vector for . The 4th one is the value of the hqm estimator that can also be obtained by h_xt and the last one is the value of .
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) t = 2 x = 2 g = g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n) Boot_all = prep_boot(g, alpha, Ni, Yi, size_s_grid, br_X, br_s, t, b, int_X, x, n) Boot_allpbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) t = 2 x = 2 g = g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n) Boot_all = prep_boot(g, alpha, Ni, Yi, size_s_grid, br_X, br_s, t, b, int_X, x, n) Boot_all
Implements the calculation of the hqm estimator on cross validation data sets. This is a preparation for the cross validation bandwidth selection technique for future conditional hazard rate estimation based on marker information data.
prep_cv(data, data.id, marker_name, event_time_name = 'years', time_name = 'year',event_name = 'status2', n, I, b)prep_cv(data, data.id, marker_name, event_time_name = 'years', time_name = 'year',event_name = 'status2', n, I, b)
data |
A data frame of time dependent data points. Missing values are allowed. |
data.id |
An id data frame obtained from |
marker_name |
The column name of the marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
n |
Number of individuals. |
I |
Number of observations leave out for a K cross validation. |
b |
Bandwidth. |
The function splits the data set via dataset_split and calculates for every splitted data set the hqm estimator
for all on the marker grid and on the time grid, where is the marker, is the exposure and is the marker-only hazard, see get_alpha for more details.
A list of matrices for every cross validation data set with for all on the marker grid and on the time grid.
pbc2_id = to_id(pbc2) n = max(as.numeric(pbc2$id)) b = 1.5 I = 26 h_xt_mat_list = prep_cv(pbc2, pbc2_id, 'serBilir', 'years', 'year', 'status2', n, I, b)pbc2_id = to_id(pbc2) n = max(as.numeric(pbc2$id)) b = 1.5 I = 26 h_xt_mat_list = prep_cv(pbc2, pbc2_id, 'serBilir', 'years', 'year', 'status2', n, I, b)
Implements the calculation of the hqm estimator on cross validation data sets. This is a preparation for the cross validation index selection technique for future conditional hazard rate estimation based on marker information data.
prep_cv2(in.par, data, data.id, marker_name1, marker_name2, event_time_name = 'years', time_name = 'year',event_name = 'status2', n, I, b)prep_cv2(in.par, data, data.id, marker_name1, marker_name2, event_time_name = 'years', time_name = 'year',event_name = 'status2', n, I, b)
in.par |
Vector of candidate indexing values. |
data |
A data frame of time dependent data points. Missing values are allowed. |
data.id |
An id data frame obtained from |
marker_name1 |
The column name of the marker values in the data frame |
marker_name2 |
The column name of the marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
n |
Number of individuals. |
I |
Number of observations leave out for a K cross validation. |
b |
Bandwidth. |
The function splits the data set via dataset_split and calculates for every splitted data set the hqm estimator
for all on the marker grid and on the time grid, where is the marker, is the exposure and is the marker-only hazard, see get_alpha for more details.
A list of matrices for every cross validation data set with for all on the marker grid and on the time grid.
Calculates a part for the K-fold cross validation score.
Q1(h_xt_mat, int_X, size_X_grid, n, Yi)Q1(h_xt_mat, int_X, size_X_grid, n, Yi)
h_xt_mat |
A matrix of the estimator for the future conditional hazard rate for all values |
int_X |
Vector of the position of the observed marker values in the grid for marker values. |
size_X_grid |
Numeric value indicating the number of grid points for marker values. |
n |
Number of individuals. |
Yi |
A matrix made by |
The function implements
where is the hqm estimator, the exposure and the marker.
A value of the score Q1.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) t = 2 h_xt_mat = t(sapply(br_s[1:99], function(si){h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)})) Q = Q1(h_xt_mat, int_X, size_X_grid, n, Yi)pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) t = 2 h_xt_mat = t(sapply(br_s[1:99], function(si){h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)})) Q = Q1(h_xt_mat, int_X, size_X_grid, n, Yi)
Computes quantile-bootstrap confidence intervals and its symmetric counterparts for the hazard rate, log-hazard rate, and back-transformed (from log scale) hazard rate functions, based on the indexed hazard estimator.
Quantile.Index.CIs(B, n.est.points, Mat.boot.haz.rate, time.grid, hqm.est, a.sig)Quantile.Index.CIs(B, n.est.points, Mat.boot.haz.rate, time.grid, hqm.est, a.sig)
B |
Integer. Number of bootsrap replications. |
n.est.points |
Integer. Number of estimation points at which the indexed hazard estimates and confidence intervals are evaluated. |
Mat.boot.haz.rate |
A matrix of bootstrap estimated hazard rates with dimensions |
time.grid |
Numeric vector of length |
hqm.est |
Indexed hazard estimator, calculated at the grid points |
a.sig |
The significance level (e.g., 0.05) which will be used in computing the confidence intervals. |
This function computes several forms of pivot confidence intervals for indexed hazard rate estimates. Set
and where
is the quantile of , obtained by ordering the bootstrap estimators
in ascending order and selecting the -th ordered value. For example, for bootstrap iterations and ,
will be the 25th smallest out of the 1000 values . Also denote with the quantile of
Then, the quantile bootstrap CI for is given by
The symmetric quantile CI (basic CI) is
The confidence intervals for the logarithm of the hazard rate function are defined as follows. First set
and .
Also denote with the quantile of
For the log hazard function we have the quantile confidence interval is
The corresponding symmetric quantile CI for the log hazard is
These confidence intervals are transformed back to confidence intervals for the hazard rate function :
The corresponding symmetric confidence interval is
Note: The bootstrap matrix Mat.boot.haz.rate is assumed to contain estimates produced using the same time grid as time.grid and the same estimator used to generate hqm.est.
A data frame with the following columns:
time |
The evaluation grid points. |
est |
Indexed hazard rate estimator |
downci, upci
|
Lower and upper endpoints of basic quantile CIs. |
docisym, upcisym
|
Lower and upper endpoints of symmetric quantile CIs. |
logdoci, logupci
|
Lower and upper endpoints of quantile CIs on the log-scale. |
logdocisym, logupcisym
|
Symmetric log-scale CIs. |
log.est |
The logarithm of the indexed hazard rate estimate, |
tLogDoCI, tLogUpCI
|
Transformed-log CIs based on |
tSymLogDoCI, tSymLogUpCI
|
Symmetric transformed-log CIs. |
Boot.hrandindex.param, Boot.hqm
marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' par.x1 <- 0.0702 #0.149 par.x2 <- 0.0856 #0.10 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42# The result, on the indexed marker 'indmar' of #\code{b_selection(pbc2, 'indmar', 'years', 'year', 'status2', I=26, seq(0.2,0.4,by=0.01))} t = par.x1 * t.x1 + par.x2 *t.x2 ls<-50 #Store original sample values: xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) xin.id <- to_id(xin) X1t=xin[,marker_name1] -mean(xin[, marker_name1]) XX1t=xin.id[,marker_name1] -mean(xin.id[, marker_name1]) X2t=xin[,marker_name2] -mean(xin[, marker_name2]) XX2t=xin.id[,marker_name2] -mean(xin.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) # Calculate the indexed HQM estimator on the original sample: arg2<- SingleIndCondFutHaz(pbc2, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par= c(par.x1, par.x2), b, t) hqm.est<-arg2[,2] # Indexed HQM estimator on original sample time.grid<-arg2[,1] # evaluation grid points n.est.points<- ls # length(hqm.est) # Create bootstrap samples by group: set.seed(1) B<- 50 #for display purposes only; for sensible results use B=1000 (slower) Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] } # Simulate true hazard rate function: true.hazard<- Sim.True.Hazard(Boot.samples, id='id', n.est.points, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) # Bootstrap the original indexed HQM estimator: res <- Boot.hrandindex.param(B, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name, b = 0.4, t = t, true.haz = true.hazard, v.param = c(0.07, 0.08), n.est.points ) J <- 80 a.sig<-0.05 # Construct Ci's: all.cis.quant<- Quantile.Index.CIs(B, n.est.points, res, time.grid, hqm.est, a.sig) # extract Plain + symmetric CIs and plot them: UpCI<-all.cis.quant[,"upci"] DoCI<-all.cis.quant[,"downci"] SymUpCI<-all.cis.quant[,"upcisym"] SymDoCI<-all.cis.quant[,"docisym"] #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J], type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(UpCI[1:J], rev(DoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymDoCI[1:J], lty=2, lwd=2) # extract transformed from Log HR + symmetric CIs LogUpCI<-all.cis.quant[,"logupci"] LogDoCI<-all.cis.quant[,"logdoci"] SymLogUpCI<-all.cis.quant[,"logupcisym"] SymLogDoCI<-all.cis.quant[,"logdocisym"] #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J], type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(LogUpCI[1:J], rev(LogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymLogUpCI[1:J], lty=2, lwd=2 )#, lwd=3 lines(time.grid[1:J], SymLogDoCI[1:J], lty=2, lwd=2) # extract Log HR + symmetric CIs tLogUpCI<-all.cis.quant[,"tLogUpCI"] tLogDoCI<-all.cis.quant[,"tLogDoCI"] tSymLogUpCI<-all.cis.quant[,"tSymLogUpCI"] tSymLogDoCI<-all.cis.quant[,"tSymLogDoCI"] #Plot the selected CIs plot(time.grid[1:J], log(hqm.est[1:J]), type="l", ylim=c(-5,4), ylab="Log Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(tLogUpCI[1:J], rev(tLogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], tSymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], tSymLogDoCI[1:J], lty=2, lwd=2)marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' par.x1 <- 0.0702 #0.149 par.x2 <- 0.0856 #0.10 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42# The result, on the indexed marker 'indmar' of #\code{b_selection(pbc2, 'indmar', 'years', 'year', 'status2', I=26, seq(0.2,0.4,by=0.01))} t = par.x1 * t.x1 + par.x2 *t.x2 ls<-50 #Store original sample values: xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) xin.id <- to_id(xin) X1t=xin[,marker_name1] -mean(xin[, marker_name1]) XX1t=xin.id[,marker_name1] -mean(xin.id[, marker_name1]) X2t=xin[,marker_name2] -mean(xin[, marker_name2]) XX2t=xin.id[,marker_name2] -mean(xin.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) # Calculate the indexed HQM estimator on the original sample: arg2<- SingleIndCondFutHaz(pbc2, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par= c(par.x1, par.x2), b, t) hqm.est<-arg2[,2] # Indexed HQM estimator on original sample time.grid<-arg2[,1] # evaluation grid points n.est.points<- ls # length(hqm.est) # Create bootstrap samples by group: set.seed(1) B<- 50 #for display purposes only; for sensible results use B=1000 (slower) Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] } # Simulate true hazard rate function: true.hazard<- Sim.True.Hazard(Boot.samples, id='id', n.est.points, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) # Bootstrap the original indexed HQM estimator: res <- Boot.hrandindex.param(B, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name, b = 0.4, t = t, true.haz = true.hazard, v.param = c(0.07, 0.08), n.est.points ) J <- 80 a.sig<-0.05 # Construct Ci's: all.cis.quant<- Quantile.Index.CIs(B, n.est.points, res, time.grid, hqm.est, a.sig) # extract Plain + symmetric CIs and plot them: UpCI<-all.cis.quant[,"upci"] DoCI<-all.cis.quant[,"downci"] SymUpCI<-all.cis.quant[,"upcisym"] SymDoCI<-all.cis.quant[,"docisym"] #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J], type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(UpCI[1:J], rev(DoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymDoCI[1:J], lty=2, lwd=2) # extract transformed from Log HR + symmetric CIs LogUpCI<-all.cis.quant[,"logupci"] LogDoCI<-all.cis.quant[,"logdoci"] SymLogUpCI<-all.cis.quant[,"logupcisym"] SymLogDoCI<-all.cis.quant[,"logdocisym"] #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J], type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(LogUpCI[1:J], rev(LogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymLogUpCI[1:J], lty=2, lwd=2 )#, lwd=3 lines(time.grid[1:J], SymLogDoCI[1:J], lty=2, lwd=2) # extract Log HR + symmetric CIs tLogUpCI<-all.cis.quant[,"tLogUpCI"] tLogDoCI<-all.cis.quant[,"tLogDoCI"] tSymLogUpCI<-all.cis.quant[,"tSymLogUpCI"] tSymLogDoCI<-all.cis.quant[,"tSymLogDoCI"] #Plot the selected CIs plot(time.grid[1:J], log(hqm.est[1:J]), type="l", ylim=c(-5,4), ylab="Log Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(tLogUpCI[1:J], rev(tLogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], tSymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], tSymLogDoCI[1:J], lty=2, lwd=2)
Calculates a part for the K-fold cross validation score.
R_K(h_xt_mat_list, int_X, size_X_grid, Yi, Ni, n)R_K(h_xt_mat_list, int_X, size_X_grid, Yi, Ni, n)
h_xt_mat_list |
A list of matrices for all cross validation data sets. Each matrix contains the estimator with the future conditional hazard rate for all values |
int_X |
Vector of the position of the observed marker values in the grid for marker values. |
size_X_grid |
Numeric value indicating the number of grid points for marker values. |
Yi |
A matrix made by |
Ni |
A matrix made by |
n |
Number of individuals. |
The function implements the estimator
where and is estimated without information from all counting processes with .
This function estimates
where is the hqm estimator, the exposure and the marker.
A matrix with for all individuals i and time grid points t.
pbc2_id = to_id(pbc2) n = max(as.numeric(pbc2$id)) b = 1.5 I = 104 h_xt_mat_list = prep_cv(pbc2, pbc2_id, 'serBilir', 'years', 'year', 'status2', n, I, b) size_s_grid <- size_X_grid <- 100 s = pbc2$year X = pbc2$serBilir br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) ss <- pbc2_id$years delta <- pbc2_id$status2 X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) R = R_K(h_xt_mat_list, int_X, size_X_grid, Yi, Ni, n) Rpbc2_id = to_id(pbc2) n = max(as.numeric(pbc2$id)) b = 1.5 I = 104 h_xt_mat_list = prep_cv(pbc2, pbc2_id, 'serBilir', 'years', 'year', 'status2', n, I, b) size_s_grid <- size_X_grid <- 100 s = pbc2$year X = pbc2$serBilir br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) ss <- pbc2_id$years delta <- pbc2_id$status2 X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) R = R_K(h_xt_mat_list, int_X, size_X_grid, Yi, Ni, n) R
Compute the Monte Carlo / bootstrap averaged alpha (true hazard) across a list of datasets.
Sim.True.Hazard(data.use, id, size_s_grid, marker_name1 = marker_name1, marker_name2 = marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par, b)Sim.True.Hazard(data.use, id, size_s_grid, marker_name1 = marker_name1, marker_name2 = marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par, b)
data.use |
List of data.frames (bootstrap samples). |
id |
Id column name. |
size_s_grid |
user supplied grid length on the time_name argument |
marker_name1 |
First marker name. |
marker_name2 |
Second marker name. |
event_time_name |
Name of event time column. |
time_name |
Name of observation time column. |
event_name |
Name of event indicator column. |
in.par |
Numeric indexing parameters. |
b |
Bandwidth. |
Numeric vector (row means of alpha estimates).
marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' ls<-50 #user supplied grid length on the time_name argument par.x1 <- 0.0702 #indexing parameter for marker 1 par.x2 <- 0.0856 #indexing parameter for marker 2 t.x1 = 0 # conditioning level for marker 1, assumes zero mean variable t.x2 = 1.9 # conditioning level for marker 2, assumes zero mean variable b = 0.42 # The result, on the indexed marker 'indmar' of #\code{b_selection(pbc2,'indmar','years','year','status2',I=26,seq(0.2,0.4,by=0.01))} t = par.x1 * t.x1 + par.x2 *t.x2 xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) # Create bootstrap samples by group set.seed(1) B<-100 #set bootstrap iterations here Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] } #compute the bootstrap simulated true HR function: true.hazard<- Sim.True.Hazard(Boot.samples, id='id', size_s_grid=ls, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b)marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' ls<-50 #user supplied grid length on the time_name argument par.x1 <- 0.0702 #indexing parameter for marker 1 par.x2 <- 0.0856 #indexing parameter for marker 2 t.x1 = 0 # conditioning level for marker 1, assumes zero mean variable t.x2 = 1.9 # conditioning level for marker 2, assumes zero mean variable b = 0.42 # The result, on the indexed marker 'indmar' of #\code{b_selection(pbc2,'indmar','years','year','status2',I=26,seq(0.2,0.4,by=0.01))} t = par.x1 * t.x1 + par.x2 *t.x2 xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) # Create bootstrap samples by group set.seed(1) B<-100 #set bootstrap iterations here Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] } #compute the bootstrap simulated true HR function: true.hazard<- Sim.True.Hazard(Boot.samples, id='id', size_s_grid=ls, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b)
Compute the indexed local linear future conditional hazard rate estimator on a grid.
SingleIndCondFutHaz(datain, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par, b, t)SingleIndCondFutHaz(datain, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par, b, t)
datain |
Data frame with longitudinal observations. |
id |
Name of id column (string). |
ls |
user supplied grid length on the time_name argument. |
X1 |
List of vectors for indexing: each vector corresponds to a biomarker and contains one summary measurement per individual. |
XX1 |
List of vectors for indexing: each vector corresponds to a single biomarker and contains its longitudinal measurements across all individuals/time points. |
event_time_name |
Name of event time column. |
time_name |
Name of observation time column. |
event_name |
Name of event indicator column. |
in.par |
Numeric vector length 2 with indexing parameters. |
b |
Bandwidth parameter. |
t |
conditioning parameter. |
A data frame with columns time and est.
b.alb = 0.9 b.bil = 4 t.alb = 1 # refers to zero mean variables - slightly high t.bil = 1.9 # refers to zero mean variable - high par.alb <- 0.0702 #0.149 par.bil <- 0.0856 #0.10 b = 0.42 t = t.alb * par.alb + t.bil *par.bil marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' ls<-50 data.use<-pbc2 data.use.id<-to_id(data.use) #data.use.id<-data.use.id[complete.cases(data.use.id), ] # mean adjust the data: X1t=data.use[,marker_name1] -mean(data.use[, marker_name1]) XX1t=data.use.id[,marker_name1] -mean(data.use.id[, marker_name1]) X2t=data.use[,marker_name2] -mean(data.use[, marker_name2]) XX2t=data.use.id[,marker_name2] -mean(data.use.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) arg2<-SingleIndCondFutHaz(pbc2, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par= c(par.alb, par.bil), b, t) hqm.est<-arg2[,2] time.grid<-arg2[,1] n.est.points<-length(hqm.est)b.alb = 0.9 b.bil = 4 t.alb = 1 # refers to zero mean variables - slightly high t.bil = 1.9 # refers to zero mean variable - high par.alb <- 0.0702 #0.149 par.bil <- 0.0856 #0.10 b = 0.42 t = t.alb * par.alb + t.bil *par.bil marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' ls<-50 data.use<-pbc2 data.use.id<-to_id(data.use) #data.use.id<-data.use.id[complete.cases(data.use.id), ] # mean adjust the data: X1t=data.use[,marker_name1] -mean(data.use[, marker_name1]) XX1t=data.use.id[,marker_name1] -mean(data.use.id[, marker_name1]) X2t=data.use[,marker_name2] -mean(data.use[, marker_name2]) XX2t=data.use.id[,marker_name2] -mean(data.use.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) arg2<-SingleIndCondFutHaz(pbc2, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par= c(par.alb, par.bil), b, t) hqm.est<-arg2[,2] time.grid<-arg2[,1] n.est.points<-length(hqm.est)
Computes bootstrap confidence intervals—studentized (double bootstrap) and its symmetric versions for the hazard rate, log-hazard rate, and back-transformed (from the log scale) hazard rate functions, based on the indexed hazard estimator.
StudentizedBwB.Index.CIs(n.est.points, all.mat, time.grid, hqm.est, a.sig)StudentizedBwB.Index.CIs(n.est.points, all.mat, time.grid, hqm.est, a.sig)
n.est.points |
Integer. Number of estimation points at which the indexed hazard estimates and confidence intervals are evaluated. |
all.mat |
A list of matrices of bootstrap estimated hazard and log hazard rates with dimensions |
time.grid |
Numeric vector of length |
hqm.est |
Indexed hazard estimator, calculated at the grid points |
a.sig |
The significance level (e.g., 0.05) which will be used in computing the confidence intervals. |
This function computes several forms of studentized confidence intervals for the indexed hazard rate function. First, for each bootstrap iteration
we construct estimators of the standard deviation, denoted by as follows: each bootstrap sample is itself bootstrapped, say times,
we estimate the corresponding index parameters , and use them to
calculate the corresponding hazard estimators . Finally we calculate as the sample variance of
. Let and be respectively the
and quantiles of
Given the bootstrap estimators , the estimator of the original data set and the quantiles
and , the studentized (double bootstrap) confidence interval (CI) for is given by
The symmetric studentized confidence interval (CI) for defined by
For the confidence intervals for the logarithm of the hazard rate function first set and
be the and quantile of .
The studentized (double bootstrap) CI confidence interval for the logarithm of the hazard rate function is
Its symmetric studentized (double bootstrap) CI for the log hazard is
These confidence intervals are transformed back to yield the following confidence intervals for the hazard rate function :
and the symmetric version
Note: The bootstrap matrix Mat.boot.haz.rate is assumed to contain estimates produced using the same time grid as time.grid and the same estimator used to generate hqm.est.
A data frame with the following columns:
time |
The evaluation grid points. |
est |
Indexed hazard rate estimator |
downci, upci
|
Lower and upper endpoints of basic studentized CIs. |
docisym, upcisym
|
Lower and upper endpoints of symmetric CIs. |
logdoci, logupci
|
Lower and upper endpoints of studentized CIs on the log-scale. |
logdocisym, logupcisym
|
Symmetric log-scale CIs. |
log.est |
The logarithm of the indexed hazard rate estimate, |
tLogDoCI, tLogUpCI
|
Transformed-log CIs based on |
tSymLogDoCI, tSymLogUpCI
|
Symmetric transformed-log CIs. |
Boot.hrandindex.param,
Boot.hqm
marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) xin.id <- to_id(xin) par.x1 <- 0.0702 #0.149 par.x2 <- 0.0856 #0.10 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42#par.alb * b.alb + par.bil *b.bil # 7 t = par.x1 * t.x1 + par.x2 *t.x2 ls<-50 X1t=xin[,marker_name1] -mean(xin[, marker_name1]) XX1t=xin.id[,marker_name1] -mean(xin.id[, marker_name1]) X2t=xin[,marker_name2] -mean(xin[, marker_name2]) XX2t=xin.id[,marker_name2] -mean(xin.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) # Calculate the indexed HQM estimator on the original sample: arg2<- SingleIndCondFutHaz(pbc2, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par= c(par.x1, par.x2), b, t) hqm.est<-arg2[,2] # Indexed HQM estimator on original sample time.grid<-arg2[,1] # evaluation grid points n.est.points<- ls # length(hqm.est) # Create bootstrap samples by group set.seed(1) B<-10 # 20 # 5 for display purposes only; for sensible results use B=200 (slower) B1<- 10 # 20 # 5 for display purposes only; use B1=20 (slower) Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] } # Simulate true hazard rate function: true.hazard<- Sim.True.Hazard(Boot.samples, id='id', n.est.points, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) # Bootstrap the original indexed HQM estimator: all.mat.use<-BwB.HRandIndex.param(B, B1, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name, b, t, true.haz=true.hazard, v.param=c(par.x1, par.x2), hqm.est=hqm.est, id= 'id', xin=xin) # Construct Ci's: a.sig<-0.05 st.ci.data<-StudentizedBwB.Index.CIs(n.est.points, all.mat.use, time.grid, hqm.est, a.sig) # extract Plain + symmetric CIs UpCI<-st.ci.data[,"UpCI"] DoCI<-st.ci.data[,"DoCI"] SymUpCI<-st.ci.data[,"SymUpCI"] SymDoCI<-st.ci.data[,"SymDoCI"] #Plot the selected CIs J<-80 #select the first 80 grid points (for display purposes only) plot(time.grid[1:J], hqm.est[1:J], type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(UpCI[1:J], rev(DoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymDoCI[1:J], lty=2, lwd=2) # extract transformed from Log HR + symmetric CIs LogUpCI<-st.ci.data[,"LogUpCI"] LogDoCI<-st.ci.data[,"LogDoCI"] SymLogUpCI<-st.ci.data[,"LogSymUpCI"] SymLogDoCI<-st.ci.data[,"LogSymDoCI"] #Plot the selected CI's plot(time.grid[1:J], log(hqm.est[1:J]), type="l", ylim=c(-5,4), ylab="Log Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(LogUpCI[1:J], rev(LogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymLogDoCI[1:J], lty=2, lwd=2) # extract Log HR + symmetric CIs tLogUpCI<-st.ci.data[,"LogtUpCI"] tLogDoCI<-st.ci.data[,"LogTDoCI"] tSymLogUpCI<-st.ci.data[,"SymLogtUpCI"] tSymLogDoCI<-st.ci.data[,"SymLogTDoCI"] #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J] , type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(tLogUpCI[1:J], rev(tLogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], tSymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], tSymLogDoCI[1:J], lty=2, lwd=2)marker_name1 <- 'albumin' marker_name2 <- 'serBilir' event_time_name <- 'years' time_name <- 'year' event_name <- 'status2' id<-'id' xin <- pbc2[,c(id, marker_name1, marker_name2, event_time_name, time_name, event_name)] n <- length(xin$id) nn<-max( as.double(xin[,'id']) ) xin.id <- to_id(xin) par.x1 <- 0.0702 #0.149 par.x2 <- 0.0856 #0.10 t.x1 = 0 # refers to zero mean variables - slightly high t.x2 = 1.9 # refers to zero mean variable - high b = 0.42#par.alb * b.alb + par.bil *b.bil # 7 t = par.x1 * t.x1 + par.x2 *t.x2 ls<-50 X1t=xin[,marker_name1] -mean(xin[, marker_name1]) XX1t=xin.id[,marker_name1] -mean(xin.id[, marker_name1]) X2t=xin[,marker_name2] -mean(xin[, marker_name2]) XX2t=xin.id[,marker_name2] -mean(xin.id[, marker_name2]) X1=list(X1t, X2t) XX1=list(XX1t, XX2t) # Calculate the indexed HQM estimator on the original sample: arg2<- SingleIndCondFutHaz(pbc2, id, ls, X1, XX1, event_time_name = 'years', time_name = 'year', event_name = 'status2', in.par= c(par.x1, par.x2), b, t) hqm.est<-arg2[,2] # Indexed HQM estimator on original sample time.grid<-arg2[,1] # evaluation grid points n.est.points<- ls # length(hqm.est) # Create bootstrap samples by group set.seed(1) B<-10 # 20 # 5 for display purposes only; for sensible results use B=200 (slower) B1<- 10 # 20 # 5 for display purposes only; use B1=20 (slower) Boot.samples<-list() for(j in 1:B) { i.use<-c() id.use<-c() index.nn <- sample (nn, replace = TRUE) for(l in 1:nn) { i.use2<-which(xin[,id]==index.nn[l]) i.use<-c(i.use, i.use2) id.use2<-rep(index.nn[l], times=length(i.use2)) id.use<-c(id.use, id.use2) } xin.i<-xin[i.use,] xin.i<-xin[i.use,] Boot.samples[[j]]<- xin.i[order(xin.i$id),] } # Simulate true hazard rate function: true.hazard<- Sim.True.Hazard(Boot.samples, id='id', n.est.points, marker_name1=marker_name1, marker_name2= marker_name2, event_time_name = event_time_name, time_name = time_name, event_name = event_name, in.par = c(par.x1, par.x2), b) # Bootstrap the original indexed HQM estimator: all.mat.use<-BwB.HRandIndex.param(B, B1, Boot.samples, marker_name1, marker_name2, event_time_name, time_name, event_name, b, t, true.haz=true.hazard, v.param=c(par.x1, par.x2), hqm.est=hqm.est, id= 'id', xin=xin) # Construct Ci's: a.sig<-0.05 st.ci.data<-StudentizedBwB.Index.CIs(n.est.points, all.mat.use, time.grid, hqm.est, a.sig) # extract Plain + symmetric CIs UpCI<-st.ci.data[,"UpCI"] DoCI<-st.ci.data[,"DoCI"] SymUpCI<-st.ci.data[,"SymUpCI"] SymDoCI<-st.ci.data[,"SymDoCI"] #Plot the selected CIs J<-80 #select the first 80 grid points (for display purposes only) plot(time.grid[1:J], hqm.est[1:J], type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(UpCI[1:J], rev(DoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymDoCI[1:J], lty=2, lwd=2) # extract transformed from Log HR + symmetric CIs LogUpCI<-st.ci.data[,"LogUpCI"] LogDoCI<-st.ci.data[,"LogDoCI"] SymLogUpCI<-st.ci.data[,"LogSymUpCI"] SymLogDoCI<-st.ci.data[,"LogSymDoCI"] #Plot the selected CI's plot(time.grid[1:J], log(hqm.est[1:J]), type="l", ylim=c(-5,4), ylab="Log Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(LogUpCI[1:J], rev(LogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], SymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], SymLogDoCI[1:J], lty=2, lwd=2) # extract Log HR + symmetric CIs tLogUpCI<-st.ci.data[,"LogtUpCI"] tLogDoCI<-st.ci.data[,"LogTDoCI"] tSymLogUpCI<-st.ci.data[,"SymLogtUpCI"] tSymLogDoCI<-st.ci.data[,"SymLogTDoCI"] #Plot the selected CIs plot(time.grid[1:J], hqm.est[1:J] , type="l", ylim=c(0,2), ylab="Hazard rate", xlab="time", lwd=2) polygon(x = c(time.grid[1:J], rev(time.grid[1:J])), y = c(tLogUpCI[1:J], rev(tLogDoCI[1:J])), col = adjustcolor("red", alpha.f = 0.50), border = NA ) lines(time.grid[1:J], tSymLogUpCI[1:J], lty=2, lwd=2 ) lines(time.grid[1:J], tSymLogDoCI[1:J], lty=2, lwd=2)
Inverse Probability of Censoring Weighting (IPCW) estimation of Cumulative/Dynamic time-dependent ROC curve. The function works in the usual survival setting as well as in the competing risks setting. Computation of the iid-representation of areas under time-dependent ROC curves is implemented. This enables computation of inference procedures: Confidence intervals and tests for comparing two AUCs of two different markers measured on the same subjects.
timeROC(T, delta, marker, other_markers = NULL, cause, weighting = "marginal", times, ROC = TRUE, iid = FALSE)timeROC(T, delta, marker, other_markers = NULL, cause, weighting = "marginal", times, ROC = TRUE, iid = FALSE)
T |
The vector of (censored) event-times. |
delta |
The vector of event indicators at the corresponding value of the vector |
marker |
The vector of the marker values for which we want to compute the time-dependent ROC curves. Without loss of generality, the function assumes that larger values of the marker are associated with higher risks of events. If lower values of the marker are associated with higher risks of events, then reverse the association adding a minus to the marker values. |
other_markers |
A matrix that contains values of other markers that we want to take into account for computing the inverse probability of censoring weights. The different columns represent the different markers. This argument is optional, and ignored if |
cause |
The value of the event indicator that represents the event of interest for which we aim to compute the time-dependent ROC curve. Without competing risks, it must be the value that indicates a non-censored obsevation (usually |
weighting |
The method used to compute the weights. |
times |
The vector of times points "t" at which we want to compute the time-dependent ROC curve. If vector |
ROC |
A logical value that indicates if we want to save the estimates of
sensitivities and specificties. Default value is |
iid |
A logical value that indicates if we want to compute the iid-representation of the area under time-dependent ROC curve estimator. |
This function computes Inverse Probability of Censoring Weighting (IPCW) estimates of Cumulative/Dynamic time-dependent ROC curve.
By definition, time-dependent ROC curve intrinsically depends on the definitions of time-dependent cases and controls.
Let denote the event time of the subject .
Without competing risks : A case is defined as a subject with . A control is defined as a subject with .
With competing risks : In this setting, subjects may undergo different type of events, denoted by in the following. Let suppose that we are interested in the event . Then, a case is defined as a subject with and .
With competing risks, two definitions of controls were suggested: (i) a control is defined as a subject that is free of any event, i.e with , and (ii) a control is defined as a subject that is not a case, i.e with or with and .
For all outputs of this package, objects named with _1 refer to definition (i). For instance AUC_1 or se_1 refer to time-dependent area under the ROC curve and its estimated standard error according to the definition (i). Objects named with _2 refer to definition (ii) .
Object of class "ipcwsurvivalROC" or "ipcwcompetingrisksROC", depending on if there is competing risk or not, that is a list. For these classes, there are print, plot and confint methods. Most objects that they contain are similar, but some are specific to each class.
Specific objects of class "ipcwsurvivalROC" :
AUC : vector of time-dependent AUC estimates at each time points.
TP : matrix of time-dependent True Positive fraction (sensitivity) estimates.
FP : matrix of time-dependent False Positive fraction (1-specificity) estimates.
Specific objects of class "ipcwcompetingrisksROC" :
AUC_1 : vector of time-dependent AUC estimates at each time points with definition (i) of controls (see Details).
AUC_2 : vector of time-dependent AUC estimates at each time points with definition (ii) of controls (see Details).
TP : matrix of time-dependent True Positive fraction (sensitivity) estimates.
FP_1 : matrix of time-dependent False Positive fraction (1-specificity) estimates with definition (i) of controls (see Details).
FP_2 : matrix of time-dependent False Positive fraction (1-specificity) estimates with definition (ii) of controls (see Details).
Objects common to both classes :
times : the time points for which the time-dependent ROC curves were computed.
weights : a object of class "IPCW", containing all informations about the weights. See ipcw function of pec package.
computation_time : the total computation time.
CumulativeIncidence : the vector of estimated probabilities of being a case at each time points.
survProb : the vector of estimated probabilities of being event-free at each time points.
Stats : a matrix containing descriptive statistics at each time points (like numbers of observed cases or censored observations before each time points).
iid : the logical value of parameter iid used in argument.
n : the sample size, after having omitted missing vaues.
inference : a list that contains, among other things, iid-representations and estimated standard errors of the estimators, and that is used for computation of comparison tests and confidence intervals.
computation_time : the computation time, in seconds.
Paul Blanche [email protected]
Hung, H. and Chiang, C. (2010). Estimation methods for time-dependent AUC with survival data. Canadian Journal of Statistics, 38(1):8-26
Uno, H., Cai, T., Tian, L. and Wei, L. (2007). Evaluating prediction rules for t-years survivors with censored regression models. Journal of the American Statistical Association, 102(478):527-537.
Blanche, P., Dartigues, J. F., & Jacqmin-Gadda, H. (2013). Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in medicine, 32(30), 5381-5397.
P. Blanche, A. Latouche, V. Viallon (2013). Time-dependent AUC with right-censored data: A Survey. Risk Assessment and Evaluation of Predictions, 239-251, Springer, https://arxiv.org/abs/1210.6805.
library(survival) Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) timesS2 <- seq(Landmark,14,by=0.5) b=0.9 arg1<- get_h_x(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) br_s2 = seq(Landmark, 14, length=ls-1) sfalb2<- make_sf( (br_s2[2]-br_s2[1])/4 , arg1) tHor <- 1.5 auc.hq.use<-auc.hqm(pbcT1, sfalb2, Landmark,tHor, event_time_name = 'years', status_name = 'status2') auc.hq.uselibrary(survival) Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] ls<-50 # 50 grid points to evaluate the estimates s.out<- pbcT1[, 'year'] s.out.use <- seq(0, max(s.out), max(s.out)/( ls-1)) timesS2 <- seq(Landmark,14,by=0.5) b=0.9 arg1<- get_h_x(pbcT1, 'albumin', br_s = s.out.use, event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) br_s2 = seq(Landmark, 14, length=ls-1) sfalb2<- make_sf( (br_s2[2]-br_s2[1])/4 , arg1) tHor <- 1.5 auc.hq.use<-auc.hqm(pbcT1, sfalb2, Landmark,tHor, event_time_name = 'years', status_name = 'status2') auc.hq.use
Creates a data frame with only one entry per individual from a data frame with time dependent data. The resulting data frame focusses on the event time and the last observed marker value.
to_id(data_set)to_id(data_set)
data_set |
A data frame of time dependent data points. Missing values are allowed. |
The function to_id uses a data frame of time dependent marker data to create a smaller data frame with only one entry per individual, the last observed marker value and the event time. Note that the column indicating the individuals must have the name id. Note also that this data frame is similar to pbc2.id from the JM package with the difference that the last observed marker value instead of the first one is captured.
A data frame with only one entry per individual.
data_set.id = to_id(pbc2)data_set.id = to_id(pbc2)