Title: | Superefficient Estimation of Future Conditional Hazards Based on Marker Information |
---|---|
Description: | Provides a nonparametric smoothed kernel density estimator for the future conditional hazard when time-dependent covariates are present. It also provides pointwise and uniform confidence bands and a bandwidth selection. |
Authors: | Dimitrios Bagkavos [aut, cre], Alex Isakson [ctb], Enno Mammen [ctb], Jens Nielsen [ctb], Cecile Proust-Lima [ctb] |
Maintainer: | Dimitrios Bagkavos <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.1 |
Built: | 2024-11-24 06:52:18 UTC |
Source: | CRAN |
Implements the bandwidth selection for the future conditional hazard rate based on K-fold cross validation.
b_selection(data, marker_name, event_time_name = 'years', time_name = 'year', event_name = 'status2', I, b_list)
b_selection(data, marker_name, event_time_name = 'years', time_name = 'year', event_name = 'status2', I, b_list)
data |
A data frame of time dependent data points. Missing values are allowed. |
marker_name |
The column name of the marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
I |
Number of observations leave out for a K cross validation. |
b_list |
Vector of bandwidths that need to be tested. |
The function b_selection
implements the cross validation bandwidth selection for the future conditional hazard rate given by
where is a smoothed kernel density estimator of
and
the exposure process of individual
. Note that
is dependent on
.
A list with the tested bandwidths and its cross validation scores.
b_selection_prep_g, Q1, R_K, prep_cv, dataset_split
I = 26 b_list = seq(0.9, 1.3, 0.1) b_scores_alb = b_selection(pbc2, 'albumin', 'years', 'year', 'status2', I, b_list) b_scores_alb[[2]][which.min(b_scores_alb[[1]])]
I = 26 b_list = seq(0.9, 1.3, 0.1) b_scores_alb = b_selection(pbc2, 'albumin', 'years', 'year', 'status2', I, b_list) b_scores_alb[[2]][which.min(b_scores_alb[[1]])]
Calculates an intermediate part for the K-fold cross validation.
b_selection_prep_g(h_mat, int_X, size_X_grid, n, Yi)
b_selection_prep_g(h_mat, int_X, size_X_grid, n, Yi)
h_mat |
A matrix of the estimator for the future conditional hazard rate for all values |
int_X |
Vector of the position of the observed marker values in the grid for marker values. |
size_X_grid |
Numeric value indicating the number of grid points for marker values. |
n |
Number of individuals. |
Yi |
A matrix made by |
The function b_selection_prep_g
calculates a key component for the bandwidth selection
where is estimated without information from all counting processes
with
and
is the exposure.
A matrix with for all individuals
i
and time grid points t
.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) Ni <- make_Ni(breaks_s=br_s, size_s_grid, ss, delta, n) t = 2 h_xt_mat = t(sapply(br_s[1:99], function(si){ h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)})) b_selection_prep_g(h_xt_mat, int_X, size_X_grid, n, Yi)
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) Ni <- make_Ni(breaks_s=br_s, size_s_grid, ss, delta, n) t = 2 h_xt_mat = t(sapply(br_s[1:99], function(si){ h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)})) b_selection_prep_g(h_xt_mat, int_X, size_X_grid, n, Yi)
Implements the uniform and pointwise confidence bands for the future conditional hazard rate based on the last observed marker measure.
Conf_bands(data, marker_name, event_time_name = 'years', time_name = 'year', event_name = 'status2', x, b)
Conf_bands(data, marker_name, event_time_name = 'years', time_name = 'year', event_name = 'status2', x, b)
data |
A data frame of time dependent data points. Missing values are allowed. |
marker_name |
The column name of the marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
x |
Numeric value of the last observed marker value. |
b |
Bandwidth. |
The function Conf_bands
implements the pointwise and uniform confidence bands for the estimator of the future conditional hazard rate . The confidence bands are based on a wild bootstrap approach
.
Pointwise:
For a given generate
for
and order it
. Then
is a pointwise confidence band for
, where
is a bootrap estimate of the variance. For more details on the wild bootstrap approach, please see
prep_boot
and g_xt
.
Uniform:
Generate for
for all
and define
for
. Order
. Then
is a uniform confidence band for
.
A list with pointwise, uniform confidence bands and the estimator for all possible time points
.
b = 10 x = 3 size_s_grid <- 100 s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) c_bands = Conf_bands(pbc2, 'serBilir', event_time_name = 'years', time_name = 'year', event_name = 'status2', x, b) J = 60 plot(br_s[1:J], c_bands$h_hat[1:J], type = "l", ylim = c(0,1), ylab = 'Hazard', xlab = 'Years') lines(br_s[1:J], c_bands$I_p_up[1:J], col = "red") lines(br_s[1:J], c_bands$I_p_do[1:J], col = "red") lines(br_s[1:J], c_bands$I_nu[1:J], col = "blue") lines(br_s[1:J], c_bands$I_nd[1:J], col = "blue")
b = 10 x = 3 size_s_grid <- 100 s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) c_bands = Conf_bands(pbc2, 'serBilir', event_time_name = 'years', time_name = 'year', event_name = 'status2', x, b) J = 60 plot(br_s[1:J], c_bands$h_hat[1:J], type = "l", ylim = c(0,1), ylab = 'Hazard', xlab = 'Years') lines(br_s[1:J], c_bands$I_p_up[1:J], col = "red") lines(br_s[1:J], c_bands$I_p_do[1:J], col = "red") lines(br_s[1:J], c_bands$I_nu[1:J], col = "blue") lines(br_s[1:J], c_bands$I_nd[1:J], col = "blue")
Creates multiple splits of a dataset which is then used in the bandwidth selection with K-fold cross validation.
dataset_split(I, data)
dataset_split(I, data)
data |
A data frame of time dependent data points. Missing values are allowed. |
I |
The number of individuals that should be left out. Optimally, |
The function dataset_split
takes a data frame and transforms it into data frames with
individuals missing from each data frame. Let
be sets of indices with
,
and
for all
. Then data frames with
individuals are created.
A list of data frames with I
individuals missing in the above way.
splitted_dataset = dataset_split(26, pbc2)
splitted_dataset = dataset_split(26, pbc2)
Calculates the entries of the matrix in the definition of the local linear kernel
dij(b,x,y, K)
dij(b,x,y, K)
x |
A vector of design points where the kernel will be evaluated. |
y |
A vector of sample data points. |
b |
The bandwidth to use (a scalar). |
K |
The kernel function to use. |
Implements the caclulation of all entries of matrix
, which is part of the definition of the local linear kernel. The actual calculation is performed by
scalar value, the result of .
Implements the Epanechnikov kernel function
Epan(x)
Epan(x)
x |
A vector of design points where the kernel will be evaluated. |
Implements the Epanechnikov kernel function
Scalar, the value of the Epanechnikov kernel at .
Implements a key part for the wild bootstrap of the hqm estimator.
g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n)
g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n)
br_X |
Marker value grid points that will be used in the evaluatiuon. |
br_s |
Time value grid points that will be used in the evaluatiuon. |
size_s_grid |
Size of the time grid. |
int_X |
Position of the linear interpolated marker values on the marker grid. |
x |
Numeric value of the last observed marker value. |
t |
Numeric value of the time the function should be evaluated. |
b |
Bandwidth. |
Yi |
A matrix made by |
Y |
A matrix made by |
n |
Number of individuals. |
The function implements
for every value on the marker grid, where
,
the exposure and
the marker.
A vector of for all values
on the marker grid.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) X = pbc2$serBilir s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) Yi<-make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) Y<-make_Y(pbc2, pbc2_id, X_lin, br_X, br_s,size_s_grid,size_X_grid, int_s,int_X, 'years', n) t = 2 x = 2 b = 10 g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n)
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) X = pbc2$serBilir s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) Yi<-make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) Y<-make_Y(pbc2, pbc2_id, X_lin, br_X, br_s,size_s_grid,size_X_grid, int_s,int_X, 'years', n) t = 2 x = 2 b = 10 g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n)
Calculates the marker-only hazard rate for time dependent data.
get_alpha(N, Y, b, br_X, K=Epan )
get_alpha(N, Y, b, br_X, K=Epan )
N |
A matrix made by |
Y |
A matrix made by |
b |
Bandwidth. |
br_X |
Vector of grid points for the marker values |
K |
Used kernel function. |
The function get_alpha
implements the marker-only hazard estimator
where is the marker and
is the exposure. The marker-only hazard is defined as the underlying hazard which is not dependent on time
.
A vector of marker-only values for br_X
.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan )
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan )
Calculates the local constant future hazard rate function, conditional on a marker value x
, across across a set of time values t
.
get_h_x(data, marker_name, event_time_name, time_name, event_name, x, b)
get_h_x(data, marker_name, event_time_name, time_name, event_name, x, b)
data |
A data frame of time dependent data points. Missing values are allowed. |
marker_name |
The column name of the marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
x |
Numeric value of the last observed marker value. |
b |
Bandwidth parameter. |
The function get_h_x
implements the future local constant conditional hazard estimator
across a grid of possible time values , where
is the marker,
is the exposure and
is the marker-only hazard, see
get_alpha
for more details.
A vector of for a grid of possible time values
.
library(survival) b = 10 x = 3 Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] b=0.9 arg1ll<-get_h_xll(pbcT1,'albumin',event_time_name='years', time_name='year',event_name='status2',2,0.9) arg1lc<-get_h_x(pbcT1,'albumin',event_time_name='years', time_name='year',event_name='status2',2,0.9) #Caclulate the local contant and local linear survival functions br_s = seq(Landmark, 14, length=99) sfalb2ll<- make_sf( (br_s[2]-br_s[1])/4 , arg1ll) sfalb2lc<- make_sf( (br_s[2]-br_s[1])/4 , arg1lc) #For comparison, also calculate the Kaplan-Meier kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s, sfalb2ll, type="l", col=1, lwd=2, ylab="Survival probability", xlab="Marker level") lines(br_s, sfalb2lc, lty=2, lwd=2, col=2) lines(kma2$time, kma2$surv, type="s", lty=2, lwd=2, col=3) legend("topright", c( "Local linear HQM", "Local constant HQM", "Kaplan-Meier"), lty=c(1, 2, 2), col=1:3, lwd=2, cex=1.7)
library(survival) b = 10 x = 3 Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] b=0.9 arg1ll<-get_h_xll(pbcT1,'albumin',event_time_name='years', time_name='year',event_name='status2',2,0.9) arg1lc<-get_h_x(pbcT1,'albumin',event_time_name='years', time_name='year',event_name='status2',2,0.9) #Caclulate the local contant and local linear survival functions br_s = seq(Landmark, 14, length=99) sfalb2ll<- make_sf( (br_s[2]-br_s[1])/4 , arg1ll) sfalb2lc<- make_sf( (br_s[2]-br_s[1])/4 , arg1lc) #For comparison, also calculate the Kaplan-Meier kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s, sfalb2ll, type="l", col=1, lwd=2, ylab="Survival probability", xlab="Marker level") lines(br_s, sfalb2lc, lty=2, lwd=2, col=2) lines(kma2$time, kma2$surv, type="s", lty=2, lwd=2, col=3) legend("topright", c( "Local linear HQM", "Local constant HQM", "Kaplan-Meier"), lty=c(1, 2, 2), col=1:3, lwd=2, cex=1.7)
Calculates the local linear future hazard rate function, conditional on a marker value x
, across a set of time values t
.
get_h_xll(data, marker_name, event_time_name, time_name, event_name, x, b)
get_h_xll(data, marker_name, event_time_name, time_name, event_name, x, b)
data |
A data frame of time dependent data points. Missing values are allowed. |
marker_name |
The column name of the marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
x |
Numeric value of the last observed marker value. |
b |
Bandwidth parameter. |
The function get_h_xll
implements the local linear future conditional hazard estimator
across a grid of possible time values , where
is the marker,
is the exposure and
is the marker-only hazard, see
get_alpha
for more details.
A vector of for a grid of possible time values
.
library(survival) b = 10 x = 3 Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] b=0.9 arg1ll<-get_h_xll(pbcT1, 'albumin', event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) arg1lc<-get_h_x(pbcT1, 'albumin', event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) #Caclulate the local contant and local linear survival functions br_s = seq(Landmark, 14, length=99) sfalb2ll<- make_sf( (br_s[2]-br_s[1])/4 , arg1ll) sfalb2lc<- make_sf( (br_s[2]-br_s[1])/4 , arg1lc) #For comparison, also calculate the Kaplan-Meier kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s, sfalb2ll, type="l", col=1, lwd=2, ylab="Survival probability", xlab="Marker level") lines(br_s, sfalb2lc, lty=2, lwd=2, col=2) lines(kma2$time, kma2$surv, type="s", lty=2, lwd=2, col=3) legend("topright", c( "Local linear HQM", "Local constant HQM", "Kaplan-Meier"), lty=c(1, 2, 2), col=1:3, lwd=2, cex=1.7)
library(survival) b = 10 x = 3 Landmark <- 2 pbcT1 <- pbc2[which(pbc2$year< Landmark & pbc2$years> Landmark),] b=0.9 arg1ll<-get_h_xll(pbcT1, 'albumin', event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) arg1lc<-get_h_x(pbcT1, 'albumin', event_time_name = 'years', time_name = 'year', event_name = 'status2', 2, 0.9) #Caclulate the local contant and local linear survival functions br_s = seq(Landmark, 14, length=99) sfalb2ll<- make_sf( (br_s[2]-br_s[1])/4 , arg1ll) sfalb2lc<- make_sf( (br_s[2]-br_s[1])/4 , arg1lc) #For comparison, also calculate the Kaplan-Meier kma2<- survfit(Surv(years , status2) ~ 1, data = pbcT1) #Plot the survival functions: plot(br_s, sfalb2ll, type="l", col=1, lwd=2, ylab="Survival probability", xlab="Marker level") lines(br_s, sfalb2lc, lty=2, lwd=2, col=2) lines(kma2$time, kma2$surv, type="s", lty=2, lwd=2, col=3) legend("topright", c( "Local linear HQM", "Local constant HQM", "Kaplan-Meier"), lty=c(1, 2, 2), col=1:3, lwd=2, cex=1.7)
Calculates the future conditional hazard rate for a marker value x
and a time value t
.
h_xt(br_X, br_s, int_X, size_s_grid, alpha, x,t, b, Yi,n)
h_xt(br_X, br_s, int_X, size_s_grid, alpha, x,t, b, Yi,n)
br_X |
Vector of grid points for the marker values |
br_s |
Vector of grid points for the time values |
int_X |
Position of the linear interpolated marker values on the marker grid. |
size_s_grid |
Size of the time grid. |
alpha |
Marker-hazard obtained from |
x |
Numeric value of the last observed marker value. |
t |
Numeric time value. |
b |
Bandwidth. |
Yi |
A matrix made by |
n |
Number of individuals. |
Function h_xt
implements the future conditional hazard estimator
where is the marker,
is the exposure and
is the marker-only hazard, see
get_alpha
for more details. The future conditional hazard is defined as
where is the survival time and
the marker of individual
observed in the time frame
. Function
h_xt
uses an classic (unmodified) kernel function , e.g. the Epanechnikov kernel.
A single numeric value of .
doi:10.1080/03461238.1998.10413997
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) x = 2 t = 2 h_hat = h_xt(br_X, br_s, int_X, size_s_grid, alpha, x, t, b, Yi, n)
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) x = 2 t = 2 h_hat = h_xt(br_X, br_s, int_X, size_s_grid, alpha, x, t, b, Yi, n)
Computes the hqm estimator on the marker grid.
h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)
h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)
br_X |
Marker value grid points that will be used in the evaluatiuon. |
br_s |
Time value grid points that will be used in the evaluatiuon. |
size_s_grid |
Size of the time grid. |
alpha |
Marker-hazard obtained from |
t |
Numeric value of the time the function should be evaluated. |
b |
Bandwidth. |
Yi |
A matrix made by |
int_X |
Position of the linear interpolated marker values on the marker grid. |
n |
Number of individuals. |
The function implements the future conditional hazard estimator
for every on the marker grid where
is the marker,
is the exposure and
is the marker-only hazard, see
get_alpha
for more details.
A vector of for all values
on the marker grid.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) t = 2 h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) t = 2 h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)
Calculates the local linear future conditional hazard rate for a marker value x
and a time value t
.
h_xtll(br_X, br_s, int_X, size_s_grid, alpha, x,t, b, Yi,n, Y)
h_xtll(br_X, br_s, int_X, size_s_grid, alpha, x,t, b, Yi,n, Y)
br_X |
Vector of grid points for the marker values |
br_s |
Vector of grid points for the time values |
int_X |
Position of the linear interpolated marker values on the marker grid. |
size_s_grid |
Size of the time grid. |
alpha |
Marker-hazard obtained from |
x |
Numeric value of the last observed marker value. |
t |
Numeric time value. |
b |
Bandwidth. |
Yi |
A matrix made by |
n |
Number of individuals. |
Y |
A matrix made by |
Function h_xtll
implements the future conditional hazard estimator
where is the marker,
is the exposure and
is the marker-only hazard, see
get_alpha
for more details. The future conditional hazard is defined as
where is the survival time and
the marker of individual
observed in the time frame
.
The function h_xtll
, in the place of uses the kernel
where with
see also Nielsen (1998).
A single numeric value of .
doi:10.1080/03461238.1998.10413997
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) x = 2 t = 2 h_hat = h_xtll(br_X, br_s, int_X, size_s_grid, alpha, x, t, b, Yi, n, Y)
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, breaks_X=br_X, breaks_s=br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X,'years', n) x = 2 t = 2 h_hat = h_xtll(br_X, br_s, int_X, size_s_grid, alpha, x, t, b, Yi, n, Y)
Implements the classical kernel function and related functionals
K_b(b,x,y, K) xK_b(b,x,y, K) K_b_mat(b,x,y, K)
K_b(b,x,y, K) xK_b(b,x,y, K) K_b_mat(b,x,y, K)
x |
A vector of design points where the kernel will be evaluated. |
y |
A vector of sample data points. |
b |
The bandwidth to use (a scalar). |
K |
The kernel function to use. |
The function K_b
implements the classical kernel function calculation
for scalars and
while
xK_b
implements the functional
again for for scalars and
. The function
K_b_mat
is the vectorized version of K_b
. It uses as inputs the vectors and
and returns a
matrix with entries
Scalar values for K_b
and xK_b
and matrix outputs for K_b_mat
.
Implements a linear interpolation between observered marker values.
lin_interpolate(t, i, data_id, data_marker, data_time)
lin_interpolate(t, i, data_id, data_marker, data_time)
t |
A vector of time values where the function should be evaluated. |
i |
A vector of ids of individuals for whom the marker values should be interpolated. |
data_id |
The vector of ids from a data frame of time dependent variables. |
data_marker |
The vector of marker values from a data frame of time dependent variables. |
data_time |
The vector of time values from a data frame of time dependent variables. |
Given time points and marker values
at different time points
, the function calculates a linear interpolation
with
at the time points
for all indicated individuals. Returned are then
. Note that the first value is always observed at time point
and the function
is extrapolated constantly after the last observed marker value.
A matrix with columns as described above for every individual in the vector
i
.
size_s_grid <- 100 X = pbc2$serBilir s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) pbc2_id = to_id(pbc2) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s)
size_s_grid <- 100 X = pbc2$serBilir s = pbc2$year br_s = seq(0, max(s), max(s)/( size_s_grid-1)) pbc2_id = to_id(pbc2) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s)
Implements the local linear kernel function.
llK_b(b,x,y, K)
llK_b(b,x,y, K)
x |
A vector of design points where the kernel will be evaluated. |
y |
A vector of sample data points. |
b |
The bandwidth to use (a scalar). |
K |
The kernel function to use. |
Implements the local linear kernel
where with
see also Nielsen (1998).
Matrix output with entries the values of the kernel function at each point.
doi:10.1080/03461238.1998.10413997
Implements the weights to be used in the local linear HQM estimator.
sn.0(xin, xout, h, kfun) sn.1(xin, xout, h, kfun) sn.2(xin, xout, h, kfun)
sn.0(xin, xout, h, kfun) sn.1(xin, xout, h, kfun) sn.2(xin, xout, h, kfun)
xin |
Sample values. |
xout |
Grid points where the estimator will be evaluated. |
h |
Bandwidth parameter. |
kfun |
Kernel function. |
The function implements the local linear weights in the definition of the estimator , see also
h_xt
A vector of for all values
on the marker grid.
Auxiliary functions that help automate the process of calculating integrals with occurances or exposure processes.
make_N(data, data.id, breaks_X, breaks_s, ss, XX, delta) make_Ni(breaks_s, size_s_grid, ss, delta, n) make_Y(data, data.id, X_lin, breaks_X, breaks_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) make_Yi(data, data.id, X_lin, breaks_X, breaks_s, size_s_grid, size_X_grid, int_s,int_X, event_time = 'years', n)
make_N(data, data.id, breaks_X, breaks_s, ss, XX, delta) make_Ni(breaks_s, size_s_grid, ss, delta, n) make_Y(data, data.id, X_lin, breaks_X, breaks_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) make_Yi(data, data.id, X_lin, breaks_X, breaks_s, size_s_grid, size_X_grid, int_s,int_X, event_time = 'years', n)
data |
A data frame of time dependent data points. Missing values are allowed. |
data.id |
An id data frame obtained from |
breaks_X |
Marker value grid points where the function will be evaluated. |
breaks_s |
Time value grid points where the function will be evaluated. |
ss |
Vector with event times. |
XX |
Vector of last observed marker values. |
delta |
0-1 vector of whether events happened. |
size_s_grid |
Size of the time grid. |
size_X_grid |
Size of the marker grid. |
n |
Number of individuals. |
X_lin |
Linear interpolation of observed marker values evaluated on the marker grid. |
int_s |
Position of the observed time values on the time grid. |
int_X |
Position of the linear interpolated marker values on the marker grid. |
event_time |
String of the column name with the event times. |
Implements matrices for the computation of integrals with occurences and exposures of the form
where is a 0-1 counting process,
the exposure and
an arbitrary function.
The functions make_N
and make_Y
return a matrix on the time grid and marker grid for occurence and exposure, respectively, while make_Ni
and make_Yi
return a matrix on the time grid for evey individual again for occurence and exposure, respectively.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n)
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n)
Creates a survival function from a hazard rate which was calculated on a grid.
make_sf(step_size_s_grid, haz)
make_sf(step_size_s_grid, haz)
step_size_s_grid |
Numeric value indicating the distance between two grid continuous grid points. |
haz |
Vector of hazard values. Hazard rate must have been calculated on a time grid. |
The function make_sf
calculates the survival function
where is the hazard rate. Here, a discritisation via an equidistant grid
on
is used to calculate the integral and it is assumed that
has been calculated for exactly these time points
.
A vector of values .
make_sf(0.1, rep(0.1,10))
make_sf(0.1, rep(0.1,10))
Followup of 312 randomised patients with primary biliary cirrhosis, a rare autoimmune liver disease, at Mayo Clinic.
pbc2
pbc2
A data frame with 1945 observations on the following 20 variables.
id
patients identifier; in total there are 312 patients.
years
number of years between registration and the earlier of death, transplantion, or study analysis time.
status
a factor with levels alive
, transplanted
and dead
.
drug
a factor with levels placebo
and D-penicil
.
age
at registration in years.
sex
a factor with levels male
and female
.
year
number of years between enrollment and this visit date, remaining values on the line of data refer to this visit.
ascites
a factor with levels No
and Yes
.
hepatomegaly
a factor with levels No
and Yes
.
spiders
a factor with levels No
and Yes
.
edema
a factor with levels No edema
(i.e., no edema and no diuretic therapy for edema),
edema no diuretics
(i.e., edema present without diuretics, or edema resolved by diuretics), and
edema despite diuretics
(i.e., edema despite diuretic therapy).
serBilir
serum bilirubin in mg/dl.
serChol
serum cholesterol in mg/dl.
albumin
albumin in gm/dl.
alkaline
alkaline phosphatase in U/liter.
SGOT
SGOT in U/ml.
platelets
platelets per cubic ml / 1000.
prothrombin
prothrombin time in seconds.
histologic
histologic stage of disease.
status2
a numeric vector with the value 1 denoting if the patient was dead, and 0 if the patient was alive or transplanted.
Fleming, T. and Harrington, D. (1991) Counting Processes and Survival Analysis. Wiley, New York.
Therneau, T. and Grambsch, P. (2000) Modeling Survival Data: Extending the Cox Model. Springer-Verlag, New York.
summary(pbc2)
summary(pbc2)
Implements key components for the wild bootstrap of the hqm estimator in preparation for obtaining confidence bands.
prep_boot(g_xt, alpha, Ni, Yi, size_s_grid, br_X, br_s, t, b, int_X, x, n)
prep_boot(g_xt, alpha, Ni, Yi, size_s_grid, br_X, br_s, t, b, int_X, x, n)
g_xt |
A vector obtained by |
alpha |
A vector of the marker only hazard on the marker grid obtained by |
Ni |
A matrix made by |
Yi |
A matrix made by |
size_s_grid |
Size of the time grid. |
br_X |
Vector of grid points for the marker values. |
br_s |
Time value grid points that will be used in the evaluatiuon. |
t |
Numeric value of the time the function should be evaluated. |
b |
Bandwidth. |
int_X |
Position of the linear interpolated marker values on the marker grid. |
x |
Numeric value of the last observed marker value. |
n |
Number of individuals. |
The function implements
and
where ,
and
with being the exposure and
the marker.
A list of 5 items. The first two are vectors for calculating and the third one a vector for
. The 4th one is the value of the hqm estimator that can also be obtained by
h_xt
and the last one is the value of .
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) t = 2 x = 2 g = g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n) Boot_all = prep_boot(g, alpha, Ni, Yi, size_s_grid, br_X, br_s, t, b, int_X, x, n) Boot_all
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) t = 2 x = 2 g = g_xt(br_X, br_s, size_s_grid, int_X, x, t, b, Yi, Y, n) Boot_all = prep_boot(g, alpha, Ni, Yi, size_s_grid, br_X, br_s, t, b, int_X, x, n) Boot_all
Implements the calculation of the hqm estimator on cross validation data sets. This is a preparation for the cross validation bandwidth selection technique for future conditional hazard rate estimation based on marker information data.
prep_cv(data, data.id, marker_name, event_time_name = 'years', time_name = 'year',event_name = 'status2', n, I, b)
prep_cv(data, data.id, marker_name, event_time_name = 'years', time_name = 'year',event_name = 'status2', n, I, b)
data |
A data frame of time dependent data points. Missing values are allowed. |
data.id |
An id data frame obtained from |
marker_name |
The column name of the marker values in the data frame |
event_time_name |
The column name of the event times in the data frame |
time_name |
The column name of the times the marker values were observed in the data frame |
event_name |
The column name of the events in the data frame |
n |
Number of individuals. |
I |
Number of observations leave out for a K cross validation. |
b |
Bandwidth. |
The function splits the data set via dataset_split
and calculates for every splitted data set the hqm estimator
for all on the marker grid and
on the time grid, where
is the marker,
is the exposure and
is the marker-only hazard, see
get_alpha
for more details.
A list of matrices for every cross validation data set with for all
on the marker grid and
on the time grid.
pbc2_id = to_id(pbc2) n = max(as.numeric(pbc2$id)) b = 1.5 I = 26 h_xt_mat_list = prep_cv(pbc2, pbc2_id, 'serBilir', 'years', 'year', 'status2', n, I, b)
pbc2_id = to_id(pbc2) n = max(as.numeric(pbc2$id)) b = 1.5 I = 26 h_xt_mat_list = prep_cv(pbc2, pbc2_id, 'serBilir', 'years', 'year', 'status2', n, I, b)
Calculates a part for the K-fold cross validation score.
Q1(h_xt_mat, int_X, size_X_grid, n, Yi)
Q1(h_xt_mat, int_X, size_X_grid, n, Yi)
h_xt_mat |
A matrix of the estimator for the future conditional hazard rate for all values |
int_X |
Vector of the position of the observed marker values in the grid for marker values. |
size_X_grid |
Numeric value indicating the number of grid points for marker values. |
n |
Number of individuals. |
Yi |
A matrix made by |
The function implements
where is the hqm estimator,
the exposure and
the marker.
A value of the score Q1.
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) t = 2 h_xt_mat = t(sapply(br_s[1:99], function(si){h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)})) Q = Q1(h_xt_mat, int_X, size_X_grid, n, Yi)
pbc2_id = to_id(pbc2) size_s_grid <- size_X_grid <- 100 n = max(as.numeric(pbc2$id)) s = pbc2$year X = pbc2$serBilir XX = pbc2_id$serBilir ss <- pbc2_id$years delta <- pbc2_id$status2 br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) N <- make_N(pbc2, pbc2_id, br_X, br_s, ss, XX, delta) Y <- make_Y(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) b = 1.7 alpha<-get_alpha(N, Y, b, br_X, K=Epan ) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, event_time = 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) t = 2 h_xt_mat = t(sapply(br_s[1:99], function(si){h_xt_vec(br_X, br_s, size_s_grid, alpha, t, b, Yi, int_X, n)})) Q = Q1(h_xt_mat, int_X, size_X_grid, n, Yi)
Calculates a part for the K-fold cross validation score.
R_K(h_xt_mat_list, int_X, size_X_grid, Yi, Ni, n)
R_K(h_xt_mat_list, int_X, size_X_grid, Yi, Ni, n)
h_xt_mat_list |
A list of matrices for all cross validation data sets. Each matrix contains the estimator with the future conditional hazard rate for all values |
int_X |
Vector of the position of the observed marker values in the grid for marker values. |
size_X_grid |
Numeric value indicating the number of grid points for marker values. |
Yi |
A matrix made by |
Ni |
A matrix made by |
n |
Number of individuals. |
The function implements the estimator
where and
is estimated without information from all counting processes
with
.
This function estimates
where is the hqm estimator,
the exposure and
the marker.
A matrix with for all individuals
i
and time grid points t
.
pbc2_id = to_id(pbc2) n = max(as.numeric(pbc2$id)) b = 1.5 I = 104 h_xt_mat_list = prep_cv(pbc2, pbc2_id, 'serBilir', 'years', 'year', 'status2', n, I, b) size_s_grid <- size_X_grid <- 100 s = pbc2$year X = pbc2$serBilir br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) ss <- pbc2_id$years delta <- pbc2_id$status2 X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) R = R_K(h_xt_mat_list, int_X, size_X_grid, Yi, Ni, n) R
pbc2_id = to_id(pbc2) n = max(as.numeric(pbc2$id)) b = 1.5 I = 104 h_xt_mat_list = prep_cv(pbc2, pbc2_id, 'serBilir', 'years', 'year', 'status2', n, I, b) size_s_grid <- size_X_grid <- 100 s = pbc2$year X = pbc2$serBilir br_s = seq(0, max(s), max(s)/( size_s_grid-1)) br_X = seq(min(X), max(X), (max(X)-min(X))/( size_X_grid-1)) ss <- pbc2_id$years delta <- pbc2_id$status2 X_lin = lin_interpolate(br_s, pbc2_id$id, pbc2$id, X, s) int_X <- findInterval(X_lin, br_X) int_s = rep(1:length(br_s), n) Yi <- make_Yi(pbc2, pbc2_id, X_lin, br_X, br_s, size_s_grid, size_X_grid, int_s, int_X, 'years', n) Ni <- make_Ni(br_s, size_s_grid, ss, delta, n) R = R_K(h_xt_mat_list, int_X, size_X_grid, Yi, Ni, n) R
Creates a data frame with only one entry per individual from a data frame with time dependent data. The resulting data frame focusses on the event time and the last observed marker value.
to_id(data_set)
to_id(data_set)
data_set |
A data frame of time dependent data points. Missing values are allowed. |
The function to_id
uses a data frame of time dependent marker data to create a smaller data frame with only one entry per individual, the last observed marker value and the event time. Note that the column indicating the individuals must have the name id
. Note also that this data frame is similar to pbc2.id
from the JM
package with the difference that the last observed marker value instead of the first one is captured.
A data frame with only one entry per individual.
data_set.id = to_id(pbc2)
data_set.id = to_id(pbc2)