Recurrent events

Overview

For recurrent events data it is often of interest to compute basic descriptive quantities to get some basic understanding of the phenonmenon studied. We here demonstrate how one can compute:

  • the marginal mean
    • efficient marginal mean estimation
  • the Ghosh-Lin Cox type regression for the marginal mean, possibly with composite outcomes.
    • efficient regression augmentation of the Ghosh-Lin model
  • the variance of a recurrent events process
  • the probability of exceeding k events
  • the two-stage recurrent events model

We also show how to improve the efficiency of recurrents events marginal mean.

In addition several tools can be used for simulating recurrent events and bivariate recurrent events data, also with a possible terminating event:

  • recurrent events up to two causes and death, given rates of survivors and death on Cox form.
    • frailty extenstions
  • the Ghosh-Lin model when the survival rate is on Cox form.
    • frailty extenstions
  • The general illness death model with cox models for all hazards.

For bivariate recurrent events we also compute summary measures that describe their dependence such as

  • the covariance
  • directional dependence
  • the bivariate probability of exceeding (k1, k2) events

Simulation of recurrents events

We start by simulating some recurrent events data with two type of events with cumulative hazards

  • Λ1(t) (rate among survivors)
  • Λ2(t) (rate among survivors)
  • ΛD(t)

where we consider types 1 and 2 and with a rate of the terminal event given by ΛD(t). We let the events be independent, but could also specify a random effects structure to generate dependence.

When simulating data we can impose various random-effects structures to generate dependence

  • Dependence=0: The intensities can be independent.

  • Dependence=1: We can one gamma distributed random effects Z. Then the intensities are

    • Zλ1(t)
    • Zλ2(t)
    • ZλD(t)
  • Dependence=2: We can draw normally distributed random effects Z1, Z2, Zd were the variance (var.z) and correlation can be specified (cor.mat). Then the intensities are

    • exp (Z1)λ1(t)
    • exp (Z2)λ2(t)
    • exp (Z3)λD(t)
  • Dependence=3: We can draw gamma distributed random effects Z1, Z2, Zd were the sum-structure can be speicifed via a matrix cor.mat. We compute j = ∑kZkcor.mat(j, k) for j = 1, 2, 3.
    Then the intensities are

    • 1λ1(t)
    • 2λ2(t)
    • 3λD(t)

We return to how to run the different set-ups later and start by simulating independent processes.

The key functions are

  • simRecurrent
    • simple simulation with only one event type and death
  • simRecurrentII
    • extended version with possibly multiple types of recurrent events (but rates can be 0)
    • Allows Cox types rates with subject specific rates

In addition we can simulate data from the Ghosh-Lin model and where marginals of the rates among survivors are on on Cox form

  • simGLcox
    • can simulate data from Ghosh-Lin model (also simRecurrentCox)
    • with frailties
      • where survival model for terminal event is on Cox form
    • can simulate data where rates among survivors are are con Cox form
      • with frailties

see examples below for specific models.

Utility functions

We here mention two utility functions

  • tie.breaker for breaking ties among jump-times which is expected in the functions below.
  • count.history that counts the number of jumps previous for each subject that is N1(t−) and N2(t−).

Marginal Mean

We start by estimating the marginal mean E(N1(t ∧ D)) where D is the timing of the terminal event.

This is based on a rate model for

  • the type 1 events  ∼ E(dN1(t)|D > t)
  • the terminal event  ∼ E(dNd(t)|D > t)

and is defined as μ1(t) = E(N1*(t)) where S(t) = P(D ≥ t) and dR1(t) = E(dN1*(t)|D > t)

and can therefore be estimated by a

  • Kaplan-Meier estimator, (u)
  • Nelson-Aalen estimator for R1(t)

where Y(t) = ∑iYi(t) such that the estimator is

Cook & Lawless (1997), and developed further in Gosh & Lin (2000).

The variance can be estimated based on the asymptotic expansion of μ̂1(t) − μ1(t)

with mean-zero processes

  • Mid(t) = NiD(t) − ∫0tYi(s)dΛD(s),
  • Mi1(t) = Ni1(t) − ∫0tYi(s)dR1(s).

as in Gosh & Lin (2000)

Generating data

We start by generating some data to illustrate the computation of the marginal mean

library(mets)
library(timereg)
set.seed(1000) # to control output in simulatins for p-values below.
data(base1cumhaz)
data(base4cumhaz)
data(drcumhaz)
ddr <- drcumhaz
base1 <- base1cumhaz
base4 <- base4cumhaz
rr <- simRecurrent(200,base1,death.cumhaz=ddr)
rr$x <- rnorm(nrow(rr)) 
rr$strata <- floor((rr$id-0.01)/100)
dlist(rr,.~id| id %in% c(1,7,9))
#> id: 1
#>     entry  time   status rr rr2 dtime fdeath death start  stop   x       strata
#> 1      0.0  451.1 1      1  1   3291  1      0        0.0  451.1  1.5212 0     
#> 201  451.1 2687.9 1      1  1   3291  1      0      451.1 2687.9  0.3290 0     
#> 337 2687.9 3290.8 0      1  1   3291  1      1     2687.9 3290.8 -0.4887 0     
#> ------------------------------------------------------------ 
#> id: 7
#>   entry time  status rr rr2 dtime fdeath death start stop  x        strata
#> 7 0     658.3 0      1  1   658.3 1      1     0     658.3 -0.04719 0     
#> ------------------------------------------------------------ 
#> id: 9
#>     entry time  status rr rr2 dtime fdeath death start stop  x       strata
#> 9     0.0 433.5 1      1  1   505.3 1      0       0.0 433.5 -0.3530 0     
#> 205 433.5 505.3 0      1  1   505.3 1      1     433.5 505.3  0.7694 0

The status variable keeps track of the recurrent evnts and their type, and death the timing of death.

To compute the marginal mean we simly estimate the two rates functions of the number of events of interest and death by using the phreg function (to start without covariates). Then the estimates are combined with standard error computation in the recurrentMarginal function

#  to fit non-parametric models with just a baseline 
xr <- phreg(Surv(entry,time,status)~cluster(id),data=rr)
dr <- phreg(Surv(entry,time,death)~cluster(id),data=rr)
par(mfrow=c(1,3))
bplot(dr,se=TRUE)
title(main="death")
bplot(xr,se=TRUE)
# robust standard errors 
rxr <-   robust.phreg(xr,fixbeta=1)
bplot(rxr,se=TRUE,robust=TRUE,add=TRUE,col=4)

# marginal mean of expected number of recurrent events 
out <- recurrentMarginal(xr,dr)
bplot(out,se=TRUE,ylab="marginal mean",col=2)

We can also extract the estimate in different time-points

summary(out,times=c(1000,2000))
#>   times mean   se-mean  CI-2.5% CI-97.5% strata
#> 1  1000 1.29 0.0989616 1.109913 1.499306      0
#> 2  2000 1.81 0.1381837 1.558450 2.102153      0

The marginal mean can also be estimated in a stratified case:

xr <- phreg(Surv(entry,time,status)~strata(strata)+cluster(id),data=rr)
dr <- phreg(Surv(entry,time,death)~strata(strata)+cluster(id),data=rr)
par(mfrow=c(1,3))
bplot(dr,se=TRUE)
title(main="death")
bplot(xr,se=TRUE)
rxr <-   robust.phreg(xr,fixbeta=1)
bplot(rxr,se=TRUE,robust=TRUE,add=TRUE,col=1:2)

out <- recurrentMarginal(xr,dr)
bplot(out,se=TRUE,ylab="marginal mean",col=1:2)

Further, if we adjust for covariates for the two rates we can still do predictions of marginal mean, what can be plotted is the baseline marginal mean, that is for the covariates equal to 0 for both models. Predictions for specific covariates can also be obtained with the recmarg (recurren marginal mean used solely for predictions without standard error computation).

# cox case
xr <- phreg(Surv(entry,time,status)~x+cluster(id),data=rr)
dr <- phreg(Surv(entry,time,death)~x+cluster(id),data=rr)
par(mfrow=c(1,3))
bplot(dr,se=TRUE)
title(main="death")
bplot(xr,se=TRUE)
rxr <- robust.phreg(xr)
bplot(rxr,se=TRUE,robust=TRUE,add=TRUE,col=1:2)

out <- recurrentMarginal(xr,dr)
bplot(out,se=TRUE,ylab="marginal mean",col=1:2)

# predictions witout se's 
outX <- recmarg(xr,dr,Xr=1,Xd=1)
bplot(outX,add=TRUE,col=3)

Improving efficiency

We now simulate some data where there is strong heterogenity such that we can improve the efficiency for censored survival data. The augmentation is a regression on the history for each subject consisting of the specified terms terms: Nt, Nt2 (Nt squared), expNt (exp(-Nt)), NtexpNt (Nt*exp(-Nt)) or by simply specifying these directly. This was developed in Cortese and Scheike (2022).

rr <- simRecurrentII(200,base1,base4,death.cumhaz=ddr,cens=3/5000,dependence=4,var.z=1)
rr <-  count.history(rr)

rr <- transform(rr,statusD=status)
rr <- dtransform(rr,statusD=3,death==1)
dtable(rr,~statusD+status+death,level=2,response=1)
#> 
#>       statusD
#> status   0   1   2   3
#>      0  82   0   0 118
#>      1   0 243   0   0
#>      2   0   0  32   0
#> 
#>      statusD
#> death   0   1   2   3
#>     0  82 243  32   0
#>     1   0   0   0 118

xr <- phreg(Surv(start,stop,status==1)~cluster(id),data=rr)
dr <- phreg(Surv(start,stop,death)~cluster(id),data=rr)
# marginal mean of expected number of recurrent events 
out <- recurrentMarginal(xr,dr)

times <- 500*(1:10)
recEFF1 <- recurrentMarginalAIPCW(Event(start,stop,statusD)~cluster(id),data=rr,times=times,cens.code=0,
                   death.code=3,cause=1,augment.model=~Nt)
with( recEFF1, cbind(times,muP,semuP,muPAt,semuPAt,semuPAt/semuP))
#>       times       muP      semuP     muPAt    semuPAt          
#>  [1,]   500 0.7296607 0.08934148 0.7249944 0.08905594 0.9968040
#>  [2,]  1000 1.0253633 0.11701067 1.0152615 0.11651045 0.9957250
#>  [3,]  1500 1.2942861 0.15154426 1.2960853 0.15078170 0.9949681
#>  [4,]  2000 1.5821804 0.19166318 1.5505537 0.19135356 0.9983846
#>  [5,]  2500 1.7367039 0.20788926 1.6836330 0.20737040 0.9975041
#>  [6,]  3000 1.8824299 0.23035386 1.8141174 0.22956165 0.9965609
#>  [7,]  3500 2.0447577 0.26163802 1.9734070 0.25940041 0.9914477
#>  [8,]  4000 2.2148153 0.30878777 2.1219632 0.30363205 0.9833034
#>  [9,]  4500 2.2457349 0.32054957 2.1413394 0.31524586 0.9834543
#> [10,]  5000 2.2457349 0.32054957 2.1413394 0.31524586 0.9834543

times <- 500*(1:10)
###recEFF14 <- recurrentMarginalAIPCW(Event(start,stop,statusD)~cluster(id),data=rr,times=times,cens.code=0,
###death.code=3,cause=1,augment.model=~Nt+Nt2+expNt+NtexpNt)
###with(recEFF14,cbind(times,muP,semuP,muPAt,semuPAt,semuPAt/semuP))

recEFF14 <- recurrentMarginalAIPCW(Event(start,stop,statusD)~cluster(id),data=rr,times=times,cens.code=0,
death.code=3,cause=1,augment.model=~Nt+I(Nt^2)+I(exp(-Nt))+ I( Nt*exp(-Nt)))
with(recEFF14,cbind(times,muP,semuP,muPAt,semuPAt,semuPAt/semuP))
#>       times       muP      semuP     muPAt    semuPAt          
#>  [1,]   500 0.7296607 0.08934148 0.7237292 0.08900193 0.9961995
#>  [2,]  1000 1.0253633 0.11701067 1.0161522 0.11620521 0.9931163
#>  [3,]  1500 1.2942861 0.15154426 1.2667534 0.15011448 0.9905653
#>  [4,]  2000 1.5821804 0.19166318 1.5082667 0.18777951 0.9797370
#>  [5,]  2500 1.7367039 0.20788926 1.6111482 0.20257318 0.9744283
#>  [6,]  3000 1.8824299 0.23035386 1.7159860 0.22374298 0.9713012
#>  [7,]  3500 2.0447577 0.26163802 1.8292809 0.25298074 0.9669112
#>  [8,]  4000 2.2148153 0.30878777 1.9265532 0.29398871 0.9520737
#>  [9,]  4500 2.2457349 0.32054957 1.9218711 0.30431451 0.9493524
#> [10,]  5000 2.2457349 0.32054957 1.9218711 0.30431451 0.9493524

bplot(out,se=TRUE,ylab="marginal mean",col=2)
k <- 1
for (t in times) {
    ci1 <- c(recEFF1$muPAt[k]-1.96*recEFF1$semuPAt[k],
             recEFF1$muPAt[k]+1.96*recEFF1$semuPAt[k])
    ci2 <- c(recEFF1$muP[k]-1.96*recEFF1$semuP[k],
             recEFF1$muP[k]+1.96*recEFF1$semuP[k])
    lines(rep(t,2)-2,ci2,col=2,lty=1,lwd=2)
    lines(rep(t,2)+2,ci1,col=1,lty=1,lwd=2)
    k <- k+1
}
legend("bottomright",c("Eff-pred"),lty=1,col=c(1,3))

In the case where covariates might be important but we are still interested in the marginal mean we can also augment wrt these covariates

n <- 200
X <- matrix(rbinom(n*2,1,0.5),n,2)
colnames(X) <- paste("X",1:2,sep="")
###
r1 <- exp( X %*% c(0.3,-0.3))
rd <- exp( X %*% c(0.3,-0.3))
rc <- exp( X %*% c(0,0))
fz <- NULL
rr <- mets:::simGLcox(n,base1,ddr,var.z=0,r1=r1,rd=rd,rc=rc,fz,model="twostage",cens=3/5000) 
rr <- cbind(rr,X[rr$id+1,])

dtable(rr,~statusD+status+death,level=2,response=1)
#> 
#>       statusD
#> status   0   1   3
#>      0  93   0 107
#>      1   0 802   0
#> 
#>      statusD
#> death   0   1   3
#>     0  93 476   0
#>     1   0 326 107

times <- seq(500,5000,by=500)
recEFF1x <- recurrentMarginalAIPCW(Event(start,stop,statusD)~cluster(id),data=rr,times=times,
                   cens.code=0,death.code=3,cause=1,augment.model=~X1+X2)
with(recEFF1x, cbind(muP,muPA,muPAt,semuP,semuPA,semuPAt,semuPAt/semuP))
#>             muP      muPA    muPAt     semuP    semuPA   semuPAt          
#>  [1,]  1.239629  1.226904 1.219113 0.1113222 0.1104686 0.1102564 0.9904262
#>  [2,]  2.260753  2.237327 2.218786 0.1888720 0.1856083 0.1850181 0.9795950
#>  [3,]  3.380778  3.284093 3.299408 0.3116231 0.2998621 0.2970862 0.9533513
#>  [4,]  4.715639  4.652729 4.588619 0.4959756 0.4640347 0.4579825 0.9233974
#>  [5,]  5.978661  5.817999 5.773031 0.6718212 0.6155609 0.6038833 0.8988751
#>  [6,]  7.029352  6.955265 6.802368 0.8586233 0.7963062 0.7710258 0.8979792
#>  [7,]  8.520280  8.217099 8.009947 1.2296894 1.0756621 1.0568620 0.8594544
#>  [8,]  9.780973  9.143637 8.813562 1.7113961 1.4189071 1.4040811 0.8204302
#>  [9,] 10.871472 10.357563 9.395688 2.1044673 1.7855074 1.6866794 0.8014757
#> [10,] 11.296956 10.735081 9.648354 2.2567311 1.8849404 1.7901175 0.7932347

xr <- phreg(Surv(start,stop,status==1)~cluster(id),data=rr)
dr <- phreg(Surv(start,stop,death)~cluster(id),data=rr)
out <- recurrentMarginal(xr,dr)
mets::summaryTimeobject(out$times,out$mu,times=times,se.mu=out$se.mu)
#>    times      mean    se-mean   CI-2.5% CI-97.5%
#> 1    500 0.8806899 0.06519655 0.7617427 1.018211
#> 2   1000 1.2224722 0.10539353 1.0324110 1.447523
#> 3   1500 1.3702238 0.14414996 1.1149155 1.683996
#> 4   2000 1.4239958 0.17001617 1.1268827 1.799446
#> 5   2500 1.4334297 0.17708838 1.1251633 1.826153
#> 6   3000 1.4352968 0.17907319 1.1239334 1.832917
#> 7   3500 1.4359348 0.17990713 1.1232758 1.835621
#> 8   4000 1.4360308 0.18005391 1.1231443 1.836081
#> 9   4500 1.4360356 0.18006355 1.1231342 1.836110
#> 10  5000 1.4360358 0.18006394 1.1231338 1.836111

Regression models for the marginal mean

One can also do regression modelling , using the model then Ghost-Lin suggested IPCW score equations that are implemented in the recreg function of mets.

First we generate data that from a Ghosh-Lin model with β = (−0.3, 0.3) and the baseline given by base1, this is done under the assumption that the death rate given covariates are on Cox form with baseline ddr:

n <- 100
X <- matrix(rbinom(n*2,1,0.5),n,2)
colnames(X) <- paste("X",1:2,sep="")
###
r1 <- exp( X %*% c(0.3,-0.3))
rd <- exp( X %*% c(0.3,-0.3))
rc <- exp( X %*% c(0,0))
fz <- NULL
rr <- mets:::simGLcox(n,base1,ddr,var.z=1,r1=r1,rd=rd,rc=rc,fz,cens=1/5000,type=2) 
rr <- cbind(rr,X[rr$id+1,])

 out  <- recreg(Event(start,stop,statusD)~X1+X2+cluster(id),data=rr,cause=1,death.code=3,cens.code=0)
 outs <- recreg(Event(start,stop,statusD)~X1+X2+cluster(id),data=rr,cause=1,death.code=3,cens.code=0,
        cens.model=~strata(X1,X2))
 summary(out)$coef
#>     Estimate      S.E.    dU^-1/2   P-value
#> X1 0.3486562 0.3510510 0.09720215 0.3206230
#> X2 0.3159378 0.3388077 0.09967828 0.3510788
 summary(outs)$coef
#>     Estimate      S.E.    dU^-1/2   P-value
#> X1 0.2963320 0.3448975 0.09772802 0.3902364
#> X2 0.2551964 0.3390756 0.10024137 0.4516759

 ## checking baseline
 par(mfrow=c(1,1))
 bplot(out)
 bplot(outs,add=TRUE,col=2)
 lines(scalecumhaz(base1,1),col=3,lwd=2)

We note that for the extended censoring model we gain a little efficiency and that the estimates are close to the true values.

Also possible to do IPCW regression at fixed time-point

 outipcw  <- recregIPCW(Event(start,stop,statusD)~X1+X2+cluster(id),data=rr,cause=1,death.code=3,
            cens.code=0,times=2000)
 outipcws <- recregIPCW(Event(start,stop,statusD)~X1+X2+cluster(id),data=rr,cause=1,death.code=3,
            cens.code=0,times=2000,cens.model=~strata(X1,X2))
 summary(outipcw)$coef
#>              Estimate   Std.Err       2.5%     97.5%   P-value
#> (Intercept) 0.8376610 0.3071504  0.2356573 1.4396647 0.0063874
#> X1          0.4593771 0.3446980 -0.2162185 1.1349727 0.1826321
#> X2          0.1168024 0.3501243 -0.5694285 0.8030334 0.7386793
 summary(outipcws)$coef
#>               Estimate   Std.Err       2.5%     97.5%     P-value
#> (Intercept) 0.90967778 0.3061090  0.3097152 1.5096403 0.002961126
#> X1          0.36067230 0.3348409 -0.2956038 1.0169484 0.281415330
#> X2          0.08605183 0.3376382 -0.5757069 0.7478106 0.798828145

We can also do the Mao-Lin type composite outcome where we both count the cause 1 and deaths for example

 out  <- recreg(Event(start,stop,statusD)~X1+X2+cluster(id),data=rr,cause=c(1,3),
        death.code=3,cens.code=0)
 summary(out)$coef
#>     Estimate      S.E.    dU^-1/2   P-value
#> X1 0.3308270 0.2973404 0.08977266 0.2658714
#> X2 0.2612694 0.2855146 0.09152510 0.3601484

Also demonstrate that this can be done with competing risks death (change some of the cause 3 deaths to cause 4) and with weights w1, w2 that follow the causes, here 1 and 3.

 rr$binf <- rbinom(nrow(rr),1,0.5) 
 rr$statusDC <- rr$statusD
 rr <- dtransform(rr,statusDC=4, statusD==3 & binf==0)
 rr$weight <- 1
 rr <- dtransform(rr,weight=2,statusDC==3)

 outC  <- recreg(Event(start,stop,statusDC)~X1+X2+cluster(id),data=rr,cause=c(1,3),
         death.code=c(3,4),cens.code=0)
 summary(outC)$coef
#>     Estimate      S.E.    dU^-1/2   P-value
#> X1 0.3388982 0.3231606 0.09350977 0.2943166
#> X2 0.3144569 0.3110926 0.09592269 0.3121053

 outCW  <- recreg(Event(start,stop,statusDC)~X1+X2+cluster(id),data=rr,cause=c(1,3),
          death.code=c(3,4),cens.code=0,wcomp=c(1,2))
 summary(outCW)$coef
#>     Estimate      S.E.    dU^-1/2   P-value
#> X1 0.3304876 0.3003393 0.09021130 0.2711662
#> X2 0.3132135 0.2885475 0.09256241 0.2777077

 bplot(out,ylab="Mean composite")
 bplot(outC,col=2,add=TRUE)
 bplot(outCW,col=3,add=TRUE)

Predictions and standard errors can be computed via the iid decompositions of the baseline and the regression coefficients. We illustrate this for the standard Ghosh-Lin model and it requires that the model is fitted with the option cox.prep=TRUE

out  <- recreg(Event(start,stop,statusD)~X1+X2+cluster(id),data=rr,cause=1,death.code=3,
            cens.code=0,cox.prep=TRUE)
baseiid <- IIDbaseline.cifreg(out,time=3000)
GLprediid(baseiid,rr[1:5,])
#>          pred    se-log    lower    upper
#> [1,] 5.823597 0.3817794 2.755625 12.30729
#> [2,] 5.823597 0.3817794 2.755625 12.30729
#> [3,] 5.823597 0.3817794 2.755625 12.30729
#> [4,] 5.823597 0.3817794 2.755625 12.30729
#> [5,] 5.823597 0.3817794 2.755625 12.30729

The Ghosh-Lin model can be made more efficient by the regression augmentation method. First computing the augmentation and then in a second step the augmented estimator (Cortese and Scheike (2023)):

 outi  <- recreg(Event(start,stop,statusD)~X1+X2+cluster(id),data=rr,cause=1,death.code=3,cens.code=0,
        augment.model=~Nt+X1+X2)
 summary(outi)$coef
#>     Estimate      S.E.    dU^-1/2   P-value
#> X1 0.3486562 0.3510510 0.09720215 0.3206230
#> X2 0.3159378 0.3388077 0.09967828 0.3510788
 outA  <- recreg(Event(start,stop,statusD)~X1+X2+cluster(id),data=rr,cause=1,death.code=3,cens.code=0,
        augment.model=~Nt+X1+X2,augment=outi$lindyn.augment)
 summary(outA)$coef
#>     Estimate      S.E.    dU^-1/2   P-value
#> X1 0.2907480 0.3407406 0.09686369 0.3935027
#> X2 0.2017009 0.3255009 0.09837819 0.5354796

We note that the simple augmentation improves the standard errors as expected. The data was generated assuming independence with previous number of events so it would suffice to augment only with the covariates.

Two-stage modelling

Above we simulated data with a terminal event on Cox form and recurrent events satisfying the Ghosh-Lin model.

Now we fit the two-stage model (the recreg must be called with cox.prep=TRUE)

 out  <- recreg(Event(start,stop,statusD)~X1+X2+cluster(id),data=rr,
        cause=1,death.code=3,cens.code=0,cox.prep=TRUE)
 outs <- phreg(Event(start,stop,statusD==3)~X1+X2+cluster(id),data=rr)

 tsout <- twostageREC(outs,out,data=rr)
 summary(tsout)
#> Ghosh-Lin(recurrent)-Cox(terminal) mean model
#> 
#>  100 clusters
#> coeffients:
#>             Estimate Std.Err    2.5%   97.5% P-value
#> dependence1  1.34706 0.16891 1.01600 1.67813       0
#> 
#> var,shared:
#>             Estimate Std.Err    2.5%   97.5% P-value
#> dependence1  1.34706 0.16891 1.01600 1.67813       0

Standard errors are computed assuming that the parameters of out and outs are both known, and therefore propobly a bit to small. We could do a bootstrap to get more reliable standard errors.

Simulations with specific structure

The function simGLcox can simulate data where the recurrent process has mean on Ghosh-Lin form. The key is that where Z is a possible frailty. Therefore leads to a Ghosh-Lin model. We can choose the survival model to have Cox form among survivors by the option model=“twostage”, otherwise model=“frailty” uses the survival model with rate Zλd(t)rd. The Z is gamma distributed with a variance that can be specified. The simulations are based on a piecwise-linear approximation of the hazard functions for S(t|X, Z) and R(t|X, Z).

n <- 100
X <- matrix(rbinom(n*2,1,0.5),n,2)
colnames(X) <- paste("X",1:2,sep="")
###
r1 <- exp( X %*% c(0.3,-0.3))
rd <- exp( X %*% c(0.3,-0.3))
rc <- exp( X %*% c(0,0))
rr <- mets:::simGLcox(n,base1,ddr,var.z=0,r1=r1,rd=rd,rc=rc,model="twostage",cens=3/5000) 
rr <- cbind(rr,X[rr$id+1,])

We can also simulate from models where the terminal event is on Cox form and the rate among survivors is on Cox form.

  • E(dN1|D > t, X) = λ1(t)r1
  • E(dNd|D > t, X) = λd(t)rd

underlying these models we have a shared frailty model

rr <- mets:::simGLcox(100,base1,ddr,var.z=1,r1=r1,rd=rd,rc=rc,type=3,cens=3/5000) 
rr <- cbind(rr,X[rr$id+1,])
margsurv <- phreg(Surv(start,stop,statusD==3)~X1+X2+cluster(id),rr)
recurrent <- phreg(Surv(start,stop,statusD==1)~X1+X2+cluster(id),rr)
estimate(margsurv)
#>    Estimate Std.Err    2.5%   97.5%  P-value
#> X1   0.1210  0.2601 -0.3887  0.6307 0.641722
#> X2  -0.7358  0.2821 -1.2887 -0.1829 0.009104
estimate(recurrent)
#>    Estimate Std.Err    2.5%  97.5% P-value
#> X1  0.10518  0.2758 -0.4354 0.6458  0.7029
#> X2  0.08246  0.2833 -0.4729 0.6378  0.7710
par(mfrow=c(1,2)); 
plot(margsurv); lines(ddr,col=3); 
plot(recurrent); lines(base1,col=3)

Other marginal properties

The mean is a useful summary measure but it is very easy and useful to look at other simple summary measures such as the probability of exceeding k events

  • P(N1*(t) ≥ k)
    • cumulative incidence of Tk = inf {t : N1*(t) = k} with competing D.

that is thus equivalent to a certain cumulative incidence of Tk occurring before D. We denote this cumulative incidence as k(t).

We note also that N1*(t)2 can be written as with f(k) = (k + 1)2 − k2, such that its mean can be written as and estimated by That is very similar to the “product-limit” estimator for E((N1*(t))2)

We use the esimator of the probabilty of exceeding “k” events based on the fact that I(N1*(t) ≥ k) is equivalent to suggesting that its mean can be computed as and estimated by

To compute these estimators we need to set up the data by computing the number of previous events of type “1” by the count.history function

###cor.mat <- corM <- rbind(c(1.0, 0.6, 0.9), c(0.6, 1.0, 0.5), c(0.9, 0.5, 1.0))
rr <- simRecurrentII(200,base1,base4,death.cumhaz=ddr,cens=3/5000,dependence=4,var.z=1)
rr <-  count.history(rr)
dtable(rr,~death+status)
#> 
#>       status   0   1   2
#> death                   
#> 0             82 247  27
#> 1            118   0   0

oo <- prob.exceedRecurrent(rr,1)
bplot(oo)

We can also look at the mean and variance based on the estimators just described

par(mfrow=c(1,2))
with(oo,plot(time,mu,col=2,type="l"))
#
with(oo,plot(time,varN,type="l"))

We could also use the product-limit estimator to estimate the probability of exceeding “k” events, and then standard errors are also returned:

 oop <- prob.exceed.recurrent(rr,1)
 bplot(oo)
 matlines(oop$times,oop$prob,type="l")

 summaryTimeobject(oop$times,oop$prob,se.mu=oop$se.prob,times=1000)
#>            times        mean    se-mean     CI-2.5%   CI-97.5%
#> N=0         1000 0.552391858 0.03882814 0.481298393 0.63398667
#> exceed>=1   1000 0.447608142 0.03882814 0.377622992 0.53056369
#> exceed>=2   1000 0.233078989 0.03344195 0.175942726 0.30876989
#> exceed>=3   1000 0.139370028 0.02817357 0.093776998 0.20712973
#> exceed>=4   1000 0.088317936 0.02319248 0.052785841 0.14776799
#> exceed>=5   1000 0.044860816 0.01791420 0.020509349 0.09812563
#> exceed>=6   1000 0.015109770 0.01051121 0.003864586 0.05907623
#> exceed>=7   1000 0.008651055 0.00856731 0.001241915 0.06026236
#> exceed>=8   1000 0.000000000 0.00000000         NaN        NaN
#> exceed>=9   1000 0.000000000 0.00000000         NaN        NaN
#> exceed>=10  1000 0.000000000 0.00000000         NaN        NaN
#> exceed>=11  1000 0.000000000 0.00000000         NaN        NaN
#> exceed>=12  1000 0.000000000 0.00000000         NaN        NaN
#> exceed>=13  1000 0.000000000 0.00000000         NaN        NaN
#> exceed>=14  1000 0.000000000 0.00000000         NaN        NaN
#> exceed>=15  1000 0.000000000 0.00000000         NaN        NaN
#> exceed>=16  1000 0.000000000 0.00000000         NaN        NaN

We note from the plot that the estimates are quite similar.

Finally, we make a plot with 95% confidence intervals

matplot(oop$times,oop$prob,type="l")
for (i in seq(ncol(oop$prob))) 
    plotConfRegion(oop$times,cbind(oop$se.lower[,i],oop$se.upper[,i]),col=i)

Multiple events

We now generate recurrent events with two types of events. We start by generating data as before where all events are independent.

rr <- simRecurrentII(200,base1,cumhaz2=base4,death.cumhaz=ddr)
rr <-  count.history(rr)
dtable(rr,~death+status)
#> 
#>       status   0   1   2
#> death                   
#> 0             20 536  85
#> 1            180   0   0

Based on this we can estimate also the joint distribution function, that is the probability that (N1(t) ≥ k1, N2(t) ≥ k2)

# Bivariate probability of exceeding 
oo <- prob.exceedBiRecurrent(rr,1,2,exceed1=c(1,5),exceed2=c(1,2))
with(oo, matplot(time,pe1e2,type="s"))
nc <- ncol(oo$pe1e2)
legend("topleft",legend=colnames(oo$pe1e2),lty=1:nc,col=1:nc)

Dependence between events: Covariance

The dependence can also be summarised in other ways. For example by computing the covariance and comparing it to the covariance under the assumption of independence among survivors.

Covariance among two types of events where E(N1*(t)N2*(t)) can be computed as

Recall that we might have a terminal event present such that we only see N1*(t ∧ D) and N2*(t ∧ D).

To compute the covariance we thus compute estimated by * Yjk(t) = ∑Yi(t)I(Nji*(s−) = k) for j = 1, 2, * j, k(t) = ∑i0tI(Nijo(s−) = k)dNij(s) * jo gives the other type so that 1o = 2 and 2o = 1.

We thus estimate $ E(N_1^(t) N_2^(t))$ by

  • Without terminating event covariance is a useful nonparametric measure.
  • With terminating event dependence can be generated terminating event.
  • In reality what is of interest would be independence among survivors that is if
    • N1 is not predicitive for N2
    • N2 is not predicitive for N1

If the two processes are independent among survivors then so and where Nj(t) = ∑i0tdNj, i(s).

Under the independence E(N1*(t)N2*(t)) is estimated

Both estimators, (N1*(t)N2*(t)) and I(N1*(t)N2*(t)), as well as (N1*(t)) and (N2*(t)), have asymptotic expansions that can be written as a sum of iid processes, similarly to the arguments of Ghosh & Lin 2000, iΨi(t).
We here, however, use a simple block bootstrap to get standard errors.

We can thus estimate the standard errors and of the estimators and their difference (N1*(t)N2*(t)) − I(N1*(t)N2*(t)).

Note that we have terms for whether * N1 is predicitive for N2 * N1 -> N2 : E(∫0tN1*(s−)dN2*(s)) * this is equivalent to a weighted log-rank test * N2 is predicitive for N1 * N2 -> N1 : E(∫0tN2*(s−)dN1*(s)) * this is equivalent to a weighted log-rank test

rr$strata <- 1
dtable(rr,~death+status)
#> 
#>       status   0   1   2
#> death                   
#> 0             20 536  85
#> 1            180   0   0

covrp <- covarianceRecurrent(rr,1,2,status="status",death="death",
                        start="entry",stop="time",id="id",names.count="Count")
par(mfrow=c(1,3)) 
plot(covrp)


# with strata, each strata in matrix column, provides basis for fast Bootstrap
covrpS <- covarianceRecurrentS(rr,1,2,status="status",death="death",
        start="entry",stop="time",strata="strata",id="id",names.count="Count")

Bootstrap standard errors for terms

First fitting the model again to get our estimates of interst, and then computing them for some specific time-points

times <- seq(500,5000,500)

coo1 <- covarianceRecurrent(rr,1,2,status="status",start="entry",stop="time")
#
mug <- Cpred(cbind(coo1$time,coo1$EN1N2),times)[,2]
mui <- Cpred(cbind(coo1$time,coo1$EIN1N2),times)[,2]
mu2.1 <- Cpred(cbind(coo1$time,coo1$mu2.1),times)[,2]
mu2.i <- Cpred(cbind(coo1$time,coo1$mu2.i),times)[,2]
mu1.2 <- Cpred(cbind(coo1$time,coo1$mu1.2),times)[,2]
mu1.i <- Cpred(cbind(coo1$time,coo1$mu1.i),times)[,2]
cbind(times,mu2.1,mu2.i)
cbind(times,mu1.2,mu1.i)

To get the bootstrap standard errors there is a quick memory demanding function (with S for speed and strata) BootcovariancerecurrenceS and slower function that goes through the loops in R Bootcovariancerecurrence.

bt1 <- BootcovariancerecurrenceS(rr,1,2,status="status",start="entry",stop="time",K=100,times=times)
#bt1 <- Bootcovariancerecurrence(rr,1,2,status="status",start="entry",stop="time",K=K,times=times)

BCoutput <- list(bt1=bt1,mug=mug,mui=mui,
        bse.mug=bt1$se.mug,bse.mui=bt1$se.mui,
        dmugi=mug-mui,
    bse.dmugi=apply(bt1$EN1N2-bt1$EIN1N2,1,sd),
    mu2.1 = mu2.1 , mu2.i = mu2.i , dmu2.i=mu2.1-mu2.i,
    mu1.2 = mu1.2 , mu1.i = mu1.i , dmu1.i=mu1.2-mu1.i,
    bse.mu2.1=apply(bt1$mu2.i,1,sd), bse.mu2.1=apply(bt1$mu2.1,1,sd),
    bse.dmu2.i=apply(bt1$mu2.1-bt1$mu2.i,1,sd),
    bse.mu1.2=apply(bt1$mu1.2,1,sd), bse.mu1.i=apply(bt1$mu1.i,1,sd),
    bse.dmu1.i=apply(bt1$mu1.2-bt1$mu1.i,1,sd)
    )

We then look at the test for overall dependence in the different time-points. We here have no suggestion of dependence.

tt  <- BCoutput$dmugi/BCoutput$bse.dmugi
cbind(times,2*(1-pnorm(abs(tt))))

We can also take out the specific components for whether N1 is predictive for N2 and vice versa. We here have no suggestion of dependence.

t21  <- BCoutput$dmu1.i/BCoutput$bse.dmu1.i 
t12  <- BCoutput$dmu2.i/BCoutput$bse.dmu2.i 
cbind(times,2*(1-pnorm(abs(t21))),2*(1-pnorm(abs(t12))))

We finally plot the boostrap samples

par(mfrow=c(1,2))
matplot(BCoutput$bt1$time,BCoutput$bt1$EN1N2,type="l",lwd=0.3)
matplot(BCoutput$bt1$time,BCoutput$bt1$EIN1N2,type="l",lwd=0.3)

Looking at other simulations with dependence

Using the normally distributed random effects we plot 4 different settings. We have variance 0.5 for all random effects and change the correlation. We let the correlation between the random effect associated with N1 and N2 be denoted ρ12 and the correlation between the random effects associated between Nj and D the terminal event be denoted as ρj3, and organize all correlation in a vector ρ = (ρ12, ρ13, ρ23).

  • Scenario I ρ = (0, 0.0, 0.0) Independence among all efects.
  data(base1cumhaz)
  data(base4cumhaz)
  data(drcumhaz)
  dr <- drcumhaz
  base1 <- base1cumhaz
  base4 <- base4cumhaz

  par(mfrow=c(1,3))
  var.z <- c(0.5,0.5,0.5)
  # death related to  both causes in same way 
  cor.mat <- corM <- rbind(c(1.0, 0.0, 0.0), c(0.0, 1.0, 0.0), c(0.0, 0.0, 1.0))
  rr <- simRecurrentII(200,base1,base4,death.cumhaz=dr,var.z=var.z,cor.mat=cor.mat,dependence=2)
  rr <- count.history(rr,types=1:2)
  cor(attr(rr,"z"))
  coo <- covarianceRecurrent(rr,1,2,status="status",start="entry",stop="time")
  plot(coo,main ="Scenario I")
  • Scenario II ρ = (0, 0.5, 0.5) Independence among survivors but dependence on terminal event
  var.z <- c(0.5,0.5,0.5)
  # death related to  both causes in same way 
  cor.mat <- corM <- rbind(c(1.0, 0.0, 0.5), c(0.0, 1.0, 0.5), c(0.5, 0.5, 1.0))
  rr <- simRecurrentII(200,base1,base4,death.cumhaz=dr,var.z=var.z,cor.mat=cor.mat,dependence=2)
  rr <- count.history(rr,types=1:2)
  coo <- covarianceRecurrent(rr,1,2,status="status",start="entry",stop="time")
  par(mfrow=c(1,3))
  plot(coo,main ="Scenario II")
  • Scenario III ρ = (0.5, 0.5, 0.5) Positive dependence among survivors and dependence on terminal event
  var.z <- c(0.5,0.5,0.5)
  # positive dependence for N1 and N2 all related in same way
  cor.mat <- corM <- rbind(c(1.0, 0.5, 0.5), c(0.5, 1.0, 0.5), c(0.5, 0.5, 1.0))
  rr <- simRecurrentII(200,base1,base4,death.cumhaz=dr,var.z=var.z,cor.mat=cor.mat,dependence=2)
  rr <- count.history(rr,types=1:2)
  coo <- covarianceRecurrent(rr,1,2,status="status",start="entry",stop="time")
  par(mfrow=c(1,3))
  plot(coo,main="Scenario III")
  • Scenario IV ρ = (−0.4, 0.5, 0.5) Negative dependence among survivors and positive dependence on terminal event
  var.z <- c(0.5,0.5,0.5)
  # negative dependence for N1 and N2 all related in same way
  cor.mat <- corM <- rbind(c(1.0, -0.4, 0.5), c(-0.4, 1.0, 0.5), c(0.5, 0.5, 1.0))
  rr <- simRecurrentII(200,base1,base4,death.cumhaz=dr,var.z=var.z,cor.mat=cor.mat,dependence=2)
  rr <- count.history(rr,types=1:2)
  coo <- covarianceRecurrent(rr,1,2,status="status",start="entry",stop="time")
  par(mfrow=c(1,3))
  plot(coo,main="Scenario IV")

SessionInfo

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] splines   stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.1  cowplot_1.1.3  mets_1.3.4     timereg_2.0.6  survival_3.7-0
#> [6] rmarkdown_2.29
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.9          utf8_1.2.4          future_1.34.0      
#>  [4] lattice_0.22-6      listenv_0.9.1       digest_0.6.37      
#>  [7] magrittr_2.0.3      evaluate_1.0.1      grid_4.4.2         
#> [10] mvtnorm_1.3-2       fastmap_1.2.0       jsonlite_1.8.9     
#> [13] Matrix_1.7-1        fansi_1.0.6         scales_1.3.0       
#> [16] isoband_0.2.7       codetools_0.2-20    numDeriv_2016.8-1.1
#> [19] jquerylib_0.1.4     lava_1.8.0          cli_3.6.3          
#> [22] rlang_1.1.4         parallelly_1.40.1   future.apply_1.11.3
#> [25] munsell_0.5.1       withr_3.0.2         cachem_1.1.0       
#> [28] yaml_2.3.10         tools_4.4.2         parallel_4.4.2     
#> [31] ucminf_1.2.2        colorspace_2.1-1    globals_0.16.3     
#> [34] buildtools_1.0.0    vctrs_0.6.5         R6_2.5.1           
#> [37] lifecycle_1.0.4     MASS_7.3-61         pkgconfig_2.0.3    
#> [40] bslib_0.8.0         pillar_1.9.0        gtable_0.3.6       
#> [43] glue_1.8.0          Rcpp_1.0.13-1       xfun_0.49          
#> [46] tibble_3.2.1        sys_3.4.3           knitr_1.49         
#> [49] farver_2.1.2        htmltools_0.5.8.1   labeling_0.4.3     
#> [52] maketools_1.3.1     compiler_4.4.2