Title: | Estimate Inverse Probability Weights |
---|---|
Description: | Functions to estimate the probability to receive the observed treatment, based on individual characteristics. The inverse of these probabilities can be used as weights when estimating causal effects from observational data via marginal structural models. Both point treatment situations and longitudinal studies can be analysed. The same functions can be used to correct for informative censoring. |
Authors: | Willem M. van der Wal [aut, cre], Ronald B. Geskus [aut] (maintainer 2011-2022) |
Maintainer: | Willem M. van der Wal <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.1 |
Built: | 2025-01-05 06:55:59 UTC |
Source: | CRAN |
Simulated dataset. Baseline data of 386 HIV positive individuals, including time of first active tuberculosis, time of death, individual end time. Time varying CD4 measurements of these patients are included in dataset timedat
.
data(basdat)
data(basdat)
A data frame with 386 observations on the following 4 variables.
id
patient ID.
Ttb
time of first active tuberculosis, measured in days since HIV seroconversion.
Tdeath
time of death, measured in days since HIV seroconversion.
Tend
individual end time (either death or censoring), measured in days since HIV seroconversion.
These simulated data are used together with data in timedat
in a detailed causal modelling example using inverse probability weighting (IPW). See ipwtm
for the example. Data were simulated using the algorithm described in Van der Wal e.a. (2009).
Willem M. van der Wal [email protected], Ronald B. Geskus [email protected]
Van der Wal W.M. & Geskus R.B. (2011). ipw: An R Package for Inverse Probability Weighting. Journal of Statistical Software, 43(13), 1-23. doi:10.18637/jss.v043.i13.
Van der Wal W.M., Prins M., Lumbreras B. & Geskus R.B. (2009). A simple G-computation algorithm to quantify the causal effect of a secondary illness on the progression of a chronic disease. Statistics in Medicine, 28(18), 2325-2337.
basdat
, haartdat
, ipwplot
, ipwpoint
, ipwtm
, timedat
, tstartfun
.
#see ?ipwtm for example
#see ?ipwtm for example
Survival data measured in 1200 HIV positive patients. Start of follow-up is HIV seroconversion. Each row corresponds to a 100 day interval of follow-up time, using the counting process notation. Patients can initiate HAART therapy. CD4 count is a confounder for the effect of HAART on mortality.
data(haartdat)
data(haartdat)
patient
patient ID
tstart
starting time for each interval of follow-up, measured in days since HIV seroconversion
fuptime
end time for each interval of follow-up, measured in days since HIV seroconversion
haartind
indicator for the initiation of HAART therapy at the end of the interval (0=HAART not initiated/1=HAART initiated).
event
indicator for death at the end of the interval (0=alive/1=died)
sex
sex (0=male/1=female)
age
age at the start of follow-up (years)
cd4.sqrt
square root of CD4 count, measured at fuptime, before haartind
These data were simulated.
Patients can initiate HAART at fuptime=0
. Therefore, to allow the fitting of a model predicting initiation of HAART, starting time for the first interval within each patient is negative (-100).
Willem M. van der Wal [email protected], Ronald B. Geskus [email protected]
Van der Wal W.M. & Geskus R.B. (2011). ipw: An R Package for Inverse Probability Weighting. Journal of Statistical Software, 43(13), 1-23. doi:10.18637/jss.v043.i13.
basdat
, haartdat
, ipwplot
, ipwpoint
, ipwtm
, timedat
, tstartfun
.
#see ?ipwtm for example
#see ?ipwtm for example
A simulated dataset containing IQ, income and health score measurements in 1000 individuals.
data(healthdat)
data(healthdat)
A data frame with 1000 rows, with each row corresponding to a separate individual. The following variables are included:
id
individual ID.
iq
IQ score.
income
gross monthly income (EUR).
health
health score (0-100).
In these simulated data, IQ is a confounder for the effect of income on health.
Willem M. van der Wal [email protected], Ronald B. Geskus [email protected]
Van der Wal W.M. & Geskus R.B. (2011). ipw: An R Package for Inverse Probability Weighting. Journal of Statistical Software, 43(13), 1-23. doi:10.18637/jss.v043.i13.
basdat
, haartdat
, healthdat
, ipwplot
, ipwpoint
, ipwtm
, timedat
, tstartfun
.
#see ?ipwpoint for example
#see ?ipwpoint for example
For time varying weights: display boxplots within strata of follow-up time. For point treatment weights: display density plot.
ipwplot(weights, timevar = NULL, binwidth = NULL, logscale = TRUE, xlab = NULL, ylab = NULL, main = "", ref = TRUE, ...)
ipwplot(weights, timevar = NULL, binwidth = NULL, logscale = TRUE, xlab = NULL, ylab = NULL, main = "", ref = TRUE, ...)
weights |
numerical vector of inverse probability weights to plot. |
timevar |
numerical vector representing follow-up time. When specified, boxplots within strata of follow-up time are displayed. When left unspecified, a density plot is displayed. |
binwidth |
numerical value indicating the width of the intervals of follow-up time; for each interval a boxplot is made. Ignored when |
logscale |
logical value. If |
xlab |
label for the horizontal axis. |
ylab |
label for the vertical axis. |
main |
main title for the plot. |
ref |
logical value. If |
... |
additional arguments passed to |
A plot is displayed.
Willem M. van der Wal [email protected], Ronald B. Geskus [email protected]
Van der Wal W.M. & Geskus R.B. (2011). ipw: An R Package for Inverse Probability Weighting. Journal of Statistical Software, 43(13), 1-23. doi:10.18637/jss.v043.i13.
basdat
, haartdat
, ipwplot
, ipwpoint
, ipwtm
, timedat
, tstartfun
.
#see ?ipwpoint and ?ipwtm for examples
#see ?ipwpoint and ?ipwtm for examples
Estimate inverse probability weights to fit marginal structural models in a point treatment situation. The exposure for which we want to estimate the causal effect can be binomial, multinomial, ordinal or continuous. Both stabilized and unstabilized weights can be estimated.
ipwpoint(exposure, family, link, numerator = NULL, denominator, data, trunc = NULL, ...)
ipwpoint(exposure, family, link, numerator = NULL, denominator, data, trunc = NULL, ...)
exposure |
a vector, representing the exposure variable of interest. Both numerical and categorical variables can be used. A binomial exposure variable should be coded using values |
family |
is used to specify a family of link functions, used to model the relationship between the variables in |
link |
specifies the link function between the variables in |
numerator |
is a formula, specifying the right-hand side of the model used to estimate the elements in the numerator of the inverse probability weights. When left unspecified, unstabilized weights with a numerator of 1 are estimated. |
denominator |
is a formula, specifying the right-hand side of the model used to estimate the elements in the denominator of the inverse probability weights. This typically includes the variables specified in the numerator model, as well as confounders for which to correct. |
data |
is a dataframe containing |
trunc |
optional truncation percentile (0-0.5). E.g. when |
... |
are further arguments passed to the function that is used to estimate the numerator and denominator models (the function is chosen using |
For each unit under observation, this function computes an inverse probability weight, which is the ratio of two probabilities:
the numerator contains the probability of the observed exposure level given observed values of stabilization factors (usually a set of baseline covariates). These probabilities are estimated using the model regressing exposure
on the terms in numerator
, using the link function indicated by family
and link
.
the denominator contains the probability of the observed exposure level given the observed values of a set of confounders, as well as the stabilization factors in the numerator. These probabilities are estimated using the model regressing exposure
on the terms in denominator
, using the link function indicated by family
and link
.
When the models from which the elements in the numerator and denominator are predicted are correctly specified, and there is no unmeasured confounding, weighting the observations by the inverse probability weights adjusts for confounding of the effect of the exposure of interest. On the weighted dataset a marginal structural model can then be fitted, quantifying the causal effect of the exposure on the outcome of interest.
With numerator
specified, stabilized weights are computed, otherwise unstabilized weighs with a numerator of 1 are computed. With a continuous exposure, using family = "gaussian"
, weights are computed using the ratio of predicted densities. Therefore, for family = "gaussian"
only stabilized weights can be used, since unstabilized weights would have infinity variance.
A list containing the following elements:
ipw.weights |
is a vector containing inverse probability weights for each unit under observation. This vector is returned in the same order as the measurements contained in |
weights.trunc |
is a vector containing truncated inverse probability weights for each unit under observation. This vector is only returned when |
call |
is the original function call to |
num.mod |
is the numerator model, only returned when |
den.mod |
is the denominator model. |
Currently, the exposure
variable and the variables used in numerator
and denominator
should not contain missing values.
Willem M. van der Wal [email protected], Ronald B. Geskus [email protected]
Cole, S.R. & Hernán, M.A. (2008). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168(6), 656-664.
Robins, J.M., Hernán, M.A. & Brumback, B.A. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550-560.
Van der Wal W.M. & Geskus R.B. (2011). ipw: An R Package for Inverse Probability Weighting. Journal of Statistical Software, 43(13), 1-23. doi:10.18637/jss.v043.i13.
basdat
, haartdat
, ipwplot
, ipwpoint
, ipwtm
, timedat
, tstartfun
.
#Simulate data with continuous confounder and outcome, binomial exposure. #Marginal causal effect of exposure on outcome: 10. n <- 1000 simdat <- data.frame(l = rnorm(n, 10, 5)) a.lin <- simdat$l - 10 pa <- exp(a.lin)/(1 + exp(a.lin)) simdat$a <- rbinom(n, 1, prob = pa) simdat$y <- 10*simdat$a + 0.5*simdat$l + rnorm(n, -10, 5) simdat[1:5,] #Estimate ipw weights. temp <- ipwpoint( exposure = a, family = "binomial", link = "logit", numerator = ~ 1, denominator = ~ l, data = simdat) summary(temp$ipw.weights) #Plot inverse probability weights graphics.off() ipwplot(weights = temp$ipw.weights, logscale = FALSE, main = "Stabilized weights", xlim = c(0, 8)) #Examine numerator and denominator models. summary(temp$num.mod) summary(temp$den.mod) #Paste inverse probability weights simdat$sw <- temp$ipw.weights #Marginal structural model for the causal effect of a on y #corrected for confounding by l using inverse probability weighting #with robust standard error from the survey package. require("survey") msm <- (svyglm(y ~ a, design = svydesign(~ 1, weights = ~ sw, data = simdat))) coef(msm) confint(msm) ## Not run: #Compute basic bootstrap confidence interval . #require(boot) #boot.fun <- function(dat, index){ # coef(glm( # formula = y ~ a, # data = dat[index,], # weights = ipwpoint( # exposure = a, # family = "gaussian", # numerator = ~ 1, # denominator = ~ l, # data = dat[index,])$ipw.weights))[2] # } #bootres <- boot(simdat, boot.fun, 499);bootres #boot.ci(bootres, type = "basic") ## End(Not run)
#Simulate data with continuous confounder and outcome, binomial exposure. #Marginal causal effect of exposure on outcome: 10. n <- 1000 simdat <- data.frame(l = rnorm(n, 10, 5)) a.lin <- simdat$l - 10 pa <- exp(a.lin)/(1 + exp(a.lin)) simdat$a <- rbinom(n, 1, prob = pa) simdat$y <- 10*simdat$a + 0.5*simdat$l + rnorm(n, -10, 5) simdat[1:5,] #Estimate ipw weights. temp <- ipwpoint( exposure = a, family = "binomial", link = "logit", numerator = ~ 1, denominator = ~ l, data = simdat) summary(temp$ipw.weights) #Plot inverse probability weights graphics.off() ipwplot(weights = temp$ipw.weights, logscale = FALSE, main = "Stabilized weights", xlim = c(0, 8)) #Examine numerator and denominator models. summary(temp$num.mod) summary(temp$den.mod) #Paste inverse probability weights simdat$sw <- temp$ipw.weights #Marginal structural model for the causal effect of a on y #corrected for confounding by l using inverse probability weighting #with robust standard error from the survey package. require("survey") msm <- (svyglm(y ~ a, design = svydesign(~ 1, weights = ~ sw, data = simdat))) coef(msm) confint(msm) ## Not run: #Compute basic bootstrap confidence interval . #require(boot) #boot.fun <- function(dat, index){ # coef(glm( # formula = y ~ a, # data = dat[index,], # weights = ipwpoint( # exposure = a, # family = "gaussian", # numerator = ~ 1, # denominator = ~ l, # data = dat[index,])$ipw.weights))[2] # } #bootres <- boot(simdat, boot.fun, 499);bootres #boot.ci(bootres, type = "basic") ## End(Not run)
Estimate inverse probability weights to fit marginal structural models, with a time-varying exposure and time-varying confounders. Within each unit under observation this function computes inverse probability weights at each time point during follow-up. The exposure can be binomial, multinomial, ordinal or continuous. Both stabilized and unstabilized weights can be estimated.
ipwtm(exposure, family, link, numerator = NULL, denominator, id, tstart, timevar, type, data, corstr = "ar1", trunc = NULL, ...)
ipwtm(exposure, family, link, numerator = NULL, denominator, id, tstart, timevar, type, data, corstr = "ar1", trunc = NULL, ...)
exposure |
vector, representing the exposure of interest. Both numerical and categorical variables can be used. A binomial exposure variable should be coded using values |
family |
specifies a family of link functions, used to model the relationship between the variables in |
link |
specifies the specific link function between the variables in |
numerator |
is a formula, specifying the right-hand side of the model used to estimate the elements in the numerator of the inverse probability weights. When left unspecified, unstabilized weights with a numerator of 1 are estimated. |
denominator |
is a formula, specifying the right-hand side of the model used to estimate the elements in the denominator of the inverse probability weights. |
id |
vector, uniquely identifying the units under observation (typically patients) within which the longitudinal measurements are taken. |
tstart |
numerical vector, representing the starting time of follow-up intervals, using the counting process notation. This argument is only needed when |
timevar |
numerical vector, representing follow-up time, starting at |
type |
specifies the type of exposure. Alternatives are |
data |
dataframe containing |
corstr |
correlation structure, only needed when using |
trunc |
optional truncation percentile (0-0.5). E.g. when |
... |
are further arguments passed to the function that is used to estimate the numerator and denominator models (the function is chosen using |
Within each unit under observation i (usually patients), this function computes inverse probability weights at each time point j during follow-up. These weights are the cumulative product over all previous time points up to j of the ratio of two probabilities:
the numerator contains at each time point the probability of the observed exposure level given observed values of stabilization factors and the observed exposure history up to the time point before j. These probabilities are estimated using the model regressing exposure
on the terms in numerator
, using the link function indicated by family
and link
.
the denominator contains at each time point the probability of the observed exposure level given the observed history of time varying confounders up to j, as well as the stabilization factors in the numerator and the observed exposure history up to the time point before j. These probabilities are estimated using the model regressing exposure
on the terms in denominator
, using the link function indicated by family
and link
.
When the models from which the elements in the numerator and denominator are predicted are correctly specified, and there is no unmeasured confounding, weighting observations ij by the inverse probability weights adjusts for confounding of the effect of the exposure of interest. On the weighted dataset a marginal structural model can then be fitted, quantifying the causal effect of the exposure on the outcome of interest.
With numerator
specified, stabilized weights are computed, otherwise unstabilized weights with a numerator of 1 are computed. With a continuous exposure, using family = "gaussian"
, weights are computed using the ratio of predicted densities at each time point. Therefore, for family = "gaussian"
only stabilized weights can be used, since unstabilized weights would have infinity variance.
A list containing the following elements:
ipw.weights |
vector containing inverse probability weights for each observation. Returned in the same order as the observations in |
weights.trunc |
vector containing truncated inverse probability weights, only returned when |
call |
the original function call. |
selvar |
selection variable. With |
num.mod |
the numerator model, only returned when |
den.mod |
the denominator model. |
Currently, the exposure
variable and the variables used in numerator
and denominator
, id
, tstart
and timevar
should not contain missing values.
Willem M. van der Wal [email protected], Ronald B. Geskus [email protected]
Cole, S.R. & Hernán, M.A. (2008). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168(6), 656-664. https://pubmed.ncbi.nlm.nih.gov:443/18682488/.
Robins, J.M., Hernán, M.A. & Brumback, B.A. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550-560. https://pubmed.ncbi.nlm.nih.gov/10955408/.
Van der Wal W.M. & Geskus R.B. (2011). ipw: An R Package for Inverse Probability Weighting. Journal of Statistical Software, 43(13), 1-23. doi:10.18637/jss.v043.i13.
basdat
, haartdat
, ipwplot
, ipwpoint
, ipwtm
, timedat
, tstartfun
.
######################################################################## #EXAMPLE 1 #Load longitudinal data from HIV positive individuals. data(haartdat) #CD4 is confounder for the effect of initiation of HAART therapy on mortality. #Estimate inverse probability weights to correct for confounding. #Exposure allocation model is Cox proportional hazards model. temp <- ipwtm( exposure = haartind, family = "survival", numerator = ~ sex + age, denominator = ~ sex + age + cd4.sqrt, id = patient, tstart = tstart, timevar = fuptime, type = "first", data = haartdat) #plot inverse probability weights graphics.off() ipwplot(weights = temp$ipw.weights, timevar = haartdat$fuptime, binwidth = 100, ylim = c(-1.5, 1.5), main = "Stabilized inverse probability weights") #CD4 count has an effect both on dropout and mortality, which causes informative censoring. #Use inverse probability of censoring weighting to correct for effect of CD4 on dropout. #Use Cox proportional hazards model for dropout. temp2 <- ipwtm( exposure = dropout, family = "survival", numerator = ~ sex + age, denominator = ~ sex + age + cd4.sqrt, id = patient, tstart = tstart, timevar = fuptime, type = "cens", data = haartdat) #plot inverse probability of censoring weights graphics.off() ipwplot(weights = temp2$ipw.weights, timevar = haartdat$fuptime, binwidth = 100, ylim = c(-1.5, 1.5), main = "Stabilized inverse probability of censoring weights") #MSM for the causal effect of initiation of HAART on mortality. #Corrected both for confounding and informative censoring. #With robust standard error obtained using cluster(). require(survival) summary(coxph(Surv(tstart, fuptime, event) ~ haartind + cluster(patient), data = haartdat, weights = temp$ipw.weights*temp2$ipw.weights)) #uncorrected model summary(coxph(Surv(tstart, fuptime, event) ~ haartind, data = haartdat)) ######################################################################## #EXAMPLE 2 data(basdat) data(timedat) #Aim: to model the causal effect of active tuberculosis (TB) on mortality. #Longitudinal CD4 is a confounder as well as intermediate for the effect of TB. #process original measurements #check for ties (not allowed) table(duplicated(timedat[,c("id", "fuptime")])) #take square root of CD4 because of skewness timedat$cd4.sqrt <- sqrt(timedat$cd4count) #add TB time to dataframe timedat <- merge(timedat, basdat[,c("id", "Ttb")], by = "id", all.x = TRUE) #compute TB status timedat$tb.lag <- ifelse(with(timedat, !is.na(Ttb) & fuptime > Ttb), 1, 0) #longitudinal CD4-model require(nlme) cd4.lme <- lme(cd4.sqrt ~ fuptime + tb.lag, random = ~ fuptime | id, data = timedat) #build new dataset: #rows corresponding to TB-status switches, and individual end times times <- sort(unique(c(basdat$Ttb, basdat$Tend))) startstop <- data.frame( id = rep(basdat$id, each = length(times)), fuptime = rep(times, nrow(basdat))) #add baseline data to dataframe startstop <- merge(startstop, basdat, by = "id", all.x = TRUE) #limit individual follow-up using Tend startstop <- startstop[with(startstop, fuptime <= Tend),] startstop$tstart <- tstartfun(id, fuptime, startstop) #compute tstart (?tstartfun) #indicate TB status startstop$tb <- ifelse(with(startstop, !is.na(Ttb) & fuptime >= Ttb), 1, 0) #indicate TB status at previous time point startstop$tb.lag <- ifelse(with(startstop, !is.na(Ttb) & fuptime > Ttb), 1, 0) #indicate death startstop$event <- ifelse(with(startstop, !is.na(Tdeath) & fuptime >= Tdeath), 1, 0) #impute CD4, based on TB status at previous time point. startstop$cd4.sqrt <- predict(cd4.lme, newdata = data.frame(id = startstop$id, fuptime = startstop$fuptime, tb.lag = startstop$tb.lag)) #compute inverse probability weights temp <- ipwtm( exposure = tb, family = "survival", numerator = ~ 1, denominator = ~ cd4.sqrt, id = id, tstart = tstart, timevar = fuptime, type = "first", data = startstop) summary(temp$ipw.weights) ipwplot(weights = temp$ipw.weights, timevar = startstop$fuptime, binwidth = 100) #models #IPW-fitted MSM, using cluster() to obtain robust standard error estimate require(survival) summary(coxph(Surv(tstart, fuptime, event) ~ tb + cluster(id), data = startstop, weights = temp$ipw.weights)) #unadjusted summary(coxph(Surv(tstart, fuptime, event) ~ tb, data = startstop)) #adjusted using conditioning: part of the effect of TB is adjusted away summary(coxph(Surv(tstart, fuptime, event) ~ tb + cd4.sqrt, data = startstop)) ## Not run: #compute bootstrap CI for TB parameter (takes a few hours) #taking into account the uncertainty introduced by modelling longitudinal CD4 #taking into account the uncertainty introduced by estimating the inverse probability weights #robust with regard to weights unequal to 1 # require(boot) # boot.fun <- function(data, index, data.tm){ # data.samp <- data[index,] # data.samp$id.samp <- 1:nrow(data.samp) # data.tm.samp <- do.call("rbind", lapply(data.samp$id.samp, function(id.samp) { # cbind(data.tm[data.tm$id == data.samp$id[data.samp$id.samp == id.samp],], # id.samp = id.samp) # } # )) # cd4.lme <- lme(cd4.sqrt ~ fuptime + tb.lag, random = ~ fuptime | id.samp, data = data.tm.samp) # times <- sort(unique(c(data.samp$Ttb, data.samp$Tend))) # startstop.samp <- data.frame(id.samp = rep(data.samp$id.samp, each = length(times)), # fuptime = rep(times, nrow(data.samp))) # startstop.samp <- merge(startstop.samp, data.samp, by = "id.samp", all.x = TRUE) # startstop.samp <- startstop.samp[with(startstop.samp, fuptime <= Tend),] # startstop.samp$tstart <- tstartfun(id.samp, fuptime, startstop.samp) # startstop.samp$tb <- ifelse(with(startstop.samp, !is.na(Ttb) & fuptime >= Ttb), 1, 0) # startstop.samp$tb.lag <- ifelse(with(startstop.samp, !is.na(Ttb) & fuptime > Ttb), 1, 0) # startstop.samp$event <- ifelse(with(startstop.samp, !is.na(Tdeath) & fuptime >= Tdeath), 1, 0) # startstop.samp$cd4.sqrt <- predict(cd4.lme, newdata = data.frame(id.samp = # startstop.samp$id.samp, fuptime = startstop.samp$fuptime, tb.lag = startstop.samp$tb.lag)) # # return(coef(coxph(Surv(tstart, fuptime, event) ~ tb, data = startstop.samp, # weights = ipwtm( # exposure = tb, # family = "survival", # numerator = ~ 1, # denominator = ~ cd4.sqrt, # id = id.samp, # tstart = tstart, # timevar = fuptime, # type = "first", # data = startstop.samp)$ipw.weights))[1]) # } # bootres <- boot(data = basdat, statistic = boot.fun, R = 999, data.tm = timedat) # bootres # boot.ci(bootres, type = "basic") # ## End(Not run)
######################################################################## #EXAMPLE 1 #Load longitudinal data from HIV positive individuals. data(haartdat) #CD4 is confounder for the effect of initiation of HAART therapy on mortality. #Estimate inverse probability weights to correct for confounding. #Exposure allocation model is Cox proportional hazards model. temp <- ipwtm( exposure = haartind, family = "survival", numerator = ~ sex + age, denominator = ~ sex + age + cd4.sqrt, id = patient, tstart = tstart, timevar = fuptime, type = "first", data = haartdat) #plot inverse probability weights graphics.off() ipwplot(weights = temp$ipw.weights, timevar = haartdat$fuptime, binwidth = 100, ylim = c(-1.5, 1.5), main = "Stabilized inverse probability weights") #CD4 count has an effect both on dropout and mortality, which causes informative censoring. #Use inverse probability of censoring weighting to correct for effect of CD4 on dropout. #Use Cox proportional hazards model for dropout. temp2 <- ipwtm( exposure = dropout, family = "survival", numerator = ~ sex + age, denominator = ~ sex + age + cd4.sqrt, id = patient, tstart = tstart, timevar = fuptime, type = "cens", data = haartdat) #plot inverse probability of censoring weights graphics.off() ipwplot(weights = temp2$ipw.weights, timevar = haartdat$fuptime, binwidth = 100, ylim = c(-1.5, 1.5), main = "Stabilized inverse probability of censoring weights") #MSM for the causal effect of initiation of HAART on mortality. #Corrected both for confounding and informative censoring. #With robust standard error obtained using cluster(). require(survival) summary(coxph(Surv(tstart, fuptime, event) ~ haartind + cluster(patient), data = haartdat, weights = temp$ipw.weights*temp2$ipw.weights)) #uncorrected model summary(coxph(Surv(tstart, fuptime, event) ~ haartind, data = haartdat)) ######################################################################## #EXAMPLE 2 data(basdat) data(timedat) #Aim: to model the causal effect of active tuberculosis (TB) on mortality. #Longitudinal CD4 is a confounder as well as intermediate for the effect of TB. #process original measurements #check for ties (not allowed) table(duplicated(timedat[,c("id", "fuptime")])) #take square root of CD4 because of skewness timedat$cd4.sqrt <- sqrt(timedat$cd4count) #add TB time to dataframe timedat <- merge(timedat, basdat[,c("id", "Ttb")], by = "id", all.x = TRUE) #compute TB status timedat$tb.lag <- ifelse(with(timedat, !is.na(Ttb) & fuptime > Ttb), 1, 0) #longitudinal CD4-model require(nlme) cd4.lme <- lme(cd4.sqrt ~ fuptime + tb.lag, random = ~ fuptime | id, data = timedat) #build new dataset: #rows corresponding to TB-status switches, and individual end times times <- sort(unique(c(basdat$Ttb, basdat$Tend))) startstop <- data.frame( id = rep(basdat$id, each = length(times)), fuptime = rep(times, nrow(basdat))) #add baseline data to dataframe startstop <- merge(startstop, basdat, by = "id", all.x = TRUE) #limit individual follow-up using Tend startstop <- startstop[with(startstop, fuptime <= Tend),] startstop$tstart <- tstartfun(id, fuptime, startstop) #compute tstart (?tstartfun) #indicate TB status startstop$tb <- ifelse(with(startstop, !is.na(Ttb) & fuptime >= Ttb), 1, 0) #indicate TB status at previous time point startstop$tb.lag <- ifelse(with(startstop, !is.na(Ttb) & fuptime > Ttb), 1, 0) #indicate death startstop$event <- ifelse(with(startstop, !is.na(Tdeath) & fuptime >= Tdeath), 1, 0) #impute CD4, based on TB status at previous time point. startstop$cd4.sqrt <- predict(cd4.lme, newdata = data.frame(id = startstop$id, fuptime = startstop$fuptime, tb.lag = startstop$tb.lag)) #compute inverse probability weights temp <- ipwtm( exposure = tb, family = "survival", numerator = ~ 1, denominator = ~ cd4.sqrt, id = id, tstart = tstart, timevar = fuptime, type = "first", data = startstop) summary(temp$ipw.weights) ipwplot(weights = temp$ipw.weights, timevar = startstop$fuptime, binwidth = 100) #models #IPW-fitted MSM, using cluster() to obtain robust standard error estimate require(survival) summary(coxph(Surv(tstart, fuptime, event) ~ tb + cluster(id), data = startstop, weights = temp$ipw.weights)) #unadjusted summary(coxph(Surv(tstart, fuptime, event) ~ tb, data = startstop)) #adjusted using conditioning: part of the effect of TB is adjusted away summary(coxph(Surv(tstart, fuptime, event) ~ tb + cd4.sqrt, data = startstop)) ## Not run: #compute bootstrap CI for TB parameter (takes a few hours) #taking into account the uncertainty introduced by modelling longitudinal CD4 #taking into account the uncertainty introduced by estimating the inverse probability weights #robust with regard to weights unequal to 1 # require(boot) # boot.fun <- function(data, index, data.tm){ # data.samp <- data[index,] # data.samp$id.samp <- 1:nrow(data.samp) # data.tm.samp <- do.call("rbind", lapply(data.samp$id.samp, function(id.samp) { # cbind(data.tm[data.tm$id == data.samp$id[data.samp$id.samp == id.samp],], # id.samp = id.samp) # } # )) # cd4.lme <- lme(cd4.sqrt ~ fuptime + tb.lag, random = ~ fuptime | id.samp, data = data.tm.samp) # times <- sort(unique(c(data.samp$Ttb, data.samp$Tend))) # startstop.samp <- data.frame(id.samp = rep(data.samp$id.samp, each = length(times)), # fuptime = rep(times, nrow(data.samp))) # startstop.samp <- merge(startstop.samp, data.samp, by = "id.samp", all.x = TRUE) # startstop.samp <- startstop.samp[with(startstop.samp, fuptime <= Tend),] # startstop.samp$tstart <- tstartfun(id.samp, fuptime, startstop.samp) # startstop.samp$tb <- ifelse(with(startstop.samp, !is.na(Ttb) & fuptime >= Ttb), 1, 0) # startstop.samp$tb.lag <- ifelse(with(startstop.samp, !is.na(Ttb) & fuptime > Ttb), 1, 0) # startstop.samp$event <- ifelse(with(startstop.samp, !is.na(Tdeath) & fuptime >= Tdeath), 1, 0) # startstop.samp$cd4.sqrt <- predict(cd4.lme, newdata = data.frame(id.samp = # startstop.samp$id.samp, fuptime = startstop.samp$fuptime, tb.lag = startstop.samp$tb.lag)) # # return(coef(coxph(Surv(tstart, fuptime, event) ~ tb, data = startstop.samp, # weights = ipwtm( # exposure = tb, # family = "survival", # numerator = ~ 1, # denominator = ~ cd4.sqrt, # id = id.samp, # tstart = tstart, # timevar = fuptime, # type = "first", # data = startstop.samp)$ipw.weights))[1]) # } # bootres <- boot(data = basdat, statistic = boot.fun, R = 999, data.tm = timedat) # bootres # boot.ci(bootres, type = "basic") # ## End(Not run)
Simulated dataset. Time varying CD4 measurements of 386 HIV positive individuals. Time of first active tuberculosis, time of death and individual end time of the patients are included in dataset basdat
.
data(timedat)
data(timedat)
A data frame with 6291 observations on the following 3 variables.
id
patient ID.
fuptime
follow-up time (days since HIV seroconversion).
cd4count
CD4 count measured at fuptime.
These simulated data are used together with data in basdat
in a detailed causal modelling example using inverse probability weighting (IPW). See ipwtm
for the example. Data were simulated using the algorithm described in Van der Wal e.a. (2009).
Willem M. van der Wal [email protected], Ronald B. Geskus [email protected]
Cole, S.R. & Hernán, M.A. (2008). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168(6), 656-664.
Robins, J.M., Hernán, M.A. & Brumback, B.A. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550-560.
Van der Wal W.M. & Geskus R.B. (2011). ipw: An R Package for Inverse Probability Weighting. Journal of Statistical Software, 43(13), 1-23. doi:10.18637/jss.v043.i13.
Van der Wal W.M., Prins M., Lumbreras B. & Geskus R.B. (2009). A simple G-computation algorithm to quantify the causal effect of a secondary illness on the progression of a chronic disease. Statistics in Medicine, 28(18), 2325-2337.
basdat
, haartdat
, ipwplot
, ipwpoint
, ipwtm
, timedat
, tstartfun
.
#See ?ipwtm for example
#See ?ipwtm for example
Function to compute starting time for intervals of follow-up, when using the counting process notation. Within each unit under observation (usually individuals), computes starting time equal to:
time of previous record when there is a previous record.
-1 for first record.
tstartfun(id, timevar, data)
tstartfun(id, timevar, data)
id |
numerical vector, uniquely identifying the units under observation, within which the longitudinal measurements are taken. |
timevar |
numerical vector, representing follow-up time, starting at 0. |
data |
dataframe containing |
Numerical vector containing starting time for each record. In the same order as the records in data
, to facilitate merging.
Currently, id
and timevar
should not contain missing values.
Willem M. van der Wal [email protected], Ronald B. Geskus [email protected]
Van der Wal W.M. & Geskus R.B. (2011). ipw: An R Package for Inverse Probability Weighting. Journal of Statistical Software, 43(13), 1-23. doi:10.18637/jss.v043.i13.
basdat
, haartdat
, ipwplot
, ipwpoint
, ipwtm
, timedat
, tstartfun
.
#data mydata1 <- data.frame( patient = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2), time.days = c(14, 34, 41, 56, 72, 98, 0, 11, 28, 35)) #compute starting time for each interval mydata1$tstart <- tstartfun(patient, time.days, mydata1) #result mydata1 #see also ?ipwtm for example
#data mydata1 <- data.frame( patient = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2), time.days = c(14, 34, 41, 56, 72, 98, 0, 11, 28, 35)) #compute starting time for each interval mydata1$tstart <- tstartfun(patient, time.days, mydata1) #result mydata1 #see also ?ipwtm for example