Title: | Estimating Causal Dose Response Functions |
---|---|
Description: | Functions and data to estimate causal dose response functions given continuous, ordinal, or binary treatments. A description of the methods is given in Galagate (2016) <https://drum.lib.umd.edu/handle/1903/18170>. |
Authors: | Douglas Galagate [cre], Joseph Schafer [aut] |
Maintainer: | Douglas Galagate <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.2 |
Built: | 2024-12-01 08:32:57 UTC |
Source: | CRAN |
This function estimates the ADRF with an additive spline estimator described in Bia et al. (2014).
add_spl_est(Y, treat, treat_formula, data, grid_val, knot_num, treat_mod, link_function, ...)
add_spl_est(Y, treat, treat_formula, data, grid_val, knot_num, treat_mod, link_function, ...)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
treat_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
data |
is a dataframe containing |
grid_val |
contains the treatment values to be evaluated. |
knot_num |
is the number of knots used in outcome model |
treat_mod |
a description of the error distribution to be used in the
model for treatment. Options include: |
link_function |
is either "log", "inverse", or "identity" for the
"Gamma" |
... |
additional arguments to be passed to the outcome regression fitting function. |
This function estimates the ADRF using additive splines in the outcome model described in Bia et al. (2014).
add_spl_est
returns an object of class "causaldrf",
a list that contains the following components:
param |
parameter estimates for a add_spl fit. |
t_mod |
the result of the treatment model fit. |
out_mod |
the result of the outcome model fit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Bia, Michela, et al. (2014). A Stata package for the application of semiparametric estimators of dose response functions. Stata Journal 14.3, 580-604.
nw_est
, iw_est
, hi_est
, gam_est
,
bart_est
, etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data add_spl_list <- add_spl_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1), knot_num = 3, treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "additive spline estimate") lines(seq(8, 16, by = 1), add_spl_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "additive spline estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, add_spl_list, sample_index) ## See Vignette for more examples.
## Example from Schafer (2015). example_data <- sim_data add_spl_list <- add_spl_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1), knot_num = 3, treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "additive spline estimate") lines(seq(8, 16, by = 1), add_spl_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "additive spline estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, add_spl_list, sample_index) ## See Vignette for more examples.
This method combines the regression estimator with a residual bias correction for estimating a parametric ADRF.
aipwee_est(Y, treat, covar_formula = ~ 1, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data, e_treat_1 = NULL, e_treat_2 = NULL, e_treat_3 = NULL, e_treat_4 = NULL, degree = 1, wt = NULL, method = "same", spline_df = NULL, spline_const = 1, spline_linear = 1, spline_quad = 1)
aipwee_est(Y, treat, covar_formula = ~ 1, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data, e_treat_1 = NULL, e_treat_2 = NULL, e_treat_3 = NULL, e_treat_4 = NULL, degree = 1, wt = NULL, method = "same", spline_df = NULL, spline_const = 1, spline_linear = 1, spline_quad = 1)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
covar_formula |
is the formula to describe the covariates needed
to estimate the constant term:
|
covar_lin_formula |
is the formula to describe the covariates needed
to estimate the linear term, t:
|
covar_sq_formula |
is the formula to describe the covariates needed
to estimate the quadratic term, t^2:
|
data |
is a dataframe containing |
e_treat_1 |
a vector, representing the conditional expectation of
|
e_treat_2 |
a vector, representing the conditional expectation of
|
e_treat_3 |
a vector, representing the conditional expectation of
|
e_treat_4 |
a vector, representing the conditional expectation of
|
degree |
is 1 for linear and 2 for quadratic outcome model. |
wt |
is weight used in lsfit for outcome regression. Default is wt = NULL. |
method |
is "same" if the same set of covariates are used to estimate the constant, linear, and/or quadratic term. If method = "different", then different sets of covariates can be used to estimate the constant, linear, and/or quadratic term. covar_lin_formula and covar_sq_formula must be specified if method = "different". |
spline_df |
degrees of freedom. The default, spline_df = NULL, corresponds to no knots. |
spline_const |
is the number of spline terms needed to estimate the constant term. |
spline_linear |
is the number of spline terms needed to estimate the linear term. |
spline_quad |
is the number of spline terms needed to estimate the quadratic term. |
This estimator bears a strong
resemblance to general regression estimators in the survey
literature, part of a more general class of calibration
estimators (Deville and Sarndal, 1992). It is
doubly robust, which means that it is consistent if
either of the models is true (Scharfstein, Rotnitzky and Robins
1999). If the Y-model is correct, then the first term in
the previous equation is unbiased for and the second term has mean
zero even if the T-model is wrong. If the Y-model is incorrect, the
first term is biased, but the second term gives a consistent estimate
of (minus one times) the bias from the Y-model if the T-model is
correct.
This function is a doubly-robust estimator that fits an outcome regression model with a bias correction term. For details see Schafer and Galagate (2015).
aipwee_est
returns an object of class "causaldrf_lsfit",
a list that contains the following components:
param |
parameter estimates for a add_spl fit. |
t_mod |
the result of the treatment model fit. |
out_mod |
the result of the outcome model fit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Schafer, Joseph L, Kang, Joseph (2008). Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychological methods, 13.4, 279.
Robins, James M and Rotnitzky, Andrea (1995). Semiparametric efficiency in multivariate regression models with missing data Journal of the American Statistical Association, 90.429, 122–129.
Scharfstein, Daniel O and Rotnitzky, Andrea and Robins, James M (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models Journal of the American Statistical Association, 94.448, 1096–1120.
Deville, Jean-Claude and Sarndal, Carl-Erik (1992). Calibration estimators in survey sampling Journal of the American Statistical Association, 87.418, 376–380.
iptw_est
, ismw_est
,
reg_est
, wtrg_est
,
##' etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) aipwee_list <- aipwee_est(Y = Y, treat = T, covar_formula = ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data = example_data, e_treat_1 = full_data$est_treat, e_treat_2 = full_data$est_treat_sq, e_treat_3 = full_data$est_treat_cube, e_treat_4 = full_data$est_treat_quartic, degree = 1, wt = NULL, method = "same", spline_df = NULL, spline_const = 1, spline_linear = 1, spline_quad = 1) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "aipwee estimate") abline(aipwee_list$param[1], aipwee_list$param[2], lty = 2, lwd = 2, col = "blue") legend('bottomright', "aipwee estimate", lty = 2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, t_mod_list, cond_exp_data, full_data, aipwee_list, sample_index)
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) aipwee_list <- aipwee_est(Y = Y, treat = T, covar_formula = ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data = example_data, e_treat_1 = full_data$est_treat, e_treat_2 = full_data$est_treat_sq, e_treat_3 = full_data$est_treat_cube, e_treat_4 = full_data$est_treat_quartic, degree = 1, wt = NULL, method = "same", spline_df = NULL, spline_const = 1, spline_linear = 1, spline_quad = 1) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "aipwee estimate") abline(aipwee_list$param[1], aipwee_list$param[2], lty = 2, lwd = 2, col = "blue") legend('bottomright', "aipwee estimate", lty = 2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, t_mod_list, cond_exp_data, full_data, aipwee_list, sample_index)
This function estimates the ADRF using Bayesian additive regression trees (BART).
bart_est(Y, treat, outcome_formula, data, grid_val, ...)
bart_est(Y, treat, outcome_formula, data, grid_val, ...)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
outcome_formula |
is the formula used for fitting the outcome surface.
gps is one of the independent variables to use in the outcome_formula. ie.
|
data |
is a dataframe containing |
grid_val |
contains the treatment values to be evaluated. |
... |
additional arguments to be passed to the bart() outcome function. |
BART is a prediction model that is applicable to many settings, one of which is causal inference problems. It is a sum of trees fit, but the influence of each tree is held back by a regularization prior so that each tree only contributes a small amount to the overall fit. Priors are put on the parameters to avoid overfitting the data and so that no single tree has a significant influence on the model fit. For more details see Chipman (2010).
BART does not require fitting a treatment model. Instead, it fits a
response surface to the whole dataset and if the response surface is
correctly specified, then the causal effect estimate is unbiased.
Although most of the focus on BART is for the binary treatment setting,
Hill (2011) also mentions an extension to the continuous or
multidose treatment setting. When using BART in this continuous treatment
setting, Hill (2011) compares the outcomes of units with
treatment level to their outcomes had
.
This method infers the treatment effect of units had they not received
treatment compared to their actual observed treatment. The comparison
is between
and
where
means that the unit is part of the treatment group.
The causal effect is comparing the predicted outcome of units that
received treatment with what their predicted outcome would have been
had they received zero treatment.
This method performs well in simulation studies. One drawback from BART is the amount of computing time needed.
bart_est
returns an object of class "causaldrf_simple",
a list that contains the following components:
param |
parameter estimates for a bart fit. |
out_mod |
the result of the bart fit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Hill, Jennifer L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20.1 (2011).
Chipman, Hugh A and George, Edward I and McCulloch, Robert E and others (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics 4.1, 266–298.
nw_est
, iw_est
, hi_est
, gam_est
,
add_spl_est
, etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). bart takes a few minutes to run (depending on computer). example_data <- sim_data # This estimate takes a long time to run... bart_list <- bart_est(Y = Y, treat = T, outcome_formula = Y ~ T + B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1)) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "bart estimate") lines(seq(8, 16, by = 1), bart_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "bart estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, bart_list, sample_index)
## Example from Schafer (2015). bart takes a few minutes to run (depending on computer). example_data <- sim_data # This estimate takes a long time to run... bart_list <- bart_est(Y = Y, treat = T, outcome_formula = Y ~ T + B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1)) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "bart estimate") lines(seq(8, 16, by = 1), bart_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "bart estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, bart_list, sample_index)
This estimates the ADRF using a method similar to that described in Hirano and Imbens (2004), but with spline basis terms in the outcome model.
gam_est(Y, treat, treat_formula, data, grid_val, treat_mod, link_function, ...)
gam_est(Y, treat, treat_formula, data, grid_val, treat_mod, link_function, ...)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
treat_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
data |
is a dataframe containing |
grid_val |
contains the treatment values to be evaluated. |
treat_mod |
a description of the error distribution to be used in the
model for treatment. Options include: |
link_function |
is either "log", "inverse", or "identity" for the
"Gamma" |
... |
additional arguments to be passed to the gam() outcome function. |
This function estimates the ADRF similarly to the method described by Hirano and Imbens (2004), but with a generalized additive model in the outcome model.
gam_est
returns an object of class "causaldrf",
a list that contains the following components:
param |
parameter estimates for a gam fit. |
t_mod |
the result of the treatment model fit. |
out_mod |
the result of the outcome model fit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Hirano, Keisuke, Imbens, Guido W (2004). The propensity score with continuous treatments. Applied Bayesian modeling and causal inference from incomplete-data perspectives.
Flores, Carlos A and Flores-Lagunes, Alfonso and Gonzalez, Arturo and Neumann, Todd C (2012). Estimating the effects of length of exposure to instruction in a training program: the case of job corps. Review of Economics and Statistics. 94.1, 153-171
nw_est
, iw_est
, hi_est
, gam_est
,
add_spl_est
,
bart_est
, etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data gam_list <- gam_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1), treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "gam estimate") lines(seq(8, 16, by = 1), gam_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "gam estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, gam_list, sample_index)
## Example from Schafer (2015). example_data <- sim_data gam_list <- gam_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1), treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "gam estimate") lines(seq(8, 16, by = 1), gam_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "gam estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, gam_list, sample_index)
This function takes a matrix containing the bootstrapped coefficients from a parametric ADRF estimator and returns upper and lower 95 percent confidence lines.
get_ci(grid_val, coef_mat, degree)
get_ci(grid_val, coef_mat, degree)
grid_val |
is the vector of grid values on |
coef_mat |
contains the bootstrapped parameter estimates. |
degree |
is 1 for linear and 2 for quadratic outcome model |
get_ci
returns upper and lower 95 percent confidence lines.
This function estimates the GPS function and estimates the ADRF. The GPS score is based on different treatment models. The treatment is linearly related to Xs.
hi_est(Y, treat, treat_formula, outcome_formula, data, grid_val, treat_mod, link_function, ...)
hi_est(Y, treat, treat_formula, outcome_formula, data, grid_val, treat_mod, link_function, ...)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
treat_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
outcome_formula |
is the formula used for fitting the outcome surface. gps is one of the independent variables to use in the outcome_formula. ie.
or a variation
of this. Use |
data |
is a dataframe containing |
grid_val |
contains the treatment values to be evaluated. |
treat_mod |
a description of the error distribution to be used in the
model for treatment. Options include: |
link_function |
For |
... |
additional arguments to be passed to the outcome lm() function. |
Hirano (2004) (HI) introduced this imputation-type method that includes a GPS component. The idea is to fit a parametric observable (outcome) model, which includes the estimated GPS as a covariate, to impute missing potential outcomes.
The method requires several steps. First, a model is used to relate
treatment to the recorded covariates. For example,
and then estimate the
parameters.
Next, the GPS for each unit is estimated
These GPS estimates are used in the outcome or observable model.
The outcome is modeled as a function of and
parametrically. For example,
After collecting the estimated parameters in the outcome and treatment
models, plug-in the treatment values into the model to estimate the
missing potential outcomes of each individual at that treatment level.
For example, if we plug in into the estimated models, then
each unit will have a potential outcome estimated at treatment level
.
The next step is to aggregate these estimated potential outcomes
to get an average treatment effect at dose level .
The mean outcome at dose-level
is given by:
Different treatment levels are plugged into the previous equation
to estimate the missing potential outcomes. If many values
are evaluated, then it is possible to trace out an ADRF.
hi_est
returns an object of class "causaldrf",
a list that contains the following components:
param |
parameter estimates for a hi fit. |
t_mod |
the result of the treatment model fit. |
out_mod |
the result of the outcome model fit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Hirano, Keisuke, Imbens, Guido W (2004). The propensity score with continuous treatments. Applied Bayesian modeling and causal inference from incomplete-data perspectives.
nw_est
, iw_est
, hi_est
, gam_est
,
add_spl_est
,
bart_est
, etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data hi_list <- hi_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, outcome_formula = Y ~ T + I(T^2) + gps + I(gps^2) + T * gps, data = example_data, grid_val = seq(8, 16, by = 1), treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "hi estimate") lines(seq(8, 16, by = 1), hi_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "hi estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, hi_list, sample_index) ## Example from van der Wal, Willem M., and Ronald B. Geskus. (2011) #Simulate data with continuous confounder and outcome, binomial exposure. #Marginal causal effect of exposure on outcome: 10. n <- 1000 simdat <- data.frame(l = rnorm(n, 10, 5)) a.lin <- simdat$l - 10 pa <- exp(a.lin)/(1 + exp(a.lin)) simdat$a <- rbinom(n, 1, prob = pa) simdat$y <- 10*simdat$a + 0.5*simdat$l + rnorm(n, -10, 5) simdat[1:5,] temp_hi <- hi_est(Y = y, treat = a, treat_formula = a ~ l, outcome_formula = y ~ gps, data = simdat, grid_val = c(0, 1), treat_mod = "Binomial", link_function = "logit") temp_hi[[1]] # estimated coefficients
## Example from Schafer (2015). example_data <- sim_data hi_list <- hi_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, outcome_formula = Y ~ T + I(T^2) + gps + I(gps^2) + T * gps, data = example_data, grid_val = seq(8, 16, by = 1), treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "hi estimate") lines(seq(8, 16, by = 1), hi_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "hi estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, hi_list, sample_index) ## Example from van der Wal, Willem M., and Ronald B. Geskus. (2011) #Simulate data with continuous confounder and outcome, binomial exposure. #Marginal causal effect of exposure on outcome: 10. n <- 1000 simdat <- data.frame(l = rnorm(n, 10, 5)) a.lin <- simdat$l - 10 pa <- exp(a.lin)/(1 + exp(a.lin)) simdat$a <- rbinom(n, 1, prob = pa) simdat$y <- 10*simdat$a + 0.5*simdat$l + rnorm(n, -10, 5) simdat[1:5,] temp_hi <- hi_est(Y = y, treat = a, treat_formula = a ~ l, outcome_formula = y ~ gps, data = simdat, grid_val = c(0, 1), treat_mod = "Binomial", link_function = "logit") temp_hi[[1]] # estimated coefficients
Simulated data used in the paper "The propensity score with continuous treatments."
data(hi_sim_data)
data(hi_sim_data)
A data frame with 1000 rows and 6 variables:
A dataset containing hi_sim_data.
use the hi_sample
function
Hirano, Keisuke, and Guido W. Imbens. "The propensity score with continuous treatments." Applied Bayesian modeling and causal inference from incomplete-data perspectives (2004): 73-84.
Moodie, Erica EM, and David A. Stephens. "Estimation of dose-response functions for longitudinal data using the generalised propensity score." Statistical methods in medical research 21.2 (2012): 149-166.
## Example from Hirano and Imbens (2004). data(hi_sim_data) head(hi_sim_data)
## Example from Hirano and Imbens (2004). data(hi_sim_data) head(hi_sim_data)
The iptw method or importance weighting method estimates the ADRF by weighting the data with stabilized or non-stabilized weights.
iptw_est(Y, treat, treat_formula, numerator_formula, data, degree, treat_mod, link_function, ...)
iptw_est(Y, treat, treat_formula, numerator_formula, data, degree, treat_mod, link_function, ...)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
treat_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
numerator_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
data |
is a dataframe containing |
degree |
is 1 for linear and 2 for quadratic outcome model. |
treat_mod |
a description of the error distribution to be used in the
model for treatment. Options include: |
link_function |
specifies the link function between the variables in
numerator or denominator and exposure, respectively.
For |
... |
additional arguments to be passed to the low level treatment regression fitting functions. |
This method uses inverse probability of treatment weighting to adjust for possible biases. For more details see Schafer and Galagate (2015) and Robins, Hernan, and Brumback (2000).
iptw_est
returns an object of class "causaldrf",
a list that contains the following components:
param |
parameter estimates for a iptw fit. |
t_mod |
the result of the treatment model fit. |
num_mod |
the result of the numerator model fit. |
weights |
the estimated weights. |
weight_data |
the weights. |
out_mod |
the outcome model. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
van der Wal, Willem M., and Ronald B. Geskus. "IPW: an R package for inverse probability weighting." Journal of Statistical Software 43.13 (2011): 1-23.
Robins, James M and Hernan, Miguel Angel and Brumback, Babette. Marginal structural models and causal inference in epidemiology. Epidemiology 11.5 (2000): 550–560.
Zhu, Yeying and Coffman, Donna L and Ghosh, Debashis. A Boosting Algorithm for Estimating Generalized Propensity Scores with Continuous Treatments. Journal of Causal Inference 3.1 (2015): 25–40.
iptw_est
, ismw_est
,
reg_est
, aipwee_est
, wtrg_est
,
etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data iptw_list <- iptw_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, numerator_formula = T ~ 1, data = example_data, degree = 1, treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "iptw estimate") abline(iptw_list$param[1], iptw_list$param[2], lty=2, lwd = 2, col = "blue") legend('bottomright', "iptw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, iptw_list, sample_index) ## Example from van der Wal, Willem M., and Ronald B. Geskus. (2011) #Simulate data with continuous confounder and outcome, binomial exposure. #Marginal causal effect of exposure on outcome: 10. n <- 1000 simdat <- data.frame(l = rnorm(n, 10, 5)) a.lin <- simdat$l - 10 pa <- exp(a.lin)/(1 + exp(a.lin)) simdat$a <- rbinom(n, 1, prob = pa) simdat$y <- 10*simdat$a + 0.5*simdat$l + rnorm(n, -10, 5) simdat[1:5,] temp_iptw <- iptw_est(Y = y, treat = a, treat_formula = a ~ l, numerator_formula = a ~ 1, data = simdat, degree = 1, treat_mod = "Binomial", link_function = "logit") temp_iptw[[1]] # estimated coefficients
## Example from Schafer (2015). example_data <- sim_data iptw_list <- iptw_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, numerator_formula = T ~ 1, data = example_data, degree = 1, treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "iptw estimate") abline(iptw_list$param[1], iptw_list$param[2], lty=2, lwd = 2, col = "blue") legend('bottomright', "iptw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, iptw_list, sample_index) ## Example from van der Wal, Willem M., and Ronald B. Geskus. (2011) #Simulate data with continuous confounder and outcome, binomial exposure. #Marginal causal effect of exposure on outcome: 10. n <- 1000 simdat <- data.frame(l = rnorm(n, 10, 5)) a.lin <- simdat$l - 10 pa <- exp(a.lin)/(1 + exp(a.lin)) simdat$a <- rbinom(n, 1, prob = pa) simdat$y <- 10*simdat$a + 0.5*simdat$l + rnorm(n, -10, 5) simdat[1:5,] temp_iptw <- iptw_est(Y = y, treat = a, treat_formula = a ~ l, numerator_formula = a ~ 1, data = simdat, degree = 1, treat_mod = "Binomial", link_function = "logit") temp_iptw[[1]] # estimated coefficients
This method estimates the ADRF by using weighting matrices instead of
scalars. The weight matrices require conditional expectations of the
treatment and higher order conditional expectations. It uses outputs from
the t_mod
function.
ismw_est(Y, treat, data, e_treat_1, e_treat_2, e_treat_3, e_treat_4, degree )
ismw_est(Y, treat, data, e_treat_1, e_treat_2, e_treat_3, e_treat_4, degree )
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
data |
is a dataframe containing |
e_treat_1 |
a vector, representing the conditional expectation of
|
e_treat_2 |
a vector, representing the conditional expectation of
|
e_treat_3 |
a vector, representing the conditional expectation of
|
e_treat_4 |
a vector, representing the conditional expectation of
|
degree |
is 1 for linear and 2 for quadratic outcome model. |
This function estimates the ADRF requires estimated moments and uses the outputs of the t_mod function as inputs. For more details, see Schafer and Galagate (2015).
ismw_est
returns an object of class "causaldrf_simple",
a list that contains the following components:
param |
the estimated parameters. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
iptw_est
, ismw_est
,
reg_est
, aipwee_est
, wtrg_est
,
etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) ismw_list <- ismw_est(Y = Y, treat = T, data = full_data, e_treat_1 = full_data$est_treat, e_treat_2 = full_data$est_treat_sq, e_treat_3 = full_data$est_treat_cube, e_treat_4 = full_data$est_treat_quartic, degree = 1) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "ismw estimate") abline(ismw_list$param[1], ismw_list$param[2], lty=2, lwd = 2, col = "blue") legend('bottomright', "ismw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, t_mod_list, cond_exp_data, full_data, ismw_list, sample_index)
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) ismw_list <- ismw_est(Y = Y, treat = T, data = full_data, e_treat_1 = full_data$est_treat, e_treat_2 = full_data$est_treat_sq, e_treat_3 = full_data$est_treat_cube, e_treat_4 = full_data$est_treat_quartic, degree = 1) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "ismw estimate") abline(ismw_list$param[1], ismw_list$param[2], lty=2, lwd = 2, col = "blue") legend('bottomright', "ismw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, t_mod_list, cond_exp_data, full_data, ismw_list, sample_index)
This is a nonparametric method that estimates the ADRF by using a local linear
regression of Y
on treat
with weighted kernel function. For
details, see Flores et. al. (2012).
iw_est(Y, treat, treat_formula, data, grid_val, bandw, treat_mod, link_function, ...)
iw_est(Y, treat, treat_formula, data, grid_val, bandw, treat_mod, link_function, ...)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
treat_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
data |
is a dataframe containing |
grid_val |
contains the treatment values to be evaluated. |
bandw |
is the bandwidth. Default is 1. |
treat_mod |
a description of the error distribution to be used in the
model for treatment. Options include: |
link_function |
is either "log", "inverse", or "identity" for the
"Gamma" |
... |
additional arguments to be passed to the treatment regression function. |
The ADRF is estimated by
where
and
which is a local linear regression.
More details are given in Flores (2012).
iw_est
returns an object of class "causaldrf",
a list that contains the following components:
param |
parameter estimates for a iw fit. |
t_mod |
the result of the treatment model fit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Flores, Carlos A., et al. "Estimating the effects of length of exposure to instruction in a training program: the case of job corps." Review of Economics and Statistics 94.1 (2012): 153-171.
nw_est
, iw_est
, hi_est
, gam_est
,
add_spl_est
,
bart_est
, etc. for other estimates.
## Example from Schafer (2015). example_data <- sim_data iw_list <- iw_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1), bandw = bw.SJ(example_data$T), treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "iw estimate") lines(seq(8, 16, by = 1), iw_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "iw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, iw_list, sample_index) ## Example from Imai & van Dyk (2004). data("nmes_data") head(nmes_data) # look at only people with medical expenditures greater than 0 nmes_nonzero <- nmes_data[which(nmes_data$TOTALEXP > 0), ] iw_list <- iw_est(Y = TOTALEXP, treat = packyears, treat_formula = packyears ~ LASTAGE + I(LASTAGE^2) + AGESMOKE + I(AGESMOKE^2) + MALE + RACE3 + beltuse + educate + marital + SREGION + POVSTALB, data = nmes_nonzero, grid_val = seq(5, 100, by = 5), bandw = bw.SJ(nmes_nonzero$packyears), treat_mod = "LogNormal") set.seed(307) sample_index <- sample(1:nrow(nmes_nonzero), 1000) plot(nmes_nonzero$packyears[sample_index], nmes_nonzero$TOTALEXP[sample_index], xlab = "packyears", ylab = "TOTALEXP", main = "iw estimate", ylim = c(0, 10000), xlim = c(0, 100)) lines(seq(5, 100, by = 5), iw_list$param, lty = 2, lwd = 2, col = "blue") legend('topright', "iw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex = 1) abline(0, 0)
## Example from Schafer (2015). example_data <- sim_data iw_list <- iw_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1), bandw = bw.SJ(example_data$T), treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "iw estimate") lines(seq(8, 16, by = 1), iw_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "iw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, iw_list, sample_index) ## Example from Imai & van Dyk (2004). data("nmes_data") head(nmes_data) # look at only people with medical expenditures greater than 0 nmes_nonzero <- nmes_data[which(nmes_data$TOTALEXP > 0), ] iw_list <- iw_est(Y = TOTALEXP, treat = packyears, treat_formula = packyears ~ LASTAGE + I(LASTAGE^2) + AGESMOKE + I(AGESMOKE^2) + MALE + RACE3 + beltuse + educate + marital + SREGION + POVSTALB, data = nmes_nonzero, grid_val = seq(5, 100, by = 5), bandw = bw.SJ(nmes_nonzero$packyears), treat_mod = "LogNormal") set.seed(307) sample_index <- sample(1:nrow(nmes_nonzero), 1000) plot(nmes_nonzero$packyears[sample_index], nmes_nonzero$TOTALEXP[sample_index], xlab = "packyears", ylab = "TOTALEXP", main = "iw estimate", ylim = c(0, 10000), xlim = c(0, 100)) lines(seq(5, 100, by = 5), iw_list$param, lty = 2, lwd = 2, col = "blue") legend('topright', "iw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex = 1) abline(0, 0)
Data set from the NMES. with 9708 observations and 12 variables.
data(nmes_data)
data(nmes_data)
A dataset containing 9708 observations and 12 variables.
Imai, K., & van Dyk, D.A. (2004). Causal Inference With General Treatment Regimes: Generalizing the Propensity Score. Journal of the American Statistical Association, 99(467).
National Center for Health Services Research and Health Care Technology Assessment. NATIONAL MEDICAL EXPENDITURE SURVEY, 1987: INSTITUTIONAL POPULATION COMPONENT. Rockville, MD: Westat, Inc. [producer], 1987. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 1990. doi:10.3886/ICPSR09280.v1
Bryer, Jason M. "TriMatch: An R Package for Propensity Score Matching of Non-binary Treatments." The R User Conference, useR! 2013 July 10-12 2013 University of Castilla-La Mancha, Albacete, Spain. Vol. 10. No. 30. 2013.
data(nmes_data) head(nmes_data)
data(nmes_data) head(nmes_data)
This is a kernel based regression method that uses a kernel as a weighting function to estimate the ADRF. The normal kernel is weighted by the inverse of the estimated GPS. See Flores et al. (2012) for more details.
nw_est(Y, treat, treat_formula, data, grid_val, bandw, treat_mod, link_function, ...)
nw_est(Y, treat, treat_formula, data, grid_val, bandw, treat_mod, link_function, ...)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
treat_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
data |
is a dataframe containing |
grid_val |
contains the treatment values to be evaluated. |
bandw |
is the bandwidth. Default is 1. |
treat_mod |
a description of the error distribution to be used in the
model for treatment. Options include: |
link_function |
is either "log", "inverse", or "identity" for the
"Gamma" |
... |
additional arguments to be passed to the treatment regression function. |
This method is a version of the Nadarya-Watson estimator Nadaraya (1964) which is a local constant regression but weighted by the inverse of the estimated GPS.
nw_est
returns an object of class "causaldrf",
a list that contains the following components:
param |
parameter estimates for a nw fit. |
t_mod |
the result of the treatment model fit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Flores, Carlos A., et al. "Estimating the effects of length of exposure to instruction in a training program: the case of job corps." Review of Economics and Statistics 94.1 (2012): 153-171.
Nadaraya, Elizbar A. "On estimating regression." Theory of Probability and Its Applications 9.1 (1964): 141–142.
nw_est
, iw_est
, hi_est
, gam_est
,
add_spl_est
,
bart_est
, etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data nw_list <- nw_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1), bandw = bw.SJ(example_data$T), treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "nw estimate") lines(seq(8, 16, by = 1), nw_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "nw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, nw_list, sample_index)
## Example from Schafer (2015). example_data <- sim_data nw_list <- nw_est(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, grid_val = seq(8, 16, by = 1), bandw = bw.SJ(example_data$T), treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "nw estimate") lines(seq(8, 16, by = 1), nw_list$param, lty = 2, lwd = 2, col = "blue") legend('bottomright', "nw estimate", lty=2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, nw_list, sample_index)
This function ensures that the units overlap according to the estimated gps
values. The overlapping dataset depends on the number of classes
n_class
to subclassify on.
overlap_fun(Y, treat, treat_formula, data_set, n_class, treat_mod, link_function, ...)
overlap_fun(Y, treat, treat_formula, data_set, n_class, treat_mod, link_function, ...)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
treat_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
data_set |
is a dataframe containing |
n_class |
is the number of classes to split |
treat_mod |
a description of the error distribution to be used in the
model for treatment. Options include: |
link_function |
is either "log", "inverse", or "identity" for the
"Gamma" |
... |
additional arguments to be passed to the treatment regression function |
overlap_fun
returns a list containing the following
elements:
overlap_dataset |
dataframe containing overlapping data. |
median_vec |
a vector containing median values. |
overlap_treat_result |
the resulting treatment fit. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Bia, Michela, et al. "A Stata package for the application of semiparametric estimators of dose response functions." Stata Journal 14.3 (2014): 580-604.
iptw_est
, ismw_est
,
reg_est
, aipwee_est
, wtrg_est
,
etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data overlap_list <- overlap_fun(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data_set = example_data, n_class = 3, treat_mod = "Normal") overlapped_data <- overlap_list$overlap_dataset summary(overlapped_data) rm(example_data, overlap_list, overlapped_data)
## Example from Schafer (2015). example_data <- sim_data overlap_list <- overlap_fun(Y = Y, treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data_set = example_data, n_class = 3, treat_mod = "Normal") overlapped_data <- overlap_list$overlap_dataset summary(overlapped_data) rm(example_data, overlap_list, overlapped_data)
This method estimates the linear or quadratic parameters of the ADRF by estimating a least-squares fit on the basis functions which are composed of combinations of the covariates, propensity spline basis, and treatment values.
prop_spline_est(Y, treat, covar_formula = ~ 1, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data, e_treat_1 = NULL, degree = 1, wt = NULL, method = "same", spline_df = NULL, spline_const = 1, spline_linear = 1, spline_quad = 1)
prop_spline_est(Y, treat, covar_formula = ~ 1, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data, e_treat_1 = NULL, degree = 1, wt = NULL, method = "same", spline_df = NULL, spline_const = 1, spline_linear = 1, spline_quad = 1)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
covar_formula |
is the formula to describe the covariates needed
to estimate the constant term:
|
covar_lin_formula |
is the formula to describe the covariates needed
to estimate the linear term, t:
|
covar_sq_formula |
is the formula to describe the covariates needed
to estimate the quadratic term, t^2:
|
data |
is a dataframe containing |
e_treat_1 |
a vector, representing the conditional expectation of
|
degree |
is 1 for linear and 2 for quadratic outcome model. |
wt |
is weight used in lsfit for outcome regression. Default is wt = NULL. |
method |
is "same" if the same set of covariates are used to estimate the constant, linear, and/or quadratic term with no spline terms. If method = "different", then different sets of covariates can be used to estimate the constant, linear, and/or quadratic term. To use spline terms, it is necessary to set method = "different". covar_lin_formula and covar_sq_formula must be specified if method = "different". |
spline_df |
degrees of freedom. The default, spline_df = NULL, corresponds to no knots. |
spline_const |
is the number of spline terms to include when estimating the constant term. |
spline_linear |
is the number of spline terms to include when estimating the linear term. |
spline_quad |
is the number of spline terms to include when estimating the quadratic term. |
This function estimates the ADRF by the method described in Schafer and Galagate (2015), that fits an outcome model using a function of the covariates and spline basis functions derived from the propensity function component.
prop_spline_est
returns an object of class "causaldrf_lsfit",
a list that contains the following components:
param |
the estimated parameters. |
out_mod |
the result of the outcome model fit using lsfit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Little, Roderick and An, Hyonggin (2004). ROBUST LIKELIHOOD-BASED ANALYSIS OF MULTIVARIATE DATA WITH MISSING VALUES. Statistica Sinica. 14: 949–968.
Schafer, Joseph L, Kang, Joseph (2008). Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychological methods, 13.4, 279.
iptw_est
, ismw_est
,
reg_est
, aipwee_est
, wtrg_est
,
etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) prop_spline_list <- prop_spline_est(Y = Y, treat = T, covar_formula = ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data = example_data, e_treat_1 = full_data$est_treat, degree = 1, wt = NULL, method = "different", spline_df = 5, spline_const = 4, spline_linear = 4, spline_quad = 4) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "propensity spline estimate") abline(prop_spline_list$param[1], prop_spline_list$param[2], lty = 2, col = "blue", lwd = 2) legend('bottomright', "propensity spline estimate", lty = 2, bty = 'Y', cex = 1, col = "blue", lwd = 2) rm(example_data, prop_spline_list, sample_index)
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) prop_spline_list <- prop_spline_est(Y = Y, treat = T, covar_formula = ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data = example_data, e_treat_1 = full_data$est_treat, degree = 1, wt = NULL, method = "different", spline_df = 5, spline_const = 4, spline_linear = 4, spline_quad = 4) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "propensity spline estimate") abline(prop_spline_list$param[1], prop_spline_list$param[2], lty = 2, col = "blue", lwd = 2) legend('bottomright', "propensity spline estimate", lty = 2, bty = 'Y', cex = 1, col = "blue", lwd = 2) rm(example_data, prop_spline_list, sample_index)
This method estimates the linear or quadratic parameters of the ADRF by estimating a least-squares fit on the basis functions which are composed of combinations of the covariates and treatment values.
reg_est(Y, treat, covar_formula, covar_lin_formula = NULL, covar_sq_formula = NULL, data, degree, wt = NULL, method = "same")
reg_est(Y, treat, covar_formula, covar_lin_formula = NULL, covar_sq_formula = NULL, data, degree, wt = NULL, method = "same")
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
covar_formula |
is the formula to describe the covariates needed
to estimate the constant term:
|
covar_lin_formula |
is the formula to describe the covariates needed
to estimate the linear term, t:
|
covar_sq_formula |
is the formula to describe the covariates needed
to estimate the quadratic term, t^2:
|
data |
is a dataframe containing |
degree |
is 1 for linear and 2 for quadratic outcome model. |
wt |
is weight used in lsfit for outcome regression. Default is wt = NULL. |
method |
is "same" if the same set of covariates are used to estimate the constant, linear, and/or quadratic term. If method = "different", then different sets of covariates can be used to estimate the constant, linear, and/or quadratic term. covar_lin_formula and covar_sq_formula must be specified if method = "different". |
This function estimates the ADRF by the method described in Schafer and Galagate (2015) that fits an outcome model using a function of the covariates.
reg_est
returns an object of class "causaldrf_lsfit",
a list that contains the following components:
param |
the estimated parameters. |
out_mod |
the result of the outcome model fit using lsfit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Schafer, Joseph L, Kang, Joseph (2008). Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychological methods, 13.4, 279.
iptw_est
, ismw_est
,
aipwee_est
, wtrg_est
,
etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data reg_list <- reg_est(Y = Y, treat = T, covar_formula = ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data = example_data, degree = 1, wt = NULL, method = "same") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "regression estimate") abline(reg_list$param[1], reg_list$param[2], lty = 2, col = "blue", lwd = 2) legend('bottomright', "regression estimate", lty = 2, bty = 'Y', cex = 1, col = "blue", lwd = 2) rm(example_data, reg_list, sample_index)
## Example from Schafer (2015). example_data <- sim_data reg_list <- reg_est(Y = Y, treat = T, covar_formula = ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, covar_lin_formula = ~ 1, covar_sq_formula = ~ 1, data = example_data, degree = 1, wt = NULL, method = "same") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "regression estimate") abline(reg_list$param[1], reg_list$param[2], lty = 2, col = "blue", lwd = 2) legend('bottomright', "regression estimate", lty = 2, bty = 'Y', cex = 1, col = "blue", lwd = 2) rm(example_data, reg_list, sample_index)
This function calculates the scalar weights
scalar_wts(treat, treat_formula, numerator_formula, data, treat_mod, link_function, ...)
scalar_wts(treat, treat_formula, numerator_formula, data, treat_mod, link_function, ...)
treat |
is the name of the treatment variable contained in
|
treat_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
numerator_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
data |
is a dataframe containing |
treat_mod |
a description of the error distribution to be used in the
model for treatment. Options include: |
link_function |
is either "log", "inverse", or "identity" for the
"Gamma" |
... |
additional arguments to be passed to the treatment regression fitting function. |
scalar_wts
returns an object of class "causaldrf_wts",
a list that contains the following components:
param |
summary of estimated weights. |
t_mod |
the result of the treatment model fit. |
num_mod |
the result of the numerator model fit. |
weights |
estimated weights for each unit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
iptw_est
, ismw_est
,
reg_est
, aipwee_est
, wtrg_est
,
etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data scalar_wts_list <- scalar_wts(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, numerator_formula = T ~ 1, data = example_data, treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], scalar_wts_list$weights[sample_index], xlab = "T", ylab = "weights", main = "scalar_wts") rm(example_data, scalar_wts_list, sample_index)
## Example from Schafer (2015). example_data <- sim_data scalar_wts_list <- scalar_wts(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, numerator_formula = T ~ 1, data = example_data, treat_mod = "Normal") sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], scalar_wts_list$weights[sample_index], xlab = "T", ylab = "weights", main = "scalar_wts") rm(example_data, scalar_wts_list, sample_index)
Simulated data used in the paper "Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models".
data(sim_data)
data(sim_data)
A data frame with 1000 rows and 20 variables:
A dataset containing sim_data.
(A.1, A.2, A.3, A.4, A.5, A.6, A.7, A.8)
are the true measured covariates.
(B.1, B.2, B.3, B.4, B.5, B.6, B.7, B.8)
are the transformed covariates.
T |
treatment |
Theta.1 |
unit level intercept |
Theta.2 |
unit level slope |
Y |
outcome |
use the draw_sample
function
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
## Example from Schafer (2015). data(sim_data) head(sim_data)
## Example from Schafer (2015). data(sim_data) head(sim_data)
This function fits a glm regression specified by the user to estimate conditional moments.
t_mod(treat, treat_formula, data, treat_mod, link_function, ...)
t_mod(treat, treat_formula, data, treat_mod, link_function, ...)
treat |
is the name of the treatment variable contained in |
treat_formula |
an object of class "formula" (or one that can be coerced to that class)
that regresses |
data |
is a dataframe containing |
treat_mod |
a description of the error distribution
to be used in the model for treatment. Options include:
|
link_function |
is either "log", "inverse", or "identity" for the "Gamma" |
... |
additional arguments to be passed to the low level treatment regression fitting functions. |
t_mod
returns a list containing the following elements:
T_data |
a dataframe containing estimated treatment, estimated treatment squared, estimated treatment cube, estimated treatment quartic, and estimated gps. |
T_result |
the result of the treatment model fit. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
ismw_est
, reg_est
,
wtrg_est
, aipwee_est
, etc. for other estimates.
overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) rm(example_data, t_mod_list, cond_exp_data, full_data)
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) rm(example_data, t_mod_list, cond_exp_data, full_data)
This method uses weight matrices to estimate parameters for an ADRF with quadratic or linear fits.
wtrg_est(Y, treat, covar_formula, data, e_treat_1, e_treat_2, e_treat_3, e_treat_4, degree)
wtrg_est(Y, treat, covar_formula, data, e_treat_1, e_treat_2, e_treat_3, e_treat_4, degree)
Y |
is the output |
treat |
is the treatment variable |
covar_formula |
is the formula for the covariates model of the form: ~ X.1 + .... |
data |
will contain all the data: X, treat, and Y |
e_treat_1 |
is estimated treatment |
e_treat_2 |
is estimated treatment squared |
e_treat_3 |
is estimated treatment cubed |
e_treat_4 |
is estimated treatment to the fourth |
degree |
is 1 for linear fit and 2 for quadratic fit |
This function estimates the ADRF by the method described in Schafer and Galagate (2015) which uses weight matrices to adjust for possible bias.
wtrg_est
returns an object of class "causaldrf",
a list that contains the following components:
param |
the estimated parameters. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
iptw_est
, ismw_est
,
reg_est
, aipwee_est
, wtrg_est
,
etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) wtrg_list <- wtrg_est(Y = Y, treat = T, covar_formula = ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, e_treat_1 = full_data$est_treat, e_treat_2 = full_data$est_treat_sq, e_treat_3 = full_data$est_treat_cube, e_treat_4 = full_data$est_treat_quartic, degree = 1) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "weighted regression estimate") abline(wtrg_list$param[1], wtrg_list$param[2], lty = 2, lwd = 2, col = "blue") legend('bottomright', "weighted regression estimate", lty = 2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, t_mod_list, cond_exp_data, full_data, wtrg_list, sample_index)
## Example from Schafer (2015). example_data <- sim_data t_mod_list <- t_mod(treat = T, treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, treat_mod = "Normal") cond_exp_data <- t_mod_list$T_data full_data <- cbind(example_data, cond_exp_data) wtrg_list <- wtrg_est(Y = Y, treat = T, covar_formula = ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8, data = example_data, e_treat_1 = full_data$est_treat, e_treat_2 = full_data$est_treat_sq, e_treat_3 = full_data$est_treat_cube, e_treat_4 = full_data$est_treat_quartic, degree = 1) sample_index <- sample(1:1000, 100) plot(example_data$T[sample_index], example_data$Y[sample_index], xlab = "T", ylab = "Y", main = "weighted regression estimate") abline(wtrg_list$param[1], wtrg_list$param[2], lty = 2, lwd = 2, col = "blue") legend('bottomright', "weighted regression estimate", lty = 2, lwd = 2, col = "blue", bty='Y', cex=1) rm(example_data, t_mod_list, cond_exp_data, full_data, wtrg_list, sample_index)