Title: | Stable Balancing Weights for Causal Inference and Missing Data |
---|---|
Description: | Implements the Stable Balancing Weights by Zubizarreta (2015) <DOI:10.1080/01621459.2015.1023805>. These are the weights of minimum variance that approximately balance the empirical distribution of the observed covariates. For an overview, see Chattopadhyay, Hase and Zubizarreta (2020) <DOI:10.1002/sim.8659>. To solve the optimization problem in 'sbw', the default solver is 'quadprog', which is readily available through CRAN. The solver 'osqp' is also posted on CRAN. To enhance the performance of 'sbw', users are encouraged to install other solvers such as 'gurobi' and 'Rmosek', which require special installation. For the installation of gurobi and pogs, please follow the instructions at <https://www.gurobi.com/documentation/current/refman/r_ins_the_r_package.html> and <http://foges.github.io/pogs/stp/r>. |
Authors: | Jose R. Zubizarreta [aut, cre], Yige Li [aut], Kwangho Kim [aut], Amine Allouah [ctb], Noah Greifer [ctb] |
Maintainer: | Jose R. Zubizarreta <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.1.9 |
Built: | 2024-12-25 06:30:33 UTC |
Source: | CRAN |
Function for estimating causal contrasts and population means using the output from sbw
.
estimate(object, out = NULL, digits = 6, ...)
estimate(object, out = NULL, digits = 6, ...)
object |
an object from function |
out |
outcome, a vector of strings with the names of the outcome variables. The default is the |
digits |
a scalar with the number of significant digits used to display the estimates. The default is |
... |
ignored arguments. |
An estimate for the estimand of interest. The standard error is calculated by robust sandwich variance estimator.
# Please see the examples in the function sbw below.
# Please see the examples in the function sbw below.
Data set from the National Supported Work Demonstration (Lalonde 1986, Dehejia and Wahba 1999). This data set is publicly available at https://users.nber.org/~rdehejia/data/.nswdata2.html.
data(lalonde)
data(lalonde)
A data frame with 614 observations, corresponding to 185 treated and 429 control subjects, and 10 variables. The treatment assignment indicator is the first variable of the data frame; the next eight columns are the covariates; the last column is the outcome:
the treatment assignment indicator (1 if treated, 0 otherwise)
a covariate, measured in years
a covariate, measured in years
a covariate indicating race (1 if black, 0 otherwise)
a covariate indicating race (1 if Hispanic, 0 otherwise)
a covariate indicating marital status (1 if married, 0 otherwise)
a covariate indicating high school diploma (1 if no degree, 0 otherwise)
a covariate, real earnings in 1974
a covariate, real earnings in 1975
the outcome, real earnings in 1978
https://users.nber.org/~rdehejia/data/.nswdata2.html
Dehejia, R., and Wahba, S. (1999), "Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs," Journal of the American Statistical Association, 94, 1053-1062.
Lalonde, R. (1986), "Evaluating the Econometric Evaluations of Training Programs," American Economic Review, 76, 604-620.
Function for finding stable weights (that is, weights of minimum variance) that approximately balance the empirical distribution of the observed covariates.
sbw( dat, ind = NULL, out = NULL, bal = list(bal_cov, bal_alg = TRUE, bal_tol, bal_std = "group", bal_gri = c(1e-04, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1), bal_sam = 1000), wei = list(wei_sum = TRUE, wei_pos = TRUE), sol = list(sol_nam = "quadprog", sol_dis = FALSE), par = list(par_est = "att", par_tar = NULL), mes = TRUE )
sbw( dat, ind = NULL, out = NULL, bal = list(bal_cov, bal_alg = TRUE, bal_tol, bal_std = "group", bal_gri = c(1e-04, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1), bal_sam = 1000), wei = list(wei_sum = TRUE, wei_pos = TRUE), sol = list(sol_nam = "quadprog", sol_dis = FALSE), par = list(par_est = "att", par_tar = NULL), mes = TRUE )
dat |
data, a data frame with a treatment assignment or missingness indicator, covariates, and possibly outcomes (which are optional). |
ind |
treatment assignment or missingness indicator, a string with the name of the binary treatment or missingness indicator, equal to 1 if treated (missing) and 0 otherwise.
When |
out |
outcome, a vector of strings with the names of the outcome variables. The default is |
bal |
balance requirements, a list with the requirements for covariate balance with the form
|
wei |
weighting constraints, a list with all the weighting constraints with the form
|
sol |
solver, a list that specifies the solver option with the form
See the POGS manual for details. |
par |
parameter of interest, a list describing the parameter of interest or estimand with the form
|
mes |
a logical variable indicating whether the messages are printed. |
A list with the following elements:
dat_weights
, a data frame with the optimal weights dat_weights$sbw_weights
;
ind
, an argument provided by the user;
out
, an argument provided by the user;
bal
, an argument provided by the user;
wei
, an argument provided by the user;
sol
, an argument provided by the user;
par
, an argument provided by the user;
effective_sample_size
, effective sample size/sizes for the weighted group/groups;
objective_value
, value/values of the objective function/functions at the optimum;
status
, status of the solution. If the optimal weights are found, status = optimal
;
otherwise, the solution may be not optimal or not exist, in which case an error will be returned with details specific to the solver used.
For the solver "quadprog", the status code is missing, therefore, status = NA
;
time
, time elapsed to find the optimal solution;
shadow_price
, dual variables or shadow prices of the covariate balance constraints;
balance_parameters
, details of the balance parameters;
cstat
, covariate balance statistic used in Wang and Zubizarreta (2020).
A magnitude to be minimized to select the degree of approximate balance in bal$bal_gri
.
https://www.ibm.com/products/ilog-cplex-optimization-studio
https://www.gurobi.com/products/gurobi-optimizer/
https://www.mosek.com/products/mosek/
http://foges.github.io/pogs/stp/r
Chattopadhyay, A., Hase, C. H., and Zubizarreta, J. R. (2020), "Balancing Versus Modeling Approaches to Weighting in Practice," Statistics in Medicine, 39, 3227-3254.
Kang, J. D. Y., and Schafer, J. L. (2007), "Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data," Statistical Science, 22, 523-539.
Stuart, E. A. Matching methods for causal inference: a review and a look forward. Statistical Science 2010; 25(1): 1-21.
Wang, Y., and Zubizarreta, J. R. (2020), "Minimal Dispersion Approximately Balancing Weights: Asymptotic Properties and Practical Considerations," Biometrika, 107, 93-105.
Zubizarreta, J. R. (2015), "Stable Weights that Balance Covariates for Estimation with Incomplete Outcome Data," Journal of the American Statistical Association, 110, 910-922.
# Simulate data kangschafer = function(n_obs) { # Z are the true covariates # t is the indicator for the respondents (treated) # y is the outcome # X are the observed covariates # Returns Z, t y and X sorted in decreasing order by t Z = MASS::mvrnorm(n_obs, mu=rep(0, 4), Sigma=diag(4)) p = 1/(1+exp(Z[, 1]-.5*Z[, 2]+.25*Z[, 3]+.1*Z[, 4])) t = rbinom(n_obs, 1, p) Zt = cbind(Z, p, t) Zt = Zt[order(t), ] Z = Zt[, 1:4] p = Zt[, 5] t = Zt[, 6] y = 210+27.4*Z[, 1]+13.7*Z[, 2]+13.7*Z[, 3]+13.7*Z[, 4]+rnorm(n_obs) X = cbind(exp(Z[, 1]/2), (Z[, 2]/(1+exp(Z[, 1])))+10, (Z[, 1]*Z[, 3]/ 25+.6)^3, (Z[, 2]+Z[, 4]+20)^2) return(list(Z=Z, p=p, t=t, y=y, X=X)) } set.seed(1234) n_obs = 200 aux = kangschafer(n_obs) Z = aux$Z p = aux$p t = aux$t y = aux$y X = aux$X # Generate data frame t_ind = t bal_cov = X data_frame = as.data.frame(cbind(t_ind, bal_cov, y)) names(data_frame) = c("t_ind", "X1", "X2", "X3", "X4", "Y") # Define treatment indicator and t_ind = "t_ind" # moment covariates bal = list() bal$bal_cov = c("X1", "X2", "X3", "X4") # Set tolerances bal$bal_tol = 0.02 bal$bal_std = "group" # Solve for the Average Treatment Effect on the Treated, ATT (default) bal$bal_alg = FALSE sbwatt_object = sbw(dat = data_frame, ind = t_ind, out = "Y", bal = bal) # # Solve for a Conditional Average Treatment Effect, CATE # sbwcate_object = sbw(dat = data_frame, ind = t_ind, out = "Y", bal = bal, # sol = list(sol_nam = "quadprog"), par = list(par_est = "cate", par_tar = "X1 > 1 & X3 <= 0.22")) # # Solve for the population mean, POP # tar = colMeans(bal_cov) # names(tar) = bal$bal_cov # sbwpop_object = sbw(dat = data_frame, ind = t_ind, out = "Y", bal = bal, # sol = list(sol_nam = "quadprog"), par = list(par_est = "pop")) # # Solve for a target population mean, AUX # sbwaux_object = sbw(dat = data_frame, bal = bal, # sol = list(sol_nam = "quadprog"), par = list(par_est = "aux", par_tar = tar*1.05)) # # Solve for the ATT using the tuning algorithm # bal$bal_alg = TRUE # bal$bal_sam = 1000 # sbwatttun_object = sbw(dat = data_frame, ind = t_ind, out = "Y", bal = bal, # sol = list(sol_nam = "quadprog"), par = list(par_est = "att", par_tar = NULL)) # Check summarize(sbwatt_object) # summarize(sbwcate_object) # summarize(sbwpop_object) # summarize(sbwaux_object) # summarize(sbwatttun_object) # Estimate estimate(sbwatt_object) # estimate(sbwcate_object) # estimate(sbwpop_object) # estimate(sbwatttun_object) # Visualize visualize(sbwatt_object) # visualize(sbwcate_object) # visualize(sbwpop_object) # visualize(sbwaux_object) # visualize(sbwatttun_object)
# Simulate data kangschafer = function(n_obs) { # Z are the true covariates # t is the indicator for the respondents (treated) # y is the outcome # X are the observed covariates # Returns Z, t y and X sorted in decreasing order by t Z = MASS::mvrnorm(n_obs, mu=rep(0, 4), Sigma=diag(4)) p = 1/(1+exp(Z[, 1]-.5*Z[, 2]+.25*Z[, 3]+.1*Z[, 4])) t = rbinom(n_obs, 1, p) Zt = cbind(Z, p, t) Zt = Zt[order(t), ] Z = Zt[, 1:4] p = Zt[, 5] t = Zt[, 6] y = 210+27.4*Z[, 1]+13.7*Z[, 2]+13.7*Z[, 3]+13.7*Z[, 4]+rnorm(n_obs) X = cbind(exp(Z[, 1]/2), (Z[, 2]/(1+exp(Z[, 1])))+10, (Z[, 1]*Z[, 3]/ 25+.6)^3, (Z[, 2]+Z[, 4]+20)^2) return(list(Z=Z, p=p, t=t, y=y, X=X)) } set.seed(1234) n_obs = 200 aux = kangschafer(n_obs) Z = aux$Z p = aux$p t = aux$t y = aux$y X = aux$X # Generate data frame t_ind = t bal_cov = X data_frame = as.data.frame(cbind(t_ind, bal_cov, y)) names(data_frame) = c("t_ind", "X1", "X2", "X3", "X4", "Y") # Define treatment indicator and t_ind = "t_ind" # moment covariates bal = list() bal$bal_cov = c("X1", "X2", "X3", "X4") # Set tolerances bal$bal_tol = 0.02 bal$bal_std = "group" # Solve for the Average Treatment Effect on the Treated, ATT (default) bal$bal_alg = FALSE sbwatt_object = sbw(dat = data_frame, ind = t_ind, out = "Y", bal = bal) # # Solve for a Conditional Average Treatment Effect, CATE # sbwcate_object = sbw(dat = data_frame, ind = t_ind, out = "Y", bal = bal, # sol = list(sol_nam = "quadprog"), par = list(par_est = "cate", par_tar = "X1 > 1 & X3 <= 0.22")) # # Solve for the population mean, POP # tar = colMeans(bal_cov) # names(tar) = bal$bal_cov # sbwpop_object = sbw(dat = data_frame, ind = t_ind, out = "Y", bal = bal, # sol = list(sol_nam = "quadprog"), par = list(par_est = "pop")) # # Solve for a target population mean, AUX # sbwaux_object = sbw(dat = data_frame, bal = bal, # sol = list(sol_nam = "quadprog"), par = list(par_est = "aux", par_tar = tar*1.05)) # # Solve for the ATT using the tuning algorithm # bal$bal_alg = TRUE # bal$bal_sam = 1000 # sbwatttun_object = sbw(dat = data_frame, ind = t_ind, out = "Y", bal = bal, # sol = list(sol_nam = "quadprog"), par = list(par_est = "att", par_tar = NULL)) # Check summarize(sbwatt_object) # summarize(sbwcate_object) # summarize(sbwpop_object) # summarize(sbwaux_object) # summarize(sbwatttun_object) # Estimate estimate(sbwatt_object) # estimate(sbwcate_object) # estimate(sbwpop_object) # estimate(sbwatttun_object) # Visualize visualize(sbwatt_object) # visualize(sbwcate_object) # visualize(sbwpop_object) # visualize(sbwaux_object) # visualize(sbwatttun_object)
sbw
Function for summarizing the output from sbw
.
summarize(object, digits = 6, ...)
summarize(object, digits = 6, ...)
object |
an object from the class |
digits |
The number of significant digits that will be displayed. The default is |
... |
ignored arguments. |
A list with the following elements:
variance
, variance of the weights
coefficient_variation
, coefficient of variation of the weights
effective_sample_size
, effective sample size
balance_table
, mean/TASDM balance tables for samples before/after weighting
shadow_price
, dual tables or shadow prices for the balanced groups
# Please see the examples in the function sbw above.
# Please see the examples in the function sbw above.
sbw
Function for visualizing the output from sbw
.
visualize(object, plot_cov, ask = TRUE, ...)
visualize(object, plot_cov, ask = TRUE, ...)
object |
an object from function |
plot_cov |
names of covariates for which balance is to be displayed. If |
ask |
logical. If |
... |
ignored arguments. |
No return value. The figures will be shown in the Plots window.
# Please see the examples in the function sbw above.
# Please see the examples in the function sbw above.