--- title: "Doubly robust estimation of the LATE and LATT with drlate" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Doubly robust estimation of the LATE and LATT with drlate} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.2, dpi = 110, out.width = "100%") ``` ## Overview `drlate` estimates the local average treatment effect (LATE) and the local average treatment effect on the treated (LATT) from observational data with a binary instrument. It implements the complete estimator suite of Słoczyński, Uysal, and Wooldridge: the doubly robust estimators of their 2022 paper (the Stata command `drlate`, Statistical Software Components S459708) and the Abadie-kappa weighting estimators of their 2025 *JBES* paper (the Stata command `kappalate`, S459257), unified behind one interface and one inference architecture. The estimation core supports: * **Doubly robust and regression/weighting estimators** (`method`): inverse-probability-weighted regression adjustment (`"ipwra"`, the default, doubly robust), inverse probability weighting (`"ipw"`), augmented inverse probability weighting (`"aipw"`, doubly robust), and regression adjustment (`"ra"`). * **Abadie-kappa weighting estimators**: `"kappa"` (kappalate's `tau_a`), `"kappa0"` (`tau_a,0`), and `"kappa10"` (`tau_a,10`); together with the two IPW variants (= `tau_u` and `tau_a,1`) these complete the five-estimator menu of the 2025 paper. * **Outcome and treatment models**: linear, logistic, probit, or Poisson, plus fractional-logit and fractional-probit for outcomes in `[0, 1]`, so the response may be continuous, binary, a count, or a proportion (matching the Stata `lateffects` `omodel`/`tmodel` options). * **Instrument propensity score models** (`ivmodel`): logistic regression by maximum likelihood (default), covariate balancing (`"cbps"`, Imai and Ratkovic 2014), inverse probability tilting (`"ipt"`, Graham, Pinto, and Egel 2012), or probit maximum likelihood (`"probit"`, for the weighting estimators). * **Normalized** (default) or unnormalized weighting for IPW and AIPW. * Sampling weights and cluster-robust standard errors. Beyond the two Stata commands, the package adds a common workflow layer, and makes it available on the kappa weighting estimators too, where `kappalate` itself offers only robust and cluster-robust standard errors: * **Diagnostics**: `plot()` displays of propensity-score overlap, covariate balance, and implied weights; `balance()` tables and the `balance_test()` overidentification balance test; `complier_means()` complier profiling; first-stage strength on every printout. * **Weak-instrument-robust inference**: Fieller confidence sets via `confint(method = "fieller")` (for the ratio-form estimators, including `"kappa"` and `"kappa0"`). * **Bootstrap inference**: `vcov = "bootstrap"` (cluster-aware, parallelizable). * **The DR Hausman test** of unconfoundedness from the 2022 paper's Section 5 (`dr_hausman()`), with an analytic standard error from a jointly stacked moment system. * **Estimator comparison**: `drlate_compare()` with a dot-whisker plot. The estimators are validated against the authors' Stata commands by golden-fixture parity (estimates and standard errors), and the inference extensions by Monte Carlo. ### Joint inference drlate computes point estimates from sequential weighted regressions. For inference, it stacks the moment conditions of *every* estimation stage — the instrument propensity score, the outcome regressions, the treatment regressions, and the causal aggregates — into one just-identified M-estimation system; the variance is the sandwich \(A^{-1} B A^{-\top} / n\) evaluated at the estimates. This reproduces the Stata package's `gmm, onestep iterate(0)` construction: standard errors account for the estimation uncertainty of each stage, including the first-stage propensity score. ## Example The bundled `drlate_sim` data simulates a binary instrument `rsncode`, a binary treatment `nvstat` with two-sided noncompliance, and outcomes on three scales. The true complier effect on `lwage` is 0.5. ```{r} library(drlate) data(drlate_sim) fit <- drlate(lwage ~ age + educ, # outcome model nvstat ~ age + educ, # treatment model rsncode ~ age + educ, # instrument propensity score model data = drlate_sim) summary(fit) ``` The three reported quantities mirror the Stata package's output: the causal estimate (LATE), the intent-to-treat effect of the instrument on the outcome (numerator), and the first-stage effect of the instrument on the treatment (denominator), with the LATE formed as their ratio. ```{r} coef(fit) confint(fit) ``` ### Other estimators ```{r} # AIPW with unnormalized moments drlate(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ age + educ, data = drlate_sim, method = "aipw", normalized = FALSE) # IPW: no covariates in the outcome/treatment equations drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age + educ, data = drlate_sim, method = "ipw") # Regression adjustment: no instrument covariates drlate(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ 1, data = drlate_sim, method = "ra") ``` ### Abadie-kappa weighting estimators The kappa methods are pure weighting estimators — covariates enter only through the instrument propensity score, so the outcome and treatment formulas are intercept-only. The printed output shows each estimator's `kappalate` name: ```{r} # Normalized Abadie kappa (kappalate tau_a,10); reports the LATE only, # since the estimator is a difference of two ratios drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age + educ, data = drlate_sim, method = "kappa10") # Unnormalized Abadie kappa (tau_a); Fieller sets available fit_k <- drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age + educ, data = drlate_sim, method = "kappa") confint(fit_k, method = "fieller") ``` ### LATT, other model families, and IPT ```{r} # LATT with an inverse-probability-tilted instrument propensity score drlate(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ age + educ, data = drlate_sim, estimand = "latt", ivmodel = "ipt") # Poisson outcome model for the positive wage level drlate(kwage ~ age + educ, nvstat ~ age + educ, rsncode ~ 1, data = drlate_sim, method = "ra", omodel = "poisson") ``` ### Clustered standard errors and weights ```{r} drlate(lwage ~ age, nvstat ~ age, rsncode ~ age, data = drlate_sim, cluster = drlate_sim$educ) ``` ## Diagnostics `plot()` provides the standard design checks — propensity-score overlap, covariate balance before/after weighting (the love plot), and the implied weight distributions; `balance()` returns the standardized mean differences as a data frame: ```{r diag-balance} fit <- drlate(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ age + educ, data = drlate_sim) plot(fit, type = "balance") balance(fit) ``` ```{r diag-overlap} plot(fit, type = "overlap") ``` `complier_means()` profiles how the compliers differ from the population (weighting by Abadie's kappa), and `balance_test()` runs the Imai--Ratkovic overidentification test of whether the propensity-score model balances the covariates --- diagnostics that mirror the postestimation suite of Stata's `lateffects` command: ```{r diag-postest} complier_means(fit) balance_test(fit) ``` ## Inference beyond the default sandwich Every printout reports the first-stage z (with z² ≈ F for a single binary instrument) and flags weakness below F = 10. The package adds two inference tools: ```{r} # Weak-instrument-robust Fieller confidence set (may be unbounded when # the first stage is weak -- that is the honest answer) confint(fit, method = "fieller") # Nonparametric bootstrap (percentile CIs; clusters resampled whole # when `cluster` is supplied) fit_b <- drlate(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ age + educ, data = drlate_sim, vcov = "bootstrap", boot_reps = 199, boot_seed = 1) confint(fit_b) ``` ## The DR Hausman test of unconfoundedness Under one-sided noncompliance (nobody takes the treatment without the instrument), the instrument-based LATT equals the unconfoundedness-based ATT if treatment assignment is unconfounded given the covariates. Section 5 of the 2022 paper turns this equality into a heterogeneity-robust Hausman test, implemented here; the Stata package does not provide it: ```{r} d_os <- drlate_sim d_os$nvstat[d_os$rsncode == 0] <- 0L dr_hausman(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ age + educ, data = d_os) ``` The simulated treatment is confounded by construction, and the test rejects. ## Comparing estimators ```{r compare-plot} cmp <- drlate_compare(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ age + educ, data = drlate_sim) cmp plot(cmp) ``` ## Replicating the Stata examples The Stata help file's examples use a public extract from the Survey of Income and Program Participation (SIPP). The equivalent R calls are: ```{r, eval = FALSE} sipp <- haven::read_dta("https://people.brandeis.edu/~tslocz/sipp.dta") sipp <- subset(as.data.frame(sipp), !is.na(kwage) & !is.na(educ) & rsncode != 999) sipp$lwage <- log(sipp$kwage) # Stata: drlate (lwage age_5) (nvstat age_5) (rsncode age_5) drlate(lwage ~ age_5, nvstat ~ age_5, rsncode ~ age_5, data = sipp) # Stata: drlate (lwage age_5) (nvstat age_5) (rsncode age_5, ipt), latt drlate(lwage ~ age_5, nvstat ~ age_5, rsncode ~ age_5, data = sipp, ivmodel = "ipt", estimand = "latt") # Stata: kappalate lwage (nvstat = rsncode) age_5, zmodel(logit) which(all) drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age_5, data = sipp, method = "kappa") # tau_a; likewise "kappa0", "kappa10", # and method = "ipw" for tau_u / tau_a,1 # Stata: kappalate lwage (nvstat = rsncode) age_5, zmodel(probit) drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age_5, data = sipp, method = "kappa", ivmodel = "probit") ``` The package's test suite verifies numerical equivalence of estimates and standard errors against fixtures generated by both Stata commands on this dataset (see `inst/stata/make-fixtures.do` and `inst/stata/make-kappalate-fixtures.do`). ## Citation If you use drlate in your research, please cite the R package, the methodological paper for the estimators you use, and the original Stata module (see `citation("drlate")` for BibTeX entries): > Venkitasubramanian, K. (2026). drlate: Doubly Robust Estimation of the > Local Average Treatment Effect in R. R package version > `r as.character(packageVersion("drlate"))`. > https://github.com/kvenkita/drlate > Słoczyński, T., Uysal, S. D., & Wooldridge, J. M. (2025). Abadie's Kappa > and Weighting Estimators of the Local Average Treatment Effect. > *Journal of Business & Economic Statistics* 43(1), 164–177. > Uysal, D., Słoczyński, T., & Wooldridge, J. M. (2026). DRLATE: Stata > module to perform doubly robust estimation of the local average > treatment effect (LATE) and the local average treatment effect on the > treated (LATT). Statistical Software Components S459708, Boston College > Department of Economics. ## References * Słoczyński, T., S. D. Uysal, and J. M. Wooldridge (2022). "Doubly Robust Estimation of Local Average Treatment Effects Using Inverse Probability Weighted Regression Adjustment." arXiv:2208.01300. * Słoczyński, T., S. D. Uysal, and J. M. Wooldridge (2025). "Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect." *Journal of Business & Economic Statistics* 43(1), 164–177. * Abadie, A. (2003). "Semiparametric Instrumental Variable Estimation of Treatment Response Models." *Journal of Econometrics* 113(2), 231–263. * Donald, S. G., Y.-C. Hsu, and R. P. Lieli (2014). "Testing the Unconfoundedness Assumption via Inverse Probability Weighted Estimators of (L)ATT." *Journal of Business & Economic Statistics* 32(3), 395–415. * Fieller, E. C. (1954). "Some Problems in Interval Estimation." *JRSS-B* 16(2), 175–185. * Graham, B. S., C. C. de Xavier Pinto, and D. Egel (2012). "Inverse Probability Tilting for Moment Condition Models with Missing Data." *Review of Economic Studies* 79(3), 1053–1079. * Imai, K., and M. Ratkovic (2014). "Covariate Balancing Propensity Score." *JRSS-B* 76(1), 243–263.