| Title: | Sensitivity of Formula Instrument to Shock Design |
|---|---|
| Description: | Functions to implement the formula instrument method in Borusyak and Hull (2023) <doi:10.3982/ECTA19367> and examine its sensitivity to the assumed distributional of counterfactual shocks. |
| Authors: | Peizan Sheng [aut, cre] |
| Maintainer: | Peizan Sheng <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-24 13:25:21 UTC |
| Source: | https://github.com/cran/formulaiv |
A named list of four formulaiv() bound tables for the Borusyak and Hull
(2023) China market-access application, used as the expected values in
tests/testthat/test-bh2023.R. Each element is the $beta data frame
returned by formulaiv() for one specification. Because they are produced by
the package itself, that test is a self-regression check; regenerate the list
with source("data-raw/BH_precomputed_results.R").
BH_precomputed_resultsBH_precomputed_results
A named list of four data frames:
BH_sens_joint_cons_no_controlsjoint set, no controls,
eps = seq(1, 20, 0.2)
BH_sens_joint_cons_with_controlsjoint set, with controls,
eps = seq(1, 20, 0.2)
BH_sens_marginal_cons_no_controlsmarginal set, no controls,
eps = seq(1, 2.5, 0.25)
BH_sens_marginal_cons_with_controlsmarginal set, with controls,
eps = seq(1, 2.5, 0.25)
Each data frame has columns eps, lb, and ub (the sensitivity bounds).
formulaiv() evaluates the sensitivity of formula instrument estimates to
small or large deviations away from an assumed baseline distribution of shocks.
formulaiv(y, x, z, f, eps, cons, controls = NULL, denom_tol = 1e-08)formulaiv(y, x, z, f, eps, cons, controls = NULL, denom_tol = 1e-08)
y |
outcome ( |
x |
endogenous regressor / treatment ( |
z |
formula instrument ( |
f |
counterfactual shock realizations, one scenario per column
( |
eps |
degree of sensitivity of interest (
|
cons |
constraint list specifying the sensitivity set. One of:
|
controls |
control variables
( |
denom_tol |
numeric tolerance below which the first-stage denominator is
treated as zero when reading off its sign over the feasible set (default
|
A list of two data frames:
beta — the sensitivity bounds, one row per eps
(columns eps, lb, ub).
p_opt — the optimal weights achieving each bound, in tidy long form
(columns eps, scenario, lb, ub; MS rows).
Each bound is the optimal value of a linear program solved with
lpSolve::lp(), whose simplex routine is sensitive to the magnitude of the
problem coefficients. Inputs on a very large scale — for example population-
or weight-aggregated sums — can make the solver report a numerical failure
(lpSolve status 5) instead of returning a bound. The estimate is invariant
to multiplying y and x by a common positive constant, so when this happens
it is enough to divide both y and x by a single scale factor that brings
them to within a few orders of magnitude of 1: every bound (beta$lb,
beta$ub) is unchanged. (Rescaling only one of y or x, or rescaling them by
different factors, does change the estimate.)
# BH market-access data (bundled with the package) y <- ma$emp_growth # outcome (N x 1) x <- ma$dma0 # endogenous regressor (N x 1) z <- x # formula instrument (N x 1) f <- ma[, paste0("ma_nlink", 1:1999)] - ma$ma2007 # shock draws (N x S) pbar <- rep(1 / 1999, 1999) # baseline weights (S x 1) # Joint sensitivity set without controls, evaluated at eps = 1.5 and 2. # Wrapped in \donttest{} because solving the linear-fractional programs over # all S = 1999 shock draws takes longer than CRAN's 5s example budget. formulaiv( y = y, x = x, z = z, f = f, eps = c(1.5, 2), cons = list(name = "joint", pbar = pbar) )$beta# BH market-access data (bundled with the package) y <- ma$emp_growth # outcome (N x 1) x <- ma$dma0 # endogenous regressor (N x 1) z <- x # formula instrument (N x 1) f <- ma[, paste0("ma_nlink", 1:1999)] - ma$ma2007 # shock draws (N x S) pbar <- rep(1 / 1999, 1999) # baseline weights (S x 1) # Joint sensitivity set without controls, evaluated at eps = 1.5 and 2. # Wrapped in \donttest{} because solving the linear-fractional programs over # all S = 1999 shock draws takes longer than CRAN's 5s example budget. formulaiv( y = y, x = x, z = z, f = f, eps = c(1.5, 2), cons = list(name = "joint", pbar = pbar) )$beta
high speed railway (HSR) data from Borusyak and Hull (2023) replication file
lineline
A data frame with 150 rows and 2022 columns. The package directly uses the following groups of columns:
line characteristics used in the ML model: line_type_en, speed,
computed_length, nlinks, year_approved, plan_type
realized opening year: year_operate0
simulated opening years: year_operate1 to year_operate1999
https://zenodo.org/records/8286785
city-level data from Borusyak and Hull (2023) replication file
mama
A data frame with 275 rows and 2013 columns. The sensitivity analysis uses the following groups of columns:
identifiers and geography: cityid, year, latitude, longitude,
scaled_lat, scaled_lon, distance_B
outcomes and baseline market access: emp_growth, dma0, ma0, ma2007
market-access simulations: ma_nlink1 to ma_nlink1999
expected market access: ma_nlink_pscore, dma_nlink_pscore
recentered market access: ma_nlink_rc
https://zenodo.org/records/8286785