china-application

library(formulaiv)

Borusyak and Hull (2023) develop a new approach to estimating the causal effects of treatments or instruments that combine multiple sources of variation according to a known formula. The key challenge of applying this “formula instrument approach” is in specifying the distribution over counterfactual shocks. The formulaiv package enables researchers to evaluate the sensitivity of their estimates to small or large deviations away from an assumed baseline distribution of shocks. This guide walks through the method using the China high-speed rail (HSR) application from Borusyak and Hull (2023).

Setup data

The first step is to prepare market access and high speed railway data in Borusyak and Hull (2023). Every object is annotated with its shape (N is the number of observations, S is the number of counterfactual shocks, J is the number of controls, L is the number of lines).

# Set up BH market access data
y <- ma$emp_growth # N x 1 vector
x <- ma$dma0 # N x 1 vector
z <- x # N x 1 vector
controls <- ma[, c("distance_B", "scaled_lat", "scaled_lon")] # N x J dataframe
f <- ma[, paste0("ma_nlink", 1:1999)] - ma$ma2007 # N x S dataframe
pbar <- rep(1 / 1999, 1999) # S x 1 vector
# In the BH market access example, N = 275, S = 1999, J = 3

# Set up BH high speed railway data
# Generate dummies of line opening status in 2016
for (i in 1:1999) {
  line[[paste0("open2016_sim", i)]] <- as.integer(
    line[[paste0("year_operate", i)]] <= 2016
  )
}

# Generate probability of line opening across S simulations
g <- line[, paste0("open2016_sim", 1:1999)] # L x S dataframe
qbar <- as.numeric(as.matrix(g) %*% pbar) # L x 1 vector

Sensitivity analysis to Borusyak and Hull (2023)

Then we assess how sensitive formula instrument estimators are to the assumed distribution of the underlying shocks in Borusyak and Hull (2023). We consider two ways of specifying the sensitivity set, intended to capture different ways of measuring deviations, each with and without geographic controls.

  • Joint sensitivity set without geographic controls
BH_sens_joint_cons_no_controls <- formulaiv(
  y = y,
  x = x,
  z = z,
  f = f,
  eps = seq(1, 20, 0.2),
  cons = list(name = "joint", pbar = pbar)
)$beta
  • Joint constraints with geographic controls
BH_sens_joint_cons_with_controls <- formulaiv(
  y = y,
  x = x,
  z = z,
  f = f,
  eps = seq(1, 20, 0.2),
  cons = list(name = "joint", pbar = pbar),
  controls = controls
)$beta
  • Marginal sensitivity set without geographic controls
BH_sens_marginal_cons_no_controls <- formulaiv(
  y = y,
  x = x,
  z = z,
  f = f,
  eps = seq(1, 2.5, 0.25),
  cons = list(name = "marginal", g = g, qbar = qbar)
)$beta
  • Marginal sensitivity set with geographic controls
BH_sens_marginal_cons_with_controls <- formulaiv(
  y = y,
  x = x,
  z = z,
  f = f,
  eps = seq(1, 2.5, 0.25),
  cons = list(name = "marginal", g = g, qbar = qbar),
  controls = controls
)$beta

Our sensitivity analysis shows that small changes in the distribution of shocks used in their formula instrument lead to instrumental variable estimates that are anywhere from large negative effects to large positive effects.

Reference

Borusyak, K. and Hull, P. (2023). “Nonrandom Exposure to Exogenous Shocks.” Econometrica, 91(6), 2155–2185. https://doi.org/10.3982/ECTA19367