--- title: "china-application" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{china-application} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(formulaiv) ``` Borusyak and Hull (2023) develop a new approach to estimating the causal effects of treatments or instruments that combine multiple sources of variation according to a known formula. The key challenge of applying this "formula instrument approach" is in specifying the distribution over counterfactual shocks. The `formulaiv` package enables researchers to evaluate the sensitivity of their estimates to small or large deviations away from an assumed baseline distribution of shocks. This guide walks through the method using the China high-speed rail (HSR) application from Borusyak and Hull (2023). ## Setup data The first step is to prepare market access and high speed railway data in Borusyak and Hull (2023). Every object is annotated with its shape (`N` is the number of observations, `S` is the number of counterfactual shocks, `J` is the number of controls, `L` is the number of lines). ```{r} # Set up BH market access data y <- ma$emp_growth # N x 1 vector x <- ma$dma0 # N x 1 vector z <- x # N x 1 vector controls <- ma[, c("distance_B", "scaled_lat", "scaled_lon")] # N x J dataframe f <- ma[, paste0("ma_nlink", 1:1999)] - ma$ma2007 # N x S dataframe pbar <- rep(1 / 1999, 1999) # S x 1 vector # In the BH market access example, N = 275, S = 1999, J = 3 # Set up BH high speed railway data # Generate dummies of line opening status in 2016 for (i in 1:1999) { line[[paste0("open2016_sim", i)]] <- as.integer( line[[paste0("year_operate", i)]] <= 2016 ) } # Generate probability of line opening across S simulations g <- line[, paste0("open2016_sim", 1:1999)] # L x S dataframe qbar <- as.numeric(as.matrix(g) %*% pbar) # L x 1 vector ``` ## Sensitivity analysis to Borusyak and Hull (2023) Then we assess how sensitive formula instrument estimators are to the assumed distribution of the underlying shocks in Borusyak and Hull (2023). We consider two ways of specifying the sensitivity set, intended to capture different ways of measuring deviations, each with and without geographic controls. - Joint sensitivity set without geographic controls ```{r, eval=FALSE} BH_sens_joint_cons_no_controls <- formulaiv( y = y, x = x, z = z, f = f, eps = seq(1, 20, 0.2), cons = list(name = "joint", pbar = pbar) )$beta ``` - Joint constraints with geographic controls ```{r, eval=FALSE} BH_sens_joint_cons_with_controls <- formulaiv( y = y, x = x, z = z, f = f, eps = seq(1, 20, 0.2), cons = list(name = "joint", pbar = pbar), controls = controls )$beta ``` - Marginal sensitivity set without geographic controls ```{r, eval=FALSE} BH_sens_marginal_cons_no_controls <- formulaiv( y = y, x = x, z = z, f = f, eps = seq(1, 2.5, 0.25), cons = list(name = "marginal", g = g, qbar = qbar) )$beta ``` - Marginal sensitivity set with geographic controls ```{r, eval=FALSE} BH_sens_marginal_cons_with_controls <- formulaiv( y = y, x = x, z = z, f = f, eps = seq(1, 2.5, 0.25), cons = list(name = "marginal", g = g, qbar = qbar), controls = controls )$beta ``` Our sensitivity analysis shows that small changes in the distribution of shocks used in their formula instrument lead to instrumental variable estimates that are anywhere from large negative effects to large positive effects. ## Reference Borusyak, K. and Hull, P. (2023). "Nonrandom Exposure to Exogenous Shocks." *Econometrica*, 91(6), 2155–2185.