Package 'ivDiag'

Title: Estimation and Diagnostic Tools for Instrumental Variables Designs
Description: Estimation and diagnostic tools for instrumental variables designs, which implements the guidelines proposed in Lal et al. (2023) <arXiv:2303.11399>, including bootstrapped confidence intervals, effective F-statistic, Anderson-Rubin test, valid-t ratio test, and local-to-zero tests.
Authors: Apoorva Lal [aut] , Yiqing Xu [aut, cre]
Maintainer: Yiqing Xu <[email protected]>
License: MIT + file LICENSE
Version: 1.0.6
Built: 2024-12-12 07:06:59 UTC
Source: CRAN

Help Index


IV Estimation and Diagnostics

Description

Conducts various estimation and diagnostic procedure for instrumental variable designs in one shot.

Details

Provides estimation and diagnostic tools for instrumental variables designs, which implements the guidelines proposed in Lal et al. (2023) <arXiv:2303.11399>, including bootstrapped confidence intervals, effective F-statistic, Anderson-Rubin test, valid-t ratio test, and local-to-zero tests.

See ivDiag for details.

Author(s)

Apoorva Lal; Yiqing Xu

References

Lal, Apoorva, Mackenzie William Lockhart, Yiqing Xu, and Ziwen Zu. 2023. "How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on 67 Replicated Studies." Available at: https://yiqingxu.org/papers/english/2021_iv/LLXZ.pdf


Anderson Rubin Test

Description

Performs the Anderson Rubin test, which is robust to weak instruments.

Usage

AR_test(data, Y, D, Z, controls, FE = NULL, cl = NULL, weights = NULL, 
  prec = 4, CI = TRUE, alpha = 0.05, parallel = NULL, cores = NULL)

Arguments

data

name of a dataframe.

Y

a string indicating the outcome variable.

D

a string indicating the treatment variable.

Z

a vector of strings indicating the instrumental variables.

controls

a vector of strings indicating the control variables.

FE

a vector of strings indicating the fixed effects variables.

cl

a string indicating the clustering variable.

weights

a string indicating the variable that stores weights.

CI

a logical flag controlling whether to calcualte the confidence interval using the inversion method.

prec

precision of results (4 by default).

alpha

level of statitical significance; the default is 0.05.

parallel

a logical flag controlling parallel computing.

cores

setting the number of cores.

Value

Fstat

F statistic, degrees of freedoms, and p-value.

ci.print

Confidence interval via intervsion (printed version).

ci

Confidence interval via intervsion (numeric version).

bounded

If the confidence interval is bounded.

References

Chernozhukov, Victor, and Christian Hansen. 2008. "The Reduced Form: A Simple Approach to Inference with Weak Instruments." Economics Letters 100 (1): 68–71.

See Also

ivDiag

Examples

data(ivDiag)
AR.out <- AR_test(data = rueda, Y = "e_vote_buying", D = "lm_pob_mesa", 
    Z = "lz_pob_mesa_f", controls = c("lpopulation", "lpotencial"),
    cl = "muni_code", CI = FALSE)
library(testthat)    
test_that("Check AR results", {
  expect_equal(as.numeric(AR.out$Fstat[1]), 48.4768)
})

Effective F

Description

Computes the effective F statistic.

Usage

eff_F(data, Y, D, Z, controls = NULL, FE = NULL, cl = NULL, 
  weights = NULL, prec = 4)

Arguments

data

name of a dataframe.

Y

a string indicating the outcome variable.

D

a string indicating the treatment variable.

Z

a vector of strings indicating the instrumental variables.

controls

a vector of strings indicating the control variables.

FE

a vector of strings indicating the fixed effects variables.

cl

a string indicating the clustering variable.

weights

a string indicating the variable that stores weights.

prec

precision of results (4 by default).

Value

the effective F statistic.

References

Olea, José Luis Montiel, and Carolin Pflueger. 2013. "A Robust Test for Weak Instruments."" Journal of Business & Economic Statistics 31 (3): 358–69.

See Also

ivDiag

Examples

effF <- eff_F(data = rueda, Y = "e_vote_buying", D = "lm_pob_mesa", 
    Z = "lz_pob_mesa_f", controls = c("lpopulation", "lpotencial"), 
    cl = "muni_code")
library(testthat)
test_that("Check effective F", {
  expect_equal(floor(as.numeric(effF)), 8598)
})

Data from GSZ (2016)

Description

Data from Guiso, Sapienza, and Zingales (2016)

Format

A data frame with 5357 rows and 11 columns.

Details

The authors revisit Putnam, Leonardi, and Nanetti (1992)’s celebrated conjecture that Italian cities that achieved self-government in the Middle Ages have higher modern-day levels of social capital. More specifically, they study the effects of free city-state status on social capital as measured by the number of nonprofit organizations and organ donations per capita, and a measure of whether students cheat in mathematics. We focus on the first outcome, the number of nonprofit organizations.

References

Guiso, Luigi, Paola Sapienza, and Luigi Zingales. 2016. "Long-Term Persistence." Journal of the European Economic Association 14 (6): 1401–36.


Data from GSZ (2016): Subsample

Description

Data from Guiso, Sapienza, and Zingales (2016); southern Italian cities

Format

A data frame with 2175 rows and 11 columns.

Details

The authors revisit Putnam, Leonardi, and Nanetti (1992)’s celebrated conjecture that Italian cities that achieved self-government in the Middle Ages have higher modern-day levels of social capital. More specifically, they study the effects of free city-state status on social capital as measured by the number of nonprofit organizations and organ donations per capita, and a measure of whether students cheat in mathematics. We focus on the first outcome, the number of nonprofit organizations.

This dataset is a subsample of southern Italian cities, which is used as a zero-first-stage sample.

References

Guiso, Luigi, Paola Sapienza, and Luigi Zingales. 2016. "Long-Term Persistence." Journal of the European Economic Association 14 (6): 1401–36.


Omnibus Function for IV Estimation and Diagnostics

Description

Conducts various estimation and diagnostic procedure for instrumental variable designs in one shot.

Usage

ivDiag(data, Y, D, Z, controls = NULL, FE = NULL, cl = NULL, weights = NULL,
  bootstrap = TRUE, run.AR = TRUE,
  nboots = 1000, parallel = TRUE, cores = NULL, 
  seed = 94305, prec = 4, debug = FALSE)

Arguments

data

name of a dataframe.

Y

a string indicating the outcome variable.

D

a string indicating the treatment variable.

Z

a vector of strings indicating the instrumental variables.

controls

a vector of strings indicating the control variables.

FE

a vector of strings indicating the fixed effects variables.

cl

a string indicating the clustering variable.

weights

a string indicating the variable that stores weights.

bootstrap

whether to turn on bootstrap (TRUE by default).

run.AR

whether to run AR test (TRUE by default).

nboots

a numeric value indicating the number of bootstrap runs.

parallel

a logical flag controlling parallel computing.

cores

setting the number of cores.

prec

precision of CI in string (4 by default).

seed

setting seed.

debug

for debugging purposes.

Value

est_ols

results from an OLS regression.

est_2sls

results from a 2SLS regression.

AR

results from an Anderson-Rubin test

F_stat

various F statistics.

rho

Pearson correlation coefficient between the treatment and predicted treatment from the first stage regression (all covariates are partialled out).

tF

results from the tF procedure based on Lee et al. (2022)

est_rf

results from the first stage regression.

est_fs

results from the reduced form regression.

p_iv

the number of instruments.

N

the number of observations.

N_cl

the number of clusters.

df

the degree of freedom left from the 2SLS regression

nvalues

the unique values the outcome Y, the treatment D, and each instrument in Z in the 2SLS regression.

Author(s)

Apoorva Lal; Yiqing Xu

References

Lal, Apoorva, Mackenzie William Lockhart, Yiqing Xu, and Ziwen Zu. 2023. "How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on 67 Replicated Studies." Available at: https://yiqingxu.org/papers/english/2021_iv/LLXZ.pdf

Lee, David S, Justin McCrary, Marcelo J Moreira, and Jack Porter. 2022. "Valid t-Ratio Inference for IV." American Economic Review 112 (10): 3260–90.

See Also

plot_coef eff_F AR_test tF

Examples

data(ivDiag)
g <- ivDiag(data = rueda, Y = "e_vote_buying", D = "lm_pob_mesa", 
    Z = "lz_pob_mesa_f", controls = c("lpopulation", "lpotencial"), 
    cl = "muni_code", bootstrap = FALSE, run.AR = FALSE)
plot_coef(g)
library(testthat)    
test_that("Check ivDiag output", {
  expect_equal(as.numeric(g$est_2sls[1,1]), -0.9835)
})

Local-to-Zero Test

Description

Estimates Local-to-Zero IV coefficients and SEs for a single instrument.

Usage

ltz(data, Y, D, Z, controls, FE = NULL, cl = NULL, weights = NULL, prior, prec = 4)

Arguments

data

name of a dataframe.

Y

a string indicating the outcome variable.

D

a string indicating the treatment variable.

Z

a vector of strings indicating the instrumental variables.

controls

a vector of strings indicating the control variables.

FE

a vector of strings indicating the fixed effects variables.

cl

a string indicating the clustering variable.

weights

a string indicating the variable that stores weights.

prior

prior mean and standard deviation of the direct effect of instrument on outcome.

prec

precision of results (4 by default).

Value

iv

results from a 2SLS regression.

ltz

results after local-to-zerio adjustment.

prior

prior mean and standard deviation

References

Conley, Timothy G, Christian B Hansen, and Peter E Rossi. 2012. "Plausibly Exogenous." Review of Economics and Statistics 94 (1): 260–72.

See Also

plot_coef

Examples

data(ivDiag)
controls <- c('altitudine', 'escursione', 'costal', 'nearsea', 'population', 
    'pop2', 'gini_land', 'gini_income')
ltz_out <- ltz(data = gsz, Y = "totassoc_p", D = "libero_comune_allnord", 
    Z = "bishopcity", controls = controls, weights = "population", 
    prior = c(0.178, 0.137))
plot_ltz(ltz_out)    
    
library(testthat)    
test_that("Check local-to-zero adjustment", {
  expect_equal(as.numeric(ltz_out$ltz[1]), 3.6088)
})

Plot OLS and IV Coefficents

Description

Visualise point estimates and confidence intervals of OLS and IV estimates.

Usage

plot_coef(out, 
  ols.methods = c("analy","bootc","boott"),
  iv.methods = c("analy","bootc","boott","ar","tf"),
  main = NULL, ylab = "Coefficient", grid = TRUE,
  stats = TRUE, ylim = NULL)

Arguments

out

output from ivDiag::ivDiag.

ols.methods

a vector specifying the inferential methods for OLS to be shown. The default is c("analy","bootc","boott").

iv.methods

a vector specifying inferential methods for 2SLS to be shown. The default is c("analy","bootc","boott","ar","tf").

main

a string specifying the title of the plot.

ylab

a string specifying ylab of the plot.

grid

a logical flag indicating whether to show the grids.

stats

a logical flag indicating whether to show the statistics, including the effective F, the number of observations, and the number of clusters (if applicable).

ylim

a two-element vector specifying the range of the y-axis.

Value

A base R plot object.

See Also

ivDiag


Visualizing Local-to-Zero Adjustment

Description

Visualise approximate sampling distributions for scalar IV coefficient with local-to-zero adjustment.

Usage

plot_ltz(out = NULL, iv_est = NULL, ltz_est = NULL, prior = NULL, xlim = NULL)

Arguments

out

output from ivDiag::ltz.

iv_est

a two-element vector of IV estimate and standard error.

ltz_est

a two-element vector of local-to-zero estimate and standard error.

prior

a two-element vector of prior mean and standard deviation.

xlim

a two-element vector specifying the range of the x-axis.

Value

A ggplot2 object.

References

Conley, Timothy G, Christian B Hansen, and Peter E Rossi. 2012. "Plausibly Exogenous." Review of Economics and Statistics 94 (1): 260–72.

See Also

ltz


Data from Rueda (2017)

Description

Data from Rueda (2017) AJPS.

Format

A data frame with 4352 rows and 6 columns.

Details

Rueda (2017) studies the persistence of vote buying in developing democracies despite the use of secret ballots and argues that brokers condition future payments on published electoral results to enforce these transactions and that this is effective only when the results of small voting groups are available. The study examines the relationship between polling station size and vote buying using three different measures of the incidence of vote buying, two at the municipality level and one at the individual level.

The size of the polling station, predicted by the rules limiting the number of voters per polling station, is used as an instrument of the actual polling place size. The institutional rule predicts sharp reductions in the size of the average polling station of a municipality every time the number of registered voters reaches a multiple of the maximum number of voters allowed to vote in a polling station. Such sharp reductions are used as a source of exogenous variation in polling place size to estimate the causal effect of this variable on vote buying.

References

Rueda, Miguel R. 2017. "Small Aggregates, Big Manipulation: Vote Buying Enforcement and Collective Monitoring." American Journal of Political Science 61 (1): 163–177.


Valid t-Ratio Procedure

Description

Performs the valid t-ratio procedure.

Usage

tF(coef, se, Fstat, prec = 4)

Arguments

coef

a 2SLS coefficient.

se

a standard error estimate for the estimated 2SLS coefficient.

Fstat

a first-stage partial F statistic.

prec

precision of results (4 by default).

Value

Results from a valid t-ratio test given the first-stage F statistic.

References

Lee, David S, Justin McCrary, Marcelo J Moreira, and Jack Porter. 2022. "Valid t-Ratio Inference for IV." American Economic Review 112 (10): 3260–90.

Examples

tf.out <- tF(coef = -0.9835, se = 0.1540, Fstat = 8598)
library(testthat)
test_that("Check tF cF", {
  expect_equal(as.numeric(tf.out[2]), 1.96)
})