Title: | Manipulation Testing Based on Density Discontinuity |
---|---|
Description: | Density discontinuity testing (a.k.a. manipulation testing) is commonly employed in regression discontinuity designs and other program evaluation settings to detect perfect self-selection (manipulation) around a cutoff where treatment/policy assignment changes. This package implements manipulation testing procedures using the local polynomial density estimators: rddensity() to construct test statistics and p-values given a prespecified cutoff, rdbwdensity() to perform data-driven bandwidth selection, and rdplotdensity() to construct density plots. |
Authors: | Matias D. Cattaneo [aut], Michael Jansson [aut], Xinwei Ma [aut, cre] |
Maintainer: | Xinwei Ma <[email protected]> |
License: | GPL-2 |
Version: | 2.6 |
Built: | 2024-12-06 06:53:22 UTC |
Source: | CRAN |
Density discontinuity testing (a.k.a. manipulation testing) is commonly employed in regression discontinuity designs and other program evaluation settings to detect perfect self-selection (manipulation) around a cutoff where treatment/policy assignment changes.
This package implements manipulation testing procedures using the local polynomial density estimators proposed in Cattaneo, Jansson and Ma (2020), and implements graphical procedures with valid confidence bands using the results in Cattaneo, Jansson and Ma (2022, 2023). In addition, this package provides complementary manipulation testing based on finite sample exact binomial testing following the esults in Cattaneo, Frandsen and Titiunik (2015) and Cattaneo, Frandsen and Vazquez-Bare (2017).
A companion Stata
package is described in Cattaneo, Jansson and Ma (2018).
Commands: rddensity
for manipulation (density discontinuity) testing.
rdbwdensity
for data-driven bandwidth selection, and
rdplotdensity
for density plots.
Related Stata and R packages useful for inference in regression discontinuity (RD) designs are described in the website: https://rdpackages.github.io/.
Matias D. Cattaneo, Princeton University [email protected].
Michael Jansson, University of California Berkeley. [email protected].
Xinwei Ma (maintainer), University of California San Diego. [email protected].
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference. Journal of the American Statistical Association 113(522): 767-779. doi:10.1080/01621459.2017.1285776
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2022. Coverage Error Optimal Confidence Intervals for Local Polynomial Regression. Bernoulli, 28(4): 2998-3022. doi:10.3150/21-BEJ1445
Cattaneo, M. D., B. Frandsen, and R. Titiunik. 2015. Randomization Inference in the Regression Discontinuity Design: An Application to the Study of Party Advantages in the U.S. Senate. Journal of Causal Inference 3(1): 1-24. doi:10.1515/jci-2013-0010
Cattaneo, M. D., M. Jansson, and X. Ma. 2018. Manipulation Testing based on Density Discontinuity. Stata Journal 18(1): 234-261. doi:10.1177/1536867X1801800115
Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531): 1449-1455. doi:10.1080/01621459.2019.1635480
Cattaneo, M. D., M. Jansson, and X. Ma. 2022. lpdensity: Local Polynomial Density Estimation and Inference. Journal of Statistical Software, 101(2): 1–25. doi:10.18637/jss.v101.i02
Cattaneo, M. D., M. Jansson, and X. Ma. 2023. Local Regression Distribution Estimators. Journal of Econometrics, 240(2): 105074. doi:10.1016/j.jeconom.2021.01.006
Cattaneo, M. D., R. Titiunik and G. Vazquez-Bare. 2017. Comparing Inference Approaches for RD Designs: A Reexamination of the Effect of Head Start on Child Mortality. Journal of Policy Analysis and Management 36(3): 643-681. doi:10.1002/pam.21985
McCrary, J. 2008. Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test. Journal of Econometrics 142(2): 698-714. doi:10.1016/j.jeconom.2007.05.005
rdbwdensity
implements several data-driven bandwidth selection
methods useful to construct manipulation testing procedures using the local
polynomial density estimators proposed in Cattaneo, Jansson and Ma (2020).
A companion Stata
package is described in Cattaneo, Jansson and Ma (2018).
Companion command: rddensity
for manipulation (density discontinuity)
testing.
Related Stata and R packages useful for inference in regression discontinuity (RD) designs are described in the website: https://rdpackages.github.io/.
rdbwdensity( X, c = 0, p = 2, fitselect = "", kernel = "", vce = "", massPoints = TRUE, regularize = TRUE, nLocalMin = NULL, nUniqueMin = NULL )
rdbwdensity( X, c = 0, p = 2, fitselect = "", kernel = "", vce = "", massPoints = TRUE, regularize = TRUE, nLocalMin = NULL, nUniqueMin = NULL )
X |
Numeric vector or one dimensional matrix/data frame, the running variable. |
c |
Numeric, specifies the threshold or cutoff value in the support of |
p |
Nonnegative integer, specifies the local polynomial order used to construct
the density estimators. Default is |
fitselect |
String, specifies the density estimation method.
|
kernel |
String, specifies the kernel function used to construct the local
polynomial estimators.
|
vce |
String, specifies the procedure used to compute the variance-covariance matrix estimator.
|
massPoints |
|
regularize |
|
nLocalMin |
Nonnegative integer, specifies the minimum number of observations in each local neighborhood.
This option will be ignored if set to |
nUniqueMin |
Nonnegative integer, specifies the minimum number of unique observations in
each local neighborhood. This option will be ignored if set to |
h |
Bandwidths for density discontinuity test, left and right to the cutoff, and asymptotic variance and bias. |
N |
|
opt |
Options passed to the function. |
X_min |
Smallest observations to the left and right of the cutoff. |
X_max |
Largest observations to the left and right of the cutoff. |
Matias D. Cattaneo, Princeton University [email protected].
Michael Jansson, University of California Berkeley. [email protected].
Xinwei Ma (maintainer), University of California San Diego. [email protected].
Cattaneo, M. D., M. Jansson, and X. Ma. 2018. Manipulation Testing based on Density Discontinuity. Stata Journal 18(1): 234-261. doi:10.1177/1536867X1801800115
Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531): 1449-1455. doi:10.1080/01621459.2019.1635480
# Generate a random sample set.seed(42) x <- rnorm(2000, mean = -0.5) # Bandwidth selection summary(rdbwdensity(X = x, vce="jackknife"))
# Generate a random sample set.seed(42) x <- rnorm(2000, mean = -0.5) # Bandwidth selection summary(rdbwdensity(X = x, vce="jackknife"))
rddensity
implements manipulation testing procedures using the
local polynomial density estimators proposed in Cattaneo, Jansson and Ma (2020),
and implements graphical procedures with valid confidence bands using the results
in Cattaneo, Jansson and Ma (2022, 2023). In addition, the command provides complementary
manipulation testing based on finite sample exact binomial testing following the
esults in Cattaneo, Frandsen and Titiunik (2015) and Cattaneo, Frandsen and
Vazquez-Bare (2017). For an introduction to manipulation testing see McCrary (2008).
A companion Stata
package is described in Cattaneo, Jansson and Ma (2018).
Companion commands: rdbwdensity
for data-driven bandwidth selection, and
rdplotdensity
for density plots.
Related Stata and R packages useful for inference in regression discontinuity (RD) designs are described in the website: https://rdpackages.github.io/.
rddensity( X, c = 0, p = 2, q = 0, fitselect = "", kernel = "", vce = "", massPoints = TRUE, h = c(), bwselect = "", all = FALSE, regularize = TRUE, nLocalMin = NULL, nUniqueMin = NULL, bino = TRUE, binoW = NULL, binoN = NULL, binoWStep = NULL, binoNStep = NULL, binoNW = 10, binoP = 0.5 )
rddensity( X, c = 0, p = 2, q = 0, fitselect = "", kernel = "", vce = "", massPoints = TRUE, h = c(), bwselect = "", all = FALSE, regularize = TRUE, nLocalMin = NULL, nUniqueMin = NULL, bino = TRUE, binoW = NULL, binoN = NULL, binoWStep = NULL, binoNStep = NULL, binoNW = 10, binoP = 0.5 )
X |
Numeric vector or one dimensional matrix/data frame, the running variable. |
c |
Numeric, specifies the threshold or cutoff value in the support of |
p |
Nonnegative integer, specifies the local polynomial order used to construct
the density estimators. Default is |
q |
Nonnegative integer, specifies the local polynomial order used to construct
the bias-corrected density estimators. Default is |
fitselect |
String, specifies the density estimation method.
|
kernel |
String, specifies the kernel function used to construct the local
polynomial estimators.
|
vce |
String, specifies the procedure used to compute the variance-covariance matrix estimator.
|
massPoints |
|
h |
Numeric, specifies the bandwidth used to construct the density estimators on the two
sides of the cutoff. If not specified, the bandwidth h is computed by the companion command
|
bwselect |
String, specifies the bandwidth selection procedure to be used.
|
all |
|
regularize |
|
nLocalMin |
Nonnegative integer, specifies the minimum number of observations in each local neighborhood.
This option will be ignored if set to |
nUniqueMin |
Nonnegative integer, specifies the minimum number of unique observations in
each local neighborhood. This option will be ignored if set to |
bino |
|
binoW |
Numeric, specifies the half length(s) of the initial window. If two values are provided, they will be used for the data below and above the cutoff separately. |
binoN |
Nonnegative integer, specifies the minimum number of observations on each side of the cutoff used for
the binomial test. This option will be ignored if |
binoWStep |
Numeric, specifies the increment in half length(s). |
binoNStep |
Nonnegative integer, specifies the minimum increment in sample size (on each side of the cutoff).
This option will be ignored if |
binoNW |
Nonnegative integer, specifies the total number of windows. Default is |
binoP |
Numeric, specifies the null hypothesis of the binomial test. Default is |
hat |
|
sd_asy |
|
sd_jk |
|
test |
|
hat_p |
Same as |
sd_asy_p |
Same as |
sd_jk_p |
Same as |
test_p |
Same as |
N |
|
h |
|
opt |
Options passed to the function. |
bino |
Binomial test results. |
X_min |
|
X_max |
|
Matias D. Cattaneo, Princeton University [email protected].
Michael Jansson, University of California Berkeley. [email protected].
Xinwei Ma (maintainer), University of California San Diego. [email protected].
Cattaneo, M. D., B. Frandsen, and R. Titiunik. 2015. Randomization Inference in the Regression Discontinuity Design: An Application to the Study of Party Advantages in the U.S. Senate. Journal of Causal Inference 3(1): 1-24. doi:10.1515/jci-2013-0010
Cattaneo, M. D., M. Jansson, and X. Ma. 2018. Manipulation Testing based on Density Discontinuity. Stata Journal 18(1): 234-261. doi:10.1177/1536867X1801800115
Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531): 1449-1455. doi:10.1080/01621459.2019.1635480
Cattaneo, M. D., M. Jansson, and X. Ma. 2022. lpdensity: Local Polynomial Density Estimation and Inference. Journal of Statistical Software, 101(2): 1–25. doi:10.18637/jss.v101.i02
Cattaneo, M. D., M. Jansson, and X. Ma. 2023. Local Regression Distribution Estimators. Journal of Econometrics, 240(2): 105074. doi:10.1016/j.jeconom.2021.01.006
Cattaneo, M. D., R. Titiunik and G. Vazquez-Bare. 2017. Comparing Inference Approaches for RD Designs: A Reexamination of the Effect of Head Start on Child Mortality. Journal of Policy Analysis and Management 36(3): 643-681. doi:10.1002/pam.21985
McCrary, J. 2008. Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test. Journal of Econometrics 142(2): 698-714. doi:10.1016/j.jeconom.2007.05.005
### Continuous Density set.seed(42) x <- rnorm(2000, mean = -0.5) rdd <- rddensity(X = x, vce = "jackknife") summary(rdd) ### Bandwidth selection using rdbwdensity() rddbw <- rdbwdensity(X = x, vce = "jackknife") summary(rddbw) ### Plotting using rdplotdensity() # 1. From -2 to 2 with 25 evaluation points at each side plot1 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25) # 2. Plotting a uniform confidence band set.seed(42) # fix the seed for simulating critical values plot2 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25, CIuniform = TRUE) ### Density discontinuity at 0 x[x > 0] <- x[x > 0] * 2 rdd2 <- rddensity(X = x, vce = "jackknife") summary(rdd2) plot3 <- rdplotdensity(rdd2, x, plotRange = c(-2, 2), plotN = 25)
### Continuous Density set.seed(42) x <- rnorm(2000, mean = -0.5) rdd <- rddensity(X = x, vce = "jackknife") summary(rdd) ### Bandwidth selection using rdbwdensity() rddbw <- rdbwdensity(X = x, vce = "jackknife") summary(rddbw) ### Plotting using rdplotdensity() # 1. From -2 to 2 with 25 evaluation points at each side plot1 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25) # 2. Plotting a uniform confidence band set.seed(42) # fix the seed for simulating critical values plot2 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25, CIuniform = TRUE) ### Density discontinuity at 0 x[x > 0] <- x[x > 0] * 2 rdd2 <- rddensity(X = x, vce = "jackknife") summary(rdd2) plot3 <- rdplotdensity(rdd2, x, plotRange = c(-2, 2), plotN = 25)
Extract of the dataset constructed by Cattaneo, Frandsen, and Titiunik (2015), which include measures of incumbency advantage in the U.S. Senate for the period 1914-2010.
Numeric vector containing 1390 observations:
Numeric vector. See Cattaneo, Frandsen and Titiunik (2015) regarding details about this dataset.
Cattaneo, M. D., B. Frandsen, and R. Titiunik. 2015. Randomization Inference in the Regression Discontinuity Design: An Application to the Study of Party Advantages in the U.S. Senate. Journal of Causal Inference 3(1): 1-24. doi:10.1515/jci-2013-0010
rdplotdensity
constructs density plots. It is based on the
local polynomial density estimator proposed in Cattaneo, Jansson and Ma (2020, 2023).
A companion Stata
package is described in Cattaneo, Jansson and Ma (2018).
Companion command: rddensity
for manipulation (density discontinuity) testing.
Related Stata and R packages useful for inference in regression discontinuity (RD) designs are described in the website: https://rdpackages.github.io/.
rdplotdensity( rdd, X, plotRange = NULL, plotN = 10, plotGrid = c("es", "qs"), alpha = 0.05, type = NULL, lty = NULL, lwd = NULL, lcol = NULL, pty = NULL, pwd = NULL, pcol = NULL, CItype = NULL, CIuniform = FALSE, CIsimul = 2000, CIshade = NULL, CIcol = NULL, bwselect = NULL, hist = TRUE, histBreaks = NULL, histFillCol = 3, histFillShade = 0.2, histLineCol = "white", title = "", xlabel = "", ylabel = "", legendTitle = NULL, legendGroups = NULL, noPlot = FALSE )
rdplotdensity( rdd, X, plotRange = NULL, plotN = 10, plotGrid = c("es", "qs"), alpha = 0.05, type = NULL, lty = NULL, lwd = NULL, lcol = NULL, pty = NULL, pwd = NULL, pcol = NULL, CItype = NULL, CIuniform = FALSE, CIsimul = 2000, CIshade = NULL, CIcol = NULL, bwselect = NULL, hist = TRUE, histBreaks = NULL, histFillCol = 3, histFillShade = 0.2, histLineCol = "white", title = "", xlabel = "", ylabel = "", legendTitle = NULL, legendGroups = NULL, noPlot = FALSE )
rdd |
Object returned by |
X |
Numeric vector or one dimensional matrix/data frame, the running variable. |
plotRange |
Numeric, specifies the lower and upper bound of the plotting region. Default is
|
plotN |
Numeric, specifies the number of grid points used for plotting on the two sides of the cutoff.
Default is |
plotGrid |
String, specifies how the grid points are positioned. Options are |
alpha |
Numeric scalar between 0 and 1, the significance level for plotting confidence regions. If more than one is provided, they will be applied to the two sides accordingly. |
type |
String, one of |
lty |
Line type for point estimates, only effective if |
lwd |
Line width for point estimates, only effective if |
lcol |
Line color for point estimates, only effective if |
pty |
Scatter plot type for point estimates, only effective if |
pwd |
Scatter plot size for point estimates, only effective if |
pcol |
Scatter plot color for point estimates, only effective if |
CItype |
String, one of |
CIuniform |
|
CIsimul |
Positive integer, the number of simulations used to construct critical values (default is 2000). This
option is ignored if |
CIshade |
Numeric, opaqueness of the confidence region, should be between 0 (transparent) and 1. Default is 0.2. If more than one is provided, they will be applied to the two sides accordingly. |
CIcol |
Color of the confidence region. |
bwselect |
String, the method for data-driven bandwidth selection. Available options
are (1) |
hist |
|
histBreaks |
Numeric vector, giving the breakpoints between histogram cells. |
histFillCol |
Color of the histogram cells. |
histFillShade |
Opaqueness of the histogram cells, should be between 0 (transparent) and 1. Default is 0.2. |
histLineCol |
Color of the histogram lines. |
title , xlabel , ylabel
|
Strings, title of the plot and labels for x- and y-axis. |
legendTitle |
String, title of legend. |
legendGroups |
String Vector, group names used in legend. |
noPlot |
No density plot will be generated if set to |
Bias correction is only used for the construction of confidence intervals/bands, but not for point
estimation. The point estimates, denoted by f_p
, are constructed using local polynomial estimates of order
p
, while the centering of the confidence intervals/bands, denoted by f_q
, are constructed using local
polynomial estimates of order q
. The confidence intervals/bands take the form:
[f_q - cv * SE(f_q) , f_q + cv * SE(f_q)]
, where cv
denotes the appropriate critical value and
SE(f_q)
denotes a standard error estimate
for the centering of the confidence interval/band. As a result, the confidence intervals/bands may not be
centered at the point estimates because they have been bias-corrected. Setting q
and p
to be equal
results on centered at the point estimate confidence intervals/bands, but requires undersmoothing for valid
inference (i.e., (I)MSE-optimal bandwdith for the density point estimator cannot be used). Hence the bandwidth
would need to be specified manually when q=p
, and the point estimates will not be (I)MSE optimal. See
Cattaneo, Jansson and Ma (2022, 2023) for details, and also Calonico, Cattaneo, and Farrell (2018, 2022) for
robust bias correction methods.
Sometimes the density point estimates may lie outside of the confidence intervals/bands, which can happen if
the underlying distribution exhibits high curvature at some evaluation point(s). One possible solution in this
case is to increase the polynomial order p
or to employ a smaller bandwidth.
Estl , Estr
|
Matrices containing estimation results:
(1) |
Estplot |
A stadnard |
Matias D. Cattaneo, Princeton University [email protected].
Michael Jansson, University of California Berkeley. [email protected].
Xinwei Ma (maintainer), University of California San Diego. [email protected].
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference. Journal of the American Statistical Association 113(522): 767-779. doi:10.1080/01621459.2017.1285776
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2022. Coverage Error Optimal Confidence Intervals for Local Polynomial Regression. Bernoulli, 28(4): 2998-3022. doi:10.3150/21-BEJ1445
Cattaneo, M. D., M. Jansson, and X. Ma. 2018. Manipulation Testing based on Density Discontinuity. Stata Journal 18(1): 234-261. doi:10.1177/1536867X1801800115
Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531): 1449-1455. doi:10.1080/01621459.2019.1635480
Cattaneo, M. D., M. Jansson, and X. Ma. 2022. lpdensity: Local Polynomial Density Estimation and Inference. Journal of Statistical Software, 101(2): 1–25. doi:10.18637/jss.v101.i02
Cattaneo, M. D., M. Jansson, and X. Ma. 2023. Local Regression Distribution Estimators. Journal of Econometrics, 240(2): 105074. doi:10.1016/j.jeconom.2021.01.006
# Generate a random sample with a density discontinuity at 0 set.seed(42) x <- rnorm(2000, mean = -0.5) x[x > 0] <- x[x > 0] * 2 # Estimation rdd <- rddensity(X = x) summary(rdd) # Density plot (from -2 to 2 with 25 evaluation points at each side) plot1 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25) # Plotting a uniform confidence band set.seed(42) # fix the seed for simulating critical values plot3 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25, CIuniform = TRUE)
# Generate a random sample with a density discontinuity at 0 set.seed(42) x <- rnorm(2000, mean = -0.5) x[x > 0] <- x[x > 0] * 2 # Estimation rdd <- rddensity(X = x) summary(rdd) # Density plot (from -2 to 2 with 25 evaluation points at each side) plot1 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25) # Plotting a uniform confidence band set.seed(42) # fix the seed for simulating critical values plot3 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25, CIuniform = TRUE)