Title: | Regression Discontinuity Estimation |
---|---|
Description: | Provides the tools to undertake estimation in Regression Discontinuity Designs. Both sharp and fuzzy designs are supported. Estimation is accomplished using local linear regression. A provided function will utilize Imbens-Kalyanaraman optimal bandwidth calculation. A function is also included to test the assumption of no-sorting effects. |
Authors: | Drew Dimmery |
Maintainer: | Drew Dimmery <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 0.57 |
Built: | 2024-10-31 22:25:40 UTC |
Source: | CRAN |
Regression discontinuity estimation package
rdd
supports both sharp and fuzzy RDD
utilizing the AER package for 2SLS regression
under the fuzzy design. Local linear regressions are performed
to either side of the cutpoint using the Imbens-Kalyanamaran
optimal bandwidth calculation, IKbandwidth
.
Drew Dimmery [email protected]
RDestimate
, DCdensity
, IKbandwidth
,
summary.RD
plot.RD
, kernelwts
DCdensity
implements the McCrary (2008) sorting test.
DCdensity(runvar, cutpoint, bin = NULL, bw = NULL, verbose = FALSE, plot = TRUE, ext.out = FALSE, htest = FALSE)
DCdensity(runvar, cutpoint, bin = NULL, bw = NULL, verbose = FALSE, plot = TRUE, ext.out = FALSE, htest = FALSE)
runvar |
numerical vector of the running variable |
cutpoint |
the cutpoint (defaults to 0) |
bin |
the binwidth (defaults to |
bw |
the bandwidth to use (by default uses bandwidth selection calculation from McCrary (2008)) |
verbose |
logical flag specifying whether to print diagnostic information to the terminal. (defaults to |
plot |
logical flag indicating whether to plot the histogram and density estimations (defaults to |
ext.out |
logical flag indicating whether to return extended output. When |
htest |
logical flag indicating whether to return an |
If ext.out
is FALSE
, only the p value will be returned. Additional output is enabled when ext.out
is TRUE
. In this case, a list will be returned with the following elements:
theta |
the estimated log difference in heights at the cutpoint |
se |
the standard error of |
z |
the z statistic of the test |
p |
the p-value of the test. A p-value below the significance threshhold indicates that the user can reject the null hypothesis of no sorting. |
binsize |
the calculated size of bins for the test |
bw |
the calculated bandwidth for the test |
cutpoint |
the cutpoint used |
data |
a dataframe for the binning of the histogram. Columns are |
Drew Dimmery <[email protected]>
McCrary, Justin. (2008) "Manipulation of the running variable in the regression discontinuity design: A density test," Journal of Econometrics. 142(2): 698-714. http://dx.doi.org/10.1016/j.jeconom.2007.05.005
#No discontinuity x<-runif(1000,-1,1) DCdensity(x,0) #Discontinuity x<-runif(1000,-1,1) x<-x+2*(runif(1000,-1,1)>0&x<0) DCdensity(x,0)
#No discontinuity x<-runif(1000,-1,1) DCdensity(x,0) #Discontinuity x<-runif(1000,-1,1) x<-x+2*(runif(1000,-1,1)>0&x<0) DCdensity(x,0)
IKbandwidth
calculates the Imbens-Kalyanaraman optimal bandwidth
for local linear regression in Regression discontinuity designs.
IKbandwidth(X, Y, cutpoint = NULL, verbose = FALSE, kernel = "triangular")
IKbandwidth(X, Y, cutpoint = NULL, verbose = FALSE, kernel = "triangular")
X |
a numerical vector which is the running variable |
Y |
a numerical vector which is the outcome variable |
cutpoint |
the cutpoint |
verbose |
logical flag indicating whether to print more information to the terminal. Default is |
kernel |
string indicating which kernel to use. Options are |
The optimal bandwidth
Drew Dimmery <[email protected]>
Imbens, Guido and Karthik Kalyanaraman. (2009) "Optimal Bandwidth Choice for the regression discontinuity estimator," NBER Working Paper Series. 14726. http://www.nber.org/papers/w14726
This function will calculate the appropriate kernel weights for a vector. This is useful when, for instance, one wishes to perform local regression.
kernelwts(X, center, bw, kernel = "triangular")
kernelwts(X, center, bw, kernel = "triangular")
X |
input x values. This variable represents the axis along which kernel weighting should be performed. |
center |
the point from which distances should be calculated. |
bw |
the bandwidth. |
kernel |
a string indicating the kernel to use. Options are |
A vector of weights with length equal to that of the X
input (one weight per element of X
).
Drew Dimmery <[email protected]>
require(graphics) X<-seq(-1,1,.01) triang.wts<-kernelwts(X,0,1,kernel="triangular") plot(X,triang.wts,type="l") cos.wts<-kernelwts(X,0,1,kernel="cosine") plot(X,cos.wts,type="l")
require(graphics) X<-seq(-1,1,.01) triang.wts<-kernelwts(X,0,1,kernel="triangular") plot(X,triang.wts,type="l") cos.wts<-kernelwts(X,0,1,kernel="cosine") plot(X,cos.wts,type="l")
Plot the relationship between the running variable and the outcome
## S3 method for class 'RD' plot(x, gran = 400, bins = 100, which = 1, range, ...)
## S3 method for class 'RD' plot(x, gran = 400, bins = 100, which = 1, range, ...)
x |
|
gran |
the granularity of the plot. This specifies the number of points to either side of the cutpoint for which the estimate is calculated. |
bins |
if the dependent variable is binary, include the number of bins within which to average |
which |
identifies which of the available plots to display. For a sharp
design, the only possibility is |
range |
the range of values of the running variable for which to plot. This
should be a vector of length two of the format |
... |
unused |
It is important to note that this function will only plot the discontinuity
using the bandwidth which is first in the vector of bandwidths passed to RDestimate
Drew Dimmery <[email protected]>
Print a very basic summary of the regression discontinuity
## S3 method for class 'RD' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'RD' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
|
digits |
number of digits to print |
... |
unused |
Drew Dimmery <[email protected]>
RDestimate
supports both sharp and fuzzy RDD
utilizing the AER package for 2SLS regression
under the fuzzy design. Local linear regressions are performed
to either side of the cutpoint using the Imbens-Kalyanaraman
optimal bandwidth calculation, IKbandwidth
.
RDestimate(formula, data, subset = NULL, cutpoint = NULL, bw = NULL, kernel = "triangular", se.type = "HC1", cluster = NULL, verbose = FALSE, model = FALSE, frame = FALSE)
RDestimate(formula, data, subset = NULL, cutpoint = NULL, bw = NULL, kernel = "triangular", se.type = "HC1", cluster = NULL, verbose = FALSE, model = FALSE, frame = FALSE)
formula |
the formula of the RDD. This is supplied in the
format of |
data |
an optional data frame |
subset |
an optional vector specifying a subset of observations to be used |
cutpoint |
the cutpoint. If omitted, it is assumed to be 0. |
bw |
a numeric vector specifying the bandwidths at which to estimate the RD. If omitted, the bandwidth is calculated using the Imbens-Kalyanaraman method, and then estimated with that bandwidth, half that bandwidth, and twice that bandwidth. If only a single value is passed into the function, the RD will similarly be estimated at that bandwidth, half that bandwidth, and twice that bandwidth. |
kernel |
a string specifying the kernel to be used in the local linear fitting.
|
se.type |
this specifies the robust SE calculation method to use. Options are,
as in |
cluster |
an optional vector specifying clusters within which the errors are assumed
to be correlated. This will result in reporting cluster robust SEs. This option overrides
anything specified in |
verbose |
will provide some additional information printed to the terminal. |
model |
logical. If |
frame |
logical. If |
Covariates are problematic for inclusion in the regression discontinuity design. This package allows their inclusion, but cautions against them insomuch as is possible. When covariates are included in the specification, they are simply included as exogenous regressors. In the sharp design, this means they are simply added into the regression equation, uninteracted with treatment. Likewise for the fuzzy design, in which they are added as regressors in both stages of estimation.
RDestimate
returns an object of class "RD
".
The functions summary
and plot
are used to obtain and print a summary and plot of
the estimated regression discontinuity. The object of class RD
is a list
containing the following components:
type |
a string denoting either |
est |
numeric vector of the estimate of the discontinuity in the outcome under a sharp design, or the Wald estimator in the fuzzy design for each corresponding bandwidth |
se |
numeric vector of the standard error for each corresponding bandwidth |
z |
numeric vector of the z statistic for each corresponding bandwidth |
p |
numeric vector of the p value for each corresponding bandwidth |
ci |
the matrix of the 95 for each corresponding bandwidth |
bw |
numeric vector of each bandwidth used in estimation |
obs |
vector of the number of observations within the corresponding bandwidth |
call |
the matched call |
na.action |
the observations removed from fitting due to missingness |
model |
(if requested) For a sharp design, a list of the |
frame |
(if requested) Returns the model frame used in fitting. |
Drew Dimmery <[email protected]>
Lee, David and Thomas Lemieux. (2010) "Regression Discontinuity Designs in Economics," Journal of Economic Literature. 48(2): 281-355. http://www.aeaweb.org/articles.php?doi=10.1257/jel.48.2.281
Imbens, Guido and Thomas Lemieux. (2010) "Regression discontinuity designs: A guide to practice," Journal of Econometrics. 142(2): 615-635. http://dx.doi.org/10.1016/j.jeconom.2007.05.001
Lee, David and David Card. (2010) "Regression discontinuity inference with specification error," Journal of Econometrics. 142(2): 655-674. http://dx.doi.org/10.1016/j.jeconom.2007.05.003
Angrist, Joshua and Jorn-Steffen Pischke. (2009) Mostly Harmless Econometrics. Princeton: Princeton University Press.
summary.RD
, plot.RD
, DCdensity
IKbandwidth
, kernelwts
, vcovHC
,
ivreg
, lm
x<-runif(1000,-1,1) cov<-rnorm(1000) y<-3+2*x+3*cov+10*(x>=0)+rnorm(1000) RDestimate(y~x) # Efficiency gains can be made by including covariates RDestimate(y~x|cov)
x<-runif(1000,-1,1) cov<-rnorm(1000) y<-3+2*x+3*cov+10*(x>=0)+rnorm(1000) RDestimate(y~x) # Efficiency gains can be made by including covariates RDestimate(y~x|cov)
summary
method for class "RD"
## S3 method for class 'RD' summary(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'RD' summary(object, digits = max(3, getOption("digits") - 3), ...)
object |
an object of class |
digits |
number of digits to display |
... |
unused |
summary.RD
returns an object of class "summary.RD
" which has the following components:
coefficients |
A matrix containing bandwidths, number of observations, estimates, SEs, z-values and p-values for each estimated bandwidth. |
fstat |
A global F-test of the corresponding model |
Drew Dimmery <[email protected]>