Title: | Binscatter Estimation and Inference |
---|---|
Description: | Provides tools for statistical analysis using the binscatter methods developed by Cattaneo, Crump, Farrell and Feng (2024a) <doi:10.48550/arXiv.1902.09608>, Cattaneo, Crump, Farrell and Feng (2024b) <https://nppackages.github.io/references/Cattaneo-Crump-Farrell-Feng_2024_NonlinearBinscatter.pdf> and Cattaneo, Crump, Farrell and Feng (2024c) <doi:10.48550/arXiv.1902.09615>. Binscatter provides a flexible way of describing the relationship between two variables based on partitioning/binning of the independent variable of interest. binsreg(), binsqreg() and binsglm() implement binscatter least squares regression, quantile regression and generalized linear regression respectively, with particular focus on constructing binned scatter plots. They also implement robust (pointwise and uniform) inference of regression functions and derivatives thereof. binstest() implements hypothesis testing procedures for parametric functional forms of and nonparametric shape restrictions on the regression function. binspwc() implements hypothesis testing procedures for pairwise group comparison of binscatter estimators. binsregselect() implements data-driven procedures for selecting the number of bins for binscatter estimation. All the commands allow for covariate adjustment, smoothness restrictions and clustering. |
Authors: | Matias D. Cattaneo, Richard K. Crump, Max H. Farrell, Yingjie Feng |
Maintainer: | Yingjie Feng <[email protected]> |
License: | GPL-2 |
Version: | 1.1 |
Built: | 2024-10-22 06:43:25 UTC |
Source: | CRAN |
Binscatter provides a flexible, yet parsimonious way of visualizing and summarizing large data sets
and has been a popular methodology in applied microeconomics and other social sciences. The binsreg package provides tools for
statistical analysis using the binscatter methods developed in
Cattaneo, Crump, Farrell and Feng (2024a) and
Cattaneo, Crump, Farrell and Feng (2024b).
binsreg
implements binscatter least squares regression with robust inference and plots, including
curve estimation, pointwise confidence intervals and uniform confidence band.
binsqreg
implements binscatter quantile regression with robust inference and plots, including
curve estimation, pointwise confidence intervals and uniform confidence band.
binsglm
implements binscatter generalized linear regression with robust inference and plots, including
curve estimation, pointwise confidence intervals and uniform confidence band.
binstest
implements binscatter-based hypothesis testing procedures for parametric specifications
of and shape restrictions on the unknown function of interest.
binspwc
implements hypothesis testing procedures for pairwise group comparison of binscatter estimators and plots confidence
bands for the difference in binscatter parameters between each pair of groups.
binsregselect
implements data-driven number of bins selectors for binscatter
implementation using either quantile-spaced or evenly-spaced binning/partitioning.
All the commands allow for covariate adjustment, smoothness restrictions, and clustering,
among other features.
The companion software article, Cattaneo, Crump, Farrell and Feng (2024c), provides further implementation details and empirical illustration. For related Stata, R and Python packages useful for nonparametric data analysis and statistical inference, visit https://nppackages.github.io/.
Matias D. Cattaneo, Princeton University, Princeton, NJ. [email protected].
Richard K. Crump, Federal Reserve Bank of New York, New York, NY. [email protected].
Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. [email protected].
Yingjie Feng (maintainer), Tsinghua University, Beijing, China. [email protected].
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.
binsglm
implements binscatter generalized linear regression with robust inference procedures and plots, following the
results in Cattaneo, Crump, Farrell and Feng (2024a) and
Cattaneo, Crump, Farrell and Feng (2024b).
Binscatter provides a flexible way to describe the relationship between two variables, after
possibly adjusting for other covariates, based on partitioning/binning of the independent variable of interest.
The main purpose of this function is to generate binned scatter plots with curve estimation with robust pointwise confidence intervals and
uniform confidence band. If the binning scheme is not set by the user, the companion function
binsregselect
is used to implement binscatter in a data-driven way. Hypothesis testing about the function of interest can be conducted via the companion
function binstest
.
binsglm(y, x, w = NULL, data = NULL, at = NULL, family = gaussian(), deriv = 0, nolink = F, dots = NULL, dotsgrid = 0, dotsgridmean = T, line = NULL, linegrid = 20, ci = NULL, cigrid = 0, cigridmean = T, cb = NULL, cbgrid = 20, polyreg = NULL, polyreggrid = 20, polyregcigrid = 0, by = NULL, bycolors = NULL, bysymbols = NULL, bylpatterns = NULL, legendTitle = NULL, legendoff = F, nbins = NULL, binspos = "qs", binsmethod = "dpi", nbinsrot = NULL, pselect = NULL, sselect = NULL, samebinsby = F, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = "HC1", cluster = NULL, asyvar = F, level = 95, noplot = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, plotxrange = NULL, plotyrange = NULL, ...)
binsglm(y, x, w = NULL, data = NULL, at = NULL, family = gaussian(), deriv = 0, nolink = F, dots = NULL, dotsgrid = 0, dotsgridmean = T, line = NULL, linegrid = 20, ci = NULL, cigrid = 0, cigridmean = T, cb = NULL, cbgrid = 20, polyreg = NULL, polyreggrid = 20, polyregcigrid = 0, by = NULL, bycolors = NULL, bysymbols = NULL, bylpatterns = NULL, legendTitle = NULL, legendoff = F, nbins = NULL, binspos = "qs", binsmethod = "dpi", nbinsrot = NULL, pselect = NULL, sselect = NULL, samebinsby = F, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = "HC1", cluster = NULL, asyvar = F, level = 95, noplot = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, plotxrange = NULL, plotyrange = NULL, ...)
y |
outcome variable. A vector. |
x |
independent variable of interest. A vector. |
w |
control variables. A matrix, a vector or a |
data |
an optional data frame containing variables in the model. |
at |
value of |
family |
a description of the error distribution and link function to be used in the generalized linear model. (See |
deriv |
derivative order of the regression function for estimation, testing and plotting.
The default is |
nolink |
if true, the function within the inverse link function is reported instead of the conditional mean function for the outcome. |
dots |
a vector or a logical value. If |
dotsgrid |
number of dots within each bin to be plotted. Given the choice, these dots are point estimates
evaluated over an evenly-spaced grid within each bin. The default is |
dotsgridmean |
If true, the dots corresponding to the point estimates evaluated at the mean of |
line |
a vector or a logical value. If |
linegrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the point estimate set by the |
ci |
a vector or a logical value. If |
cigrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the point
estimate set by the |
cigridmean |
If true, the confidence intervals corresponding to the point estimates evaluated at the mean of |
cb |
a vector or a logical value. If |
cbgrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the point
estimate set by the |
polyreg |
degree of a global polynomial regression model for plotting. By default, this fit is not included
in the plot unless explicitly specified. Recommended specification is |
polyreggrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the point estimate set by the |
polyregcigrid |
number of evaluation points of an evenly-spaced grid within each bin used for constructing
confidence intervals based on polynomial regression set by the |
by |
a vector containing the group indicator for subgroup analysis; both numeric and string variables
are supported. When |
bycolors |
an ordered list of colors for plotting each subgroup series defined by the option |
bysymbols |
an ordered list of symbols for plotting each subgroup series defined by the option |
bylpatterns |
an ordered list of line patterns for plotting each subgroup series defined by the option |
legendTitle |
String, title of legend. |
legendoff |
If true, no legend is added. |
nbins |
number of bins for partitioning/binning of |
binspos |
position of binning knots. The default is |
binsmethod |
method for data-driven selection of the number of bins. The default is |
nbinsrot |
initial number of bins value used to construct the DPI number of bins selector. If not specified, the data-driven ROT selector is used instead. |
pselect |
vector of numbers within which the degree of polynomial |
sselect |
vector of numbers within which the number of smoothness constraints |
samebinsby |
if true, a common partitioning/binning structure across all subgroups specified by the option |
randcut |
upper bound on a uniformly distributed variable used to draw a subsample for bins/degree/smoothness selection.
Observations for which |
nsims |
number of random draws for constructing confidence bands. The default is
|
simsgrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the supremum operation needed to construct confidence bands. The default is |
simsseed |
seed for simulation. |
vce |
Procedure to compute the variance-covariance matrix estimator. Options are
|
cluster |
cluster ID. Used for compute cluster-robust standard errors. |
asyvar |
if true, the standard error of the nonparametric component is computed and the uncertainty related to control
variables is omitted. Default is |
level |
nominal confidence level for confidence interval and confidence band estimation. Default is |
noplot |
if true, no plot produced. |
dfcheck |
adjustments for minimum effective sample size checks, which take into account number of unique
values of |
masspoints |
how mass points in
|
weights |
an optional vector of weights to be used in the fitting process. Should be |
subset |
optional rule specifying a subset of observations to be used. |
plotxrange |
a vector. |
plotyrange |
a vector. |
... |
optional arguments used by |
bins_plot |
A |
data.plot |
A list containing data for plotting. Each item is a sublist of data frames for each group. Each sublist may contain the following data frames:
|
imse.var.rot |
Variance constant in IMSE, ROT selection. |
imse.bsq.rot |
Bias constant in IMSE, ROT selection. |
imse.var.dpi |
Variance constant in IMSE, DPI selection. |
imse.bsq.dpi |
Bias constant in IMSE, DPI selection. |
cval.by |
A vector of critical values for constructing confidence band for each group. |
opt |
A list containing options passed to the function, as well as |
Matias D. Cattaneo, Princeton University, Princeton, NJ. [email protected].
Richard K. Crump, Federal Reserve Bank of New York, New York, NY. [email protected].
Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. [email protected].
Yingjie Feng (maintainer), Tsinghua University, Beijing, China. [email protected].
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.
x <- runif(500); d <- 1*(runif(500)<=x) ## Binned scatterplot binsglm(d, x, family=binomial())
x <- runif(500); d <- 1*(runif(500)<=x) ## Binned scatterplot binsglm(d, x, family=binomial())
binspwc
implements hypothesis testing procedures for pairwise group comparison of binscatter estimators
and plots confidence bands for the difference in binscatter parameters between each pair of groups, following the
results in Cattaneo, Crump, Farrell and Feng (2024a) and
Cattaneo, Crump, Farrell and Feng (2024b).
If the binning scheme is not set by the user, the companion function
binsregselect
is used to implement binscatter in a data-driven way. Binned scatter plots based on different methods
can be constructed using the companion functions binsreg
, binsqreg
or binsglm
.
Hypothesis testing for parametric functional forms of and shape restrictions on the regression function of interest can
be conducted via the companion function binstest
.
binspwc(y, x, w = NULL, data = NULL, estmethod = "reg", family = gaussian(), quantile = NULL, deriv = 0, at = NULL, nolink = F, by = NULL, pwc = NULL, testtype = "two-sided", lp = Inf, bins = NULL, bynbins = NULL, binspos = "qs", pselect = NULL, sselect = NULL, binsmethod = "dpi", nbinsrot = NULL, samebinsby = FALSE, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = NULL, cluster = NULL, asyvar = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, numdist = NULL, numclust = NULL, estmethodopt = NULL, plot = FALSE, dotsngrid = 0, plotxrange = NULL, plotyrange = NULL, colors = NULL, symbols = NULL, level = 95, ...)
binspwc(y, x, w = NULL, data = NULL, estmethod = "reg", family = gaussian(), quantile = NULL, deriv = 0, at = NULL, nolink = F, by = NULL, pwc = NULL, testtype = "two-sided", lp = Inf, bins = NULL, bynbins = NULL, binspos = "qs", pselect = NULL, sselect = NULL, binsmethod = "dpi", nbinsrot = NULL, samebinsby = FALSE, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = NULL, cluster = NULL, asyvar = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, numdist = NULL, numclust = NULL, estmethodopt = NULL, plot = FALSE, dotsngrid = 0, plotxrange = NULL, plotyrange = NULL, colors = NULL, symbols = NULL, level = 95, ...)
y |
outcome variable. A vector. |
x |
independent variable of interest. A vector. |
w |
control variables. A matrix, a vector or a |
data |
an optional data frame containing variables used in the model. |
estmethod |
estimation method. The default is |
family |
a description of the error distribution and link function to be used in the generalized linear model when |
quantile |
the quantile to be estimated. A number strictly between 0 and 1. |
deriv |
derivative order of the regression function for estimation, testing and plotting.
The default is |
at |
value of |
nolink |
if true, the function within the inverse link function is reported instead of the conditional mean function for the outcome. |
by |
a vector containing the group indicator for subgroup analysis; both numeric and string variables
are supported. When |
pwc |
a vector or a logical value. If |
testtype |
type of pairwise comparison test. The default is |
lp |
an Lp metric used for pairwise comparison tests. The default is |
bins |
A vector. If |
bynbins |
a vector of the number of bins for partitioning/binning of |
binspos |
position of binning knots. The default is |
pselect |
vector of numbers within which the degree of polynomial |
sselect |
vector of numbers within which the number of smoothness constraints |
binsmethod |
method for data-driven selection of the number of bins. The default is |
nbinsrot |
initial number of bins value used to construct the DPI number of bins selector. If not specified, the data-driven ROT selector is used instead. |
samebinsby |
if true, a common partitioning/binning structure across all subgroups specified by the option |
randcut |
upper bound on a uniformly distributed variable used to draw a subsample for bins/degree/smoothness selection.
Observations for which |
nsims |
number of random draws for hypothesis testing. The default is
|
simsgrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the supremum (infimum or Lp metric) operation needed to construct hypothesis testing
procedures. The default is |
simsseed |
seed for simulation. |
vce |
procedure to compute the variance-covariance matrix estimator. For least squares regression and generalized linear regression, the allowed options are the same as that for |
cluster |
cluster ID. Used for compute cluster-robust standard errors. |
asyvar |
if true, the standard error of the nonparametric component is computed and the uncertainty related to control
variables is omitted. Default is |
dfcheck |
adjustments for minimum effective sample size checks, which take into account number of unique
values of |
masspoints |
how mass points in
|
weights |
an optional vector of weights to be used in the fitting process. Should be |
subset |
optional rule specifying a subset of observations to be used. |
numdist |
number of distinct values for selection. Used to speed up computation. |
numclust |
number of clusters for selection. Used to speed up computation. |
estmethodopt |
a list of optional arguments used by |
plot |
if true, the confidence bands for all pairwise group comparisons (the difference between each pair of groups) are plotted.
The degree and smoothness of polynomials used to construct the bands are the same as those specified for testing. The default is |
dotsngrid |
number of dots to be added to the plot for confidence bands. Given the choice, these dots are point estimates of the difference between groups
evaluated over an evenly-spaced grid within the common support of all groups. The default is |
plotxrange |
a vector. |
plotyrange |
a vector. |
colors |
an ordered list of colors for plotting the difference between each pair of groups. |
symbols |
an ordered list of symbols for plotting the difference between each pair of groups. |
level |
nominal confidence level for confidence band estimation. Default is |
... |
optional arguments to control bootstrapping if |
stat |
A matrix. Each row corresponds to the comparison between two groups. The first column is the test statistic. The second and third columns give the corresponding group numbers.
The null hypothesis is |
pval |
A vector of p-values for all pairwise group comparisons. |
bins_plot |
A |
data.plot |
A list containing data for plotting. Each item is a sublist of data frames for comparison between each pair of groups. Each sublist may contain the following data frames:
|
cval.cb |
A vector of critical values for all pairwise group comparisons. |
imse.var.rot |
Variance constant in IMSE expansion, ROT selection. |
imse.bsq.rot |
Bias constant in IMSE expansion, ROT selection. |
imse.var.dpi |
Variance constant in IMSE expansion, DPI selection. |
imse.bsq.dpi |
Bias constant in IMSE expansion, DPI selection. |
opt |
A list containing options passed to the function, as well as |
Matias D. Cattaneo, Princeton University, Princeton, NJ. [email protected].
Richard K. Crump, Federal Reserve Bank of New York, New York, NY. [email protected].
Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. [email protected].
Yingjie Feng (maintainer), Tsinghua University, Beijing, China. [email protected].
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.
binsreg
, binsqreg
, binsglm
, binsregselect
, binstest
.
x <- runif(500); y <- sin(x)+rnorm(500); t <- 1*(runif(500)>0.5) ## Binned scatterplot binspwc(y,x, by=t)
x <- runif(500); y <- sin(x)+rnorm(500); t <- 1*(runif(500)>0.5) ## Binned scatterplot binspwc(y,x, by=t)
binsqreg
implements binscatter quantile regression with robust inference procedures and plots, following the
results in Cattaneo, Crump, Farrell and Feng (2024a) and
Cattaneo, Crump, Farrell and Feng (2024b).
Binscatter provides a flexible way to describe the quantile relationship between two variables, after
possibly adjusting for other covariates, based on partitioning/binning of the independent variable of interest.
The main purpose of this function is to generate binned scatter plots with curve estimation with robust pointwise confidence intervals and
uniform confidence band. If the binning scheme is not set by the user, the companion function
binsregselect
is used to implement binscatter in a data-driven way. Hypothesis testing about the function of interest
can be conducted via the companion function binstest
.
binsqreg(y, x, w = NULL, data = NULL, at = NULL, quantile = 0.5, deriv = 0, dots = NULL, dotsgrid = 0, dotsgridmean = T, line = NULL, linegrid = 20, ci = NULL, cigrid = 0, cigridmean = T, cb = NULL, cbgrid = 20, polyreg = NULL, polyreggrid = 20, polyregcigrid = 0, by = NULL, bycolors = NULL, bysymbols = NULL, bylpatterns = NULL, legendTitle = NULL, legendoff = F, nbins = NULL, binspos = "qs", binsmethod = "dpi", nbinsrot = NULL, pselect = NULL, sselect = NULL, samebinsby = F, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = "nid", cluster = NULL, asyvar = F, level = 95, noplot = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, plotxrange = NULL, plotyrange = NULL, qregopt = NULL, ...)
binsqreg(y, x, w = NULL, data = NULL, at = NULL, quantile = 0.5, deriv = 0, dots = NULL, dotsgrid = 0, dotsgridmean = T, line = NULL, linegrid = 20, ci = NULL, cigrid = 0, cigridmean = T, cb = NULL, cbgrid = 20, polyreg = NULL, polyreggrid = 20, polyregcigrid = 0, by = NULL, bycolors = NULL, bysymbols = NULL, bylpatterns = NULL, legendTitle = NULL, legendoff = F, nbins = NULL, binspos = "qs", binsmethod = "dpi", nbinsrot = NULL, pselect = NULL, sselect = NULL, samebinsby = F, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = "nid", cluster = NULL, asyvar = F, level = 95, noplot = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, plotxrange = NULL, plotyrange = NULL, qregopt = NULL, ...)
y |
outcome variable. A vector. |
x |
independent variable of interest. A vector. |
w |
control variables. A matrix, a vector or a |
data |
an optional data frame containing variables in the model. |
at |
value of |
quantile |
the quantile to be estimated. A number strictly between 0 and 1. |
deriv |
derivative order of the regression function for estimation, testing and plotting.
The default is |
dots |
a vector or a logical value. If |
dotsgrid |
number of dots within each bin to be plotted. Given the choice, these dots are point estimates
evaluated over an evenly-spaced grid within each bin. The default is |
dotsgridmean |
If true, the dots corresponding to the point estimates evaluated at the mean of |
line |
a vector or a logical value. If |
linegrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the point estimate set by the |
ci |
a vector or a logical value. If |
cigrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the point
estimate set by the |
cigridmean |
If true, the confidence intervals corresponding to the point estimates evaluated at the mean of |
cb |
a vector or a logical value. If |
cbgrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the point
estimate set by the |
polyreg |
degree of a global polynomial regression model for plotting. By default, this fit is not included
in the plot unless explicitly specified. Recommended specification is |
polyreggrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the point estimate set by the |
polyregcigrid |
number of evaluation points of an evenly-spaced grid within each bin used for constructing
confidence intervals based on polynomial regression set by the |
by |
a vector containing the group indicator for subgroup analysis; both numeric and string variables
are supported. When |
bycolors |
an ordered list of colors for plotting each subgroup series defined by the option |
bysymbols |
an ordered list of symbols for plotting each subgroup series defined by the option |
bylpatterns |
an ordered list of line patterns for plotting each subgroup series defined by the option |
legendTitle |
String, title of legend. |
legendoff |
If true, no legend is added. |
nbins |
number of bins for partitioning/binning of |
binspos |
position of binning knots. The default is |
binsmethod |
method for data-driven selection of the number of bins. The default is |
nbinsrot |
initial number of bins value used to construct the DPI number of bins selector. If not specified, the data-driven ROT selector is used instead. |
pselect |
vector of numbers within which the degree of polynomial |
sselect |
vector of numbers within which the number of smoothness constraints |
samebinsby |
if true, a common partitioning/binning structure across all subgroups specified by the option |
randcut |
upper bound on a uniformly distributed variable used to draw a subsample for bins/degree/smoothness selection.
Observations for which |
nsims |
number of random draws for constructing confidence bands. The default is
|
simsgrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the supremum operation needed to construct confidence bands. The default is |
simsseed |
seed for simulation. |
vce |
Procedure to compute the variance-covariance matrix estimator (see
|
cluster |
cluster ID. Used for compute cluster-robust standard errors. |
asyvar |
if true, the standard error of the nonparametric component is computed and the uncertainty related to control
variables is omitted. Default is |
level |
nominal confidence level for confidence interval and confidence band estimation. Default is |
noplot |
if true, no plot produced. |
dfcheck |
adjustments for minimum effective sample size checks, which take into account number of unique
values of |
masspoints |
how mass points in
|
weights |
an optional vector of weights to be used in the fitting process. Should be |
subset |
optional rule specifying a subset of observations to be used. |
plotxrange |
a vector. |
plotyrange |
a vector. |
qregopt |
a list of optional arguments used by |
... |
optional arguments to control bootstrapping. See |
bins_plot |
A |
data.plot |
A list containing data for plotting. Each item is a sublist of data frames for each group. Each sublist may contain the following data frames:
|
imse.var.rot |
Variance constant in IMSE, ROT selection. |
imse.bsq.rot |
Bias constant in IMSE, ROT selection. |
imse.var.dpi |
Variance constant in IMSE, DPI selection. |
imse.bsq.dpi |
Bias constant in IMSE, DPI selection. |
cval.by |
A vector of critical values for constructing confidence band for each group. |
opt |
A list containing options passed to the function, as well as |
Matias D. Cattaneo, Princeton University, Princeton, NJ. [email protected].
Richard K. Crump, Federal Reserve Bank of New York, New York, NY. [email protected].
Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. [email protected].
Yingjie Feng (maintainer), Tsinghua University, Beijing, China. [email protected].
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.
x <- runif(500); y <- sin(x)+rnorm(500) ## Binned scatterplot binsqreg(y,x)
x <- runif(500); y <- sin(x)+rnorm(500) ## Binned scatterplot binsqreg(y,x)
binsreg
implements binscatter least squares regression with robust inference procedures and plots, following the
results in Cattaneo, Crump, Farrell and Feng (2024a) and
Cattaneo, Crump, Farrell and Feng (2024b).
Binscatter provides a flexible way to describe the mean relationship between two variables, after
possibly adjusting for other covariates, based on partitioning/binning of the independent variable of interest.
The main purpose of this function is to generate binned scatter plots with curve estimation with robust pointwise confidence intervals and
uniform confidence band. If the binning scheme is not set by the user, the companion function
binsregselect
is used to implement binscatter in a data-driven (optimal)
way. Hypothesis testing about the regression function can be conducted via the companion
function binstest
.
binsreg(y, x, w = NULL, data = NULL, at = NULL, deriv = 0, dots = NULL, dotsgrid = 0, dotsgridmean = T, line = NULL, linegrid = 20, ci = NULL, cigrid = 0, cigridmean = T, cb = NULL, cbgrid = 20, polyreg = NULL, polyreggrid = 20, polyregcigrid = 0, by = NULL, bycolors = NULL, bysymbols = NULL, bylpatterns = NULL, legendTitle = NULL, legendoff = F, nbins = NULL, binspos = "qs", binsmethod = "dpi", nbinsrot = NULL, pselect = NULL, sselect = NULL, samebinsby = F, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = "HC1", cluster = NULL, asyvar = F, level = 95, noplot = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, plotxrange = NULL, plotyrange = NULL)
binsreg(y, x, w = NULL, data = NULL, at = NULL, deriv = 0, dots = NULL, dotsgrid = 0, dotsgridmean = T, line = NULL, linegrid = 20, ci = NULL, cigrid = 0, cigridmean = T, cb = NULL, cbgrid = 20, polyreg = NULL, polyreggrid = 20, polyregcigrid = 0, by = NULL, bycolors = NULL, bysymbols = NULL, bylpatterns = NULL, legendTitle = NULL, legendoff = F, nbins = NULL, binspos = "qs", binsmethod = "dpi", nbinsrot = NULL, pselect = NULL, sselect = NULL, samebinsby = F, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = "HC1", cluster = NULL, asyvar = F, level = 95, noplot = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, plotxrange = NULL, plotyrange = NULL)
y |
outcome variable. A vector. |
x |
independent variable of interest. A vector. |
w |
control variables. A matrix, a vector or a |
data |
an optional data frame containing variables used in the model. |
at |
value of |
deriv |
derivative order of the regression function for estimation, testing and plotting.
The default is |
dots |
a vector or a logical value. If |
dotsgrid |
number of dots within each bin to be plotted. Given the choice, these dots are point estimates
evaluated over an evenly-spaced grid within each bin. The default is |
dotsgridmean |
If true, the dots corresponding to the point estimates evaluated at the mean of |
line |
a vector or a logical value. If |
linegrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the point estimate set by the |
ci |
a vector or a logical value. If |
cigrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the point
estimate set by the |
cigridmean |
If true, the confidence intervals corresponding to the point estimates evaluated at the mean of |
cb |
a vector or a logical value. If |
cbgrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the point
estimate set by the |
polyreg |
degree of a global polynomial regression model for plotting. By default, this fit is not included
in the plot unless explicitly specified. Recommended specification is |
polyreggrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the point estimate set by the |
polyregcigrid |
number of evaluation points of an evenly-spaced grid within each bin used for constructing
confidence intervals based on polynomial regression set by the |
by |
a vector containing the group indicator for subgroup analysis; both numeric and string variables
are supported. When |
bycolors |
an ordered list of colors for plotting each subgroup series defined by the option |
bysymbols |
an ordered list of symbols for plotting each subgroup series defined by the option |
bylpatterns |
an ordered list of line patterns for plotting each subgroup series defined by the option |
legendTitle |
String, title of legend. |
legendoff |
If true, no legend is added. |
nbins |
number of bins for partitioning/binning of |
binspos |
position of binning knots. The default is |
binsmethod |
method for data-driven selection of the number of bins. The default is |
nbinsrot |
initial number of bins value used to construct the DPI number of bins selector. If not specified, the data-driven ROT selector is used instead. |
pselect |
vector of numbers within which the degree of polynomial |
sselect |
vector of numbers within which the number of smoothness constraints |
samebinsby |
if true, a common partitioning/binning structure across all subgroups specified by the option |
randcut |
upper bound on a uniformly distributed variable used to draw a subsample for bins/degree/smoothness selection.
Observations for which |
nsims |
number of random draws for constructing confidence bands. The default is
|
simsgrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the supremum operation needed to construct confidence bands. The default is |
simsseed |
seed for simulation. |
vce |
Procedure to compute the variance-covariance matrix estimator. Options are
|
cluster |
cluster ID. Used for compute cluster-robust standard errors. |
asyvar |
If true, the standard error of the nonparametric component is computed and the uncertainty related to control
variables is omitted. Default is |
level |
nominal confidence level for confidence interval and confidence band estimation. Default is |
noplot |
if true, no plot produced. |
dfcheck |
adjustments for minimum effective sample size checks, which take into account number of unique
values of |
masspoints |
how mass points in
|
weights |
an optional vector of weights to be used in the fitting process. Should be |
subset |
Optional rule specifying a subset of observations to be used. |
plotxrange |
a vector. |
plotyrange |
a vector. |
bins_plot |
A |
data.plot |
A list containing data for plotting. Each item is a sublist of data frames for each group. Each sublist may contain the following data frames:
|
imse.var.rot |
Variance constant in IMSE, ROT selection. |
imse.bsq.rot |
Bias constant in IMSE, ROT selection. |
imse.var.dpi |
Variance constant in IMSE, DPI selection. |
imse.bsq.dpi |
Bias constant in IMSE, DPI selection. |
cval.by |
A vector of critical values for constructing confidence band for each group. |
opt |
A list containing options passed to the function, as well as |
Matias D. Cattaneo, Princeton University, Princeton, NJ. [email protected].
Richard K. Crump, Federal Reserve Bank of New York, New York, NY. [email protected].
Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. [email protected].
Yingjie Feng (maintainer), Tsinghua University, Beijing, China. [email protected].
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.
x <- runif(500); y <- sin(x)+rnorm(500) ## Binned scatterplot binsreg(y,x)
x <- runif(500); y <- sin(x)+rnorm(500) ## Binned scatterplot binsreg(y,x)
binsregselect
implements data-driven procedures for selecting the number of bins for binscatter
estimation. The selected number is optimal in minimizing integrated mean squared error (IMSE).
binsregselect(y, x, w = NULL, data = NULL, deriv = 0, bins = NULL, pselect = NULL, sselect = NULL, binspos = "qs", nbins = NULL, binsmethod = "dpi", nbinsrot = NULL, simsgrid = 20, savegrid = F, vce = "HC1", useeffn = NULL, randcut = NULL, cluster = NULL, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, norotnorm = F, numdist = NULL, numclust = NULL)
binsregselect(y, x, w = NULL, data = NULL, deriv = 0, bins = NULL, pselect = NULL, sselect = NULL, binspos = "qs", nbins = NULL, binsmethod = "dpi", nbinsrot = NULL, simsgrid = 20, savegrid = F, vce = "HC1", useeffn = NULL, randcut = NULL, cluster = NULL, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, norotnorm = F, numdist = NULL, numclust = NULL)
y |
outcome variable. A vector. |
x |
independent variable of interest. A vector. |
w |
control variables. A matrix, a vector or a |
data |
an optional data frame containing variables used in the model. |
deriv |
derivative order of the regression function for estimation, testing and plotting.
The default is |
bins |
a vector. |
pselect |
vector of numbers within which the degree of polynomial |
sselect |
vector of numbers within which the number of smoothness constraints |
binspos |
position of binning knots. The default is |
nbins |
number of bins for degree/smoothness selection. If |
binsmethod |
method for data-driven selection of the number of bins. The default is |
nbinsrot |
initial number of bins value used to construct the DPI number of bins selector. If not specified, the data-driven ROT selector is used instead. |
simsgrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the supremum (infimum or Lp metric) operation needed to construct confidence bands and hypothesis testing
procedures. The default is |
savegrid |
if true, a data frame produced containing grid. |
vce |
procedure to compute the variance-covariance matrix estimator. Options are
|
useeffn |
effective sample size to be used when computing the (IMSE-optimal) number of bins. This option is useful for extrapolating the optimal number of bins to larger (or smaller) datasets than the one used to compute it. |
randcut |
upper bound on a uniformly distributed variable used to draw a subsample for bins/degree/smoothness selection.
Observations for which |
cluster |
cluster ID. Used for compute cluster-robust standard errors. |
dfcheck |
adjustments for minimum effective sample size checks, which take into account number of unique
values of |
masspoints |
how mass points in
|
weights |
an optional vector of weights to be used in the fitting process. Should be |
subset |
optional rule specifying a subset of observations to be used. |
norotnorm |
if true, a uniform density rather than normal density used for ROT selection. |
numdist |
number of distinct values for selection. Used to speed up computation. |
numclust |
number of clusters for selection. Used to speed up computation. |
nbinsrot.poly |
ROT number of bins, unregularized. |
nbinsrot.regul |
ROT number of bins, regularized. |
nbinsrot.uknot |
ROT number of bins, unique knots. |
nbinsdpi |
DPI number of bins. |
nbinsdpi.uknot |
DPI number of bins, unique knots. |
prot.poly |
ROT degree of polynomials, unregularized. |
prot.regul |
ROT degree of polynomials, regularized. |
prot.uknot |
ROT degree of polynomials, unique knots. |
pdpi |
DPI degree of polynomials. |
pdpi.uknot |
DPI degree of polynomials, unique knots. |
srot.poly |
ROT number of smoothness constraints, unregularized. |
srot.regul |
ROT number of smoothness constraints, regularized. |
srot.uknot |
ROT number of smoothness constraints, unique knots. |
sdpi |
DPI number of smoothness constraints. |
sdpi.uknot |
DPI number of smoothness constraints, unique knots. |
imse.var.rot |
Variance constant in IMSE expansion, ROT selection. |
imse.bsq.rot |
Bias constant in IMSE expansion, ROT selection. |
imse.var.dpi |
Variance constant in IMSE expansion, DPI selection. |
imse.bsq.dpi |
Bias constant in IMSE expansion, DPI selection. |
int.result |
Intermediate results, including a matrix of degree and smoothness ( |
opt |
A list containing options passed to the function, as well as total sample size |
data.grid |
A data frame containing grid. |
Matias D. Cattaneo, Princeton University, Princeton, NJ. [email protected].
Richard K. Crump, Federal Reserve Bank of New York, New York, NY. [email protected].
Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. [email protected].
Yingjie Feng (maintainer), Tsinghua University, Beijing, China. [email protected].
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.
x <- runif(500); y <- sin(x)+rnorm(500) est <- binsregselect(y,x) summary(est)
x <- runif(500); y <- sin(x)+rnorm(500) est <- binsregselect(y,x) summary(est)
binstest
implements binscatter-based hypothesis testing procedures for parametric functional
forms of and nonparametric shape restrictions on the regression function of interest, following the results
in Cattaneo, Crump, Farrell and Feng (2024a) and
Cattaneo, Crump, Farrell and Feng (2024b).
If the binning scheme is not set by the user,
the companion function binsregselect
is used to implement binscatter in a
data-driven way and inference procedures are based on robust bias correction.
Binned scatter plots based on different methods can be constructed using the companion functions binsreg
,
binsqreg
or binsglm
.
binstest(y, x, w = NULL, data = NULL, estmethod = "reg", family = gaussian(), quantile = NULL, deriv = 0, at = NULL, nolink = F, testmodel = NULL, testmodelparfit = NULL, testmodelpoly = NULL, testshape = NULL, testshapel = NULL, testshaper = NULL, testshape2 = NULL, lp = Inf, bins = NULL, nbins = NULL, pselect = NULL, sselect = NULL, binspos = "qs", binsmethod = "dpi", nbinsrot = NULL, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = NULL, cluster = NULL, asyvar = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, numdist = NULL, numclust = NULL, estmethodopt = NULL, ...)
binstest(y, x, w = NULL, data = NULL, estmethod = "reg", family = gaussian(), quantile = NULL, deriv = 0, at = NULL, nolink = F, testmodel = NULL, testmodelparfit = NULL, testmodelpoly = NULL, testshape = NULL, testshapel = NULL, testshaper = NULL, testshape2 = NULL, lp = Inf, bins = NULL, nbins = NULL, pselect = NULL, sselect = NULL, binspos = "qs", binsmethod = "dpi", nbinsrot = NULL, randcut = NULL, nsims = 500, simsgrid = 20, simsseed = NULL, vce = NULL, cluster = NULL, asyvar = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL, subset = NULL, numdist = NULL, numclust = NULL, estmethodopt = NULL, ...)
y |
outcome variable. A vector. |
x |
independent variable of interest. A vector. |
w |
control variables. A matrix, a vector or a |
data |
an optional data frame containing variables used in the model. |
estmethod |
estimation method. The default is |
family |
a description of the error distribution and link function to be used in the generalized linear model when |
quantile |
the quantile to be estimated. A number strictly between 0 and 1. |
deriv |
derivative order of the regression function for estimation, testing and plotting.
The default is |
at |
value of |
nolink |
if true, the function within the inverse link function is reported instead of the conditional mean function for the outcome. |
testmodel |
a vector or a logical value. It sets the degree of polynomial and the number of smoothness constraints for parametric model specification
testing. If |
testmodelparfit |
a data frame or matrix which contains the evaluation grid and fitted values of the model(s) to be tested against. The column contains a series of evaluation points at which the binscatter model and the parametric model of interest are compared with each other. Each parametric model is represented by other columns, which must contain the fitted values at the corresponding evaluation points. |
testmodelpoly |
degree of a global polynomial model to be tested against. |
testshape |
a vector or a logical value. It sets the degree of polynomial and the number of smoothness constraints for nonparametric shape restriction
testing. If |
testshapel |
a vector of null boundary values for hypothesis testing. Each number |
testshaper |
a vector of null boundary values for hypothesis testing. Each number |
testshape2 |
a vector of null boundary values for hypothesis testing. Each number |
lp |
an Lp metric used for parametric model specification testing and/or shape restriction testing. The default is |
bins |
a vector. If |
nbins |
number of bins for partitioning/binning of |
pselect |
vector of numbers within which the degree of polynomial |
sselect |
vector of numbers within which the number of smoothness constraints |
binspos |
position of binning knots. The default is |
binsmethod |
method for data-driven selection of the number of bins. The default is |
nbinsrot |
initial number of bins value used to construct the DPI number of bins selector. If not specified, the data-driven ROT selector is used instead. |
randcut |
upper bound on a uniformly distributed variable used to draw a subsample for bins/degree/smoothness selection.
Observations for which |
nsims |
number of random draws for hypothesis testing. The default is
|
simsgrid |
number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the supremum (infimum or Lp metric) operation needed to construct hypothesis testing
procedures. The default is |
simsseed |
seed for simulation. |
vce |
procedure to compute the variance-covariance matrix estimator. For least squares regression and generalized linear regression, the allowed options are the same as that for |
cluster |
cluster ID. Used for compute cluster-robust standard errors. |
asyvar |
if true, the standard error of the nonparametric component is computed and the uncertainty related to control
variables is omitted. Default is |
dfcheck |
adjustments for minimum effective sample size checks, which take into account number of unique
values of |
masspoints |
how mass points in
|
weights |
an optional vector of weights to be used in the fitting process. Should be |
subset |
optional rule specifying a subset of observations to be used. |
numdist |
number of distinct values for selection. Used to speed up computation. |
numclust |
number of clusters for selection. Used to speed up computation. |
estmethodopt |
a list of optional arguments used by |
... |
optional arguments to control bootstrapping if |
testshapeL |
Results for |
testshapeR |
Results for |
testshape2 |
Results for |
testpoly |
Results for |
testmodel |
Results for |
imse.var.rot |
Variance constant in IMSE, ROT selection. |
imse.bsq.rot |
Bias constant in IMSE, ROT selection. |
imse.var.dpi |
Variance constant in IMSE, DPI selection. |
imse.bsq.dpi |
Bias constant in IMSE, DPI selection. |
opt |
A list containing options passed to the function, as well as total sample size |
Matias D. Cattaneo, Princeton University, Princeton, NJ. [email protected].
Richard K. Crump, Federal Reserve Bank of New York, New York, NY. [email protected].
Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. [email protected].
Yingjie Feng (maintainer), Tsinghua University, Beijing, China. [email protected].
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.
binsreg
, binsqreg
, binsglm
, binsregselect
.
x <- runif(500); y <- sin(x)+rnorm(500) est <- binstest(y,x, testmodelpoly=1) summary(est)
x <- runif(500); y <- sin(x)+rnorm(500) est <- binstest(y,x, testmodelpoly=1) summary(est)