Title: | Extreme Bounds Analysis (EBA) |
---|---|
Description: | An implementation of Extreme Bounds Analysis (EBA), a global sensitivity analysis that examines the robustness of determinants in regression models. The package supports both Leamer's and Sala-i-Martin's versions of EBA, and allows users to customize all aspects of the analysis. |
Authors: | Marek Hlavac <[email protected]> |
Maintainer: | Marek Hlavac <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.7 |
Built: | 2024-12-19 06:53:56 UTC |
Source: | CRAN |
eba
is used to perform extreme bounds analysis (EBA), a global sensitivity test that examines the robustness of the association between a dependent variable and a variety of possible determinants. The eba
function performs a demanding version of EBA, proposed by Leamer (1985), that focuses on the upper and lower extreme bounds of regression estimates, as well as a more flexible version proposed by Sala-i-Martin (1997). Sala-i-Martin's EBA considers the entire distribution of regression coefficients. For Sala-i-Martin's version of extreme bounds analysis, eba
estimates results for both the normal model (in which regression coefficients are assumed to be normally distributed across models) and the generic model (where no such assumption is made).
eba(formula = NULL, data, y = NULL, free = NULL, doubtful = NULL, focus = NULL, k = 0:3, mu = 0, level = 0.95, vif = NULL, exclusive = NULL, draws = NULL, reg.fun = lm, se.fun = NULL, include.fun = NULL, weights = NULL, ...)
eba(formula = NULL, data, y = NULL, free = NULL, doubtful = NULL, focus = NULL, k = 0:3, mu = 0, level = 0.95, vif = NULL, exclusive = NULL, draws = NULL, reg.fun = lm, se.fun = NULL, include.fun = NULL, weights = NULL, ...)
formula |
a formula that specifies the EBA model that the function will run. Most generally, the formula is of the following format: |
data |
a data frame containing the variables used in the extreme bounds analysis. |
y |
a character string that specifies the dependent variable. |
free |
a character vector that specifies the 'free' variables to be used in the analysis. These variables are included in each regression model. |
doubtful |
a character vector that specifies the 'doubtful' variables to be used in the analysis. These variables will be included, in various combinations, in the estimated regression models. |
focus |
a character vector that specifies the 'focus' variables of the extreme bounds analysis. These are the variables whose robustness the user wants to test. The focus variables must be a subset of the variables included in the argument |
k |
a vector of integers that specifies the number of doubtful variables that will be included in each estimated regression model in addition to the focus variable. Following Levine and Renelt (1992), the default is set to |
mu |
a named vector of numeric values that specifies regression coefficients under the null hypothesis. The names of the vector's elements indicate which variable the null hypothesis coefficients belong to. These null hypothesis coefficient values will be used in all hypothesis testing. Alternatively, the argument |
level |
a numeric value between 0 and 1 that indicates the confidence level to be used in determining the robustness/fragility of determinants. |
vif |
a numeric value that sets the maximum limit on a coefficient's variance inflation factor (VIF), a rule-of-thumb indicator of multicollinearity. Only coefficient estimates whose VIF does not exceed the limit will be considered in the analysis. If |
exclusive |
a list of character vectors, or a formula with sets of mutually exclusive variables separated by |
draws |
a positive integer value that specifies how many regressions |
reg.fun |
a function that estimates the desired regression model. The function must accept arguments |
se.fun |
a function that calculates the standard errors for regression coefficient estimates. The function must accept the regression model object as its first argument, and must return a numeric vector with element names that identify the corresponding regressors. |
include.fun |
a function that determines whether the results from a particular regression model will be included in the analysis. The function must accept the regression model object as its first argument, and must return a logical value. Only regression models for which the function returns a value of TRUE will be included in the extreme bounds analysis. |
weights |
a character string or a function that specifies what weights will be applied to the results from each estimated regression model. The default value of |
... |
additional arguments that will be passed on to the regression function specified by |
If the argument focus
is NULL, it is populated by the content of doubtful
. Conversely, if doubtful
is NULL, it will be filled in with values from focus
. It is thus sufficient to specify only one of doubtful
or focus
to test the robustness of all doubtful variables.
The character strings in arguments y
, free
, doubtful
, focus
and exclusive
can contain model formula operators described in formula
(such as :
, *
, ^
, %in%
), as well as the function I
. In addition, the variables in character strings can be enclosed within other functions: "log(x)"
, for instance, represents the natural logarithm of x
.
The summary
object obtained from the regression function specified in argument reg.fun
should contain a coefficients
matrix component. eba
will collect the coefficient estimates, standard errors, test statistics and p-values from the first, second, third and fourth columns of the coefficients
matrix, respectively. The number of observations is equal to length(x$residuals)
, where x
is the regression model object.
The calculation of weights based on McFadden's likelihood ratio index (see argument weights
) relies on the generic accessor function logLik
. If weights
are based on the regression's R-squared and adjusted R-squared, eba
obtains the values of these statistics from the model object's components r.squared
and adj.r.squared
, respectively.
eba
returns an object of class "eba"
. The corresponding summary
function (i.e., summary.eba
) returns the same object.
An object of class "eba"
is a list containing the following components:
bounds |
a data frame with the results of the extreme bounds analysis. The data frame
|
call |
the matched call. |
coefficients |
a list that contains data frames with selected quantities of interest that emerge from the extreme bounds analysis. This list can also be extracted by calling the generic accessor function
|
mu |
a named vector of regression coefficients under the null hypothesis for each variable. |
level |
a number between 0 and 1 that indicates the confidence level for hypothesis testing. |
ncomb |
total number of doubtful variable combinations that include at least one focus variable. |
nreg |
total number of regressions that were estimated as part of the extreme bounds analysis. When |
nreg.variable |
a named vector containing the the number of estimated regressions that included each variable. |
ncoef.variable |
a named vector containing the the number of estimated coefficients that were used in the extreme bounds analysis. This number can differ from |
regressions |
a list that contains estimation results for each regression that was run as part of the extreme bounds analysis. This list contains several components which store quantities such as coefficient or standard error estimates for each of the estimated regressions. Each of these components is a matrix whose number of rows corresponds to the total number of regressions (equal to
|
Hlavac, Marek (2016). ExtremeBounds: Extreme Bounds Analysis in R. Journal of Statistical Software, 72(9), 1-22. doi: 10.18637/jss.v072.i09.
Marek Hlavac < mhlavac at alumni.princeton.edu >
Research Fellow, Central European Labour Studies Institute (CELSI), Bratislava, Slovakia
McFadden, Daniel L. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. In: P. Zarembka (Ed.), Frontiers in Econometrics, Academic Press: New York, 105-142.
Leamer, Edward E. (1985). Sensitivity Analysis Would Help. American Economic Review, 57(3), 308-313.
Levine, Ross, and David Renelt. (1992). A Sensitivity Analysis of Cross-Country Growth Regressions. American Economic Review, 82(4), 942-963.
Sala-i-Martin, Xavier. (1997). I Just Ran Two Million Regressions. American Economic Review, 87(2), 178-183. doi:10.3386/w6252.
# perform Extreme Bounds Analysis eba.results <- eba(formula = mpg ~ wt | hp + gear | cyl + disp + drat + qsec + vs + am + carb, data = mtcars[1:10, ], exclusive = ~ cyl + disp + hp | am + gear) # The same result can be achieved by running: # eba.results <- eba(data = mtcars[1:10, ], y = "mpg", free = "wt", # doubtful = c("cyl", "disp", "hp", "drat", "qsec", # "vs", "am", "gear", "carb"), # focus = c("hp", "gear"), # exclusive = list(c("cyl", "disp", "hp"), # c("am", "gear"))) # print out results print(eba.results) # create histograms hist(eba.results, variables = c("hp","gear"), main = c("hp" = "Gross horsepower", "gear" = "Number of forward gears"))
# perform Extreme Bounds Analysis eba.results <- eba(formula = mpg ~ wt | hp + gear | cyl + disp + drat + qsec + vs + am + carb, data = mtcars[1:10, ], exclusive = ~ cyl + disp + hp | am + gear) # The same result can be achieved by running: # eba.results <- eba(data = mtcars[1:10, ], y = "mpg", free = "wt", # doubtful = c("cyl", "disp", "hp", "drat", "qsec", # "vs", "am", "gear", "carb"), # focus = c("hp", "gear"), # exclusive = list(c("cyl", "disp", "hp"), # c("am", "gear"))) # print out results print(eba.results) # create histograms hist(eba.results, variables = c("hp","gear"), main = c("hp" = "Gross horsepower", "gear" = "Number of forward gears"))
The package ExtremeBounds
performs extreme bounds analysis (EBA), a global sensitivity test that examines the robustness of the association between a dependent variable and a variety of possible determinants. It supports a demanding version of EBA, proposed by Leamer (1985), that focuses on the upper and lower extreme bounds of regression estimates, as well as a more flexible version proposed by Sala-i-Martin (1997). Sala-i-Martin's EBA considers the entire distribution of regression coefficients. For Sala-i-Martin's version of extreme bounds analysis, the package ExtremeBounds
estimates results for both the normal model (in which regression coefficients are assumed to be normally distributed across models) and the generic model (where no such assumption is made).
The most important function is eba
, which performs the extreme bounds analysis and stores the results in an object of class "eba"
. This object can then be passed on to the print.eba
and hist.eba
functions to obtain, respectively, a printed summary of EBA results and a set of histograms that illustrate the EBA results graphically.
If you have any comments or suggestions, please do not hesitate to contact the author.
Hlavac, Marek (2016). ExtremeBounds: Extreme Bounds Analysis in R. Journal of Statistical Software, 72(9), 1-22. doi: 10.18637/jss.v072.i09.
Marek Hlavac < mhlavac at alumni.princeton.edu >
Research Fellow, Central European Labour Studies Institute (CELSI), Bratislava, Slovakia
hist.eba
is used to generate a set of histograms that present the results of extreme bounds analysis graphically. Each histogram illustrates the distribution of regression coefficients across the models estimated in the course of EBA. In addition, function hist.eba
can overlay each histogram with lines that indicate the value of the regression coefficient assumed under the null hypothesis (argument mu.show
), as well as with curves that indicate the distribution's kernel density (argument density.show
) and a normally distributed approximation (argument normal.show
). Additional formatting options are available.
## S3 method for class 'eba' hist(x, variables = NULL, col = "gray", freq = FALSE, main = NULL, mu.show = TRUE, mu.col = "red", mu.lwd = 2, mu.visible = TRUE, density.show = TRUE, density.col = "blue", density.lwd = 2, density.args = NULL, normal.show = FALSE, normal.col = "darkgreen", normal.lwd = 2, normal.weighted = FALSE, xlim = NULL, ylim = NULL, ...)
## S3 method for class 'eba' hist(x, variables = NULL, col = "gray", freq = FALSE, main = NULL, mu.show = TRUE, mu.col = "red", mu.lwd = 2, mu.visible = TRUE, density.show = TRUE, density.col = "blue", density.lwd = 2, density.args = NULL, normal.show = FALSE, normal.col = "darkgreen", normal.lwd = 2, normal.weighted = FALSE, xlim = NULL, ylim = NULL, ...)
x |
an object of class |
variables |
a character vector that specifies the variables for which histograms are requested. If NULL, histograms for all variables will be produced. |
col |
a color to be used to fill the histogram bars. Default is |
freq |
logical; if TRUE, the histogram graphic is a representation of frequencies, the counts component of the result; if FALSE, probability densities, component density, are plotted (so that the histogram has a total area of one). Defaults to TRUE if and only if breaks are equidistant (and probability is not specified). |
main |
a named character vector that specifies the histogram title labels for the requested variables. The name of the vector component specifies the variable, while the content of the component itself contains the title label. If the vector's components are not named, variables are labelled in the order that they appear in the argument |
mu.show |
logical; if TRUE (default), a vertical line for each variable's regression coefficient value assumed under the null hypothesis (specified by the |
mu.col |
a color to be used to draw the vertical line at |
mu.lwd |
the line width for vertical line at |
mu.visible |
logical; if TRUE (default), make sure that the histograms' horizontal axes are scaled so that the vertical line at |
density.show |
logical; if TRUE, a kernel density curve for the regression coefficients' distribution will be drawn over the histograms. The kernel densities are calculated using the standard |
density.col |
a color to be used to draw the kernel density curve. Default is |
density.lwd |
the line width for the kernel density curve. Default is 2. |
density.args |
a list of additional arguments that will be passed on to the kernel |
normal.show |
logical; if TRUE, a density curve for the normal distribution function will be drawn over the histograms. For each variable, the distribution's mean and standard error will be the means and standard errors of the corresponding regression coefficients. |
normal.col |
a color to be used to draw the normal distribution density curve. Default is |
normal.lwd |
the line width for the normal distribution density curve. For more detail, see the documentation for |
normal.weighted |
logical; If TRUE, the normal distribution density shown by |
xlim |
the range of x values with sensible defaults. |
ylim |
the range of y values with sensible defaults. |
... |
additional arguments that will be passed on to the |
hist.eba
returns an object of class "hist.eba"
.
An object of class "hist.eba"
is a list containing the following components:
call |
the matched call |
histograms |
a list of objects of class |
Hlavac, Marek (2016). ExtremeBounds: Extreme Bounds Analysis in R. Journal of Statistical Software, 72(9), 1-22. doi: 10.18637/jss.v072.i09.
Marek Hlavac < mhlavac at alumni.princeton.edu >
Research Fellow, Central European Labour Studies Institute (CELSI), Bratislava, Slovakia
# perform Extreme Bounds Analysis eba.results <- eba(formula = mpg ~ wt | hp + gear | cyl + disp + drat + qsec + vs + am + carb, data = mtcars[1:10, ], k = 0:2) # The same result can be achieved by running: # eba.results <- eba(data = mtcars[1:10, ], y = "mpg", free = "wt", # doubtful = c("cyl","disp","hp","drat","qsec","vs","am","gear","carb"), # focus = c("hp","gear"), k = 0:2) # create histograms, keeping the default settings hist(eba.results) # re-create histograms with customized settings hist(eba.results, variables = c("hp","gear"), main = c("hp" = "Gross horsepower", "gear" = "Number of forward gears"), mu.visible=FALSE, normal.show=TRUE, normal.lwd=1)
# perform Extreme Bounds Analysis eba.results <- eba(formula = mpg ~ wt | hp + gear | cyl + disp + drat + qsec + vs + am + carb, data = mtcars[1:10, ], k = 0:2) # The same result can be achieved by running: # eba.results <- eba(data = mtcars[1:10, ], y = "mpg", free = "wt", # doubtful = c("cyl","disp","hp","drat","qsec","vs","am","gear","carb"), # focus = c("hp","gear"), k = 0:2) # create histograms, keeping the default settings hist(eba.results) # re-create histograms with customized settings hist(eba.results, variables = c("hp","gear"), main = c("hp" = "Gross horsepower", "gear" = "Number of forward gears"), mu.visible=FALSE, normal.show=TRUE, normal.lwd=1)
hist.eba
prints the results of extreme bounds analysis (EBA; performed by the eba
function) and returns the printed object invisibly (via invisible(x)
). The function prints out information about the distribution and significance of estimated regression coefficients, the results of Leamer's EBA, as well as those of Sala-i-Martin's EBA (both the normal and generic model).
## S3 method for class 'eba' print(x, digits = 3, ...)
## S3 method for class 'eba' print(x, digits = 3, ...)
x |
an object of class |
digits |
number of decimal places to which the output will be rounded. |
... |
further arguments passed to |
print.eba
prints the following information in its output:
Call
: the matched call (based on x$call
).
Confidence level
: the confidence level for hypothesis testing (x$level
).
Number of combinations
: the total number of doubtful variable combinations that contain at least one focus variable (x$ncomb
).
Regressions estimated
: the number of regressions that were estimated in the course of EBA (x$nreg
). When no random sampling of regression models was requested (i.e., when eba
's argument draws
is NULL), the number of combinations (above) will equal the number of regressions estimated.
Number of regressions by variable:
the number of regressions estimated, by variable (x$nreg.variable
).
Number of coefficients used by variable:
the number of coefficients used in the extreme bounds analysis, by variable (x$ncoef.variable
).
Beta coefficients:
Coef (Wgt Mean)
: the weighted mean of the estimated regression coefficients. Individual regression models receive a weight specified by eba
's argument weights
.
SE (Wgt Mean)
: the weighted mean of the standard errors on estimated regression coefficients. Individual regression models receive a weight specified by eba
's argument weights
.
Min Coef
and SE (Min Coef)
: the value of the lowest regression coefficient across the estimated models and the corresponding standard error.
Max Coef
and SE (Max Coef)
: the value of the highest regression coefficient across the estimated models and the corresponding standard error.
Distribution of beta coefficients:
Pct(beta < mu)
: proportion of estimated regression coefficients whose value is less than mu
.
Pct(beta > mu)
: proportion of estimated regression coefficients whose value is greater than mu
.
Pct(significant != mu)
: proportion of regression models in which the estimated coefficient is statistically significantly different from mu
.
Pct(signif & beta < mu)
: proportion of estimated regression coefficients that are both statistically significantly different from and whose value is less than mu
.
Pct(signif & beta > mu)
: proportion of estimated regression coefficients that are both statistically significantly different from and whose value is greater than mu
.
Leamer's Extreme Bounds Analysis (EBA):
Lower Extreme Bound
: Leamer's lower extreme bound at the specified confidence level.
Upper Extreme Bound
: Leamer's upper extreme bound at the specified confidence level.
Robust/Fragile?
: a character string indicating whether the variable is robust or fragile based on Leamer's extreme bounds analysis.
Sala-i-Martin's Extreme Bounds Analysis (EBA):
N: CDF(beta <= 0)
: the value of the cumulative density function at CDF(mu
) - i.e., the proportion of coefficients that are estimated to be lower or equal to mu
- based on Sala-i-Martin's EBA that assumes that regression coefficients are normally distributed across the estimated models. Weights specified by eba
's argument weights
are applied.
N: CDF(beta > 0)
: the proportion of coefficients that are estimated to be greater than mu
, based on Sala-i-Martin's EBA that assumes that regression coefficients are normally distributed across the estimated models. Weights specified by eba
's argument weights
are applied.
G: CDF(beta <= 0)
: the value of the cumulative density function at CDF(mu
) based on Sala-i-Martin's EBA that does not assume any particular distribution of regression coefficients across the estimated models. Weights specified by eba
's argument weights
are applied.
G: CDF(beta > 0)
: the proportion of coefficients that are estimated to be greater than mu
, based on Sala-i-Martin's EBA that does not assume any particular distribution of regression coefficients across the estimated models. Weights specified by eba
's argument weights
are applied.
Note that all values of cumulative density functions for Sala-i-Martin's EBA are printed as percentages.
Hlavac, Marek (2016). ExtremeBounds: Extreme Bounds Analysis in R. Journal of Statistical Software, 72(9), 1-22. doi: 10.18637/jss.v072.i09.
Marek Hlavac < mhlavac at alumni.princeton.edu >
Research Fellow, Central European Labour Studies Institute (CELSI), Bratislava, Slovakia
# perform Extreme Bounds Analysis eba.results <- eba(formula = mpg ~ wt | hp + gear | cyl + disp + drat + qsec + vs + am + carb, data = mtcars[1:10, ], k = 0:2) # The same result can be achieved by running: # eba.results <- eba(data = mtcars[1:10, ], y = "mpg", free = "wt", # doubtful = c("cyl","disp","hp","drat","qsec","vs","am","gear","carb"), # focus = c("hp","gear"), k = 0:2) # print out results, rounded to 2 decimal places print(eba.results, digits = 2)
# perform Extreme Bounds Analysis eba.results <- eba(formula = mpg ~ wt | hp + gear | cyl + disp + drat + qsec + vs + am + carb, data = mtcars[1:10, ], k = 0:2) # The same result can be achieved by running: # eba.results <- eba(data = mtcars[1:10, ], y = "mpg", free = "wt", # doubtful = c("cyl","disp","hp","drat","qsec","vs","am","gear","carb"), # focus = c("hp","gear"), k = 0:2) # print out results, rounded to 2 decimal places print(eba.results, digits = 2)