Title: | Blinder-Oaxaca Decomposition |
---|---|
Description: | An implementation of the Blinder-Oaxaca decomposition for linear regression models. |
Authors: | Marek Hlavac <[email protected]> |
Maintainer: | Marek Hlavac <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.5 |
Built: | 2024-12-03 06:45:36 UTC |
Source: | CRAN |
Data from a 2013 sample of employed Hispanic workers in metropolitan Chicago. It is a subset of the 2013 Current Population Survey (CPS) Outgoing Rotation Groups (ORG) data set provided by the Center for Economic and Policy Research in Washington, DC (CEPR, 2014).
data("chicago")
data("chicago")
A data frame containing 712 observations on 9 variables. The 9 variables contain labor market and demographic information on a sample of employed Hispanic workers in the Chicago metropolitan area.
[, 1] | age | the worker's age, expressed in years |
[, 2] | female | an indicator for female gender |
[, 3] | foreign.born | an indicator for foreign-born status |
[, 4] | LTHS | an indicator for having completed less than a high school (LTHS) education |
[, 5] | high.school | an indicator for having completed a high school education |
[, 6] | some.college | an indicator for having completed some college education |
[, 7] | college | an indicator for having completed a college education |
[, 8] | advanced.degree | an indicator for having completed an advanced degree |
[, 9] | ln.real.wage | the natural logarithm of the worker's real wage (in 2013 U.S. dollars) |
Center for Economic and Policy Research (CEPR). 2014. CPS ORG Uniform Extracts, Version 1.9 . Washington, DC.
data("chicago") summary(chicago)
data("chicago") summary(chicago)
oaxaca
performs a Blinder-Oaxaca decomposition for linear regression models (Blinder, 1973; Oaxaca, 1973). This statistical method decomposes the difference in the means of outcome variables across two groups into a part that is due to cross-group differences in explanatory variables and a part that is due to differences in group-specific coefficients. Economists have used Blinder-Oaxaca decompositions extensively to study labor market discrimination. In principle, however, the method is appropriate for the exploration of cross-group differences in any outcome variable.
The oaxaca
function allows users to estimate both a threefold and a twofold variant of the decomposition, as described and implemented by Jann (2008). It supports a variety of reference coefficient weights, as well as pooled model estimation. It can also adjust coefficients on indicator variables to be invariant to the choice of the omitted reference category. Bootstrapped standard errors are calculated (e.g., Efron, 1979). The function returns an object of class "oaxaca"
that can be visualized using the plot.oaxaca
method.
oaxaca(formula, data, group.weights = NULL, R = 100, reg.fun = lm, ...)
oaxaca(formula, data, group.weights = NULL, R = 100, reg.fun = lm, ...)
formula |
a formula that specifies the model that the function will run. Typically, the formula is of the following form: |
data |
a data frame containing the data to be used in the Blinder-Oaxaca decomposition. |
group.weights |
a vector of numeric values between 0 and 1. These values specify the weight given to Group A relative to Group B in determining the reference set of coefficients (Oaxaca and Ransom, 1994). By default, the following weights are included in each estimation:
|
R |
number of bootstrapping replicates for the calculation of standard errors. No bootstrapping is performed when the value of |
reg.fun |
a function that estimates the desired regression model. The function must accept arguments |
... |
additional arguments that will be passed on to the regression function specified by |
oaxaca
returns an object of class "oaxaca"
. The corresponding summary
function (i.e., summary.oaxaca
) returns the same object.
An object of class "oaxaca"
is a list containing the following components:
beta |
a list that contains information about the regression coefficients used in estimating the decomposition. If dummy variables
|
call |
the matched call. |
n |
a list that contains information about the number of observations used in the analysis. It contains the following components:
|
R |
a numeric vector that contains the number of bootstrapping replicates. |
reg |
a list that contains estimated regression objects:
|
threefold |
a list that contains the result of the threefold Blinder-Oaxaca decomposition. It decomposes the difference in mean outcomes into three parts:
The list |
twofold |
a list that contains the result of the twofold Blinder-Oaxaca decomposition. It decomposes the difference in mean outcomes into two parts:
The The list |
x |
a list that contains:
|
y |
a list that contains the mean values of the dependent variable (i.e., the outcome variable). It contains the following components:
|
Hlavac, Marek (2022). oaxaca: Blinder-Oaxaca Decomposition in R.
R package version 0.1.5. https://CRAN.R-project.org/package=oaxaca
Dr. Marek Hlavac < mhlavac at alumni.princeton.edu >
Social Policy Institute, Bratislava, Slovakia
Blinder, Alan S. (1973). Wage Discrimination: Reduced Form and Structural Estimates. Journal of Human Resources, 8(4), 436-455.
Cotton, Jeremiah. (1988). On the Decomposition of Wage Differentials. Review of Economics and Statistics, 70(2), 236-243.
Efron, Bradley. (1979). Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics, 7(1), 1-26.
Gardeazabal, Javier and Arantza Ugidos. (2004). More on Identification in Detailed Wage Decompositions. Review of Economics and Statistics, 86(4), 1034-1036.
Jann, Ben. (2008). The Blinder-Oaxaca Decomposition for Linear Regression Models. Stata Journal, 8(4), 453-479.
Neumark, David. (1988). Employers' Discriminatory Behavior and the Estimation of Wage Discrimination. Journal of Human Resources, 23(3), 279-295.
Oaxaca, Ronald L. (1973). Male-Female Wage Differentials in Urban Labor Markets. International Economic Review, 14(3), 693-709.
Oaxaca, Ronald L. and Michael R. Ransom. (1994). On Discrimination and the Decomposition of Wage Differentials. Journal of Econometrics, 61(1), 5-21.
Reimers, Cordelia W. (1983). Labor Market Discrimination Against Hispanic and Black Men. Review of Economics and Statistics, 65(4), 570-579.
# set random seed set.seed(03104) # load data set of Hispanic workers in Chicago data("chicago") # perform Blinder-Oaxaca Decomposition: # explain differences in log real wages across native and foreign-born groups oaxaca.results.1 <- oaxaca(ln.real.wage ~ age + female + LTHS + some.college + college + advanced.degree | foreign.born, data = chicago, R = 30) # print the results print(oaxaca.results.1) # Next: # - adjust gender and education dummy variable coefficients to make results # invariant to the choice of omitted baseline (reference category) # - include additional weights for the twofold decomposition that give # weights of 0.2 and 0.4 to Group A relative to Group B in the choice # of reference coefficients oaxaca.results.2 <- oaxaca(ln.real.wage ~ age + female + LTHS + some.college + college + advanced.degree | foreign.born | LTHS + some.college + college + advanced.degree, data = chicago, group.weights = c(0.2, 0.4), R = 30) # plot the results plot(oaxaca.results.2)
# set random seed set.seed(03104) # load data set of Hispanic workers in Chicago data("chicago") # perform Blinder-Oaxaca Decomposition: # explain differences in log real wages across native and foreign-born groups oaxaca.results.1 <- oaxaca(ln.real.wage ~ age + female + LTHS + some.college + college + advanced.degree | foreign.born, data = chicago, R = 30) # print the results print(oaxaca.results.1) # Next: # - adjust gender and education dummy variable coefficients to make results # invariant to the choice of omitted baseline (reference category) # - include additional weights for the twofold decomposition that give # weights of 0.2 and 0.4 to Group A relative to Group B in the choice # of reference coefficients oaxaca.results.2 <- oaxaca(ln.real.wage ~ age + female + LTHS + some.college + college + advanced.degree | foreign.born | LTHS + some.college + college + advanced.degree, data = chicago, group.weights = c(0.2, 0.4), R = 30) # plot the results plot(oaxaca.results.2)
plot.oaxaca
is used to generate a set of coefficient bar plots that present the results of a Blinder-Oaxaca decomposition graphically.
## S3 method for class 'oaxaca' plot(x, decomposition = "threefold", type = "variables", group.weight = NULL, unexplained.split = FALSE, variables = NULL, components = NULL, component.left = FALSE, component.labels = NULL, variable.labels = NULL, ci = TRUE, ci.level = 0.95, title = "", xlab = "", ylab = "", bar.color = NULL, ...)
## S3 method for class 'oaxaca' plot(x, decomposition = "threefold", type = "variables", group.weight = NULL, unexplained.split = FALSE, variables = NULL, components = NULL, component.left = FALSE, component.labels = NULL, variable.labels = NULL, ci = TRUE, ci.level = 0.95, title = "", xlab = "", ylab = "", bar.color = NULL, ...)
x |
an object of class |
decomposition |
specifies which type of decomposition will be presented. Can be either |
type |
specifies whether the results of an overall decomposition or a variable-by-variable decomposition will be presented. Can be either |
group.weight |
a numeric value that specifies the group weight for which the twofold decomposition results will be presented. Only relevant when argument |
unexplained.split |
a logical value that toggles whether, in the twofold decomposition, the presentation of the |
variables |
a character vector that specifies the variables for which coefficient bar plots are requested. If NULL, plots for all variables will be produced. Only relevant when argument |
components |
a character vector that specifies which decomposition components will be presented. For threefold decomposition, must be a subset of |
component.left |
a logical value that specifies whether the decomposition components will be presented along the left side of the coefficient bar plot. By default, the argument is set to |
component.labels |
a named character vector that specifies custom labels for individual decomposition components. The character vector elements contain the new labels, while the elements' names must correspond to the appropriate decomposition components: |
variable.labels |
a named character vector that specifies custom labels for the presented variables. The character vector elements contain the new labels, while the elements' names must contain the appropriate variable name. Only relevant when argument |
ci |
a logical value that toggles the presentation of confidence intervals (standard error bars) in the coefficient bar plots. |
ci.level |
a numeric value between 0 and 1 that specifies the confidence level for the presented confidence intervals. |
title |
a character string that contains the title of the coefficient bar plot. |
xlab |
a character string that specifies the horizontal axis label. |
ylab |
a character string that specifies the vertical axis label. |
bar.color |
a named vector that specifies the color of bars in the plot. The vector elements' names contain a string that identifies either the variable or the decomposition component that the color will be applied to. If no names are specifies, the bars will be colored from top to bottom. |
... |
additional arguments passed on to the aesthetic mapping ( |
Hlavac, Marek (2022). oaxaca: Blinder-Oaxaca Decomposition in R.
R package version 0.1.5. https://CRAN.R-project.org/package=oaxaca
Dr. Marek Hlavac < mhlavac at alumni.princeton.edu >
Social Policy Institute, Bratislava, Slovakia
Wickham, Hadley. (2009). ggplot2: Elegant Graphics for Data Analysis. Springer Science & Business Media.
# set random seed set.seed(08544) # load data set of Hispanic workers in Chicago data("chicago") # perform Blinder-Oaxaca Decomposition: # explain differences in log real wages across native and foreign-born groups oaxaca.results <- oaxaca(ln.real.wage ~ age + female + LTHS + some.college + college + advanced.degree | foreign.born, data = chicago, R = 50) # plot results of the threefold decomposition, variable-by-variable # only include educational variables # decomposition components along the left side of the plot plot(oaxaca.results, component.left = TRUE, variables = c("LTHS", "some.college", "college", "advanced.degree"), variable.labels = c("LTHS" = "less than high school", "some.college" = "some college", "advanced.degree" = "advanced degree")) # plot results of the twofold decomposition (overall results) # equal weight for Group A and B in reference coefficient determinantion (weight = 0.5) # unexplained portion split into A and B plot(oaxaca.results, decomposition = "twofold", type = "overall", group.weight = 0.5, unexplained.split = TRUE, bar.color = c("limegreen", "hotpink", "steelblue"))
# set random seed set.seed(08544) # load data set of Hispanic workers in Chicago data("chicago") # perform Blinder-Oaxaca Decomposition: # explain differences in log real wages across native and foreign-born groups oaxaca.results <- oaxaca(ln.real.wage ~ age + female + LTHS + some.college + college + advanced.degree | foreign.born, data = chicago, R = 50) # plot results of the threefold decomposition, variable-by-variable # only include educational variables # decomposition components along the left side of the plot plot(oaxaca.results, component.left = TRUE, variables = c("LTHS", "some.college", "college", "advanced.degree"), variable.labels = c("LTHS" = "less than high school", "some.college" = "some college", "advanced.degree" = "advanced degree")) # plot results of the twofold decomposition (overall results) # equal weight for Group A and B in reference coefficient determinantion (weight = 0.5) # unexplained portion split into A and B plot(oaxaca.results, decomposition = "twofold", type = "overall", group.weight = 0.5, unexplained.split = TRUE, bar.color = c("limegreen", "hotpink", "steelblue"))