Title: | Detailed Distributional Decomposition |
---|---|
Description: | Implements the Oaxaca-Blinder decomposition method and generalizations of it that decompose differences in distributional statistics beyond the mean. The function ob_decompose() decomposes differences in the mean outcome between two groups into one part explained by different covariates (composition effect) and into another part due to differences in the way covariates are linked to the outcome variable (structure effect). The function further divides the two effects into the contribution of each covariate and allows for weighted doubly robust decompositions. For distributional statistics beyond the mean, the function performs the recentered influence function (RIF) decomposition proposed by Firpo, Fortin, and Lemieux (2018). The function dfl_decompose() divides differences in distributional statistics into an composition effect and a structure effect using inverse probability weighting as introduced by DiNardo, Fortin, and Lemieux (1996). The function also allows to sequentially decompose the composition effect into the contribution of single covariates. References: Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. (2018) <doi:10.3390/econometrics6020028>. "Decomposing Wage Distributions Using Recentered Influence Function Regressions." Fortin, Nicole M., Thomas Lemieux, and Sergio Firpo. (2011) <doi:10.3386/w16045>. "Decomposition Methods in Economics." DiNardo, John, Nicole M. Fortin, and Thomas Lemieux. (1996) <doi:10.2307/2171954>. "Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach." Oaxaca, Ronald. (1973) <doi:10.2307/2525981>. "Male-Female Wage Differentials in Urban Labor Markets." Blinder, Alan S. (1973) <doi:10.2307/144855>. "Wage Discrimination: Reduced Form and Structural Estimates." |
Authors: | David Gallusser [aut], Samuel Meier [aut, cre] |
Maintainer: | Samuel Meier <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2024-12-04 07:11:13 UTC |
Source: | CRAN |
The function aggregates decomposition terms and calculates their covariance matrix based on detailed decomposition results.
aggregate_terms( x, aggregate_factors = TRUE, custom_aggregation = NULL, reweighting )
aggregate_terms( x, aggregate_factors = TRUE, custom_aggregation = NULL, reweighting )
x |
an object of class "ob_decompose", usually , a result of a call to [ob_decompose()]. |
aggregate_factors |
boolean, if 'TRUE' (default) terms associated with detailed factor levels are aggregated to a single term for every factor variable. |
custom_aggregation |
list specifying the aggregation of detailed decomposition terms. The parameter 'custom_aggregation' overrides the parameter 'aggregate_factors'. If 'NULL' (default), then either all detailed terms or all terms associated with a single variable are returned. |
reweighting |
boolean, if 'TRUE' the decompostion in 'object' contains reweighting (i.e. specification and reweighting error) |
The function returns an updated object of class "ob_decompose" containing the aggregated decomposition terms.
The function resamples observations and restimates the OB decomposition with the new sample.
bootstrap_estimate_ob_decompose( formula_decomposition, formula_reweighting, data_used, group, reference_0, normalize_factors, reweighting, reweighting_method, trimming, trimming_threshold, rifreg, rifreg_statistic, rifreg_probs, custom_rif_function, na.action, cluster = NULL, ... )
bootstrap_estimate_ob_decompose( formula_decomposition, formula_reweighting, data_used, group, reference_0, normalize_factors, reweighting, reweighting_method, trimming, trimming_threshold, rifreg, rifreg_statistic, rifreg_probs, custom_rif_function, na.action, cluster = NULL, ... )
formula_decomposition |
|
formula_reweighting |
|
data_used |
|
group |
name of the a binary variable (numeric or factor) identifying the two groups that will be compared. The group identified by the lower ranked value in 'group' (i.e., 0 in the case of a dummy variable or the first level of factor variable) is defined as group 0. |
reference_0 |
boolean: indicating if group 0 is the reference group and if its coefficients are used to compute the counterfactual mean. |
normalize_factors |
boolean: If 'TRUE', then factor variables are normalized as proposed by Gardeazabal/Ugidos (2004) |
reweighting |
boolean: if 'TRUE', then the decomposition is performed with with respect to reweighted reference group. |
reweighting_method |
specifies the method fit and predict conditional probabilities
used to derive the reweighting factor. Currently, |
trimming |
boolean: If |
trimming_threshold |
numeric: threshold defining the maximal accepted
relative weight of the reweighting factor value (i.e., inverse probability weight)
of a single observation. If |
rifreg |
boolean: if 'TRUE', then RIF decomposition is performed |
rifreg_statistic |
string containing the distributional statistic for which to compute the RIF. |
rifreg_probs |
a vector of length 1 or more with probabilities of quantiles. |
custom_rif_function |
the RIF function to compute the RIF of the custom distributional statistic. |
na.action |
generic function that defines how NAs in the data should be handled. |
cluster |
numeric vector of same length as |
... |
additional parameters passed to custom_rif_function |
dfl_decompose
divides between-group differences in distributional
statistics of an outcome variable into a structure effect and a composition
effect. Following DiNardo, Fortin, and Lemieux (1996), the procedure reweights
the sample distribution of a reference group such that the group's covariates
distribution matches the covariates distribution of a comparison group.
The function derives counterfactual distributions with inverse probability weigthing. Reweighting factors are estimate by modelling the probability of belonging to the comparison group conditional on covariates.
The function allows detailed decompositions of the composition effect by sequentially reweighting (conditional) covariate distributions. Standard errors can be bootstrapped.
dfl_decompose( formula, data, weights, group, na.action = na.exclude, reference_0 = TRUE, subtract_1_from_0 = FALSE, right_to_left = TRUE, method = "logit", estimate_statistics = TRUE, statistics = c("quantiles", "mean", "variance", "gini", "iq_range_p90_p10", "iq_range_p90_p50", "iq_range_p50_p10"), probs = c(1:9)/10, custom_statistic_function = NULL, trimming = FALSE, trimming_threshold = NULL, return_model = TRUE, estimate_normalized_difference = TRUE, bootstrap = FALSE, bootstrap_iterations = 100, bootstrap_robust = FALSE, cores = 1, ... )
dfl_decompose( formula, data, weights, group, na.action = na.exclude, reference_0 = TRUE, subtract_1_from_0 = FALSE, right_to_left = TRUE, method = "logit", estimate_statistics = TRUE, statistics = c("quantiles", "mean", "variance", "gini", "iq_range_p90_p10", "iq_range_p90_p50", "iq_range_p50_p10"), probs = c(1:9)/10, custom_statistic_function = NULL, trimming = FALSE, trimming_threshold = NULL, return_model = TRUE, estimate_normalized_difference = TRUE, bootstrap = FALSE, bootstrap_iterations = 100, bootstrap_robust = FALSE, cores = 1, ... )
formula |
a |
data |
a |
weights |
name of the observation weights variable or vector of observation weights. |
group |
name of a binary variable (numeric or factor) identifying the
two groups for which the differences are to be decomposed. The group
identified by the lower ranked value in |
na.action |
a function to filter missing data (default |
reference_0 |
boolean: if |
subtract_1_from_0 |
boolean: By default ('FALSE'), the distributional statistic of group 0 is subtracted from the one of group 1 to compute the overall difference. Setting 'subtract_1_from_0' to 'TRUE' merely changes the sign of the decomposition results. |
right_to_left |
determines the direction of a sequential decomposition.
If |
method |
specifies the method to fit and predict conditional probabilities
used to derive the reweighting factor. At the moment, |
estimate_statistics |
boolean: if |
statistics |
a character vector that defines the distributional statistics
for which the decomposition is performed. Per default,
|
probs |
a vector of length 1 or more with the probabilities of the quantiles
to be estimated with default |
custom_statistic_function |
a function estimating a custom distributional statistic
that will be decomposed ( |
trimming |
boolean: If |
trimming_threshold |
numeric: threshold defining the maximal accepted
relative weight of the reweighting factor value (i.e., inverse probability weight)
of a single observation. If |
return_model |
boolean: If |
estimate_normalized_difference |
boolean: If |
bootstrap |
boolean: If |
bootstrap_iterations |
positive integer with default |
bootstrap_robust |
boolean: if |
cores |
positive integer with default |
... |
other parameters passed to the function estimating the conditional probabilities. |
The observed difference to be decomposed equals the difference between the values
of the distributional statistic of group
1 and group
0, respectively:
where denotes the statistics of the outcome distribution
of group
. Group 0 is identified by the lower ranked value
of the
group
variable.
If reference_0=TRUE
, then group 0 is the reference group and its observations
are reweighted such that they match the covariates distribution of group 1, the
comparison group. The counterfactual combines the covariates distribution
of group 1 with the conditional outcome distribution
of group 0 and is derived by reweighting group 0
where is the reweighting factor, i.e., the inverse probabilities
of belonging to the comparison group conditional on covariates x.
The distributional statistic of the counterfactual distribution,
, allows to decompose the observed difference into
a (wage) structure effect (
) and a
composition effect (
).
If reference_0=FALSE
, then the counterfactual is derived by combining
the covariates distribution of group 0 with the conditional outcome
distribution of group 1 and, thus, reweighting group 1
The composition effect becomes and the
structure effect
, respectively.
The covariates are defined in formula
. The reweighting factor is
estimated in the pooled sample with observations from both groups. method = "logit"
uses a logit model to fit the conditional probabilities. method = "fastglm"
also fits a logit model but with a faster algorithm from fastglm.
method = "random_forest"
uses the Ranger implementation of
the random forests classifier.
The counterfactual statistics are then estimated with the observed data of the reference group and the fitted reweighting factors.
formula
allows to specify interaction terms in the conditional
probability models. If you are interested in an aggregate decomposition,
then all covariates have to be entered at once, e.g., Y ~ X + Z
.
The procedure allows for sequential decomposition of the composition effect. In this case, more than one reweighting factor based on different sets of covariates are estimated.
If you are interested in a sequential decomposition, the decomposition
sequence has to be distinguished by the |
operator in the formula
object. For instance, Y ~ X | Z
would decompose the aggregate composition
effect into the contribution of covariate(s) X and the one of covariate(s) Z,
respectively.
In this two-fold sequential decomposition, we have the detailed composition effects
and
which sum up to the aggregate composition effect .
is defined as above. It captures the contribution of all
covariates (i.e., X and Z). In contrast,
corresponds
to the statistic of the counterfactual distribution isolating the contribution
of covariate(s) X in contrast to the one of covariate(s) Z.
If right_to_left=TRUE
, then the counterfactual is defined as
where is the conditional distribution of X given Z of
group 1 and
the distribution of Z. If
right_to_left=FALSE
,
we have
Note that it is possible to specify the detailed models in every part of formula
.
This is useful if you want to estimate in every step a fully saturated model,
e.g., Y ~ X * Z | Z
. If not further specified, the variables are
additively included in the model used to derived the aggregate reweighting
factor.
The detailed decomposition terms are path-dependent. The results depend on the sequence
the covariates enter the decomposition (e.g, Y ~ X | Z
yields different
detailed decomposition terms than Y ~ Z | X
) . Even for the same sequence,
the results differ depending on the 'direction' of the decomposition. In
the example above using right_to_left=TRUE
, the contribution of Z is evaluated
using the conditional distribution of X given Z from group 0. If we use
right_to_left=FALSE
instead, the same contribution is evaluated using
the conditional distribution from group 1.
Per default, the distributional statistics for which the between group differences are decomposed are quantiles, the mean, the variance, the Gini coefficient and the interquantile range between the 9th and the 1st decile, the 9th decile and the median, and the median and the first decile, respectively. The interquantile ratios between the same quantiles are implemented, as well.
The quantiles can be specified by probs
that sets the corresponding
probabilities of the quantiles of interest. For other distributional statistics,
please use custom_statistic_function
The function bootstraps standard errors and derives a bootstrapped Kolmogorov-Smirnov distribution to construct uniform confindence bands. The Kolmogorov-Smirnov distribution is estimated as in Chen et al. (2017).
an object of class dfl_decompose
containing a data.frame with the
decomposition results for the quantiles and for the other distributional
statistics, respectively, a data.frame with the estimated reweighting factor
for every observation, a data.frame with sample quantiles of the reweighting
factors and a list with standard errors for the decomposition terms, the
quantiles of the reweighting factor, the bootstrapped
Kolmogorov-Smirnov distribution to construct uniform confidence bands for
quantiles, as well as a list with the normalized differences between the
covariate means of the comparison group and the reweighted reference group.
Chen, Mingli, Victor Chernozhukov, Iván Fernández-Val, and Blaise Melly. 2017. "Counterfactual: An R Package for Counterfactual Analysis." *The R Journal* 9(1): 370-384.
DiNardo, John, Nicole M. Fortin, and Thomas Lemieux. 1996. "Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach." Econometrica, 64(5), 1001-1044.
Firpo, Sergio P., Nicole M. Fortin, and Thomas Lemieux. 2018. "Decomposing Wage Distributions Using Recentered Influence Function Regressions." Econometrics 6(2), 28.
Fortin, Nicole M., Thomas Lemieux, and Sergio Firpo. 2011. "Decomposition methods in economics." In Orley Ashenfelter and David Card, eds., Handbook of Labor Economics. Vol. 4. Elsevier, 1-102.
Firpo, Sergio P., and Cristine Pinto. 2016. "Identification and Estimation of Distributional Impacts of Interventions Using Changes in Inequality Measures." Journal of Applied Econometrics, 31(3), 457-486.
Huber, Martin, Michael Lechner, and Conny Wunsch. 2013. "The performance of estimators based on the propensity score." Journal of Econometrics, 175(1), 1-21.
## Example from handbook chapter of Fortin, Lemieux, and Firpo (2011: 67) ## with a sample of the original data data("men8305") flf_model <- log(wage) ~ union * (education + experience) + education * experience # Reweighting sample from 1983-85 flf_male_inequality <- dfl_decompose(flf_model, data = men8305, weights = weights, group = year ) # Summarize results summary(flf_male_inequality) # Plot decomposition of quantile differences plot(flf_male_inequality) # Use alternative reference group (i.e., reweight sample from 2003-05) flf_male_inequality_reference_0305 <- dfl_decompose(flf_model, data = men8305, weights = weights, group = year, reference_0 = FALSE ) summary(flf_male_inequality_reference_0305) # Bootstrap standard errors (using smaller sample for the sake of illustration) set.seed(123) flf_male_inequality_boot <- dfl_decompose(flf_model, data = men8305[1:1000, ], weights = weights, group = year, bootstrap = TRUE, bootstrap_iterations = 100, cores = 1 ) # Get standard errors and confidence intervals summary(flf_male_inequality_boot) # Plot quantile differences with pointwise confidence intervals plot(flf_male_inequality_boot) # Plot quantile differences with uniform confidence intervals plot(flf_male_inequality_boot, uniform_bands = TRUE) ## Sequential decomposition # Here we distinguish the contribution of education and experience # from the contribution of unionization conditional on education and experience. model_sequential <- log(wage) ~ union * (education + experience) + education * experience | education * experience # First variant: # Contribution of union is evaluated using composition of # education and experience from 2003-2005 (group 1) male_inequality_sequential <- dfl_decompose(model_sequential, data = men8305, weights = weights, group = year ) # Summarize results summary(male_inequality_sequential) # Second variant: # Contribution of union is evaluated using composition of # education and experience from 1983-1985 (group 0) male_inequality_sequential_2 <- dfl_decompose(model_sequential, data = men8305, weights = weights, group = year, right_to_left = FALSE ) # Summarize results summary(male_inequality_sequential_2) # The domposition effects associated with (conditional) unionization for deciles cbind( male_inequality_sequential$decomposition_quantiles$prob, male_inequality_sequential$decomposition_quantiles$`Comp. eff. X1|X2`, male_inequality_sequential_2$decomposition_quantiles$`Comp. eff. X1|X2` ) ## Trim observations with weak common support ## (i.e. observations with relative factor weights > \sqrt(N)/N) set.seed(123) data_weak_common_support <- data.frame( d = factor(c( c("A", "A", rep("B", 98)), c(rep("A", 90), rep("B", 10)) )), group = rep(c(0, 1), each = 100) ) data_weak_common_support$y <- ifelse(data_weak_common_support$d == "A", 1, 2) + data_weak_common_support$group + rnorm(200, 0, 0.5) decompose_results_trimmed <- dfl_decompose(y ~ d, data_weak_common_support, group = group, trimming = TRUE ) identical( decompose_results_trimmed$trimmed_observations, which(data_weak_common_support$d == "A") ) ## Pass a custom statistic function to decompose income share of top 10% top_share <- function(dep_var, weights, top_percent = 0.1) { threshold <- Hmisc::wtd.quantile(dep_var, weights = weights, probs = 1 - top_percent) share <- sum(weights[which(dep_var > threshold)] * dep_var[which(dep_var > threshold)]) / sum(weights * dep_var) return(share) } flf_male_inequality_custom_stat <- dfl_decompose(flf_model, data = men8305, weights = weights, group = year, custom_statistic_function = top_share ) summary(flf_male_inequality_custom_stat)
## Example from handbook chapter of Fortin, Lemieux, and Firpo (2011: 67) ## with a sample of the original data data("men8305") flf_model <- log(wage) ~ union * (education + experience) + education * experience # Reweighting sample from 1983-85 flf_male_inequality <- dfl_decompose(flf_model, data = men8305, weights = weights, group = year ) # Summarize results summary(flf_male_inequality) # Plot decomposition of quantile differences plot(flf_male_inequality) # Use alternative reference group (i.e., reweight sample from 2003-05) flf_male_inequality_reference_0305 <- dfl_decompose(flf_model, data = men8305, weights = weights, group = year, reference_0 = FALSE ) summary(flf_male_inequality_reference_0305) # Bootstrap standard errors (using smaller sample for the sake of illustration) set.seed(123) flf_male_inequality_boot <- dfl_decompose(flf_model, data = men8305[1:1000, ], weights = weights, group = year, bootstrap = TRUE, bootstrap_iterations = 100, cores = 1 ) # Get standard errors and confidence intervals summary(flf_male_inequality_boot) # Plot quantile differences with pointwise confidence intervals plot(flf_male_inequality_boot) # Plot quantile differences with uniform confidence intervals plot(flf_male_inequality_boot, uniform_bands = TRUE) ## Sequential decomposition # Here we distinguish the contribution of education and experience # from the contribution of unionization conditional on education and experience. model_sequential <- log(wage) ~ union * (education + experience) + education * experience | education * experience # First variant: # Contribution of union is evaluated using composition of # education and experience from 2003-2005 (group 1) male_inequality_sequential <- dfl_decompose(model_sequential, data = men8305, weights = weights, group = year ) # Summarize results summary(male_inequality_sequential) # Second variant: # Contribution of union is evaluated using composition of # education and experience from 1983-1985 (group 0) male_inequality_sequential_2 <- dfl_decompose(model_sequential, data = men8305, weights = weights, group = year, right_to_left = FALSE ) # Summarize results summary(male_inequality_sequential_2) # The domposition effects associated with (conditional) unionization for deciles cbind( male_inequality_sequential$decomposition_quantiles$prob, male_inequality_sequential$decomposition_quantiles$`Comp. eff. X1|X2`, male_inequality_sequential_2$decomposition_quantiles$`Comp. eff. X1|X2` ) ## Trim observations with weak common support ## (i.e. observations with relative factor weights > \sqrt(N)/N) set.seed(123) data_weak_common_support <- data.frame( d = factor(c( c("A", "A", rep("B", 98)), c(rep("A", 90), rep("B", 10)) )), group = rep(c(0, 1), each = 100) ) data_weak_common_support$y <- ifelse(data_weak_common_support$d == "A", 1, 2) + data_weak_common_support$group + rnorm(200, 0, 0.5) decompose_results_trimmed <- dfl_decompose(y ~ d, data_weak_common_support, group = group, trimming = TRUE ) identical( decompose_results_trimmed$trimmed_observations, which(data_weak_common_support$d == "A") ) ## Pass a custom statistic function to decompose income share of top 10% top_share <- function(dep_var, weights, top_percent = 0.1) { threshold <- Hmisc::wtd.quantile(dep_var, weights = weights, probs = 1 - top_percent) share <- sum(weights[which(dep_var > threshold)] * dep_var[which(dep_var > threshold)]) / sum(weights * dep_var) return(share) } flf_male_inequality_custom_stat <- dfl_decompose(flf_model, data = men8305, weights = weights, group = year, custom_statistic_function = top_share ) summary(flf_male_inequality_custom_stat)
The function resamples observations and restimates the DFL decomposition with the new sample.
dfl_decompose_bootstrap( formula, dep_var, data_used, weights, group_variable, reference_group, estimate_statistics, statistics, probs, custom_statistic_function, right_to_left, trimming, trimming_threshold, ... )
dfl_decompose_bootstrap( formula, dep_var, data_used, weights, group_variable, reference_group, estimate_statistics, statistics, probs, custom_statistic_function, right_to_left, trimming, trimming_threshold, ... )
formula |
|
dep_var |
dependent variable |
data_used |
|
weights |
weights variable |
group_variable |
group variable |
reference_group |
reference_group to be reweighted |
estimate_statistics |
boolean: if |
statistics |
a character vector that defines the distributional statistics for which the decomposition is performed. |
probs |
a vector of length 1 or more with the probabilities of the quantiles to be estimated. |
custom_statistic_function |
a function estimating a custom distributional statistic that will be decomposed. |
right_to_left |
determines the direction of a sequential decomposition. |
trimming |
boolean: If |
trimming_threshold |
numeric: threshold defining the maximal accepted
relative weight of the reweighting factor value (i.e., inverse probability weight)
of a single observation. If |
... |
other parameters passed to the function estimating the conditional probabilities. |
This function performs the DFL decomposition. It derives the reweighting factors, estimates the distributional statistics and calculates the decomposition terms.
dfl_decompose_estimate( formula, dep_var, data_used, weights, group_variable, reference_group, method, estimate_statistics, statistics, probs, custom_statistic_function, right_to_left, trimming, trimming_threshold, return_model, estimate_normalized_difference, ... )
dfl_decompose_estimate( formula, dep_var, data_used, weights, group_variable, reference_group, method, estimate_statistics, statistics, probs, custom_statistic_function, right_to_left, trimming, trimming_threshold, return_model, estimate_normalized_difference, ... )
formula |
|
dep_var |
dependent variable |
data_used |
|
weights |
weights variable |
group_variable |
group variable |
reference_group |
reference_group to be reweighted |
method |
method used to estimate conditional probabilities |
estimate_statistics |
boolean: if |
statistics |
a character vector that defines the distributional statistics for which the decomposition is performed. |
probs |
a vector of length 1 or more with the probabilities of the quantiles to be estimated. |
custom_statistic_function |
a function estimating a custom distributional statistic that will be decomposed. |
right_to_left |
determines the direction of a sequential decomposition. |
trimming |
boolean: If |
trimming_threshold |
numeric: threshold defining the maximal accepted
relative weight of the reweighting factor value (i.e., inverse probability weight)
of a single observation. If |
return_model |
boolean: If |
estimate_normalized_difference |
boolean: If |
... |
other parameters passed to the function estimating the conditional probabilities. |
Interquantile range
estimate_iq_range(dep_var, weights, probs = c(0.1, 0.9))
estimate_iq_range(dep_var, weights, probs = c(0.1, 0.9))
dep_var |
numeric vector of outcome variable |
weights |
numeric vector of weights |
probs |
a vector with probabilities whose range defines the interquantile range |
a numeric value indicating the (weighted) interquantile range
Interquantile ratio
estimate_iq_ratio(dep_var, weights, probs = c(0.1, 0.9))
estimate_iq_ratio(dep_var, weights, probs = c(0.1, 0.9))
dep_var |
numeric vector of outcome variable |
weights |
numeric vector of weights |
probs |
a vector with probabilities whose range defines the interquantile range |
a numeric value indicating the (weighted) interquantile ratio
The function performs the linear Oaxaca-Blinder decomposition.
estimate_ob_decompose( formula, data_used, reference_0, normalize_factors, compute_analytical_se, return_model_fit, reweighting, rifreg, rifreg_statistic, rifreg_probs, custom_rif_function, na.action, vcov, ... )
estimate_ob_decompose( formula, data_used, reference_0, normalize_factors, compute_analytical_se, return_model_fit, reweighting, rifreg, rifreg_statistic, rifreg_probs, custom_rif_function, na.action, vcov, ... )
formula |
|
data_used |
|
reference_0 |
boolean: indicating if group 0 is the reference group and if its coefficients are used to compute the counterfactual mean. |
normalize_factors |
boolean: If 'TRUE', then factor variables are normalized as proposed by Gardeazabal/Ugidos (2004) |
compute_analytical_se |
boolean: If 'TRUE', then analytical standard errors for decomposition terms are calculated (assuming independence between groups). |
return_model_fit |
boolean: If 'TRUE', then model objects are returned. |
reweighting |
boolean: if 'TRUE', then the decomposition is performed with with respect to reweighted reference group. |
rifreg |
boolean: if 'TRUE', then RIF decomposition is performed |
rifreg_statistic |
string containing the distributional statistic for which to compute the RIF. |
rifreg_probs |
a vector of length 1 or more with probabilities of quantiles. |
custom_rif_function |
the RIF function to compute the RIF of the custom distributional statistic. |
na.action |
generic function that defines how NAs in the data should be handled. |
vcov |
unction estimating covariance matrix of regression coefficients if |
... |
additional parameters passed to custom_rif_function |
This function fits a binary choice model and predicts probabilities for every observations.
fit_and_predict_probabilities( formula, data_used, weights, method = "logit", return_model = FALSE, newdata = NULL, ... )
fit_and_predict_probabilities( formula, data_used, weights, method = "logit", return_model = FALSE, newdata = NULL, ... )
formula |
|
data_used |
|
weights |
weights variable |
method |
method to estimate conditional probabilities |
return_model |
boolean: If |
newdata |
|
... |
other parameters passed to the estimation function. |
Estimate weighted distributional statistics for the reference or the counterfactual group.
get_distributional_statistics( dep_var, weights, group_variable, group, statistics, custom_statistic_function = NULL, probs = 1:9/10, log_transformed )
get_distributional_statistics( dep_var, weights, group_variable, group, statistics, custom_statistic_function = NULL, probs = 1:9/10, log_transformed )
dep_var |
vector of outcome variable |
weights |
vector of observations weights |
group_variable |
vector of group assignment |
group |
identifier of group for which distributional statistics are calculated |
statistics |
vector of statistics to be calculated |
custom_statistic_function |
a custom statistic function to be evaluated |
probs |
probabilities of quantiles to be calculated |
log_transformed |
indicator if outcome variable is log transformed |
The function calculates normalized differences between covariate means of comparison group and reweighted reference group.
get_normalized_difference( formula, data_used, weights, psi, group_variable, reference_group )
get_normalized_difference( formula, data_used, weights, psi, group_variable, reference_group )
formula |
model formula used to calulate the conditional probabilities of the reweighting factor |
data_used |
|
weights |
vector with observations weights |
psi |
vector with the estimated reweighting factor |
group_variable |
variable with group identifier |
reference_group |
identifier of (reweighted) reference group |
Imbens, Guido W. and Jeffrey M. Wooldridge. 2009. Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47, no. 1: 5-86.
The function performs the normalization of the factor variables proposed by Gardeazabal and Ugidos (2004, GU) to estimate detailed decompositions that do not depend on the chosen reference levels of the factor variables.
GU_normalization(formula, data, weights, group)
GU_normalization(formula, data, weights, group)
formula |
an object of class "formula". See lm for further details. |
data |
a data frame containing the variables in the model. |
weights |
numeric vector of non-negative observation weights, hence of same length as |
group |
name of the a binary variable (numeric or factor) identifying the two groups that will be compared. |
a list containing the adjusted formula, adjusted data, adjusted coefficient names, and the normalized regressors for prediction and the
Gardeazabal, Javier, and Arantza Ugidos. 2004. "More on identification in detailed wage decompositions." Review of Economics and Statistics 86(4): 1034-1036.
data("men8305") mod1 <- log(wage) ~ union + married + nonwhite + education + experience normalized_data <- GU_normalization( formula = mod1, data = men8305, weights = weights, group = year )
data("men8305") mod1 <- log(wage) ~ union + married + nonwhite + education + experience normalized_data <- GU_normalization( formula = mod1, data = men8305, weights = weights, group = year )
This function constructs sums the coefficients of each factor variable to construct a additional coefficients for their originally left-out reference levels and adds them to the estimated coefficients vector.
GU_normalization_get_coefficients(coef_names, est_coef)
GU_normalization_get_coefficients(coef_names, est_coef)
coef_names |
list with coefficients of every factor variable that need to be adjusted |
est_coef |
vector of estimated coefficients |
This function adjusts the covariance matrix for the additional coefficients of the originally left-out reference levels of all factor variable.
GU_normalization_get_vcov(coef_names, Cov_beta)
GU_normalization_get_vcov(coef_names, Cov_beta)
coef_names |
list with coefficients of every factor variable that need to be adjusted |
Cov_beta |
estimated covariance matrix of the regression coefficients |
This function sums the coefficients of a single factor variable to construct an additional coefficient for the left-out reference level.
GU_normalization_sum_coefficients(coef_names, est_coef)
GU_normalization_sum_coefficients(coef_names, est_coef)
coef_names |
names of the dummy coefficients of a factor variable |
est_coef |
estimated coefficient vector |
This function adjusts the covariance matrix for the additional coefficient of the originally left-out reference level of a single factor variable.
GU_normalization_sum_vcov(coef_names, Cov_beta)
GU_normalization_sum_vcov(coef_names, Cov_beta)
coef_names |
names of the dummy coefficients of a factor variable |
Cov_beta |
estimated covariance matrix of the regression coefficients |
A sample of the the Merged Outgoing Rotation Group of the Current Population Survey from 1983 to 1985 and 2003 to 2005, respectively, used as example by Fortin, Lemieux & Firpo (2011) in their handbook chapter. The data set contains a selection of 8 variables and a sample of 40,347 observations of male workers (i.e., a tenth of the origninal data set).
men8305
men8305
A data frame with 40,347 rows and 8 variables.
Hourly wage in US dollars at constant prices
Union status indicator
Factor variable with 6 education levels: high-school graduates (reference), elementary, high-school dropouts , some college, college graduates, post college graduates
Factor variable with 9 potential experience levels, each of five years gap, 20 to 24 years as reference level)
Married indicator
Non-white indicator
Indicator distinguishing pooled observations from the 1983 to 1985 period and those from 2003 to 2005
CPS sample weights
Fortin, Nicole M., Thomas Lemieux, and Firpo Segio. 2011. "Decomposition Methods in Economics." In Orley Ashenfelter and David Card, eds., Handbook of Labor Economics, Volume 4a., Chapter 1, 1-102.
Sample of National Longitudinal Survey (NLSY) 79 containig wage data from the year 2000 of workers who were aged 35 to 43 in that year. The data is from O'Neill and O'Neill (2006) and is used as an illustration of the Oxaca-Blinder mean decomposition in Firpo, Fortin, and Lemieuex (2011). The data contains 2655 male and 2654 female observations, respectively.
nlys00
nlys00
A data frame with 5,396 rows and 15 variables.
Female indicator
Hourly wage in US dollars
Age in years
Central city indicator
Metropolitan statistical area (MSA) indicator
Factor variable disinguishing 4 large regions
Black indicator
Hispanic indicator
Factor variable indicating highest attained education
Percentile score of armed force qualification test (AFTQ) devided by 10
Family responsibility indicator
Years worked in cilivian labor force
Years worked in military
Share of years worked in part-time
Factor variable identifying 4 industries
Fortin, Nicole M., Thomas Lemieux, and Firpo Segio. 2011. "Decomposition Methods in Economics." In Orley Ashenfelter and David Card, eds., Handbook of Labor Economics, Volume 4a., Chapter 1, 1-102.
ob_decompose
implements the Oaxaca-Blinder decomposition that
divides differences in the mean outcome between two groups into one part explained
by different covariate means (composition effect) and into another part due to
differences in linear regression coefficients linking covariates to the outcome
variable (structure effect).
The function allows for 'doubly robust' decompositions where the sample of one group is reweighted such that it matches the covariates distribution of the other group before the regression coefficients are estimated.
For distributional statistics beyond the mean, the function performs the RIF regression decomposition proposed by Firpo, Fortin, and Lemieux (2018).
ob_decompose( formula, data, group, weights = NULL, reweighting = FALSE, normalize_factors = FALSE, reference_0 = TRUE, subtract_1_from_0 = FALSE, reweighting_method = "logit", trimming = FALSE, trimming_threshold = NULL, rifreg_statistic = NULL, rifreg_probs = c(1:9)/10, custom_rif_function = NULL, na.action = na.omit, bootstrap = FALSE, bootstrap_iterations = 100, bootstrap_robust = FALSE, cluster = NULL, cores = 1, vcov = stats::vcov, ... )
ob_decompose( formula, data, group, weights = NULL, reweighting = FALSE, normalize_factors = FALSE, reference_0 = TRUE, subtract_1_from_0 = FALSE, reweighting_method = "logit", trimming = FALSE, trimming_threshold = NULL, rifreg_statistic = NULL, rifreg_probs = c(1:9)/10, custom_rif_function = NULL, na.action = na.omit, bootstrap = FALSE, bootstrap_iterations = 100, bootstrap_robust = FALSE, cluster = NULL, cores = 1, vcov = stats::vcov, ... )
formula |
a |
data |
a data frame containing the variables in the model. |
group |
name of the a binary variable (numeric or factor) identifying the two groups that will be compared. The group identified by the lower ranked value in 'group' (i.e., 0 in the case of a dummy variable or the first level of factor variable) is defined as group 0. |
weights |
numeric vector of non-negative observation weights, hence of same length as |
reweighting |
boolean: if 'TRUE', then the decomposition is performed with with respect to reweighted reference group yielding either a 'doubly robust' Oaxaca-Blinder decomposition or a reweighted RIF decomposition. |
normalize_factors |
boolean: If 'TRUE', then factor variables are normalized as
proposed by Gardeazabal/Ugidos (2004) and results are not dependent on the factor's
reference group. Per default ( |
reference_0 |
boolean: if 'TRUE' (default), then the group 0 – i.e.,
the group identified by the lower ranked value in 'group' – will be defined
as reference group. The reference group will be reweighted to match the
covariates distribution of the counterfactual sample.
By default, the composition effect is computed as |
subtract_1_from_0 |
boolean: By default ('FALSE'), X0 is subtracted from X1 and beta0 from beta1 (X1b1 - X0b0)
to compute the overall difference. Setting 'subtract_1_from_0' to 'TRUE' merely changes the sign of the decomposition results.
This means the composition effect is computed as |
reweighting_method |
specifies the method fit and predict conditional probabilities
used to derive the reweighting factor. Currently, |
trimming |
boolean: If |
trimming_threshold |
numeric: threshold defining the maximal accepted
relative weight of the reweighting factor value (i.e., inverse probability weight)
of a single observation. If |
rifreg_statistic |
string containing the distributional statistic for which to compute the RIF.
If 'NULL' (default), no RIF regression decomposition is computed.
If an available statistic is selected, 'ob_decompose' estimates a RIF regression decomposition.
The 'rifreg_statistic' can be one of
"quantiles", "mean", "variance", "gini", "interquantile_range", "interquantile_ratio", or "custom".
If "custom" is selected, a |
rifreg_probs |
a vector of length 1 or more with probabilities of quantiles. Each quantile is indicated with a value between 0 and 1.
Default is |
custom_rif_function |
the RIF function to compute the RIF of the custom distributional statistic.
Default is NULL. Only needs to be provided if |
na.action |
generic function that defines how NAs in the data should be handled.
Default is |
bootstrap |
boolean: If 'FALSE' (default), then no bootstrapped standard errors are calculated and, in the case of a standard Oaxaca-Blinder decomposition, analytical standard errors are estimated (assuming independence between groups). |
bootstrap_iterations |
positive integer indicating the number of bootstrap
iterations to execute. Only required if |
bootstrap_robust |
boolean: if 'FALSE' (default), then bootstrapped standard errors are estimated as the standard deviations of the bootstrapp estimates. Otherwise, the function uses the bootstrap interquartile range rescaled by the interquantile range of the standard distribution to estimate standard errors. |
cluster |
numeric vector of same length as |
cores |
positive integer indicating the number of cores to use when
computing bootstrap standard errors. Only required if |
vcov |
function estimating covariance matrix of regression coefficients if
standard errors are not bootstrapped (i.e., |
... |
additional parameters passed to the custom_rif_function. Apart from dep_var, weights and probs they must have a different name than the the ones in rifreg. For instance, if you want to pass a parameter statistic to the custom_rif_function, name it custom_statistic. Additional parameters can also be passed to the density function used to estimate the RIF of quantiles. |
ob_decompose()
contains for four different decomposition methods of
observed group differences.
1. The original Oaxaca-Blinder decomposition (default)
2. A 'doubly robust Oaxaca-Blinder decomposition (reweighting=TRUE
)
3. A RIF Regression decomposition. (e.g., rifreg_statistic="quantiles"
)
4. A reweighted RIF regression decomposition. (reweighting=TRUE
and rifreg_statistic="quantiles"
)
The doubly robust OB decomposition is a robust and path independent alternative for detailed decompositions at the mean. is to combine reweighting with the linear Oaxaca-Blinder method (see Fortin et al., 2011: 48-51). This approach has the valuable side effect of accounting for potential errors introduced by an incomplete inverse probability weighting and the linear model specification, respectively.
A path independent method that goes beyond the mean is the RIF decomposition
of Firpo, Fortin, and Lemieux (2018). The approach approximates the expected value
of the 'recentered influence function' (RIF) of the distributional statistic
(e.g., quantile, variance, or Gini coefficient) of an outcome variable
conditional on covariates with linear regressions. RIF regression coefficients can
be consistent estimates of the marginal effect
of a small change in the expected value of a covariate to the distributional statistics of
an outcome variable (see documentation of the companion package rifreg
).
Thus, they can be used to decompose between-group difference in distributional statistics.
Firpo et al. (2018) combine the RIF regressions again with the reweighting estimator to avoid specification errors.
an object of class ob_decompose
containing a data.frame with the
decomposition results for the quantiles and for the other distributional
statistics, respectively, a data.frame with the estimated reweighting factor
for every observation, a data.frame with sample quantiles of the reweighting
factors and a list with standard errors for the decomposition terms, the
quantiles of the reweighting factor, the bootstrapped
Kolmogorov-Smirnov distribution to construct uniform confidence bands for
quantiles, as well as a list with the normalized differences between the
covariate means of the comparison group and the reweighted reference group.
A list object of class 'ob_decompose' containing the following components:
- 'ob_decompose': A list containing the decomposition results, covariance matrix, model fits and more detailed result information.
- 'group_variable_name': A string indicating the name of the group variable.
- 'group_variable_levels': A string indicating the levels of the group variable.
- 'reference_group': A string indicating the which level of the group variable was used as reference group.
- 'reweighting_estimates': A list containing the reweighting estimates if reweighting=TRUE
, else (NA
)
- 'input_parameters': A list of input parameters used for the estimation.
Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. 2018. "Decomposing Wage Distributions Using Recentered Influence Function Regressions." Econometrics, 6(2):28.
Fortin, Nicole, Thomas Lemieux, and Sergio Firpo. 2011. "Decomposition methods in economics." In Orley Ashenfelter and David Card, eds., Handbook of labor economics. Vol. 4. Elsevier, 1-102.
Gardeazabal, Javier, and Arantza Ugidos. 2004. "More on identification in detailed wage decompositions." Review of Economics and Statistics, 86(4): 1034-1036.
## Oaxaca-Blinder decomposition of gender wage gap ## with NLYS79 data like in Fortin, Lemieux, & Firpo (2011: 41) data("nlys00") mod1 <- log(wage) ~ age + central_city + msa + region + black + hispanic + education + afqt + family_responsibility + years_worked_civilian + years_worked_military + part_time + industry # Using female coefficients (reference_0 = TRUE) to estimate counterfactual mean decompose_female_as_reference <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = TRUE ) decompose_female_as_reference # Using male coefficients (reference_0 = FALSE) decompose_male_as_reference <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = FALSE ) decompose_male_as_reference # Replicate first and third column in Table 3 in Fortin, Lemieux, & Firpo (2011: 41) # Define aggregation of decomposition terms custom_aggregation <- list( `Age, race, region, etc.` = c( "age", "blackyes", "hispanicyes", "regionNorth-central", "regionSouth", "regionWest", "central_cityyes", "msayes" ), `Education` = c( "education<10 yrs", "educationHS grad (diploma)", "educationHS grad (GED)", "educationSome college", "educationBA or equiv. degree", "educationMA or equiv. degree", "educationPh.D or prof. degree" ), `AFTQ` = "afqt", `L.T. withdrawal due to family` = "family_responsibility", `Life-time work experience` = c( "years_worked_civilian", "years_worked_military", "part_time" ), `Industrial sectors` = c( "industryManufacturing", "industryEducation, Health, Public Admin.", "industryOther services" ) ) # First column summary(decompose_male_as_reference, custom_aggregation = custom_aggregation) # Third column summary(decompose_female_as_reference, custom_aggregation = custom_aggregation) ## Compare bootstrapped standard errors... decompose_female_as_reference_bs <- ob_decompose( formula = mod1, data = nlys00, group = female, bootstrap = TRUE, bootstrap_iterations = 100 ) summary(decompose_female_as_reference_bs, custom_aggregation = custom_aggregation) # ... to analytical standard errors (assuming independence between groups and # homoscedasticity) decompose_female_as_reference <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = TRUE ) summary(decompose_female_as_reference, custom_aggregation = custom_aggregation) # Return standard errors for all detailed terms summary(decompose_female_as_reference, aggregate_factors = FALSE) ## 'Doubly robust' Oaxaca-Blinder decomposition of gender wage gap mod2 <- log(wage) ~ age + central_city + msa + region + black + hispanic + education + afqt + family_responsibility + years_worked_civilian + years_worked_military + part_time + industry | age + (central_city + msa) * region + (black + hispanic) * (education + afqt) + family_responsibility * (years_worked_civilian + years_worked_military) + part_time * industry decompose_male_as_reference_robust <- ob_decompose( formula = mod2, data = nlys00, group = female, reference_0 = FALSE, reweighting = TRUE ) # ... using random forests instead of logit to estimate weights decompose_male_as_reference_robust_rf <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = FALSE, reweighting = TRUE, method = "random_forest" ) # Reweighted RIF Regression Decomposition data("men8305") model_rifreg <- log(wage) ~ union + education + experience | union * (education + experience) + education * experience # Variance variance_decomposition <- ob_decompose( formula = model_rifreg, data = men8305, group = year, reweighting = TRUE, rifreg_statistic = "variance" ) # Deciles deciles_decomposition <- ob_decompose( formula = model_rifreg, data = men8305, group = year, reweighting = TRUE, rifreg_statistic = "quantiles", rifreg_probs = c(1:9) / 10 ) # plot(deciles_decomposition) # RIF regression decomposition with custom function # custom function custom_variance_function <- function(dep_var, weights, probs = NULL) { weighted_mean <- weighted.mean(x = dep_var, w = weights) rif <- (dep_var - weighted_mean)^2 rif <- data.frame(rif, weights) names(rif) <- c("rif_variance", "weights") return(rif) } custom_decomposition <- ob_decompose( formula = model_rifreg, data = men8305, group = year, reweighting = TRUE, rifreg_statistic = "custom", custom_rif_function = custom_variance_function )
## Oaxaca-Blinder decomposition of gender wage gap ## with NLYS79 data like in Fortin, Lemieux, & Firpo (2011: 41) data("nlys00") mod1 <- log(wage) ~ age + central_city + msa + region + black + hispanic + education + afqt + family_responsibility + years_worked_civilian + years_worked_military + part_time + industry # Using female coefficients (reference_0 = TRUE) to estimate counterfactual mean decompose_female_as_reference <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = TRUE ) decompose_female_as_reference # Using male coefficients (reference_0 = FALSE) decompose_male_as_reference <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = FALSE ) decompose_male_as_reference # Replicate first and third column in Table 3 in Fortin, Lemieux, & Firpo (2011: 41) # Define aggregation of decomposition terms custom_aggregation <- list( `Age, race, region, etc.` = c( "age", "blackyes", "hispanicyes", "regionNorth-central", "regionSouth", "regionWest", "central_cityyes", "msayes" ), `Education` = c( "education<10 yrs", "educationHS grad (diploma)", "educationHS grad (GED)", "educationSome college", "educationBA or equiv. degree", "educationMA or equiv. degree", "educationPh.D or prof. degree" ), `AFTQ` = "afqt", `L.T. withdrawal due to family` = "family_responsibility", `Life-time work experience` = c( "years_worked_civilian", "years_worked_military", "part_time" ), `Industrial sectors` = c( "industryManufacturing", "industryEducation, Health, Public Admin.", "industryOther services" ) ) # First column summary(decompose_male_as_reference, custom_aggregation = custom_aggregation) # Third column summary(decompose_female_as_reference, custom_aggregation = custom_aggregation) ## Compare bootstrapped standard errors... decompose_female_as_reference_bs <- ob_decompose( formula = mod1, data = nlys00, group = female, bootstrap = TRUE, bootstrap_iterations = 100 ) summary(decompose_female_as_reference_bs, custom_aggregation = custom_aggregation) # ... to analytical standard errors (assuming independence between groups and # homoscedasticity) decompose_female_as_reference <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = TRUE ) summary(decompose_female_as_reference, custom_aggregation = custom_aggregation) # Return standard errors for all detailed terms summary(decompose_female_as_reference, aggregate_factors = FALSE) ## 'Doubly robust' Oaxaca-Blinder decomposition of gender wage gap mod2 <- log(wage) ~ age + central_city + msa + region + black + hispanic + education + afqt + family_responsibility + years_worked_civilian + years_worked_military + part_time + industry | age + (central_city + msa) * region + (black + hispanic) * (education + afqt) + family_responsibility * (years_worked_civilian + years_worked_military) + part_time * industry decompose_male_as_reference_robust <- ob_decompose( formula = mod2, data = nlys00, group = female, reference_0 = FALSE, reweighting = TRUE ) # ... using random forests instead of logit to estimate weights decompose_male_as_reference_robust_rf <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = FALSE, reweighting = TRUE, method = "random_forest" ) # Reweighted RIF Regression Decomposition data("men8305") model_rifreg <- log(wage) ~ union + education + experience | union * (education + experience) + education * experience # Variance variance_decomposition <- ob_decompose( formula = model_rifreg, data = men8305, group = year, reweighting = TRUE, rifreg_statistic = "variance" ) # Deciles deciles_decomposition <- ob_decompose( formula = model_rifreg, data = men8305, group = year, reweighting = TRUE, rifreg_statistic = "quantiles", rifreg_probs = c(1:9) / 10 ) # plot(deciles_decomposition) # RIF regression decomposition with custom function # custom function custom_variance_function <- function(dep_var, weights, probs = NULL) { weighted_mean <- weighted.mean(x = dep_var, w = weights) rif <- (dep_var - weighted_mean)^2 rif <- data.frame(rif, weights) names(rif) <- c("rif_variance", "weights") return(rif) } custom_decomposition <- ob_decompose( formula = model_rifreg, data = men8305, group = year, reweighting = TRUE, rifreg_statistic = "custom", custom_rif_function = custom_variance_function )
The function calculates the decomposition terms of the linear
Oaxaca-Blinder decomposition based on the estimated OLS
coefficients and the respective model.matrix
.
ob_decompose_calculate_terms( beta0, beta1, X0, X1, weights0, weights1, reference_0 )
ob_decompose_calculate_terms( beta0, beta1, X0, X1, weights0, weights1, reference_0 )
beta0 |
vector of estimated coefficients of group 0 |
beta1 |
vector of estimated coefficients of group 1 |
X0 |
|
X1 |
|
weights0 |
vector of observation weights of group 0 |
weights1 |
vector of observation weights of group 1 |
reference_0 |
boolean: indicating if group 0 is the reference group and if its coefficients are used to compute the counterfactual mean. |
The function calculate the covariance matrix for the decomposition terms of the linear Oaxaca-Blinder decomposition assuming independence between groups.
ob_decompose_calculate_vcov( beta0, beta1, X0, X1, weights0, weights1, Cov_beta0, Cov_beta1, reference_0 )
ob_decompose_calculate_vcov( beta0, beta1, X0, X1, weights0, weights1, Cov_beta0, Cov_beta1, reference_0 )
beta0 |
vector of estimated coefficients of group 0 |
beta1 |
vector of estimated coefficients of group 1 |
X0 |
|
X1 |
|
weights0 |
vector of observation weights of group 0 |
weights1 |
vector of observation weights of group 1 |
Cov_beta0 |
estimated covariance matrix of coefficients of group 0 |
Cov_beta1 |
estimated covariance matrix of coefficients of group 1 |
reference_0 |
boolean: indicating if group 0 is the reference group and if its coefficients are used to compute the counterfactual mean. |
Jann, Ben, 2005. "Standard errors for the Blinder-Oaxaca decomposition." *3rd German Stata Users’ Group Meeting 2005*. Available from [https://boris.unibe.ch/69506/1/oaxaca_se_handout.pdf](https://boris.unibe.ch/69506/1/oaxaca_se_handout.pdf).
The function plots decomposition terms for quantiles estimated
with dfl_decompose
over the unit interval.
## S3 method for class 'dfl_decompose' plot( x, ..., confidence_bands = TRUE, confidence_level = 0.95, uniform_bands = FALSE )
## S3 method for class 'dfl_decompose' plot( x, ..., confidence_bands = TRUE, confidence_level = 0.95, uniform_bands = FALSE )
x |
an object of class "dfl_decompose", usually, a result of a call to [dfl_decompose()] with [statistics = "quantiles"]. |
... |
other parameters to be passed through to plot function. |
confidence_bands |
If 'TRUE' (default) and if standard errors have been bootstrapped, confidence bands are plotted. |
confidence_level |
numeric value between 0 and 1 (default = 0.95) that defines the confidence interval
plotted as a ribbon and defined as |
uniform_bands |
If 'FALSE' (default), pointwise confidence bands are computed. Otherwise, uniform bands are constructed based on the bootstrapped Kolmogrov-Smirnov statistic (see summary.dfl_decompose). |
a ggplot illustrating the decomposition terms for quantiles.
data("men8305") flf_model <- log(wage) ~ union * (education + experience) + education * experience flf_male_inequality <- dfl_decompose(flf_model, data = men8305, weights = weights, group = year ) plot(flf_male_inequality)
data("men8305") flf_model <- log(wage) ~ union * (education + experience) + education * experience flf_male_inequality <- dfl_decompose(flf_model, data = men8305, weights = weights, group = year ) plot(flf_male_inequality)
The function plots decomposition terms for quantiles estimtated
with ob_decompose
over the unit interval.
## S3 method for class 'ob_decompose' plot( x, ..., detailed_effects = TRUE, aggregate_factors = TRUE, custom_aggregation = NULL, confidence_bands = FALSE, confidence_level = 0.95 )
## S3 method for class 'ob_decompose' plot( x, ..., detailed_effects = TRUE, aggregate_factors = TRUE, custom_aggregation = NULL, confidence_bands = FALSE, confidence_level = 0.95 )
x |
an object of class "ob_decompose", usually, a result of a call to [ob_decompose()] with [statistics = "quantiles"]. |
... |
other parameters to be passed through to plot function. |
detailed_effects |
If 'TRUE' (default), then the detailed effects are plotted. Otherwise only the total (aggregate) effects are plotted. |
aggregate_factors |
boolean, if 'TRUE' (default) terms associated with detailed factor levels are aggregated to a single term for every factor variable. |
custom_aggregation |
list specifying the aggregation of detailed decomposition terms. The parameter 'custom_aggregation' overrides the parameter 'aggregate_factors'. If 'NULL' (default), then either all detailed terms or all terms associated with a single variable are returned. |
confidence_bands |
If 'TRUE' and if standard errors have been bootstrapped, confidence bands are plotted. |
confidence_level |
numeric value between 0 and 1 (default = 0.95) that defines the confidence interval
plotted as a ribbon and defined as |
a ggplot illustrating the decomposition terms for quantiles.
data("nlys00") mod1 <- log(wage) ~ age + central_city + msa + region + black + hispanic + education + afqt + family_responsibility + years_worked_civilian + years_worked_military + part_time + industry # plotting RIF regression decomposition of deciles decompose_rifreg_deciles <- ob_decompose( formula = mod1, data = nlys00, group = female, reweighting = TRUE, rifreg_statistic = "quantiles", bootstrap = TRUE, bootstrap_iterations = 50, reference_0 = FALSE ) plot(decompose_rifreg_deciles) plot(decompose_rifreg_deciles, confidence_bands = TRUE ) # plotting Oaxaca-Blinder decomposition decompose_ob_mean <- ob_decompose( formula = mod1, data = nlys00, group = female, reweighting = TRUE, bootstrap = FALSE, reference_0 = FALSE ) plot(decompose_ob_mean) plot(decompose_ob_mean, detailed_effects = FALSE) # With custom aggregation custom_aggregation <- list( `Age, race, region, etc.` = c( "age", "blackyes", "hispanicyes", "regionNorth-central", "regionSouth", "regionWest", "central_cityyes", "msayes" ), `Education` = c( "education<10 yrs", "educationHS grad (diploma)", "educationHS grad (GED)", "educationSome college", "educationBA or equiv. degree", "educationMA or equiv. degree", "educationPh.D or prof. degree" ), `AFTQ` = "afqt", `L.T. withdrawal due to family` = "family_responsibility", `Life-time work experience` = c( "years_worked_civilian", "years_worked_military", "part_time" ), `Industrial sectors` = c( "industryManufacturing", "industryEducation, Health, Public Admin.", "industryOther services" ) ) plot(decompose_ob_mean, custom_aggregation = custom_aggregation)
data("nlys00") mod1 <- log(wage) ~ age + central_city + msa + region + black + hispanic + education + afqt + family_responsibility + years_worked_civilian + years_worked_military + part_time + industry # plotting RIF regression decomposition of deciles decompose_rifreg_deciles <- ob_decompose( formula = mod1, data = nlys00, group = female, reweighting = TRUE, rifreg_statistic = "quantiles", bootstrap = TRUE, bootstrap_iterations = 50, reference_0 = FALSE ) plot(decompose_rifreg_deciles) plot(decompose_rifreg_deciles, confidence_bands = TRUE ) # plotting Oaxaca-Blinder decomposition decompose_ob_mean <- ob_decompose( formula = mod1, data = nlys00, group = female, reweighting = TRUE, bootstrap = FALSE, reference_0 = FALSE ) plot(decompose_ob_mean) plot(decompose_ob_mean, detailed_effects = FALSE) # With custom aggregation custom_aggregation <- list( `Age, race, region, etc.` = c( "age", "blackyes", "hispanicyes", "regionNorth-central", "regionSouth", "regionWest", "central_cityyes", "msayes" ), `Education` = c( "education<10 yrs", "educationHS grad (diploma)", "educationHS grad (GED)", "educationSome college", "educationBA or equiv. degree", "educationMA or equiv. degree", "educationPh.D or prof. degree" ), `AFTQ` = "afqt", `L.T. withdrawal due to family` = "family_responsibility", `Life-time work experience` = c( "years_worked_civilian", "years_worked_military", "part_time" ), `Industrial sectors` = c( "industryManufacturing", "industryEducation, Health, Public Admin.", "industryOther services" ) ) plot(decompose_ob_mean, custom_aggregation = custom_aggregation)
print method for class "dfl_decompose"
## S3 method for class 'dfl_decompose' print(x, ...)
## S3 method for class 'dfl_decompose' print(x, ...)
x |
an object of class "dfl_decompose", usually , a result of a call to [dfl_decompose()]. |
... |
other parameters to be passed through to printing functions. |
The function print.dfl_decompose()
displays the decompositions terms saved in x
.
print method for class "ob_decompose"
## S3 method for class 'ob_decompose' print(x, ...)
## S3 method for class 'ob_decompose' print(x, ...)
x |
an object of class "ob_decompose", usually , a result of a call to [ob_decompose()]. |
... |
other parameters to be passed through to printing functions. |
The function print.ob_decompose()
displays the decompositions terms saved in x
.
This function implements the trimming rule proposed by Huber, Lechner, and Wunsch (2014). Observations above the trimming threshold are trimmed in the reference group and in the comparison group. Per default, the timming is set to sqrt(N)/N, where N is the number of observation in the reweighted reference group. The function returns vector index of observation to be trimmed.
select_observations_to_be_trimmed( reweighting_factor, group_variable, group, trimming_threshold = NULL )
select_observations_to_be_trimmed( reweighting_factor, group_variable, group, trimming_threshold = NULL )
reweighting_factor |
Estimated reweigting factor |
group_variable |
Variable identifying the reference and comparison group, respectively. |
group |
Identifier of reference group |
trimming_threshold |
threshold defining the maximal accepted relative weight of a reweighting factor/observation. If 'NULL', the threshold is set to 'sqrt(N)/N', where N is the number of observations in the reference group. |
summary method for class "dfl_decompose"
## S3 method for class 'dfl_decompose' summary(object, ..., confidence_level = 0.95, digits = 4)
## S3 method for class 'dfl_decompose' summary(object, ..., confidence_level = 0.95, digits = 4)
object |
an object of class "dfl_decompose", a result of a call to [dfl_decompose()]. |
... |
other parameters to be passed through to printing functions. |
confidence_level |
numeric value between 0 and 1 (default = 0.95) that defines the confidence level of the printed confidence intervals. |
digits |
number of digits to be printed. |
If standard errors were bootstrapped, standard
errors and confidence bands are given. Pointwise confidences bands are defined
as qnorm((1-confidence_level)/2)
* standard error. Uniform bands
are constructed by multiplying the standard error with confidence_level
-quantile
of the bootstrapped Kolmogorov-Smirnov statistic as in Chen et al. (2017).
The function summary.dfl_decompose()
displays the decompositions
terms save in object
. The function further returns a list with the displayed
decomposition terms and, if standard errors were bootstrapped, the corresponding
standard errors and confindence bands.
Chen, Mingli, Victor Chernozhukov, Iván Fernández-Val, and Blaise Melly. 2017. "Counterfactual: An R Package for Counterfactual Analysis." The R Journal 9(1): 370-384.
Apart from displaying the (detailed) decomposition results with standard
errors, summary.ob_decompose()
allows to customize the aggregation of the
detailed decomposition terms.
## S3 method for class 'ob_decompose' summary( object, ..., aggregate_factors = TRUE, custom_aggregation = NULL, confidence_level = 0.95 )
## S3 method for class 'ob_decompose' summary( object, ..., aggregate_factors = TRUE, custom_aggregation = NULL, confidence_level = 0.95 )
object |
an object of class "ob_decompose", usually , a result of a call to [ob_decompose()]. |
... |
other parameters to be passed through to summary function. |
aggregate_factors |
boolean, if 'TRUE' (default) terms associated with detailed factor levels are aggregated to a single term for every factor variable. |
custom_aggregation |
list specifying the aggregation of detailed decomposition terms. The parameter 'custom_aggregation' overrides the parameter 'aggregate_factors'. If 'NULL' (default), then either all detailed terms or all terms associated with a single variable are returned. |
confidence_level |
numeric value between 0 and 1 (default = 0.95) that defines the printed confidence interval. |
The function summary.ob_decompose()
summarizes the decompositions terms saved in object
.
data("nlys00") mod1 <- log(wage) ~ age + education + years_worked_civilian + years_worked_military + part_time + industry decompose_results <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = TRUE ) # Print standard errors summary(decompose_results) # Aggregate decomposition terms associated with factor levels summary(decompose_results, aggregate_factors = TRUE) # custom aggregation of decomposition terms custom_aggregation <- list( `Age` = c("age"), `Education` = c( "education<10 yrs", "educationHS grad (diploma)", "educationHS grad (GED)", "educationSome college", "educationBA or equiv. degree", "educationMA or equiv. degree", "educationPh.D or prof. degree" ), `Life-time work experience` = c( "years_worked_civilian", "years_worked_military", "part_time" ), `Industrial sectors` = c( "industryManufacturing", "industryEducation, Health, Public Admin.", "industryOther services" ) ) summary(decompose_results, custom_aggregation = custom_aggregation)
data("nlys00") mod1 <- log(wage) ~ age + education + years_worked_civilian + years_worked_military + part_time + industry decompose_results <- ob_decompose( formula = mod1, data = nlys00, group = female, reference_0 = TRUE ) # Print standard errors summary(decompose_results) # Aggregate decomposition terms associated with factor levels summary(decompose_results, aggregate_factors = TRUE) # custom aggregation of decomposition terms custom_aggregation <- list( `Age` = c("age"), `Education` = c( "education<10 yrs", "educationHS grad (diploma)", "educationHS grad (GED)", "educationSome college", "educationBA or equiv. degree", "educationMA or equiv. degree", "educationPh.D or prof. degree" ), `Life-time work experience` = c( "years_worked_civilian", "years_worked_military", "part_time" ), `Industrial sectors` = c( "industryManufacturing", "industryEducation, Health, Public Admin.", "industryOther services" ) ) summary(decompose_results, custom_aggregation = custom_aggregation)