--- title: "Callback: Components Models" author: "Emmanuel Duguet" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Callback: Components Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Difference estimators In standard experiments, discrimination can be measured directly by the difference in callback proportions. When there are more than one source of discrimination, differences in differences may be used. Components models can be interpreted as a generalization of all these difference estimators. They can be used to test restrictions on the discrimination coefficients, and provide an optimal estimator when such restriction holds. Consider a simple gender discrimination example. Let $p_m$ the callback rate of a man and $p_f$ the callback probability of women. They can be written: \begin{align*} p_m &= p_0\\ p_f &= p_0 + \delta_g \end{align*} where $p_0$ is the benchmark probability and $\delta_g$ the gender discrimination coefficient. It is negative when there is discrimination. By convention, the candidate who does not suffer from discrimination is the benchmark candidate since it measures how the labor market should work. This is why we took the male candidate as the benchmark candidate: he should not suffer from gender discrimination. The probability of the female candidate is $p_f$ and it is the sum of the benchmark probability ($p_0$) and of the gender discrimination coefficient ($\delta_g$). Discrimination occurs whenever $\delta_g<0\Leftrightarrow p_f0$, having both characteristics lead to a smaller discrimination and we have the subadditive case. The solution to the previous system is: \begin{align*} \delta_o&=p_{mf}-p_{m\ell}\\ \delta_g&=p_{f\ell}-p_{m\ell}\\ \delta_{go}&=p_{ff}-p_{mf}-(p_{f\ell}-p_{m\ell}) \end{align*} The two discrimination coefficients are still measured by differences, but the interaction term requires a difference in differences estimation. More complex situation an arise. The previous model is said to be just identified because there is only one way to get from the callback probabilities to the discrimination coefficient. Sometimes, there are parameters restrictions, like $\delta_{go}=0$, and the system becomes overidentified. In this situation, there are several ways to retrieve the discrimination coefficients from the callback probabilities. Consider the case $\delta_{go}=0$. One can estimate, for example, $\delta_o$ in two ways: \begin{align*} \delta_0&=p_{mf}-p_{m\ell}\\ \delta_0&=p_{ff}-p_{f\ell}=p_0+\delta_o+\delta_g-(p_0+\delta_g) \end{align*} In such a case, one should use an optimal estimator that uses the redundancy of the constraint. Intuitively, it should provide some weighted average of the available definitions. # Components models Components models are useful when there is more than one source of discrimination. We consider that the discrimination coefficients $\delta=(\delta_o,\delta_g,\delta_{og})$ are the *parameters of interest*, since their estimation is the goal of the analysis, and that the callback proportions are the *auxiliary parameters* since they are the source of the estimation method. The constrained between the parameters of interest and the auxiliary parameters can be written: \begin{align*} \left( \begin{array}{c} p_{mf}-p_{m\ell}\\ p_{f\ell}-p_{m\ell}\\ p_{ff}-p_{m\ell} \end{array}\right) &= \left( \begin{array}{ccc} 1&0&0\\ 0&1&0\\ 1&1&1 \end{array}\right) \left( \begin{array}{ccc} \delta_o\\ \delta_g\\ \delta_{og} \end{array}\right) \Leftrightarrow \pi = A \delta \end{align*} where $\pi$ is the vector of the *auxilliary parameters* (here, the callback rates differences). The matrix $A$ is called the Boolean matrix because it simply indicates the presence or absence of a component in the callback rates differences. If $A$ is square and invertible, the system is just identified. Here, its determinant equals 1, so it is the case. Now, consider the constraint $\alpha_{go}=0$, the Boolean matrix is not square anymore: \begin{align*} \left( \begin{array}{c} p_{mf}-p_{m\ell}\\ p_{f\ell}-p_{m\ell}\\ p_{ff}-p_{m\ell} \end{array}\right) &= \left( \begin{array}{ccc} 1&0\\ 0&1\\ 1&1 \end{array}\right) \left( \begin{array}{ccc} \delta_o\\ \delta_g \end{array}\right) \Leftrightarrow \pi = A \delta \end{align*} and we should look at the column rank of the Boolean matrix. The two columns are linearly independent so that the model is overidentified. In the general case, get the following linear relationship: \begin{align*} \pi = A \delta \end{align*} with $A$ of full column rank. This will lead us to a minimum distance estimator (Asymptotic Least Squares). In practice, we do not observe $\pi$ but its estimate from the correspondence test, denoted $\hat{\pi}$, so that, there is an error term, denoted $\omega$ defined as: \begin{align*} \hat{\pi}=\pi+\omega\Leftrightarrow \hat{\pi}=A\delta +\omega \end{align*} with $\hat{\Omega}=\hat{\mathbb{V}}(\omega)=\hat{\mathbb{V}}(\hat{\pi})$, computable directly from the callback data. The Ordinary Least Squares estimator $\hat{b}$ gives the difference estimators when the system is just identified. \begin{align*} \hat{\delta}=(A^\prime A)^{-1}A^\prime \hat{\pi} \end{align*} and the optimal estimator $b^\ast$ should be used when the system is overidentified: \begin{align*} b^\ast&=(A^\prime\hat{\Omega}^{-1} A)^{-1}A^\prime \hat{\Omega}^{-1} \hat{\pi}. \end{align*} We also have the property that $\hat{b}=b^\ast$ when the system is just identified. # Application ## Standard case We use the data from an experiment about mobility, for the profession of management controled in the Paris area. The candidates can be women or men, and have either no driving license or both the car and moto licences. The data set is: ```{r} library(callback) m <- mobility1 str(m) ``` and the raw callback rates are: ```{r,fig.width=7,fig.height=4.4} c <- callback(data=m,cluster="offer",candid=c("gender","licenses"),callback="callback") r <- stat_raw(c) print(r) plot(r) ``` We see that women with no driving licenses were preferred to the other candidates. In order to investigate this issue, we write the following components model: \begin{align*} p_{m1}&=p_0\\ p_{m0}&=p_0+\delta_\ell\\ p_{f0}&=p_0+\delta_g+\delta_\ell+\delta_{g\ell}\\ p_{f1}&=p_0+\delta_g \end{align*} where $p_0$ is the benchmark probability, $\delta_\ell$ the penality for not having any license (if negative), $\delta_g$ the effect of gender and $\delta_{g\ell}$ the intersectionality parameter (when being a woman without license). Consider the model on the probability differences: \begin{align*} \left( \begin{array}{c} p_{m0}-p_{m1}\\ p_{f0}-p_{m1}\\ p_{f1}-p_{m1} \end{array}\right) = \left( \begin{array}{ccc} 1&0&0\\ 1&1&1\\ 0&1&0 \end{array}\right) \left( \begin{array}{ccc} \delta_\ell\\ \delta_g\\ \delta_{og} \end{array}\right) \end{align*} Before to create the components model, we check the reference levels of our factors : ```{r} levels(m$gender) levels(m$licenses) ``` so that the reference candidate is the man with no license. This won't work because our reference candidate is the man with both licenses. We have the make the callback object again after fixing this: ```{r} m2 <- m m2$licenses <- relevel(m2$licenses,ref="Yes") levels(m2$licenses) ``` Now, we can create the components model, an object with class `callback_comp`. The first equation is useless since it defines the benchmark candidate, and $p_0$ should not be indicated since it is present in all the equations. Starting with the second equation: ```{r} model <- list( c("licenses"), c("licenses","gender","inter"), c("gender")) cpm <- callback_comp(data = m2, cluster = "offer", candid = c("gender","licenses"), callback = "callback", model = model) ``` and the model is checked with the instructions: ```{r} print(cpm) ``` Since the model is identified, we can proceed to the estimation: ```{r} estim <- reg(cpm) print(estim) ``` The reference candidate had a 10% callback rate, licenses would have a negative effect and gender a positive effect. But we need to know which effects are significant: ```{r} summary(estim) ``` We see that the licenses do not have a significant effect when they are alone, but they do in interaction with gender ($\delta_{\ell g}<0$). Women would be discriminated against when they hold both driving licenses. In order to complete the study, we would like to drop the terms that are not significant. In a components model, this implies to regroup the candidates. ## Candidates grouping Droping $\delta_\ell$ from the model implies that: \begin{align*} p_{m1}=p_{m0} \end{align*} so that all the males candidates should be put together. The model becomes: \begin{align*} \left( \begin{array}{c} p_{f0}-p_{m}\\ p_{f1}-p_{m} \end{array}\right) &= \left( \begin{array}{cc} 1&1\\ 1&0 \end{array}\right) \left( \begin{array}{c} \delta_g\\ \delta_{og} \end{array}\right) \end{align*} We first redefine our data set with the new factor: ```{r} m2 <- m m2$cand <- as.factor(ifelse(m2$gender == "Man","m", ifelse(m2$licenses == "Yes","f1","f0"))) m2$cand <- relevel(m2$cand,ref = "m") levels(m2$cand) ``` and write the new components model: ```{r} model <- list( c("gender"),c("gender","inter")) cpm2 <- callback_comp(data = m2, cluster = "offer", candid = "cand", callback = "callback", model = model) summary(reg(cpm2)) ``` All the components are significant at the 10% level. While women experience an advantage over men in this job (+2.7%), they fully lose it when they hold both the motocycle and car driving licenses (-5.3%). # References Duguet E., du Parquet L., L’Horty Y., Petit P. (2018) "Counterproductive hiring discrimination against women: Evidence from a French correspondence test", International Journal of Manpower, Vol. 39 Issue: 1, pp.37-50, https://doi.org/10.1108/IJM-01-2016-0004 Duguet E., Le Gall R., L’Horty Y., Petit P. (2018) "How does labour market history influence the access to hiring interviews?", International Journal of Manpower, Vol. 39 Issue:4, pp.519-533, https://doi.org/10.1108/IJM-09-2017-0231