--- title: "An R package for discrimination measurement" author: "Emmanuel Duguet" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{An R package for discrimination measurement} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Discrimination tests In discrimination experiments, candidates are sent on the same test (e.g. job, house rental) and one examines whether they receive the same outcome. The number of non negative or explicitly positive answers are examined in details, looking for outcome differences. In what follows, we consider a test about the effect of gender and origin on the recruitment of software developers (`inter1` data set). The candidates can have a French, Moroccan, Senegalese or Vietnamese origin, suggested by their first and last names. ```{r} library(callback) m <- inter1 table(m$origin, m$lastn) table(m$origin, m$firstn) ``` The contents of the data set is: ```{r} str(m) ``` The `offer` variable is very important. It indicates the job offer identification. It is important because, in order to test discrimination, the workers must candidate on the *same* job offer. This is the `cluster` parameter of the `callback()` function. With `cluster = "offer"` we are sure that all the computations will be paired, which means that we will always compare the candidates on the very same job offer. This is essential to produce meaningful results since otherwise the difference of answers could come from the differences of recruiters and not from the differences in gender or origin. The second important variables are the ones that define the candidates. Here, there are two variables : the suggested origin (F for French, M for Moroccan, V for Vietnamese and S for Senegalese) and the gender. Combined together, they give the candidate variable that we use in the analysis. The `origin` and `gender` variables are factors and the reference levels of these factors implicitly define the reference candidate. By convention, the reference candidate is the one that is the less susceptible to be discriminated against. Here the French origin man should be taken because his French origin and gender should not be discrimination sources in the French labor market. In practice, we will check that this candidate really had the highest callback rate. We can find the reference levels of our factors by looking at the first level given by the `levels()` function. ```{r} levels(m$origin) levels(m$gender) ``` By default, the levels are ordered after their alphabetical ordering. It is pure chance that we find the French man as a reference. It can be changed with the `relevel` function. For instance, if one wants to take the woman as a reference, enter: ```{r} m$gender2 <- relevel(m$gender, ref = "Woman") levels(m$gender2) ``` and the new factor `gender2` has "Woman" as the reference. The last element we need is, obviously, the outcome of the job hiring application. It is given by the `callback` variable. It is a Boolean variable, TRUE when the recruiter gives a non negative callback (in this data set), and FALSE otherwise. We can know launch the `callback()` function, which prepares the data for statistical analysis. Here we need to choose the `comp` parameter. Indeed, we realize that there are $n=8$ candidates so that $n(n-1)/2=8\times 7/2=28$ comparisons are possible. This is a large number and this is why `callback()` performs the statistical analysis according to the reference candidate only by default with `comp = "ref"`. This reduces our analysis to $n-1=7$ comparisons. One can get the 28 comparisons by setting `comp = "all"` instead. ```{r} dtest <- callback( data = m, cluster = "offer", candid = c("origin", "gender"), callback = "callback" ) ``` The `dtest` object contains the formatted data needed for the callback analysis. Using `print()` gives the mains characteristics of the experiment : ```{r} print(dtest) ``` We find that the experiment is standard since all the candidates have been sent to all the tests. When a candidate of the same type is send several times to a test, the most favorable answer is kept (the "max" rule). The reader is informed that there are other ways to deal with this issue. # Global statistics We can take a look at the global callback rates of the candidates, by entering : ```{r} print(stat_raw(dtest)) ``` and get a graphical representation with : ```{r,fig.width=7,fig.height=4.4} plot(stat_raw(dtest)) ``` It is possible to change the definition of the confidence intervals, the confidence level and the colors in the plot. If you prefer the Clopper-Pearson definition, a 90% confidence interval, a "steelblue3" bar and a black confidence interval enter : ```{r,fig.width=7,fig.height=4.4} g <- stat_raw(dtest, level = 0.9,method="cp") print(g) plot(g, col = c("steelblue3","black")) ``` When all the candidates are sent to all the tests, the previous figures may be used to measure discrimination. However, when there is a rotation of the candidates so that only a part of them is sent on each test, it could not be the case. For this reason, we prefer to use *matched statistics*, which only compare candidates that have been sent to the same tests. Since we do pairwise comparisons, we will consider two candidates $1$ and $2$ that are send on the same test. There are four possible outcomes: no callback (denoted $0$ for both candidates), one of the two candidates is called back (denoted $1$ for the candidate called back, $0$ for the other), or both candidates are called back (denoted $1$ for both candidates). We count the corresponding cases and use the following notations: * $n_{00}$: no candidate is called back * $n_{10}$: candidate $1$ is called back only * $n_{01}$: candidate $2$ is called back only * $n_{11}$: both candidates are called back In order to get the result of the discrimination tests, we will use the `stat_count` function. It can be saved into an object for further exports, or printed. The following instruction: ```{r} sp <- stat_paired(dtest) ``` does not produce any printed output, but saves an object with class `stat_count` into `s`. We can get the statistics with: ```{r} print(sp) ``` The callback counts describe the results of the paired experiments. The first column defines the comparison under the form "candidate 1 vs candidate 2". Here "F.Man vs F.Woman" means that we compare French origin men ("F.Man") with the French origin woman ("F.Woman"). Out of 310 tests, 113 got at least one callback. The French origin men got 86 callbacks and the French origin women 70. The difference, called net discrimination, equals 86-70=16 callbacks. We can go further in the details thanks to the next columns. Out of 310 tests, neither candidate was called back in $n_{00}=197$ of the job offers, $n_{10}=43$ called only men, $n_{01}=27$ called only women and $n_{11}=43$ called both. Discrimination only occurs when a single candidate is called back. The net discrimination is thus $n_{10}-n_{01}=43-27=16$ (the "Difference" column). The corresponding line percentages are available with \code{s$props}. ```{r} sp$props ``` We can save the output or print it, like in the previous example. Printing is the default. # Matched statistics ## callback rates In fact, there are three ways that can be used to compute proportions in discrimination studies. First, you can divide the number of callbacks by the number of tests. We call it "matched callback rates" given by the function `stat_mcr()`. Second, you can restrict your analysis to the tests which got at least one callback. We call it "total callback shares", given by the function `stat_tcs()`. Last you can divide by the number of tests where only one candidate has been called back. We call it "exclusive callback shares", given by the function `stat_ecs()`. The callback rate of candidates $1$ and $2$, denoted respectively $p_1$ and $p_2$, are obtained by dividing the number of callbacks of each candidate by the total number of discrimination tests $n$: $$ \begin{align*} p_1 &= \frac{n_{10}+n_{11}}{n}\\ p_2 &= \frac{n_{01}+n_{11}}{n}\\ \text{with } n &= n_{00}+n_{10}+n_{01}+n_{11} \end{align*} $$ The absence of discrimination is measured by: $$p_1=p_2\Leftrightarrow n_{10}=n_{01}$$ The `stat_mcr()` function provides the proportions, the confidence intervals and the equality tests. By default, the level is 95% and can be changed with the `level`option. The Student definition is obtained with: ```{r} mcr <- stat_mcr(dtest) print(mcr) ``` and a corresponding plot with: ```{r,fig.width=7,fig.height=4.4} plot(mcr) ``` This represents the difference of proportions and their confidence intervals. Another plot is available, with the confidence intervals of the callback rate of the two candidates. However, the reader is informed that these confidence intervals with level $1-\alpha$ can be misleading because their crossing does not guarantee the equality of the callback rates at the $\alpha$ level. To get it anyway, enter: ```{r,fig.width=7,fig.height=4.4} plot(mcr, dif = FALSE) ``` The difference analysis is not available with the Clopper-Pearson intervals. # Total callback shares With the total callback shares approach, we restrict the analysis to the tests with at least one callback. The total callback shares of the candidates $1$ and $2$, denoted respectively $s_1$ and $s_2$ are defined by: $$ \begin{align*} s_1 &= \frac{n_{10}}{n_c}\\ s_2 &= \frac{n_{01}}{n_c}\\ n_c &= n_{10}+n_{01}+n_{11} \end{align*} $$ and the equal treatment test is: $$s_1=s_2\Leftrightarrow n_{10}=n_{01}$$ The total callback rates and the callback shares are related by the relationship: $$p_i=s_i\times \pi_c$$ where $\pi_c=n_c/n$ is the overall response rate to the study. This is equivalent to the previous approach, with a different normalization. Here as well, three tests are available: the Fisher independence test, the chi-squared tests and the asymptotic Student test. For the confidence intervals enter: ```{r} tcs <- stat_tcs(dtest) print(tcs) ``` with the graphical representation of the confidence intervals : ```{r,fig.width=7,fig.height=4.4} plot(tcs) ``` or with the representation of the callback percentages : ```{r,fig.width=7,fig.height=4.4} plot(tcs, dif = FALSE) ``` For the Fisher test, enter: ```{r} print(stat_tcs(dtest,method = "cp")) ``` in this case, notice that the statistic equals the p-value of the test. # exclusive callback shares This third approach considers discrimination cases only. They are defined by: $$ \begin{align*} e_1&=\frac{n_{10}}{n_d}\\ e_2&=\frac{n_{01}}{n_d}\\ n_d&=n_{10}+n_{01} \end{align*} $$ and the equal treatment case is obtained when: $$ e_1=e_2\Leftrightarrow n_{10}=n_{01} $$ which is equivalent to the two other approaches. We also have: $$p_i=e_i\times \pi_d$$ where $\pi_d=n_d/n$ is the sample discrimination proportion. The function to use is now `stat_ecs`. In order to get the difference test and graphic, use: ```{r} ecs <- stat_ecs(dtest) print(ecs) plot(ecs) ``` For the callback shares, enter : ```{r,fig.width=7,fig.height=4.4} plot(ecs, dif = FALSE) ``` For the Pearson test with the Wison intervals, enter: ```{r,fig.width=7,fig.height=4.4} ecs <- stat_ecs(dtest, method = "wilson") print(ecs) plot(ecs) ```