In discrimination experiments, candidates are sent on the same test
(e.g. job, house rental) and one examines whether they receive the same
outcome. The number of non negative or explicitly positive answers are
examined in details, looking for outcome differences. In what follows,
we consider a test about the effect of gender and origin on the
recruitment of software developers (inter1
data set). The
candidates can have a French, Moroccan, Senegalese or Vietnamese origin,
suggested by their first and last names.
library(callback)
m <- inter1
table(m$origin, m$lastn)
#>
#> Bertrand Diallo Diouf Kaidi Moreau Pham Tran Zalegh
#> F 310 0 0 0 310 0 0 0
#> M 0 0 0 310 0 0 0 310
#> S 0 310 310 0 0 0 0 0
#> V 0 0 0 0 0 310 310 0
table(m$origin, m$firstn)
#>
#> Abdallah Amadou Anthony Fatou Jamila Minh Trang Sophie Tien Hiep
#> F 0 0 310 0 0 0 310 0
#> M 310 0 0 0 310 0 0 0
#> S 0 310 0 310 0 0 0 0
#> V 0 0 0 0 0 310 0 310
The contents of the data set is:
str(m)
#> 'data.frame': 2480 obs. of 11 variables:
#> $ offer : Factor w/ 310 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 2 2 ...
#> $ firstn : Factor w/ 8 levels "Abdallah","Amadou",..: 7 4 5 6 1 2 3 8 1 2 ...
#> $ lastn : Factor w/ 8 levels "Bertrand","Diallo",..: 5 3 4 7 8 2 1 6 8 2 ...
#> $ origin : Factor w/ 4 levels "F","M","S","V": 1 3 2 4 2 3 1 4 2 3 ...
#> $ sentorder: int 3 7 6 2 1 5 4 8 8 4 ...
#> $ gender : Factor w/ 2 levels "Man","Woman": 2 2 2 2 1 1 1 1 1 1 ...
#> $ callback : logi TRUE TRUE TRUE TRUE FALSE FALSE ...
#> $ paris : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
#> $ cont : Factor w/ 2 levels "LTC","STC": 1 1 1 1 1 1 1 1 1 1 ...
#> $ ansorder : int 1 2 3 4 9 9 9 9 9 9 ...
#> $ date : Factor w/ 3 levels "April 2009","February 2009",..: 2 2 2 2 2 2 2 2 2 2 ...
The offer
variable is very important. It indicates the
job offer identification. It is important because, in order to test
discrimination, the workers must candidate on the same job
offer. This is the cluster
parameter of the
callback()
function. With cluster = "offer"
we
are sure that all the computations will be paired, which means that we
will always compare the candidates on the very same job offer. This is
essential to produce meaningful results since otherwise the difference
of answers could come from the differences of recruiters and not from
the differences in gender or origin.
The second important variables are the ones that define the
candidates. Here, there are two variables : the suggested origin (F for
French, M for Moroccan, V for Vietnamese and S for Senegalese) and the
gender. Combined together, they give the candidate variable that we use
in the analysis. The origin
and gender
variables are factors and the reference levels of these factors
implicitly define the reference candidate. By convention, the reference
candidate is the one that is the less susceptible to be discriminated
against. Here the French origin man should be taken because his French
origin and gender should not be discrimination sources in the French
labor market. In practice, we will check that this candidate really had
the highest callback rate. We can find the reference levels of our
factors by looking at the first level given by the levels()
function.
By default, the levels are ordered after their alphabetical ordering.
It is pure chance that we find the French man as a reference. It can be
changed with the relevel
function. For instance, if one
wants to take the woman as a reference, enter:
and the new factor gender2
has “Woman” as the reference.
The last element we need is, obviously, the outcome of the job hiring
application. It is given by the callback
variable. It is a
Boolean variable, TRUE when the recruiter gives a non negative callback
(in this data set), and FALSE otherwise.
We can know launch the callback()
function, which
prepares the data for statistical analysis. Here we need to choose the
comp
parameter. Indeed, we realize that there are n = 8 candidates so that n(n − 1)/2 = 8 × 7/2 = 28
comparisons are possible. This is a large number and this is why
callback()
performs the statistical analysis according to
the reference candidate only by default with comp = "ref"
.
This reduces our analysis to n − 1 = 7 comparisons. One can get
the 28 comparisons by setting comp = "all"
instead.
dtest <- callback(
data = m,
cluster = "offer",
candid = c("origin", "gender"),
callback = "callback"
)
The dtest
object contains the formatted data needed for
the callback analysis. Using print()
gives the mains
characteristics of the experiment :
print(dtest)
#>
#> Structure of the experiment
#> ---------------------------
#>
#> Candidates defined by: origin gender
#> Callback variable: callback
#>
#> Number of tests for each candidate:
#>
#> F.Man F.Woman M.Man M.Woman S.Man S.Woman V.Man V.Woman
#> 310 310 310 310 310 310 310 310
#>
#>
#> Number of tests for each pair of candidates:
#>
#> F.Man.vs.F.Woman F.Man.vs.M.Man F.Man.vs.M.Woman F.Man.vs.S.Man
#> 310 310 310 310
#> F.Man.vs.S.Woman F.Man.vs.V.Man F.Man.vs.V.Woman
#> 310 310 310
#>
#>
#> Number of tests with all the candidates: 310
We find that the experiment is standard since all the candidates have been sent to all the tests. When a candidate of the same type is send several times to a test, the most favorable answer is kept (the “max” rule). The reader is informed that there are other ways to deal with this issue.
We can take a look at the global callback rates of the candidates, by entering :
print(stat_raw(dtest))
#>
#> Proportions: raw callback rates
#> Confidence intervals: Student at 95 %
#>
#> tests callback inf_p_callback p_callback sup_p_callback
#> F.Man 310 86 0.22730239 0.27741935 0.3275363
#> F.Woman 310 70 0.17900426 0.22580645 0.2726086
#> M.Man 310 65 0.16411033 0.20967742 0.2552445
#> M.Woman 310 32 0.06916861 0.10322581 0.1372830
#> S.Man 310 43 0.10001944 0.13870968 0.1773999
#> S.Woman 310 26 0.05284271 0.08387097 0.1148992
#> V.Man 310 38 0.08587036 0.12258065 0.1592909
#> V.Woman 310 62 0.15522525 0.20000000 0.2447748
and get a graphical representation with :
It is possible to change the definition of the confidence intervals, the confidence level and the colors in the plot. If you prefer the Clopper-Pearson definition, a 90% confidence interval, a “steelblue3” bar and a black confidence interval enter :
g <- stat_raw(dtest, level = 0.9,method="cp")
print(g)
#>
#> Proportions: raw callback rates
#> Confidence intervals: Clopper-Pearson at 90 %
#>
#> tests callback inf_p_callback p_callback sup_p_callback
#> F.Man 310 86 0.23570047 0.27741935 0.3223433
#> F.Woman 310 70 0.18721293 0.22580645 0.2683608
#> M.Man 310 65 0.17222662 0.20967742 0.2513276
#> M.Woman 310 32 0.07612170 0.10322581 0.1361644
#> S.Man 310 43 0.10749071 0.13870968 0.1752003
#> S.Woman 310 26 0.05943210 0.08387097 0.1144672
#> V.Man 310 38 0.09312657 0.12258065 0.1575588
#> V.Woman 310 62 0.16327759 0.20000000 0.2410655
plot(g, col = c("steelblue3","black"))
When all the candidates are sent to all the tests, the previous figures may be used to measure discrimination. However, when there is a rotation of the candidates so that only a part of them is sent on each test, it could not be the case. For this reason, we prefer to use matched statistics, which only compare candidates that have been sent to the same tests.
Since we do pairwise comparisons, we will consider two candidates 1 and 2 that are send on the same test. There are four possible outcomes: no callback (denoted 0 for both candidates), one of the two candidates is called back (denoted 1 for the candidate called back, 0 for the other), or both candidates are called back (denoted 1 for both candidates). We count the corresponding cases and use the following notations:
In order to get the result of the discrimination tests, we will use
the stat_count
function. It can be saved into an object for
further exports, or printed. The following instruction:
does not produce any printed output, but saves an object with class
stat_count
into s
. We can get the statistics
with:
print(sp)
#>
#> Callback counts:
#> ----------------
#> tests callback disc callback1 Neither Only 1 Only 2 Both
#> F.Man vs F.Woman 310 113 70 86 70 197 43 27
#> F.Man vs M.Man 310 106 61 86 65 204 41 20
#> F.Man vs M.Woman 310 96 74 86 32 214 64 10
#> F.Man vs S.Man 310 100 71 86 43 210 57 14
#> F.Man vs S.Woman 310 97 82 86 26 213 71 11
#> F.Man vs V.Man 310 97 70 86 38 213 59 11
#> F.Man vs V.Woman 310 111 74 86 62 199 49 25
#> Difference calldif
#> F.Man vs F.Woman 43 16
#> F.Man vs M.Man 45 21
#> F.Man vs M.Woman 22 54
#> F.Man vs S.Man 29 43
#> F.Man vs S.Woman 15 60
#> F.Man vs V.Man 27 48
#> F.Man vs V.Woman 37 24
The callback counts describe the results of the paired experiments. The first column defines the comparison under the form “candidate 1 vs candidate 2”. Here “F.Man vs F.Woman” means that we compare French origin men (“F.Man”) with the French origin woman (“F.Woman”). Out of 310 tests, 113 got at least one callback. The French origin men got 86 callbacks and the French origin women 70. The difference, called net discrimination, equals 86-70=16 callbacks. We can go further in the details thanks to the next columns. Out of 310 tests, neither candidate was called back in n00 = 197 of the job offers, n10 = 43 called only men, n01 = 27 called only women and n11 = 43 called both. Discrimination only occurs when a single candidate is called back. The net discrimination is thus n10 − n01 = 43 − 27 = 16 (the “Difference” column). The corresponding line percentages are available with .
sp$props
#> p_callback p_cand1 p_cand2 p_c00 p_c10 p_c01
#> F.Man vs F.Woman 0.3645161 0.2774194 0.22580645 0.6354839 0.1387097 0.08709677
#> F.Man vs M.Man 0.3419355 0.2774194 0.20967742 0.6580645 0.1322581 0.06451613
#> F.Man vs M.Woman 0.3096774 0.2774194 0.10322581 0.6903226 0.2064516 0.03225806
#> F.Man vs S.Man 0.3225806 0.2774194 0.13870968 0.6774194 0.1838710 0.04516129
#> F.Man vs S.Woman 0.3129032 0.2774194 0.08387097 0.6870968 0.2290323 0.03548387
#> F.Man vs V.Man 0.3129032 0.2774194 0.12258065 0.6870968 0.1903226 0.03548387
#> F.Man vs V.Woman 0.3580645 0.2774194 0.20000000 0.6419355 0.1580645 0.08064516
#> p_c11 p_cand_dif
#> F.Man vs F.Woman 0.13870968 0.05161290
#> F.Man vs M.Man 0.14516129 0.06774194
#> F.Man vs M.Woman 0.07096774 0.17419355
#> F.Man vs S.Man 0.09354839 0.13870968
#> F.Man vs S.Woman 0.04838710 0.19354839
#> F.Man vs V.Man 0.08709677 0.15483871
#> F.Man vs V.Woman 0.11935484 0.07741935
We can save the output or print it, like in the previous example. Printing is the default.
In fact, there are three ways that can be used to compute proportions
in discrimination studies. First, you can divide the number of callbacks
by the number of tests. We call it “matched callback rates” given by the
function stat_mcr()
. Second, you can restrict your analysis
to the tests which got at least one callback. We call it “total callback
shares”, given by the function stat_tcs()
. Last you can
divide by the number of tests where only one candidate has been called
back. We call it “exclusive callback shares”, given by the function
stat_ecs()
.
The callback rate of candidates 1 and 2, denoted respectively p1 and p2, are obtained by dividing the number of callbacks of each candidate by the total number of discrimination tests n:
$$ \begin{align*} p_1 &= \frac{n_{10}+n_{11}}{n}\\ p_2 &= \frac{n_{01}+n_{11}}{n}\\ \text{with } n &= n_{00}+n_{10}+n_{01}+n_{11} \end{align*} $$
The absence of discrimination is measured by: p1 = p2 ⇔ n10 = n01
The stat_mcr()
function provides the proportions, the
confidence intervals and the equality tests. By default, the level is
95% and can be changed with the level
option. The Student
definition is obtained with:
mcr <- stat_mcr(dtest)
print(mcr)
#>
#> Proportions: matched callback rates
#> Confidence intervals: Student at 95 %
#>
#> tests inf_p_callback p_callback sup_p_callback inf_p_cand1
#> F.Man vs F.Woman 310 0.3106416 0.3645161 0.4183907 0.2273024
#> F.Man vs M.Man 310 0.2888373 0.3419355 0.3950337 0.2273024
#> F.Man vs M.Woman 310 0.2579222 0.3096774 0.3614326 0.2273024
#> F.Man vs S.Man 310 0.2702542 0.3225806 0.3749071 0.2273024
#> F.Man vs S.Woman 310 0.2610009 0.3129032 0.3648056 0.2273024
#> F.Man vs V.Man 310 0.2610009 0.3129032 0.3648056 0.2273024
#> F.Man vs V.Woman 310 0.3043985 0.3580645 0.4117306 0.2273024
#> p_cand1 sup_p_cand1 inf_p_cand2 p_cand2 sup_p_cand2
#> F.Man vs F.Woman 0.2774194 0.3275363 0.17900426 0.22580645 0.2726086
#> F.Man vs M.Man 0.2774194 0.3275363 0.16411033 0.20967742 0.2552445
#> F.Man vs M.Woman 0.2774194 0.3275363 0.06916861 0.10322581 0.1372830
#> F.Man vs S.Man 0.2774194 0.3275363 0.10001944 0.13870968 0.1773999
#> F.Man vs S.Woman 0.2774194 0.3275363 0.05284271 0.08387097 0.1148992
#> F.Man vs V.Man 0.2774194 0.3275363 0.08587036 0.12258065 0.1592909
#> F.Man vs V.Woman 0.2774194 0.3275363 0.15522525 0.20000000 0.2447748
#> inf_cand_dif p_cand_dif sup_cand_dif
#> F.Man vs F.Woman -0.001263807 0.05161290 0.1044896
#> F.Man vs M.Man 0.018669997 0.06774194 0.1168139
#> F.Man vs M.Woman 0.123097544 0.17419355 0.2252896
#> F.Man vs S.Man 0.087439176 0.13870968 0.1899802
#> F.Man vs S.Woman 0.140210120 0.19354839 0.2468867
#> F.Man vs V.Man 0.104550333 0.15483871 0.2051271
#> F.Man vs V.Woman 0.023420286 0.07741935 0.1314184
#>
#> Student test
#> statistic p_stat c_stat
#> F.Man vs F.Woman 1.920642 5.569699e-02 .
#> F.Man vs M.Man 2.716294 6.973512e-03 **
#> F.Man vs M.Woman 6.708070 9.404184e-11 ***
#> F.Man vs S.Man 5.323431 1.961932e-07 ***
#> F.Man vs S.Woman 7.140081 6.704916e-12 ***
#> F.Man vs V.Man 6.058490 3.990671e-09 ***
#> F.Man vs V.Woman 2.821082 5.095962e-03 **
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.10 ' ' 1
and a corresponding plot with:
This represents the difference of proportions and their confidence
intervals. Another plot is available, with the confidence intervals of
the callback rate of the two candidates. However, the reader is informed
that these confidence intervals with level 1 − α can be misleading because
their crossing does not guarantee the equality of the callback rates at
the α level. To get it anyway,
enter:
The difference analysis is not available with the Clopper-Pearson intervals.