We will use the callback
package to analyze hiring
discrimination based on gender and origin in France. The experiment was
conducted in 2009 for the jobs of software developers (Petit et al.,
2013). The data is available in the data frame inter1
.
Let’s examine its contents.
library(callback)
data(inter1)
str(inter1)
#> 'data.frame': 2480 obs. of 11 variables:
#> $ offer : Factor w/ 310 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 2 2 ...
#> $ firstn : Factor w/ 8 levels "Abdallah","Amadou",..: 7 4 5 6 1 2 3 8 1 2 ...
#> $ lastn : Factor w/ 8 levels "Bertrand","Diallo",..: 5 3 4 7 8 2 1 6 8 2 ...
#> $ origin : Factor w/ 4 levels "F","M","S","V": 1 3 2 4 2 3 1 4 2 3 ...
#> $ sentorder: int 3 7 6 2 1 5 4 8 8 4 ...
#> $ gender : Factor w/ 2 levels "Man","Woman": 2 2 2 2 1 1 1 1 1 1 ...
#> $ callback : logi TRUE TRUE TRUE TRUE FALSE FALSE ...
#> $ paris : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
#> $ cont : Factor w/ 2 levels "LTC","STC": 1 1 1 1 1 1 1 1 1 1 ...
#> $ ansorder : int 1 2 3 4 9 9 9 9 9 9 ...
#> $ date : Factor w/ 3 levels "April 2009","February 2009",..: 2 2 2 2 2 2 2 2 2 2 ...
The first variable, offer
, is very important. It
indicates the job offer identification. It is important because, in
order to test discrimination, the workers must candidate on the same job
offer. This is the cluster
parameter of the
callback()
function. With cluster = "offer"
we
are sure that all the computations will be paired, which means that we
will always compare the candidates on the very same job offer. This is
essential to produce meaningful results since otherwise the difference
of answers could come from the differences of recruiters and not from
the differences in gender or origin.
The second important variables are the ones that define the
candidates. Here, there are two variables : gender
and
origin
. These are factors and the reference levels of these
factors implicitly define the reference candidate. Which one? By
convention, it is the candidate that is the least susceptible of being
discriminated against. Here, the reference candidate would be male
because male candidates should not be discriminated against because of
their gender, and from a French origin because French origin candidates
should not be discriminated against in the French labor market because
they have a French origin. In practice, we will check that this
candidate really had the highest callback rate. We can find the
reference levels of our two factors by looking at the first level given
by the levels()
function.
There are two genders : “Man” (reference) and “Woman”. There are four
origins : French (F, reference), Moroccan (M), Senegalese (S) and
Vietnamese (V). You do not need to aggregate the two candidates’
variables gender and origin to use callback()
, it will do
it for you.
The last element we need is, obviously, the outcome of the job hiring
application. It is given by the callback
variable. It is a
Boolean variable, TRUE when the recruiter gives a non negative callback,
and FALSE otherwise.
We can know launch the callback()
function, which
prepares the data for statistical analysis. Here we need to choose the
comp
parameter. Indeed, we realize that there are 8
candidates so that 8 x 7/2 = 28 comparisons are possible. This is a
large number and this is why callback()
performs the
statistical analysis according to the reference candidate by default
with comp = "ref"
. This reduces our analysis to 7
comparisons. You could get the 28 comparisons by choosing
comp = "all"
instead.
m <- callback(data = inter1, cluster = "offer", candid = c("gender","origin"), callback = "callback")
The m
object contains the formatted data needed for the
analysis. Using print()
gives the mains characteristics of
the experiment :
print(m)
#>
#> Structure of the experiment
#> ---------------------------
#>
#> Candidates defined by: gender origin
#> Callback variable: callback
#>
#> Number of tests for each candidate:
#>
#> Man.F Man.M Man.S Man.V Woman.F Woman.M Woman.S Woman.V
#> 310 310 310 310 310 310 310 310
#>
#>
#> Number of tests for each pair of candidates:
#>
#> Man.F.vs.Man.M Man.F.vs.Man.S Man.F.vs.Man.V Man.F.vs.Woman.F Man.F.vs.Woman.M
#> 310 310 310 310 310
#> Man.F.vs.Woman.S Man.F.vs.Woman.V
#> 310 310
#>
#>
#> Number of tests with all the candidates: 310
We find that the experiment is standard, in the sense that all the
candidates were sent to all the applications. Notice that this is not
needed to use callback
, it will work fine if there are less
candidates. However, when more than one candidate of the same type are
send to a test, the most favorable answer is kept (the “max” rule). The
reader is informed that there are other ways to deal with this
issue.
We can also take a look at the global callback rates of the candidates, by entering :
print(stat_glob(m))
#>
#> Global callback rates
#> ---------------------
#>
#> Wilson confidence intervals at the 95 percent level
#>
#> inf p_callback sup
#> Man.F 0.22902951 0.27741935 0.3314330
#> Man.M 0.16658655 0.20967742 0.2601261
#> Man.S 0.10322749 0.13870968 0.1834024
#> Man.V 0.08923308 0.12258065 0.1655700
#> Woman.F 0.18130574 0.22580645 0.2772501
#> Woman.M 0.07270828 0.10322581 0.1439115
#> Woman.S 0.05654641 0.08387097 0.1219046
#> Woman.V 0.15780504 0.20000000 0.2498025
and get a graphical representation with :
It is possible to change the definition of the confidence intervals, the confidence level and the color of the plot. If you prefer the Clopper-Pearson definition, a 90% confidence interval and a “steelblue3” color enter :
s <- stat_glob(m,level=0.9)
print(s,method="cp")
#>
#> Global callback rates
#> ---------------------
#>
#> Clopper-Pearson confidence intervals at the 90 percent level
#>
#> inf p_callback sup
#> Man.F 0.23570047 0.27741935 0.3223433
#> Man.M 0.17222662 0.20967742 0.2513276
#> Man.S 0.10749071 0.13870968 0.1752003
#> Man.V 0.09312657 0.12258065 0.1575588
#> Woman.F 0.18721293 0.22580645 0.2683608
#> Woman.M 0.07612170 0.10322581 0.1361644
#> Woman.S 0.05943210 0.08387097 0.1144672
#> Woman.V 0.16327759 0.20000000 0.2410655
graph(s,method="cp",col="steelblue3")
When all the candidates are sent to all the tests, the previous figures may be used to measure discrimination. However, when there is a rotation of the candidates so that only a part of them is sent on each test, it could not be the case. For this reason, we prefer to use matched statistics, which only compares candidates that have been sent to the same tests.
In order to get the result of the discrimination tests, we will use
the stat_count
function. It can be saved into an object for
further exports, or printed. The following instruction:
does not produce any printed output, but saves an object
stat_count
into s
. We can get the statistics
with:
print(s)
#>
#> Callback counts:
#> ----------------
#> tests callback callback1 callback2 Neither Only 1 Only 2 Both
#> Man.F.vs.Man.M 310 106 86 65 204 41 20 45
#> Man.F.vs.Man.S 310 100 86 43 210 57 14 29
#> Man.F.vs.Man.V 310 97 86 38 213 59 11 27
#> Man.F.vs.Woman.F 310 113 86 70 197 43 27 43
#> Man.F.vs.Woman.M 310 96 86 32 214 64 10 22
#> Man.F.vs.Woman.S 310 97 86 26 213 71 11 15
#> Man.F.vs.Woman.V 310 111 86 62 199 49 25 37
#> Difference
#> Man.F.vs.Man.M 21
#> Man.F.vs.Man.S 43
#> Man.F.vs.Man.V 48
#> Man.F.vs.Woman.F 16
#> Man.F.vs.Woman.M 54
#> Man.F.vs.Woman.S 60
#> Man.F.vs.Woman.V 24
The callback counts describe the results of the paired experiments. The first column defines the comparison under the form “candidate 1 vs candidate 2”. Here “Man.F vs Woman.F” means that we compare French origin men and women. Out of 310 tests, 113 got at least one callback. The men got 86 callbacks and the women 70. The difference, called net discrimination, equals 16 callbacks. We can go further in the details thanks to the next columns. Out of 310 tests, neither candidate was called back in 197 of the job offers, 43 called only men, 27 called only women and 43 called both. Discrimination only occurs when a single candidate is called back. The net discrimination is thus 43-27=16. The corresponding line percentages are available with .
s$props
#> p_callback p_cand1 p_cand2 p_c00 p_c10 p_c01
#> Man.F.vs.Man.M 0.3419355 0.2774194 0.20967742 0.6580645 0.1322581 0.06451613
#> Man.F.vs.Man.S 0.3225806 0.2774194 0.13870968 0.6774194 0.1838710 0.04516129
#> Man.F.vs.Man.V 0.3129032 0.2774194 0.12258065 0.6870968 0.1903226 0.03548387
#> Man.F.vs.Woman.F 0.3645161 0.2774194 0.22580645 0.6354839 0.1387097 0.08709677
#> Man.F.vs.Woman.M 0.3096774 0.2774194 0.10322581 0.6903226 0.2064516 0.03225806
#> Man.F.vs.Woman.S 0.3129032 0.2774194 0.08387097 0.6870968 0.2290323 0.03548387
#> Man.F.vs.Woman.V 0.3580645 0.2774194 0.20000000 0.6419355 0.1580645 0.08064516
#> p_c11 p_cand_dif
#> Man.F.vs.Man.M 0.14516129 0.06774194
#> Man.F.vs.Man.S 0.09354839 0.13870968
#> Man.F.vs.Man.V 0.08709677 0.15483871
#> Man.F.vs.Woman.F 0.13870968 0.05161290
#> Man.F.vs.Woman.M 0.07096774 0.17419355
#> Man.F.vs.Woman.S 0.04838710 0.19354839
#> Man.F.vs.Woman.V 0.11935484 0.07741935
Now, we can pass to the proportions analysis. We can save the output
or print it, like in the previous example. Printing is the default.
There are three ways to compute proportions in discrimination studies.
First, you can divide the number of callbacks by the number of tests. We
call it “matched callback rates” given by the function
stat_mcr()
. Second, you can restrict your analysis to the
tests which got at least one callback. We call it “total callback
shares”, given by the function stat_tcs()
. Last you can
divide by the number of tests where only one candidate has been called
back. We call it “exclusive callback shares”, given by the function
stat_ecs()
. With the first convention, we get:
stat_mcr(m)
#>
#>
#> Equality of proportions - matched callback rates
#> ------------------------------------------------
#>
#> Fisher test:
#> p_cand_dif p_Fisher s_Fisher
#> Man.F.vs.Man.M 0.06774194 6.108741e-02 .
#> Man.F.vs.Man.S 0.13870968 2.859620e-05 ***
#> Man.F.vs.Man.V 0.15483871 1.886620e-06 ***
#> Man.F.vs.Woman.F 0.05161290 1.649424e-01
#> Man.F.vs.Woman.M 0.17419355 3.790180e-08 ***
#> Man.F.vs.Woman.S 0.19354839 3.229317e-10 ***
#> Man.F.vs.Woman.V 0.07741935 3.004301e-02 *
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.10 ' ' 1
#>
#> Chi-squared test:
#> p_cand_dif Pearson p_Pearson s_Pearson
#> Man.F.vs.Man.M 0.06774194 3.501885 6.129902e-02 .
#> Man.F.vs.Man.S 0.13870968 17.267087 3.247638e-05 ***
#> Man.F.vs.Man.V 0.15483871 22.268145 2.371075e-06 ***
#> Man.F.vs.Woman.F 0.05161290 1.927221 1.650627e-01
#> Man.F.vs.Woman.M 0.17419355 29.400702 5.885631e-08 ***
#> Man.F.vs.Woman.S 0.19354839 37.932719 7.322680e-10 ***
#> Man.F.vs.Woman.V 0.07741935 4.695087 3.024897e-02 *
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.10 ' ' 1
#>
#> Student test:
#> p_cand_dif Student p_Student s_Student
#> Man.F.vs.Man.M 0.06774194 2.716294 6.973512e-03 **
#> Man.F.vs.Man.S 0.13870968 5.323431 1.961932e-07 ***
#> Man.F.vs.Man.V 0.15483871 6.058490 3.990671e-09 ***
#> Man.F.vs.Woman.F 0.05161290 1.920642 5.569699e-02 .
#> Man.F.vs.Woman.M 0.17419355 6.708070 9.404184e-11 ***
#> Man.F.vs.Woman.S 0.19354839 7.140081 6.704916e-12 ***
#> Man.F.vs.Woman.V 0.07741935 2.821082 5.095962e-03 **
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.10 ' ' 1
The printing output includes three tests: the Fisher exact
independence test between the candidate type and the callback variable
the chi-squared test for the equality of the candidates’ callback rates,
and the asymptotic Student test for the equality of the candidates’
callback rates. A code indicates the significance of the difference,
with the same convention as in the lm()
function. We find
that all the differences are significant at the 5% level, except for the
two French origin candidates, whatever the test used. The associated
graphical representation is obtained by:
The colors can be changed with the option col
and the
definition of the confidence intervals with the option
method
.
There is a second graphical representation, that shows the confidence interval of both candidates. However, the reader must be warned that this representation can be misleading for the following reason. The crossing of the confidence intervals does not imply the equality of the proportions. The only correct representation is the previous one, given by default. To get a comparaison of the confidence interval, enter:
The statistics for the other conventions, total callback share and
exclusive callback shares, can be obtained by changing the function name
to stat_tcs()
and stat_ecs()
respectively. The
graphical representations for the differences are also similar. The only
difference is that, it is possible to have a representation of the total
or exclusive callback shares. For the total callback shares, we
have:
For the total callback shares, we get:
and for the exclusive callback shares, we get :