Callback: getting started

Emmanuel Duguet

We will use the callback package to analyze hiring discrimination based on gender and origin in France. The experiment was conducted in 2009 for the jobs of software developers (Petit et al., 2013). The data is available in the data frame inter1. Let’s examine its contents.

library(callback)
data(inter1)
str(inter1)
#> 'data.frame':    2480 obs. of  11 variables:
#>  $ offer    : Factor w/ 310 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 2 2 ...
#>  $ firstn   : Factor w/ 8 levels "Abdallah","Amadou",..: 7 4 5 6 1 2 3 8 1 2 ...
#>  $ lastn    : Factor w/ 8 levels "Bertrand","Diallo",..: 5 3 4 7 8 2 1 6 8 2 ...
#>  $ origin   : Factor w/ 4 levels "F","M","S","V": 1 3 2 4 2 3 1 4 2 3 ...
#>  $ sentorder: int  3 7 6 2 1 5 4 8 8 4 ...
#>  $ gender   : Factor w/ 2 levels "Man","Woman": 2 2 2 2 1 1 1 1 1 1 ...
#>  $ callback : logi  TRUE TRUE TRUE TRUE FALSE FALSE ...
#>  $ paris    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ cont     : Factor w/ 2 levels "LTC","STC": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ ansorder : int  1 2 3 4 9 9 9 9 9 9 ...
#>  $ date     : Factor w/ 3 levels "April 2009","February 2009",..: 2 2 2 2 2 2 2 2 2 2 ...

The first variable, offer, is very important. It indicates the job offer identification. It is important because, in order to test discrimination, the workers must candidate on the same job offer. This is the cluster parameter of the callback() function. With cluster = "offer" we are sure that all the computations will be paired, which means that we will always compare the candidates on the very same job offer. This is essential to produce meaningful results since otherwise the difference of answers could come from the differences of recruiters and not from the differences in gender or origin.

The second important variables are the ones that define the candidates. Here, there are two variables : gender and origin. These are factors and the reference levels of these factors implicitly define the reference candidate. Which one? By convention, it is the candidate that is the least susceptible of being discriminated against. Here, the reference candidate would be male because male candidates should not be discriminated against because of their gender, and from a French origin because French origin candidates should not be discriminated against in the French labor market because they have a French origin. In practice, we will check that this candidate really had the highest callback rate. We can find the reference levels of our two factors by looking at the first level given by the levels() function.

levels(inter1$gender)
#> [1] "Man"   "Woman"
levels(inter1$origin)
#> [1] "F" "M" "S" "V"

There are two genders : “Man” (reference) and “Woman”. There are four origins : French (F, reference), Moroccan (M), Senegalese (S) and Vietnamese (V). You do not need to aggregate the two candidates’ variables gender and origin to use callback(), it will do it for you.

The last element we need is, obviously, the outcome of the job hiring application. It is given by the callback variable. It is a Boolean variable, TRUE when the recruiter gives a non negative callback, and FALSE otherwise.

We can know launch the callback() function, which prepares the data for statistical analysis. Here we need to choose the comp parameter. Indeed, we realize that there are 8 candidates so that 8 x 7/2 = 28 comparisons are possible. This is a large number and this is why callback() performs the statistical analysis according to the reference candidate by default with comp = "ref". This reduces our analysis to 7 comparisons. You could get the 28 comparisons by choosing comp = "all" instead.

m <- callback(data = inter1, cluster = "offer", candid = c("gender","origin"), callback = "callback")

The m object contains the formatted data needed for the analysis. Using print() gives the mains characteristics of the experiment :

print(m)
#> 
#>  Structure of the experiment 
#>  ---------------------------
#>  
#>  Candidates defined by: gender origin 
#>  Callback variable: callback 
#>  
#>  Number of tests for each candidate:
#> 
#>   Man.F   Man.M   Man.S   Man.V Woman.F Woman.M Woman.S Woman.V 
#>     310     310     310     310     310     310     310     310 
#> 
#>  
#>  Number of tests for each pair of candidates:
#> 
#>  Man.F.vs.Man.M Man.F.vs.Man.S Man.F.vs.Man.V Man.F.vs.Woman.F Man.F.vs.Woman.M
#>             310            310            310              310              310
#>  Man.F.vs.Woman.S Man.F.vs.Woman.V
#>               310              310
#> 
#>  
#>  Number of tests with all the candidates: 310

We find that the experiment is standard, in the sense that all the candidates were sent to all the applications. Notice that this is not needed to use callback, it will work fine if there are less candidates. However, when more than one candidate of the same type are send to a test, the most favorable answer is kept (the “max” rule). The reader is informed that there are other ways to deal with this issue.

We can also take a look at the global callback rates of the candidates, by entering :

print(stat_glob(m))
#> 
#>  Global callback rates 
#>  ---------------------
#>  
#>  Wilson confidence intervals at the 95 percent level 
#> 
#>                inf p_callback       sup
#> Man.F   0.22902951 0.27741935 0.3314330
#> Man.M   0.16658655 0.20967742 0.2601261
#> Man.S   0.10322749 0.13870968 0.1834024
#> Man.V   0.08923308 0.12258065 0.1655700
#> Woman.F 0.18130574 0.22580645 0.2772501
#> Woman.M 0.07270828 0.10322581 0.1439115
#> Woman.S 0.05654641 0.08387097 0.1219046
#> Woman.V 0.15780504 0.20000000 0.2498025

and get a graphical representation with :

graph(stat_glob(m))

It is possible to change the definition of the confidence intervals, the confidence level and the color of the plot. If you prefer the Clopper-Pearson definition, a 90% confidence interval and a “steelblue3” color enter :

s <- stat_glob(m,level=0.9)
print(s,method="cp")
#> 
#>  Global callback rates 
#>  ---------------------
#>  
#>  Clopper-Pearson confidence intervals at the 90 percent level 
#> 
#>                inf p_callback       sup
#> Man.F   0.23570047 0.27741935 0.3223433
#> Man.M   0.17222662 0.20967742 0.2513276
#> Man.S   0.10749071 0.13870968 0.1752003
#> Man.V   0.09312657 0.12258065 0.1575588
#> Woman.F 0.18721293 0.22580645 0.2683608
#> Woman.M 0.07612170 0.10322581 0.1361644
#> Woman.S 0.05943210 0.08387097 0.1144672
#> Woman.V 0.16327759 0.20000000 0.2410655
graph(s,method="cp",col="steelblue3")

When all the candidates are sent to all the tests, the previous figures may be used to measure discrimination. However, when there is a rotation of the candidates so that only a part of them is sent on each test, it could not be the case. For this reason, we prefer to use matched statistics, which only compares candidates that have been sent to the same tests.

In order to get the result of the discrimination tests, we will use the stat_count function. It can be saved into an object for further exports, or printed. The following instruction:

s <- stat_count(m)

does not produce any printed output, but saves an object stat_count into s. We can get the statistics with:

print(s)
#> 
#>  Callback counts:
#>  ----------------
#>                  tests callback callback1 callback2 Neither Only 1 Only 2 Both
#> Man.F.vs.Man.M     310      106        86        65     204     41     20   45
#> Man.F.vs.Man.S     310      100        86        43     210     57     14   29
#> Man.F.vs.Man.V     310       97        86        38     213     59     11   27
#> Man.F.vs.Woman.F   310      113        86        70     197     43     27   43
#> Man.F.vs.Woman.M   310       96        86        32     214     64     10   22
#> Man.F.vs.Woman.S   310       97        86        26     213     71     11   15
#> Man.F.vs.Woman.V   310      111        86        62     199     49     25   37
#>                  Difference
#> Man.F.vs.Man.M           21
#> Man.F.vs.Man.S           43
#> Man.F.vs.Man.V           48
#> Man.F.vs.Woman.F         16
#> Man.F.vs.Woman.M         54
#> Man.F.vs.Woman.S         60
#> Man.F.vs.Woman.V         24

The callback counts describe the results of the paired experiments. The first column defines the comparison under the form “candidate 1 vs candidate 2”. Here “Man.F vs Woman.F” means that we compare French origin men and women. Out of 310 tests, 113 got at least one callback. The men got 86 callbacks and the women 70. The difference, called net discrimination, equals 16 callbacks. We can go further in the details thanks to the next columns. Out of 310 tests, neither candidate was called back in 197 of the job offers, 43 called only men, 27 called only women and 43 called both. Discrimination only occurs when a single candidate is called back. The net discrimination is thus 43-27=16. The corresponding line percentages are available with .

s$props
#>                  p_callback   p_cand1    p_cand2     p_c00     p_c10      p_c01
#> Man.F.vs.Man.M    0.3419355 0.2774194 0.20967742 0.6580645 0.1322581 0.06451613
#> Man.F.vs.Man.S    0.3225806 0.2774194 0.13870968 0.6774194 0.1838710 0.04516129
#> Man.F.vs.Man.V    0.3129032 0.2774194 0.12258065 0.6870968 0.1903226 0.03548387
#> Man.F.vs.Woman.F  0.3645161 0.2774194 0.22580645 0.6354839 0.1387097 0.08709677
#> Man.F.vs.Woman.M  0.3096774 0.2774194 0.10322581 0.6903226 0.2064516 0.03225806
#> Man.F.vs.Woman.S  0.3129032 0.2774194 0.08387097 0.6870968 0.2290323 0.03548387
#> Man.F.vs.Woman.V  0.3580645 0.2774194 0.20000000 0.6419355 0.1580645 0.08064516
#>                       p_c11 p_cand_dif
#> Man.F.vs.Man.M   0.14516129 0.06774194
#> Man.F.vs.Man.S   0.09354839 0.13870968
#> Man.F.vs.Man.V   0.08709677 0.15483871
#> Man.F.vs.Woman.F 0.13870968 0.05161290
#> Man.F.vs.Woman.M 0.07096774 0.17419355
#> Man.F.vs.Woman.S 0.04838710 0.19354839
#> Man.F.vs.Woman.V 0.11935484 0.07741935

Now, we can pass to the proportions analysis. We can save the output or print it, like in the previous example. Printing is the default. There are three ways to compute proportions in discrimination studies. First, you can divide the number of callbacks by the number of tests. We call it “matched callback rates” given by the function stat_mcr(). Second, you can restrict your analysis to the tests which got at least one callback. We call it “total callback shares”, given by the function stat_tcs(). Last you can divide by the number of tests where only one candidate has been called back. We call it “exclusive callback shares”, given by the function stat_ecs(). With the first convention, we get:

stat_mcr(m)
#> 
#>  
#>  Equality of proportions - matched callback rates
#>  ------------------------------------------------ 
#>  
#>  Fisher test: 
#>                  p_cand_dif     p_Fisher s_Fisher
#> Man.F.vs.Man.M   0.06774194 6.108741e-02      .  
#> Man.F.vs.Man.S   0.13870968 2.859620e-05      ***
#> Man.F.vs.Man.V   0.15483871 1.886620e-06      ***
#> Man.F.vs.Woman.F 0.05161290 1.649424e-01         
#> Man.F.vs.Woman.M 0.17419355 3.790180e-08      ***
#> Man.F.vs.Woman.S 0.19354839 3.229317e-10      ***
#> Man.F.vs.Woman.V 0.07741935 3.004301e-02      *  
#> 
#>  Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.10 ' ' 1
#>  
#>  Chi-squared test: 
#>                  p_cand_dif   Pearson    p_Pearson s_Pearson
#> Man.F.vs.Man.M   0.06774194  3.501885 6.129902e-02       .  
#> Man.F.vs.Man.S   0.13870968 17.267087 3.247638e-05       ***
#> Man.F.vs.Man.V   0.15483871 22.268145 2.371075e-06       ***
#> Man.F.vs.Woman.F 0.05161290  1.927221 1.650627e-01          
#> Man.F.vs.Woman.M 0.17419355 29.400702 5.885631e-08       ***
#> Man.F.vs.Woman.S 0.19354839 37.932719 7.322680e-10       ***
#> Man.F.vs.Woman.V 0.07741935  4.695087 3.024897e-02       *  
#> 
#>  Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.10 ' ' 1
#>  
#>  Student test: 
#>                  p_cand_dif  Student    p_Student s_Student
#> Man.F.vs.Man.M   0.06774194 2.716294 6.973512e-03       ** 
#> Man.F.vs.Man.S   0.13870968 5.323431 1.961932e-07       ***
#> Man.F.vs.Man.V   0.15483871 6.058490 3.990671e-09       ***
#> Man.F.vs.Woman.F 0.05161290 1.920642 5.569699e-02       .  
#> Man.F.vs.Woman.M 0.17419355 6.708070 9.404184e-11       ***
#> Man.F.vs.Woman.S 0.19354839 7.140081 6.704916e-12       ***
#> Man.F.vs.Woman.V 0.07741935 2.821082 5.095962e-03       ** 
#> 
#>  Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.10 ' ' 1

The printing output includes three tests: the Fisher exact independence test between the candidate type and the callback variable the chi-squared test for the equality of the candidates’ callback rates, and the asymptotic Student test for the equality of the candidates’ callback rates. A code indicates the significance of the difference, with the same convention as in the lm() function. We find that all the differences are significant at the 5% level, except for the two French origin candidates, whatever the test used. The associated graphical representation is obtained by:

graph(stat_mcr(m))

The colors can be changed with the option col and the definition of the confidence intervals with the option method.

There is a second graphical representation, that shows the confidence interval of both candidates. However, the reader must be warned that this representation can be misleading for the following reason. The crossing of the confidence intervals does not imply the equality of the proportions. The only correct representation is the previous one, given by default. To get a comparaison of the confidence interval, enter:

graph(stat_mcr(m),dif=FALSE)

The statistics for the other conventions, total callback share and exclusive callback shares, can be obtained by changing the function name to stat_tcs() and stat_ecs() respectively. The graphical representations for the differences are also similar. The only difference is that, it is possible to have a representation of the total or exclusive callback shares. For the total callback shares, we have:

For the total callback shares, we get:

graph(stat_tcs(m),dif=FALSE)

and for the exclusive callback shares, we get :

graph(stat_ecs(m),dif=FALSE)