=Paper= {{Paper |id=Vol-3218/paper3 |storemode=property |title=Closing the Gender Wage Gap: Adversarial Fairness in Job Recommendation |pdfUrl=https://ceur-ws.org/Vol-3218/RecSysHR2022-paper_3.pdf |volume=Vol-3218 |authors=Clara Rus,Jeffrey Luppes,Harrie Oosterhuis,Gido H. Schoenmacker |dblpUrl=https://dblp.org/rec/conf/hr-recsys/RusLOS22 }} ==Closing the Gender Wage Gap: Adversarial Fairness in Job Recommendation== https://ceur-ws.org/Vol-3218/RecSysHR2022-paper_3.pdf
Closing the Gender Wage Gap: Adversarial Fairness in Job
Recommendation
Clara Rus1 , Jeffrey Luppes2 , Harrie Oosterhuis1 and Gido H. Schoenmacker2
1
    Radboud University, Houtlaan 4, 6525XZ, Nijmegen, the Netherlands
2
    DPG Media Online Services, Jacob Bontiusplaats 9, 1018LL, Amsterdam, the Netherlands


                                       Abstract
                                       The goal of this work is to help mitigate the already existing gender wage gap by supplying unbiased job recommendations
                                       based on resumes from job seekers. We employ a generative adversarial network to remove gender bias from word2vec
                                       representations of 12M job vacancy texts and 900k resumes. Our results show that representations created from recruitment
                                       texts contain algorithmic bias and that this bias results in real-world consequences for recommendation systems. Without
                                       controlling for bias, women are recommended jobs with significantly lower salary in our data. With adversarially fair
                                       representations, this wage gap disappears, meaning that our debiased job recommendations reduce wage discrimination. We
                                       conclude that adversarial debiasing of word representations can increase real-world fairness of systems and thus may be part
                                       of the solution for creating fairness-aware recommendation systems.

                                       Keywords
                                       Generative adversarial networks, Fairness-aware machine learning, Recruitment, Gender bias



1. Introduction                                                                                                   present in the data and help a system achieve fairer pre-
                                                                                                                  dictions [13].
The recruitment industry relies more and more on au-                                                                 One way to learn debiased representations is through
tomation for processing, searching, and matching job adversarial learning. State-of-the-art adversarial debias-
vacancies to job seekers. However, automation of the ing methods [14, 15, 16, 17, 18] rely on the same general
recruitment process can lead to discriminatory results approach as generative adversarial networks [19]. A gen-
with respect to certain groups, based on gender, ethnicity erator model is trained to produce new data representa-
or age [1]. Inequality in employment and remuneration tions, that are critiqued by an adversary neural network.
still exists between for example ethnic groups [2, 3, 4] The adversary tries to predict the sensitive variable (in
and gender groups [5, 6], thus naive implementations our case, gender) from the produced representation. By
of AI recruitment systems are at risk of copying and training the representations together with an adversary
perpetuating these inequalities.                                                                                  and classifier, they are aimed to be both fair and useful
              One reason for an algorithm to show discriminatory for the task.
behaviour is the input data [7]. If the data is under–                                                               This work is motivated by the desire to supply un-
representative or if historical bias is present, then the biased job recommendations to job seekers. We focus
system can propagate this in its predictions [1]. Ignoring specifically on mitigating gender bias in word embed-
the presence of bias in the data, can perpetuate existing dings obtained from recruitment texts using adversarial
(gender) stereotypes and inequalities in employment.                                                              learning. Our work adds to existing research by apply-
              Examples of systems that have shown biased behaviour ing state-of-the-art debiasing [14, 20] to industrial sized
with respect to gender include the Amazon recruitment free-format recruitment textual data. Firstly, we inves-
system1 and the Facebook Add algorithm [8]. Also widely tigate gender bias in the existing representations and
used models, such as BERT [9] and word2vec [10], have the unfairness it results in. Secondly, we apply two de-
been shown to create biased representations [11, 12]. biasing methods to create new representations. These
Obtaining fair representations could eliminate the bias methods balance multi-label classification to ensure that
                                                                                                                  task-relevant information has been preserved, with an
RecSys in HR’22: The 2nd Workshop on Recommender Systems for adversarial setup that attempts to remove the effects of
Human Resources, in conjunction with the 16th ACM Conference on
                                                                                                                  gender bias. The resulting new representations are tested
Recommender Systems, September 18–23, 2022, Seattle, USA.
Envelope-Open clara.rus@ru.nl (C. Rus); jeffrey.luppes@dpgmedia.nl                                                in a job recommendation setting where the difference in
(J. Luppes); harrie.oosterhuis@ru.nl (H. Oosterhuis);                                                             wage between jobs recommended based on female/male
gido.schoenmacker@dpgmedia.nl (G. H. Schoenmacker)                                                                resumes is evaluated.
Orcid 0000-0002-0465-535X (J. Luppes); 0000-0002-0458-9233                                                           To summarize, our contributions are three-fold: (i) we
(H. Oosterhuis); 0000-0003-3946-928X (G. H. Schoenmacker)
                     © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License measure whether adversarial learning can mitigate gen-
    CEUR
    Workshop
    Proceedings
                     Attribution 4.0 International (CC BY 4.0).
                     CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                                  der bias in representations of industrial sized free-format
1
    https://www.reuters.com/article/idUSKCN1MK08G                                                                 recruitment textual data; (ii) we show whether debiased
representations help achieve fairness and performance        augmented corpus, resulting in new representations for
on a multi-label classification task; and (iii) to the au-   both the resumes and the vacancies. In the remaining text,
thors’ best knowledge, we are the first to successfully      “original representations” will refer to the representations
apply debiased representations to help solve the gender      trained on the original texts, whereas “word-substitution
wage-gap in a job recommendation setting. Moreover,          representations” will refer to the representations trained
our implementation of the adversarial debiasing method       on the altered texts.
is publicly available.                                          Secondly, we applied the adversarial approach as pro-
   In the next section, our data and methods are described   posed by Edwards and Storkey [14]. This method consists
in detail. After that, the results are presented. Lastly,    of three neural network components: a generator, a clas-
these results are discussed together with our final con-     sifier, and an adversary. Inspired by Özdenizci et al. [20],
clusions and suggestions for future directions.              we chose the following the architecture: The generator
                                                             is a multilayer perceptron with three hidden layers of
                                                             128 neurons that outputs a 300 dimensional vector repre-
2. Data and Method                                           senting the new representations. The classifier and the
                                                             adversary have one hidden layer of 128 neurons. The
2.1. Data                                                    output dimension of the classifier is 21 (industry group
The recruitment data set used throughout this research       classes), and the output dimension of the adversary is
consists of job vacancies and job seeker information pro-    one (gender). An architecture schematic is included as
vided by DPG Recruitment. Job vacancy information            Figure 1.
included (i) salary ranges, (ii) working hours, and (iii)       The generator creates new representations for the clas-
anonymised free-format job vacancy texts. In total there     sification task, while the adversary attempts to predict a
are 12 millions vacancies.                                   sensitive variable gender from these new representations.
   Job seeker information consisted of (i) one or more       The goal of the generator is to create representations
industry group(s) that the job seeker expressed interest     that can fool the adversary in such a way that the sen-
in (out of a total of 21 pre-defined groups), (ii) inferred  sitive variable can no longer be predicted, while also
dichotomous gender, and (iii) anonymised free-format re-     obtaining a good performance on the classification task.
sume texts. Gender of the job seeker was inferred based      The classification task is considered to be a multi-label
on first name. From the total of available resumes, en-      task of 21 classes, predicting the industry group(s) for
tries with missing data (65%) or ambiguous first name        each job seeker. This means that the classification loss
(3%) were excluded, leaving 904,576 (32%) complete re-       should be minimized while the adversarial loss should be
sumes with a female to male ratio of 0.93. Anonymisa-        maximized. The final loss (Equation 1) of the model is a
tion included removal of all names (including company        weighted sum of the adversarial loss and the classification
names), dates, addresses, telephone numbers, email ad-       loss, where Z are the newly generated representations,
dresses, websites, and other contact information. A more     Y’ are the predictions of the classifier and S’ are the pre-
complete overview of this data is given in Appendix A.       dictions of the adversary:
   Both vacancy and resume texts were embedding into
                                                                        𝐿 = 𝛼𝐿𝑐𝑙𝑎 (𝑍 , 𝑌 ′ ) + 𝛽𝐿𝑎𝑑𝑣 (𝑍 , 𝑆 ′ ).    (1)
300-dimensional word vector using a word2vec [10]
model trained on all vacancy texts. Finally, each text We will call representations created by this method “ad-
was represented as the mean over the embeddings of the versarial representations”. Because the adversarial pro-
words composing the text.                                   cess could be unstable, all results pertaining to these are
                                                            the mean of 5 independent complete training runs.
2.2. Bias and debiasing
Previous research has shown that popular models such as      2.3. Evaluation
BERT [9] and word2vec [10] can create biased represen-       Classifiers for both industry groups and sensitive variable
tations [11, 12, 21]. In this work, two debiasing methods    are evaluated in terms of accuracy and area under the
were employed to combat this bias.                           receiver operating characteristic curve (AUC). Fairness
   Firstly, to create a simple baseline, we attempt to de-   was evaluated using statistical parity [22]:
bias the representations by replacing gendered words
with neutral words. For example, gendered pronouns
“she”/“he”, “her”/“his” are replaced with neutral pro-           𝑃(𝑐𝑙𝑎(𝑍 ) = 1|𝑆 = 1) − 𝑃(𝑐𝑙𝑎(𝑍 ) = 1|𝑆 = 0) < 𝜖.    (2)
nouns “they” and “theirs”. Gendered words such as:
“woman”/“man”, “girl”/“boy” are replaced with the word          In the recruitment industry, if a system designed to
“person”. The full list of substitutions can be found in     match resumes and vacancies perpetuates biased asso-
Appendix B. A new word2vec model was trained on this         ciations, it could lead to a wage gap between salaries of
                                                                            AUC and an accuracy of 86%; the word-substitution rep-




                                                            )
                                                                            resentations result in an 93% AUC and an accuracy of




                                                           28
                                                         (1




                                                                       1)
                                                       er




                                                                   (2
                                                     ifi
                                                                            85%; lastly, the adversarial representations lowered both




                                                                  Yˆ
                                                  ss
                                                   a
                                                Cl
                                                                            the accuracy and the AUC to 82%.




                                            )
                                           00
                )




                               )
            00




                            28




                                        (3
            (3




                            (1




                                        d
           al




                          en




                                       e
                                    as
       in




                        dd
       ig




                                   bi
                       Hi
      Or




                                   De
                                                                            3.2. Prediction of industry group




                                                              )
                                                           28
                                                           (1




                                                                       )
                                                       ry




                                                                   (1
                                                    sa




                                                                  Sˆ
                                                               Secondly, the information contents and statistical parity




                                                     r
                                                  ve
                                                Ad
                                                               of the three representation types were tested by attempt-
Figure 1: Architecture of the adversarial setup. The left sec- ing to predict the function group based on resume repre-
tion (green) represents the generator, consisting of an input sentation. Table 1 shows the result obtained in terms of
layer (d=300) for the word2vec representations, three hidden performance and statistical parity.
layers (d=128), and an output layer (d=300) for the debiased      Training a classifier with the original representations
representations. The top section (blue) represents the classi- of the resumes obtained a statistical parity of 0.076. The
fier consisting of a hidden layer (d=128) and an output layer word-substitution representations obtained similar re-
𝑌̂ (d=21) encoding the industry groups. The bottom section sults. Using an adversarial approach improved the statis-
(red) represents the adversary and consists of a hidden layer tical parity by 21%, at the cost of lowering the accuracy
(d=128) and an output neuron 𝑆 ̂ (d=1) encoding the sensitive
                                                               by 2 percentage point and the true positive rate by 16
variable gender.
                                                               percentage point.

                                                                            3.3. Salary Association Test
women and men [23]. To specifically test differences in
salary, a salary association test was performed between                     Thirdly, a salary association test was performed using
the representations of the resumes and of the vacancies.                    the three representation types. Table 2 describes the
Using the embeddings of the resumes and the vacancies                       salary distribution of the female and male groups for
the L2 distance matrix was computed and each resume                         each debiasing method. In the female group there are
was matched to the closest vacancy. The salary distri-                      4827 samples and in the male group 5173.
bution of the matched vacancies of the female-inferred                         Using the original representations, female-inferred re-
group were compared with the male-inferred group.                           sumes were associated with a mean salary of €25.28 per
                                                                            hour, whereas male-inferred resumes were associated
2.4. Experimental Setup                                                     with a mean salary of €26.09 per hour, which is signif-
                                                                            icantly (p<1e−5) higher. This results in an estimated
The train split was created by taking 30% of random                         average annual wage gap of €1680.
samples for the validation split, and the rest of the full                     Using the word-substitution representations, female-
data is used for training. The full data set was not used for               inferred resumes were associated with a mean salary of
the salary association due to computational limitations.                    €25.19 per hour, whereas male-inferred resumes were
Instead, 10,000 resumes were associated with all jobs from                  associated with a mean salary of €26.14 per hour. The
the time period June 2020–June 2021 that provided salary                    difference between the means of the female group and the
information. This resulted in 23,501 total vacancies. All                   male group increased, broadening the annual wage gap
experiments were conducted using a fixed 70-30% split                       to €1900 (with a significant difference between groups,
and the Adam optimizer with a learning rate of 1e−5. For                    p<1e−7).
all components the binary cross-entropy loss was used.                         With the adversarial representations, female-inferred
Parameters of the final loss (Equation 1) are set in the                    resumes were associated with a mean salary of €27.06
following way: α = 1 , β = 1. The implementation of the                     an hour, whereas male-inferred resumes were associated
adversarial debiasing method can be found at: https://                      with a mean salary of €27.15 an hour. Using the adversar-
github.com/ClaraRus/Debias-Embeddings-Recruitment.                          ial method to generate fair representations for both the
                                                                            resumes and vacancies decreased the mean gap, lowering
                                                                            the annual wage gap to €180. This resulted in the female/-
3. Results                                                                  male difference now being non-significant (p=0.47).
                                                                               Table 2 shows the mean salary per hour for each in-
3.1. Prediction of sensitive variable                                       dustry group. Ideally females and males belonging to the
Firstly, the discriminatory power to predict the sensi-                     same industry group should have similar salaries. The
tive variable gender was tested using the original, word-                   word-substitution representations lowered the wage gap
substitution, and adversarial representations. Using the                    in 13 of the industry groups by €620 per year, while in-
original representations, gender is predicted with 94%                      creasing the gap for 7 with an average of €460 per year.
Table 1
Statistical parity and performance of multi-label classification of 21 industry groups from three different types of representations.
Original representations were obtained using word2vec. Word-substitution representations were obtained using a word-
substitution debiasing method. Adversarial representations were obtained using the adversarial debiasing method. The
“Overall” row represents the weighted mean. Parity: statistical parity (Eq. 2), closer to zero is better. TPR: True positive rate.
                                                        Original                 Word-substitution                Adversarial
                                               Parity    Accuracy      TPR    Parity  Accuracy    TPR    Parity     Accuracy       TPR
  Overall                                      0.076       0.90        0.37   0.760     0.90      0.38   0.060        0.89        0.21
  Administration/Secretarial                   0.267       0.74        0.52   0.271     0.74      0.53   0.260        0.72        0.45
  Automation/Internet                          0.066       0.83        0.46   0.069     0.83      0.47   0.045        0.82        0.32
  Policy/Executive                             0.000       0.79        0.19   0.001     0.79      0.21   0.005        0.79        0.08
  Security/Defence/Police                      0.010       0.83        0.16   0.009     0.83      0.16   0.000        0.83        0.03
  Commercial/Sales                             0.074       0.74        0.36   0.070     0.74      0.35   0.059        0.72        0.19
  Consultancy/Advice                           0.026       0.75        0.20   0.033     0.75      0.22   0.012        0.74        0.07
  Design/Creative/Journalism                   0.004       0.82        0.26   0.005     0.82      0.30   0.001        0.82        0.09
  Management                                   0.070       0.78        0.32   0.063     0.78      0.29   0.052        0.77        0.22
  Financial/Accounting                         0.021       0.81        0.47   0.021     0.81      0.48   0.020        0.80        0.29
  Financial services                           0.012       0.79        0.22   0.012     0.79      0.28   0.007        0.79        0.10
  HR/Training                                  0.041       0.80        0.32   0.043     0.80      0.34   0.014        0.78        0.09
  Catering/Retail                              0.037       0.76        0.33   0.023     0.76      0.27   0.018        0.75        0.14
  Procurement/Logistics/Transport              0.115       0.77        0.38   0.102     0.77      0.35   0.087        0.76        0.24
  Legal                                        0.015       0.85        0.45   0.015     0.85      0.44   0.002        0.84        0.09
  Customer service/Call centre/Front office    0.039       0.76        0.12   0.031     0.76      0.10   0.001        0.76        0.01
  Marketing/PR/Communications                  0.031       0.77        0.41   0.031     0.77      0.45   0.028        0.76        0.27
  Medical/Healthcare                           0.115       0.76        0.40   0.116     0.77      0.40   0.100        0.75        0.27
  Education/Research/Science                   0.045       0.77        0.32   0.057     0.77      0.39   0.031        0.75        0.16
  Other                                        0.005       0.68        0.04   0.009     0.68      0.05   0.000        0.68        0.00
  Production/Operational                       0.063       0.78        0.27   0.062     0.78      0.27   0.043        0.77        0.15
  Technology                                   0.165       0.79        0.51   0.169     0.79      0.52   0.153        0.78        0.43


Table 2
Salary Association Test between resumes and vacancies. For each resume the most similar vacancy was assigned based on
Euclidean distance in the representation space. The values represent the salary per hour in Euros (€). Original representations
were obtained using word2vec. Word-substitution representations were obtained using the word-substitution debiasing
method. Adversarial representations were obtained using the adversarial debiasing method. The top three rows represent the
weighted summary statistics. The industries names with an asterisk (*) are the ones for which the adversarial method reduced
the wage gap.
                                                        Original                 Word-substitution                Adversarial
                                               Female   Male Wage gap         Female Male Wage gap       Female     Male Wage gap
  Mean                                         25.28    26.09       0.81      25.19   26.14    0.95      27.06      27.15        0.09
  Standard deviation                           9.43     9.90        0.47      9.54    10.07    0.53      10.14      9.94        -0.20
  Median                                       23.40    23.62       0.22      22.95   23.62    0.67      23.97      24.30        0.33
  Administration/Secretarial*                  23.50    24.94        1.44     23.44   24.95     1.51     26.45      26.41       -0.04
  Automation/Internet                          28.34    28.58        0.24     28.05   29.02     0.97     29.94      28.44       -1.50
  Policy/Executive*                            29.90    31.23        1.33     30.16   31.53     1.37     30.35      31.18        0.83
  Security/Defence/Police*                     24.81    22.78       -2.03     24.81   23.09    -1.72     25.51      26.44        0.93
  Commercial/Sales*                            23.76    25.77        2.01     23.88   25.53     1.65     26.66      27.39        0.73
  Consultancy/Advice*                          29.27    30.49        1.22     29.25   30.42     1.17     29.92      30.63        0.71
  Design/Creative/Journalism                   26.39    26.33       -0.06     26.13   26.12    -0.01     28.22      28.02       -0.20
  Management*                                  29.49    31.22        1.73     29.49   31.47     1.98     30.66      30.31       -0.35
  Financial/Accounting*                        24.30    27.94        3.64     24.43   28.07     3.64     27.20      28.62        1.42
  Financial services*                          24.33    27.85        3.52     24.19   27.76     3.57     26.80      28.67        1.87
  HR/Training                                  28.59    28.87        0.28     28.80   29.10     0.30     29.52      29.15       -0.37
  Catering/Retail*                             22.80    23.76        0.96     22.76   23.49     0.73     25.15      24.50        0.65
  Procurement/Logistics/Transport              23.70    23.28       -0.42     23.46   23.30    -0.16     25.96      25.10       -0.86
  Legal*                                       24.89    28.79        3.90     25.52   28.91     3.39     28.82      29.01        0.19
  Customer service/Call centre/Front office*   22.89    23.78        0.89     23.01   23.85     0.84     25.35      25.96        0.61
  Marketing/PR/Communications*                 26.64    27.65        1.01     26.71   27.55     0.84     28.86      29.22        0.36
  Medical/Healthcare*                          26.30    27.51        1.21     26.11   27.19     1.08     27.17      28.07        0.90
  Education/Research/Science                   28.82    27.43       -1.39     28.30   27.65    -0.65     27.66      29.07        1.41
  Other*                                       24.91    24.58       -0.33     24.79   24.84     0.05     26.07      26.32        0.25
  Production/Operational*                      21.69    22.61        0.92     21.15   22.49     1.34     24.39      23.94       -0.45
  Technology*                                  25.51    24.09       -1.42     24.57   24.07    -0.50     25.79      25.79        0.00
For “Financial/Accounting” there is no change in the          formance. Since statistical parity balances for equal true
salary association. The adversarial method lowered the        positive rate, the false positive and negative rates are
wage gap in 16 out of the 21 industries by an average of      likely to be affected.
€2160 per year but it increased the gap in the rest of the       Our analysis reveals that ignoring the presence of bias
industries by an average of €780 per year.                    in recruitment texts, that are used to match resumes
                                                              and vacancies, could lead to severe unwanted discrimina-
                                                              tory behaviour. The original representations produced
4. Discussion and Conclusion                                  a wage gap of €1680 per year between the female group
                                                              and the male group. The adversarial representations
This work focused on removing gender bias from word
                                                              eliminated this wage gap to a statistically insignificant
embeddings of vacancy texts and resumes with the goal of
                                                              difference. This result is especially important, because it
creating debiased job recommendations. It showed that
                                                              shows that the adversarial representations did not just
gender can be predicted extremely well from anonymised
                                                              perform better on selected in-vitro metrics, but also im-
resume embeddings and that naive resume-to-job recom-
                                                              proved fairness in a real application. This suggests that
mendations based on these embeddings can perpetuate
                                                              the adversarial representations do not remove bias only
the “wage gap” that exists between women and men.
                                                              “cosmetically” [27], but instead are effective for improv-
Adversarial debiasing improved statistical parity for in-
                                                              ing fairness in job recommendation. The adversarial
dustry classification based on resume and eliminated the
                                                              method increased the mean salary for both the female
female/male salary difference in job recommendations.
                                                              group and the male group, with a higher increase for
This suggests that adversarial debiasing can help make
                                                              the female group to balance the gap. This is a positive
fairer recommendations in realistic scenarios.
                                                              outcome as the method did not sacrifice the salaries of
   Our results indicate that anonymisation alone is not
                                                              one of the groups in order to reduce the wage gap.
enough to remove indirect information about the gender
                                                                 This work was limited by several factors. Firstly, while
of the job seeker. Namely, from our 900𝑘 anonymised
                                                              the fairness of job recommendations was assessed, due
resumes, gender could be predicted with an AUC of 0.94.
                                                              to unavailability of data, the quality of recommendations
This exceeds similar results that have been shown in a
                                                              could not be evaluated. This was mitigated by performing
smaller data set (AUC=0.81) [24]. This is a common prob-
                                                              a related classification task: predicting which industry
lem in fairness-aware machine learning, where removal
                                                              groups a job seeker is interested in based on resume. The
of directly sensitive information is undermined by corre-
                                                              accuracy of 0.89 on this task suggests that salient informa-
lated features that allow the sensitive information to be
                                                              tion relevant to job placement has been preserved. How-
inferred [22].
                                                              ever, since the true positive rate was impacted, it seems
   The difficultly of removing gender bias from language
                                                              likely that the recall of the recommendations would be
was further illustrated by our data augmentation attempt
                                                              impacted too. Secondly, the recommender system to sug-
to substitute a selection of gendered words by neutral
                                                              gest jobs based on representation distances was relatively
words before word2vec training. The resulting embed-
                                                              simple; if job-to-resume association data were available,
dings did not effect much change in any of our tests.
                                                              a more complex solution might be preferable. Thirdly,
Previous work on word substitution data augmentation
                                                              because gender was inferred for this research, it was
has been shown effective [25, 26], so it may be that our
                                                              not possible to include non-binary gender identities [28].
results are limited by the quantity and/or selection of
                                                              Since this group is vulnerable to employment discrimi-
our word substitution pairs (Table 4), which were taken
                                                              nation [29, 30, 31], it should not be overlooked and more
from [21]. While it is possible to improve upon our sub-
                                                              research here is needed. Fourthly, results reported in this
stitution pairs, creating a complete list of gendered words
                                                              research use only word2vec document embeddings; other
as used in vacancies and resumes is challenging if not
                                                              types of embeddings are not considered. Lastly, training
unfeasible, especially in multiple languages.
                                                              of the models was performed using a fixed split instead
   In contrast, the adversarial approach improved both
                                                              of cross-validation, which was infeasible due to time and
statistical parity and the wage gap in our data. Using the
                                                              costs. However, the results are likely to be representative
adversarial representations, prediction of gender dropped
                                                              given the large size of the data set.
from an AUC of 0.94 to 0.82 while performance of in-
                                                                 The strengths of our work include the application of ad-
dustry group prediction, in terms of accuracy, dropped
                                                              versarial debiasing for fairness-aware machine learning
only minimally (Table 1). However, the true positive
                                                              on real and large industry data. While adversarial debi-
rate was decreased, indicating that performance was af-
                                                              asing for fairness is not novel [14, 17, 18, 32, 33], applica-
fected. These results are linked and can be adapted by
                                                              tions generally extend to publicly available benchmark
changing the 𝛼 and 𝛽 parameters in Equation 1: more
                                                              data sets that make it difficult to assess its applicability
gender-neutral embeddings will likely lead to improved
                                                              to real-world recommendation systems. Our work is one
statistical parity but decreased industry prediction per-
                                                              of the first to show the results of adversarial fairness in
a real, industrial-scale system. In addition, this research                         tiers of fairness in machine learning, Communi-
obtained an acceptable trade-off between fairness and                               cations of the ACM 63 (2020) 82–89. doi:1 0 . 1 1 4 5 /
performance for a complex multi-label classification task.                          3376898.
Finally, this work showed that the adversarial approach                         [8] M. Ali, P. Sapiezynski, M. Bogen, A. Korolova,
eliminated the female/male wage gap in our job recom-                               A. Mislove, A. Rieke, Discrimination through op-
mendations, even though it was not trained for this task.                           timization: How Facebook’s ad delivery can lead
   In conclusion, this work identified gender bias in word                          to biased outcomes, Proceedings of the ACM on
representations and salary associations based on recruit-                           human-computer interaction 3 (2019) 1–30. doi:1 0 .
ment industry texts and successfully applied adversarial                            1145/3359301.
debiasing to combat gender bias in job recommendation. [9] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
With adversarial representations, the mean female/male                              Pre-training of deep bidirectional transformers for
wage gap was no longer statistically significant due to                             language understanding, arXiv preprint (2018).
being reduced by 89% from €1680 to €180 annually. Our                               doi:1 0 . 4 8 5 5 0 / a r X i v . 1 8 1 0 . 0 4 8 0 5 .
results show that adversarial debiasing of word represen- [10] T. Mikolov, K. Chen, G. Corrado, J. Dean, Effi-
tations can increase real-world fairness of recommenda-                             cient estimation of word representations in vec-
tion systems and thus may contribute to creating fairness-                          tor space, Proceedings of Workshop at ICLR 2013
aware machine learning systems.                                                     (2013). doi:1 0 . 4 8 5 5 0 / a r X i v . 1 3 0 1 . 3 7 8 1 .
                                                                               [11] K. Kurita, N. Vyas, A. Pareek, A. W. Black,
                                                                                    Y. Tsvetkov, Measuring bias in contextualized
References                                                                          word representations, arXiv preprint (2019). doi:1 0 .
                                                                                    48550/arXiv.1906.07337.
  [1] A. Köchling, M. C. Wehner, Discriminated by an
                                                                               [12] T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama,
      algorithm: A systematic review of discrimination
                                                                                    A. T. Kalai, Man is to computer programmer as
      and fairness by algorithmic decision-making in the
                                                                                    woman is to homemaker? Debiasing word embed-
      context of HR recruitment and HR development,
                                                                                    dings, Advances in neural information processing
      Business Research 13 (2020) 795–848. doi:1 0 . 1 0 0 7 /
                                                                                    systems 29 (2016). doi:1 0 . 4 8 5 5 0 / a r X i v . 1 6 0 7 . 0 6 5 2 0 .
      s40685- 020- 00134- w.
                                                                               [13] A. Beutel, J. Chen, Z. Zhao, E. H. Chi, Data deci-
  [2] L. Thijssen, B. Lancee, S. Veit, R. Yemane, Dis-
                                                                                    sions and theoretical implications when adversar-
      crimination against Turkish minorities in Germany
                                                                                    ially learning fair representations, arXiv preprint
      and the Netherlands: field experimental evidence
                                                                                    (2017). doi:1 0 . 4 8 5 5 0 / a r X i v . 1 7 0 7 . 0 0 0 7 5 .
      on the effect of diagnostic information on labour
                                                                               [14] H. Edwards, A. J. Storkey, Censoring representa-
      market outcomes, Journal of Ethnic and Migration
                                                                                    tions with an adversary, in: Y. Bengio, Y. LeCun
      Studies 47 (2021) 1222–1239. doi:1 0 . 1 0 8 0 / 1 3 6 9 1 8 3 X .
                                                                                    (Eds.), 4th International Conference on Learning
      2019.1622793.
                                                                                    Representations, ICLR 2016, San Juan, Puerto Rico,
  [3] P. Bisschop, B. ter Weel, J. Zwetsloot, Ethnic em-
                                                                                    May 2-4, 2016, Conference Track Proceedings, 2016,
      ployment gaps of graduates in the Netherlands,
                                                                                    pp. 1–14. doi:1 0 . 4 8 5 5 0 / a r X i v . 1 5 1 1 . 0 5 8 9 7 .
      De Economist 168 (2020) 577–598. doi:1 0 . 1 0 0 7 /
                                                                               [15] D. Xu, S. Yuan, L. Zhang, X. Wu, FairGAN+:
      s10645- 020- 09375- w.
                                                                                    Achieving fair data generation and classification
  [4] M. Ramos, L. Thijssen, M. Coenders, Labour mar-
                                                                                    through generative adversarial nets, in: 2019 IEEE
      ket discrimination against Moroccan minorities
                                                                                    International Conference on Big Data (Big Data),
      in the Netherlands and Spain: A cross-national
                                                                                    2019, pp. 1401–1406. doi:1 0 . 1 1 0 9 / B i g D a t a 4 7 0 9 0 .
      and cross-regional comparison, Journal of Eth-
                                                                                    2019.9006322.
      nic and Migration Studies 47 (2021) 1261–1284.
                                                                               [16] D. Madras, E. Creager, T. Pitassi, R. Zemel, Learn-
      doi:1 0 . 1 0 8 0 / 1 3 6 9 1 8 3 X . 2 0 1 9 . 1 6 2 2 8 2 4 .
                                                                                    ing adversarially fair and transferable represen-
  [5] E. Matteazzi, A. Pailhé, A. Solaz, Part-time employ-
                                                                                    tations, in: J. Dy, A. Krause (Eds.), Proceed-
      ment, the gender wage gap and the role of wage-
                                                                                    ings of the 35th International Conference on Ma-
      setting institutions: Evidence from 11 European
                                                                                    chine Learning, volume 80 of Proceedings of Ma-
      countries, European Journal of Industrial Relations
                                                                                    chine Learning Research, PMLR, 2018, pp. 3384–3393.
      24 (2018) 221–241. doi:1 0 . 1 1 7 7 / 0 9 5 9 6 8 0 1 1 7 7 3 8 8 5 7 .
                                                                                    URL: https://proceedings.mlr.press/v80/madras18a.
  [6] G. Ciminelli, C. Schwellnus, B. Stadler, Sticky floors
                                                                                    html. doi:1 0 . 4 8 5 5 0 / a r X i v . 1 8 0 2 . 0 6 3 0 9 .
      or glass ceilings? The role of human capital, work-
                                                                               [17] C. Wu, F. Wu, X. Wang, Y. Huang, X. Xie, Fairness–
      ing time flexibility and discrimination in the gender
                                                                                    aware news recommendation with decomposed
      wage gap, OECD Economics Department Working
                                                                                    adversarial learning, in: Proceedings of the
      Papers (2021). doi:1 0 . 1 7 8 7 / 0 2 e f 3 2 3 5 - e n .
                                                                                    AAAI Conference on Artificial Intelligence, vol-
  [7] A. Chouldechova, A. Roth, A snapshot of the fron-
                                                                                    ume 35, 2021, pp. 4462–4469. doi:1 0 . 4 8 5 5 0 / a r X i v .
     2006.16742.                                                                              1 (Long and Short Papers), Association for Compu-
[18] H. Liu, N. Zhao, X. Zhang, H. Lin, L. Yang, B. Xu,                                       tational Linguistics, Minneapolis, Minnesota, 2019,
     Y. Lin, W. Fan, Dual constraints and adversarial                                         pp. 609–614. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 0 6 1 .
     learning for fair recommenders, Knowledge-Based                                     [28] C. Richards, W. P. Bouman, L. Seal, M. J. Barker, T. O.
     Systems 239 (2022) 108058. doi:1 0 . 1 0 1 6 / j . k n o s y s .                         Nieder, G. T’Sjoen, Non-binary or genderqueer
     2021.108058.                                                                             genders, International Review of Psychiatry 28
[19] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,                                     (2016) 95–102. doi:1 0 . 3 1 0 9 / 0 9 5 4 0 2 6 1 . 2 0 1 5 . 1 1 0 6 4 4 6 .
     D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,                                 [29] J. Harrison, J. Grant, J. L. Herman, A gender
     Generative adversarial networks, arXiv preprint                                          not listed here: Genderqueers, gender rebels, and
     (2014). doi:1 0 . 4 8 5 5 0 / a r X i v . 1 4 0 6 . 2 6 6 1 .                            otherwise in the national transgender discrimina-
[20] O. Özdenizci, Y. Wang, T. Koike-Akino, D. Erdoğ-                                         tion survey, LGBTQ Public Policy Journal at the
     muş, Learning invariant representations from EEG                                         Harvard Kennedy School 2 (2012) 13. URL: https:
     via adversarial inference, IEEE access 8 (2020)                                          //escholarship.org/uc/item/2zj46213.
     27074–27085. doi:1 0 . 1 1 0 9 / A C C E S S . 2 0 2 0 . 2 9 7 1 6 0 0 .            [30] S. Davidson, Gender inequality: Nonbinary trans-
[21] A. Caliskan, J. Bryson, A. Narayanan, Semantics                                          gender people in the workplace, Cogent Social Sci-
     derived automatically from language corpora con-                                         ences 2 (2016) 1236511. doi:1 0 . 1 0 8 0 / 2 3 3 1 1 8 8 6 . 2 0 1 6 .
     tain human-like biases, Science 356 (2017) 183–186.                                      1236511.
     doi:1 0 . 1 1 2 6 / s c i e n c e . a a l 4 2 3 0 .                                 [31] A. A. Fogarty, L. Zheng, Gender ambiguity in
[22] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman,                                         the workplace: Transgender and gender-diverse
     A. Galstyan, A survey on bias and fairness in                                            discrimination, ABC-CLIO, 2018. URL: https://
     machine learning, ACM Comput. Surv. 54 (2021).                                           publisher.abc-clio.com/9781440863233/.
     doi:1 0 . 1 1 4 5 / 3 4 5 7 6 0 7 .                                                 [32] B. H. Zhang, B. Lemoine, M. Mitchell, Mitigat-
[23] F. Calanca, L. Sayfullina, L. Minkus, C. Wagner,                                         ing unwanted biases with adversarial learning,
     E. Malmi, Responsible team players wanted: an                                            in: Proceedings of the 2018 AAAI/ACM Confer-
     analysis of soft skill requirements in job advertise-                                    ence on AI, Ethics, and Society, 2018, pp. 335–340.
     ments, EPJ Data Science 8 (2019) 1–20. doi:1 0 . 1 1 4 0 /                               doi:1 0 . 1 1 4 5 / 3 2 7 8 7 2 1 . 3 2 7 8 7 7 9 .
     epjds/s13688- 019- 0190- z.                                                         [33] P. Sattigeri, S. C. Hoffman, V. Chenthamarakshan,
[24] P. Parasurama, J. Sedoc, Gendered information                                            K. R. Varshney, Fairness GAN: Generating datasets
     in resumes and its role in algorithmic and hu-                                           with fairness properties using a generative adversar-
     man hiring bias, in: Academy of Management                                               ial network, IBM Journal of Research and Develop-
     Proceedings, volume 2022, Academy of Manage-                                             ment 63 (2019) 3–1. doi:1 0 . 1 1 4 7 / J R D . 2 0 1 9 . 2 9 4 5 5 1 9 .
     ment Briarcliff Manor, NY 10510, 2022, p. 17133.
     doi:h t t p s : / / d o i . o r g / 1 0 . 5 4 6 5 / A M B P P . 2 0 2 2 . 2 8 5 .
[25] R. Hall Maudslay, H. Gonen, R. Cotterell, S. Teufel,
     It’s all in the name: Mitigating gender bias with
                                                                                         A. Industry and resumes
     name–based counterfactual data substitution, in:                                       distributions
     Proceedings of the 2019 Conference on Empirical
     Methods in Natural Language Processing and the                                      Table 3 shows the distribution of the samples in each
     9th International Joint Conference on Natural Lan-                                  industry group over the whole data set. In more than
     guage Processing (EMNLP-IJCNLP), Association                                        half of the industries the resumes from the female group
     for Computational Linguistics, Hong Kong, China,                                    are under-represented. The “Technology” industry has
     2019, pp. 5267–5275. doi:1 0 . 1 8 6 5 3 / v 1 / D 1 9 - 1 5 3 0 .                  the fewest samples from the female group. The are in-
[26] Y. Pruksachatkun, S. Krishna, J. Dhamala, R. Gupta,                                 dustries where the resumes from the female group are
     K.-W. Chang, Does robustness improve fairness?                                      over-represented, such as the “Administration/Secretar-
     Approaching fairness with word substitution ro-                                     ial” and the “Customer service/Call centre/Front office”
     bustness methods for text classification, in: Find-                                 industries, where the resumes from the male group are
     ings of the Association for Computational Lin-                                      more than three times less present.
     guistics: ACL-IJCNLP 2021, 2021, pp. 3320–3331.
     doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1 0 6 . 1 0 8 2 6 .
[27] H. Gonen, Y. Goldberg, Lipstick on a pig: Debiasing
                                                                                         B. Word-substitution debiasing
     methods cover up systematic gender biases in word                                      method
     embeddings but do not remove them, in: Proceed-
     ings of the 2019 Conference of the North American                                   Table 4 shows the substitutions of the gendered words
     Chapter of the Association for Computational Lin-                                   with the neutral words for both English and Dutch.
     guistics: Human Language Technologies, Volume
Table 3
Distribution of samples over each industry group. Counts and percentages per industry group do not sum to the expected
totals, because job seekers were free to select multiple groups. “F-M Ratio” represent the ratio between the number of females
within an industry group and the number of males.

                                                        Male              Female                Total        F-M Ratio
     Overall                                           467173             437403               904576           0.93
     Administration/Secretarial                      45293      (9%)    167585   (38%)     212878 (23%)        3.7
     Automation/Internet                             49527    (10%)     8547       (1%)    58074      (6%)     0.17
     Policy/Executive                                40086      (8%)    33541       (7%)   73627      (8%)     0.83
     Security/Defence/Police                         23134      (4%)    8821       (2%)    31955      (3%)     0.38
     Commercial/Sales                               92801     (19%)     66461    (15%)     159262 (17%)        0.71
     Consultancy/Advice                             69914     (14%)      42245     (9%)    112159 (12%)        0.6
     Design/Creative/Journalism                      19279      (4%)     24839     (5%)    44118      (4%)     1.28
     Management                                     67412      (14%)     32153     (7%)    99565    (11%)      0.48
     Financial/Accounting                            34233      (7%)     25523     (5%)    59756      (6%)     0.74
     Financial services                              34342      (7%)     29882     (6%)    64224      (7%)     0.87
     Catering/Retail                                 44647      (9%)    60588     (13%)    105235 (11%)        1.35
     HR/Training                                     26852      (5%)    53679    (12%)     80531      (8%)     1.99
     Procurement/Logistics/Transport                99429     (21%)      29677     (6%)    129106 (14%)        0.29
     Legal                                           8638       (1%)     18488     (4%)    27126     (2%)      2.14
     Customer service/Call centre/Front office       20000      (4%)    71090     (16%)    91090    (10%)      3.55
     Marketing/PR/Communications                    46832      (10%)    58598     (13%)    105430 (11%)        1.25
     Medical/Healthcare                             24018        (5%)   85414     (19%)    109432 (12%)        1.25
     Education/Research/Science                     38430        (8%)   66318     (15%)    104748 (11%)        1.72
     Other                                          86749      (18%)    82728     (18%)    169477 (18%)        0.95
     Production/Operational                         77790        (5%)   25452     (25%)     103242 (11%)       0.32
     Technology                                     102798     (22%)    9097        (2%)   111895 (12%)        0.08




Table 4
Substitutions of gendered words with neutral words used in the word-substitution debiasing method in both English (top) and
Dutch (bottom).

                                       Male Word       Female Word       Neutral Word
                                           he              she               they
                                           his            hers              theirs
                                         himself         herself          themselves
                                          male           female             person
                                           boy             girl             person
                                          man            woman              person
                                            hij              zij/ze            u
                                           zijn               haar            uw
                                          hijzelf            zijzelf         uzelf
                                         jongen              meisje         persoon
                                           man               vrouw          persoon