-

H. Oosterhuis);

Gap: Adversarial Fairness in Job Recom mendation

Clara Rus

clara.rus@ru.nl 1

Jefrey

Luppes

jeffrey.luppes@dpgmedia.nl 0

Harrie Oosterhuis

harrie.oosterhuis@ru.nl 1

Gido H. Schoenmacker

gido.schoenmacker@dpgmedia.nl 0 0 DPG Media Online Services , Jacob Bontiusplaats 9, 1018LL, Amsterdam , the Netherlands 1 Radboud University , Houtlaan 4, 6525XZ, Nijmegen , the Netherlands

2022

000 0 0003

The goal of this work is to help mitigate the already existing gender wage gap by supplying unbiased job recommendations based on resumes from job seekers. We employ a generative adversarial network to remove gender bias from word2vec representations of 12M job vacancy texts and 900k resumes. Our results show that representations created from recruitment texts contain algorithmic bias and that this bias results in real-world consequences for recommendation systems. Without controlling for bias, women are recommended jobs with significantly lower salary in our data. With adversarially fair representations, this wage gap disappears, meaning that our debiased job recommendations reduce wage discrimination. We conclude that adversarial debiasing of word representations can increase real-world fairness of systems and thus may be part of the solution for creating fairness-aware recommendation systems.

Generative adversarial networks Fairness-aware machine learning Recruitment Gender bias

CEUR htp:/ceur-ws.org ISN1613-073 https://www.reuters.com/article/idUSKCN1MK08G der bias in representations of industrial sized free-format recruitment textual data; (ii) we show whether debiased 1. Introduction The recruitment industry relies more and more on automation for processing, searching, and matching job vacancies to job seekers. However, automation of the recruitment process can lead to discriminatory results with respect to certain groups, based on gender, ethnicity or age [ 1 ]. Inequality in employment and remuneration still exists between for example ethnic groups [ 2, 3, 4 ] and gender groups [5, 6], thus naive implementations of AI recruitment systems are at risk of copying and perpetuating these inequalities.

One reason for an algorithm to show discriminatory

behaviour is the input data [7]. If the data is under– representative or if historical bias is present, then the system can propagate this in its predictions [ 1 ]. Ignoring the presence of bias in the data, can perpetuate existing (gender) stereotypes and inequalities in employment.

Examples of systems that have shown biased behaviour

with respect to gender include the Amazon recruitment system1 and the Facebook Add algorithm [8]. Also widely used models, such as BERT [ 9 ] and word2vec [10], have been shown to create biased representations [11, 12].

Obtaining fair representations could eliminate the bias

RecSys in HR’22: The 2nd Workshop on Recommender Systems for Human Resources, in conjunction with the 16th ACM Conference on dictions [13].

One way to learn debiased representations is through

adversarial learning. State-of-the-art adversarial debiasing methods [14, 15, 16, 17, 18] rely on the same general approach as generative adversarial networks [19]. A generator model is trained to produce new data representations, that are critiqued by an adversary neural network.

The adversary tries to predict the sensitive variable (in

our case, gender) from the produced representation. By training the representations together with an adversary and classifier, they are aimed to be both fair and useful for the task.

This work is motivated by the desire to supply unbiased job recommendations to job seekers. We focus specifically on mitigating gender bias in word embeddings obtained from recruitment texts using adversarial learning. Our work adds to existing research by applying state-of-the-art debiasing [14, 20] to industrial sized free-format recruitment textual data. Firstly, we investigate gender bias in the existing representations and the unfairness it results in. Secondly, we apply two debiasing methods to create new representations. These methods balance multi-label classification to ensure that task-relevant information has been preserved, with an adversarial setup that attempts to remove the efects of gender bias. The resulting new representations are tested in a job recommendation setting where the diference in wage between jobs recommended based on female/male resumes is evaluated.

To summarize, our contributions are three-fold: (i) we representations help achieve fairness and performance augmented corpus, resulting in new representations for on a multi-label classification task; and (iii) to the au- both the resumes and the vacancies. In the remaining text, thors’ best knowledge, we are the first to successfully “original representations” will refer to the representations apply debiased representations to help solve the gender trained on the original texts, whereas “word-substitution wage-gap in a job recommendation setting. Moreover, representations” will refer to the representations trained our implementation of the adversarial debiasing method on the altered texts. is publicly available. Secondly, we applied the adversarial approach as pro

In the next section, our data and methods are described posed by Edwards and Storkey [14]. This method consists in detail. After that, the results are presented. Lastly, of three neural network components: a generator, a clasthese results are discussed together with our final con- sifier, and an adversary. Inspired by Özdenizci et al. [20], clusions and suggestions for future directions. we chose the following the architecture: The generator is a multilayer perceptron with three hidden layers of 128 neurons that outputs a 300 dimensional vector repre2. Data and Method senting the new representations. The classifier and the adversary have one hidden layer of 128 neurons. The 2.1. Data output dimension of the classifier is 21 (industry group The recruitment data set used throughout this research classes), and the output dimension of the adversary is consists of job vacancies and job seeker information pro- one (gender). An architecture schematic is included as vided by DPG Recruitment. Job vacancy information Figure 1. included (i) salary ranges, (ii) working hours, and (iii) The generator creates new representations for the clasanonymised free-format job vacancy texts. In total there sification task, while the adversary attempts to predict a are 12 millions vacancies. sensitive variable gender from these new representations.

Job seeker information consisted of (i) one or more The goal of the generator is to create representations industry group(s) that the job seeker expressed interest that can fool the adversary in such a way that the senin (out of a total of 21 pre-defined groups), (ii) inferred sitive variable can no longer be predicted, while also dichotomous gender, and (iii) anonymised free-format re- obtaining a good performance on the classification task. sume texts. Gender of the job seeker was inferred based The classification task is considered to be a multi-label on first name. From the total of available resumes, en- task of 21 classes, predicting the industry group(s) for tries with missing data (65%) or ambiguous first name each job seeker. This means that the classification loss (3%) were excluded, leaving 904,576 (32%) complete re- should be minimized while the adversarial loss should be sumes with a female to male ratio of 0.93. Anonymisa- maximized. The final loss (Equation 1) of the model is a tion included removal of all names (including company weighted sum of the adversarial loss and the classification names), dates, addresses, telephone numbers, email ad- loss, where Z are the newly generated representations, dresses, websites, and other contact information. A more Y’ are the predictions of the classifier and S’ are the precomplete overview of this data is given in Appendix A. dictions of the adversary:

Both vacancy and resume texts were embedding into 300-dimensional word vector using a word2vec [10] = ( , ′) + ( , ′). (1) model trained on all vacancy texts. Finally, each text was represented as the mean over the embeddings of the words composing the text.

We will call representations created by this method “ad

versarial representations”. Because the adversarial process could be unstable, all results pertaining to these are the mean of 5 independent complete training runs.

2.2. Bias and debiasing Previous research has shown that popular models such as BERT [9] and word2vec [10] can create biased representations [11, 12, 21]. In this work, two debiasing methods were employed to combat this bias.

Firstly, to create a simple baseline, we attempt to debias the representations by replacing gendered words with neutral words. For example, gendered pronouns “she”/“he”, “her”/“his” are replaced with neutral pronouns “they” and “theirs”. Gendered words such as: “woman”/“man”, “girl”/“boy” are replaced with the word “person”. The full list of substitutions can be found in Appendix B. A new word2vec model was trained on this

2.3. Evaluation Classifiers for both industry groups and sensitive variable are evaluated in terms of accuracy and area under the receiver operating characteristic curve (AUC). Fairness was evaluated using statistical parity [22]:

(( ) = 1| = 1) − (( ) = 1| = 0) < . (2)

In the recruitment industry, if a system designed to match resumes and vacancies perpetuates biased associations, it could lead to a wage gap between salaries of

Adversary(128) ˆY (21) ˆS (1)

AUC and an accuracy of 86%; the word-substitution representations result in an 93% AUC and an accuracy of 85%; lastly, the adversarial representations lowered both the accuracy and the AUC to 82%. 3.2. Prediction of industry group Secondly, the information contents and statistical parity

of the three representation types were tested by attemptFigure 1: Architecture of the adversarial setup. The left sec- ing to predict the function group based on resume repretion (green) represents the generator, consisting of an input sentation. Table 1 shows the result obtained in terms of layer (d=300) for the word2vec representations, three hidden performance and statistical parity. layers (d=128), and an output layer (d=300) for the debiased Training a classifier with the original representations representations. The top section (blue) represents the classi- of the resumes obtained a statistical parity of 0.076. The fier consisting of a hidden layer (d=128) and an output layer word-substitution representations obtained similar re ̂ (d=21) encoding the industry groups. The bottom section sults. Using an adversarial approach improved the statis(red) represents the adversary and consists of a hidden layer tical parity by 21%, at the cost of lowering the accuracy (d=128) and an output neuron ̂ (d=1) encoding the sensitive by 2 percentage point and the true positive rate by 16 variable gender. percentage point.

3.3. Salary Association Test

women and men [23]. To specifically test diferences in salary, a salary association test was performed between Thirdly, a salary association test was performed using the representations of the resumes and of the vacancies. the three representation types. Table 2 describes the Using the embeddings of the resumes and the vacancies salary distribution of the female and male groups for the L2 distance matrix was computed and each resume each debiasing method. In the female group there are was matched to the closest vacancy. The salary distri- 4827 samples and in the male group 5173. bution of the matched vacancies of the female-inferred Using the original representations, female-inferred regroup were compared with the male-inferred group. sumes were associated with a mean salary of €25.28 per hour, whereas male-inferred resumes were associated 2.4. Experimental Setup with a mean salary of €26.09 per hour, which is significantly (p<1e−5) higher. This results in an estimated The train split was created by taking 30% of random average annual wage gap of €1680. samples for the validation split, and the rest of the full Using the word-substitution representations, femaledata is used for training. The full data set was not used for inferred resumes were associated with a mean salary of the salary association due to computational limitations. €25.19 per hour, whereas male-inferred resumes were Instead, 10,000 resumes were associated with all jobs from associated with a mean salary of €26.14 per hour. The the time period June 2020–June 2021 that provided salary diference between the means of the female group and the information. This resulted in 23,501 total vacancies. All male group increased, broadening the annual wage gap experiments were conducted using a fixed 70-30% split to €1900 (with a significant diference between groups, and the Adam optimizer with a learning rate of 1e−5. For p<1e−7). all components the binary cross-entropy loss was used. With the adversarial representations, female-inferred Parameters of the final loss (Equation 1) are set in the resumes were associated with a mean salary of €27.06 following way: α = 1 , β = 1. The implementation of the an hour, whereas male-inferred resumes were associated adversarial debiasing method can be found at: https:// with a mean salary of €27.15 an hour. Using the adversargithub.com/ClaraRus/Debias-Embeddings-Recruitment. ial method to generate fair representations for both the resumes and vacancies decreased the mean gap, lowering 3. Results the annual wage gap to €180. This resulted in the female/male diference now being non-significant (p= 0.47).

Table 2 shows the mean salary per hour for each in3.1. Prediction of sensitive variable dustry group. Ideally females and males belonging to the Firstly, the discriminatory power to predict the sensi- same industry group should have similar salaries. The tive variable gender was tested using the original, word- word-substitution representations lowered the wage gap substitution, and adversarial representations. Using the in 13 of the industry groups by €620 per year, while inoriginal representations, gender is predicted with 94% creasing the gap for 7 with an average of €460 per year. For “Financial/Accounting” there is no change in the formance. Since statistical parity balances for equal true salary association. The adversarial method lowered the positive rate, the false positive and negative rates are wage gap in 16 out of the 21 industries by an average of likely to be afected. €2160 per year but it increased the gap in the rest of the Our analysis reveals that ignoring the presence of bias industries by an average of €780 per year. in recruitment texts, that are used to match resumes and vacancies, could lead to severe unwanted discriminatory behaviour. The original representations produced 4. Discussion and Conclusion a wage gap of €1680 per year between the female group and the male group. The adversarial representations This work focused on removing gender bias from word eliminated this wage gap to a statistically insignificant embeddings of vacancy texts and resumes with the goal of diference. This result is especially important, because it creating debiased job recommendations. It showed that shows that the adversarial representations did not just gender can be predicted extremely well from anonymised perform better on selected in-vitro metrics, but also imresume embeddings and that naive resume-to-job recom- proved fairness in a real application. This suggests that mendations based on these embeddings can perpetuate the adversarial representations do not remove bias only the “wage gap” that exists between women and men. “cosmetically” [27], but instead are efective for improvAdversarial debiasing improved statistical parity for in- ing fairness in job recommendation. The adversarial dustry classification based on resume and eliminated the method increased the mean salary for both the female female/male salary diference in job recommendations. group and the male group, with a higher increase for This suggests that adversarial debiasing can help make the female group to balance the gap. This is a positive fairer recommendations in realistic scenarios. outcome as the method did not sacrifice the salaries of

Our results indicate that anonymisation alone is not one of the groups in order to reduce the wage gap. enough to remove indirect information about the gender This work was limited by several factors. Firstly, while of the job seeker. Namely, from our 900 anonymised the fairness of job recommendations was assessed, due resumes, gender could be predicted with an AUC of 0.94. to unavailability of data, the quality of recommendations This exceeds similar results that have been shown in a could not be evaluated. This was mitigated by performing smaller data set (AUC=0.81) [24]. This is a common prob- a related classification task: predicting which industry lem in fairness-aware machine learning, where removal groups a job seeker is interested in based on resume. The of directly sensitive information is undermined by corre- accuracy of 0.89 on this task suggests that salient informalated features that allow the sensitive information to be tion relevant to job placement has been preserved. Howinferred [22]. ever, since the true positive rate was impacted, it seems

The dificultly of removing gender bias from language likely that the recall of the recommendations would be was further illustrated by our data augmentation attempt impacted too. Secondly, the recommender system to sugto substitute a selection of gendered words by neutral gest jobs based on representation distances was relatively words before word2vec training. The resulting embed- simple; if job-to-resume association data were available, dings did not efect much change in any of our tests. a more complex solution might be preferable. Thirdly, Previous work on word substitution data augmentation because gender was inferred for this research, it was has been shown efective [ 25, 26], so it may be that our not possible to include non-binary gender identities [28]. results are limited by the quantity and/or selection of Since this group is vulnerable to employment discrimiour word substitution pairs (Table 4), which were taken nation [29, 30, 31], it should not be overlooked and more from [21]. While it is possible to improve upon our sub- research here is needed. Fourthly, results reported in this stitution pairs, creating a complete list of gendered words research use only word2vec document embeddings; other as used in vacancies and resumes is challenging if not types of embeddings are not considered. Lastly, training unfeasible, especially in multiple languages. of the models was performed using a fixed split instead

In contrast, the adversarial approach improved both of cross-validation, which was infeasible due to time and statistical parity and the wage gap in our data. Using the costs. However, the results are likely to be representative adversarial representations, prediction of gender dropped given the large size of the data set. from an AUC of 0.94 to 0.82 while performance of in- The strengths of our work include the application of addustry group prediction, in terms of accuracy, dropped versarial debiasing for fairness-aware machine learning only minimally (Table 1). However, the true positive on real and large industry data. While adversarial debirate was decreased, indicating that performance was af- asing for fairness is not novel [14, 17, 18, 32, 33], applicafected. These results are linked and can be adapted by tions generally extend to publicly available benchmark changing the and parameters in Equation 1: more data sets that make it dificult to assess its applicability gender-neutral embeddings will likely lead to improved to real-world recommendation systems. Our work is one statistical parity but decreased industry prediction per- of the first to show the results of adversarial fairness in a real, industrial-scale system. In addition, this research tiers of fairness in machine learning, Communiobtained an acceptable trade-of between fairness and cations of the ACM 63 (2020) 82–89. doi:1 0 . 1 1 4 5 / performance for a complex multi-label classification task. 3 3 7 6 8 9 8 .

Finally, this work showed that the adversarial approach [8] M. Ali, P. Sapiezynski, M. Bogen, A. Korolova, eliminated the female/male wage gap in our job recom- A. Mislove, A. Rieke, Discrimination through opmendations, even though it was not trained for this task. timization: How Facebook’s ad delivery can lead

In conclusion, this work identified gender bias in word to biased outcomes, Proceedings of the ACM on representations and salary associations based on recruit- human-computer interaction 3 (2019) 1–30. doi:1 0 . ment industry texts and successfully applied adversarial 1 1 4 5 / 3 3 5 9 3 0 1 . debiasing to combat gender bias in job recommendation. [ 9 ] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: With adversarial representations, the mean female/male Pre-training of deep bidirectional transformers for wage gap was no longer statistically significant due to language understanding, arXiv preprint (2018). being reduced by 89% from €1680 to €180 annually. Our doi:1 0 . 4 8 5 5 0 / a r X i v . 1 8 1 0 . 0 4 8 0 5 . results show that adversarial debiasing of word represen- [10] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efitations can increase real-world fairness of recommenda- cient estimation of word representations in vection systems and thus may contribute to creating fairness- tor space, Proceedings of Workshop at ICLR 2013 aware machine learning systems. (2013). doi:1 0 . 4 8 5 5 0 / a r X i v . 1 3 0 1 . 3 7 8 1 . [11] K. Kurita, N. Vyas, A. Pareek, A. W. Black,

Y. Tsvetkov, Measuring bias in contextualized References word representations, arXiv preprint (2019). doi:1 0 . [ 1 ] A. Köchling, M. C. Wehner, Discriminated by an [12] 4T8. 5B5o0l/uakrbX aisvi.,1K9 .0-6W. 0. 7C3h3 a7n.g, J. Y. Zou, V. Saligrama, algorithm: A systematic review of discrimination A. T. Kalai, Man is to computer programmer as and fairness by algorithmic decision-making in the woman is to homemaker? Debiasing word embedcontext of HR recruitment and HR development, dings, Advances in neural information processing Business Research 13 (2020) 795–848. doi:1 0 . 1 0 0 7 / systems 29 (2016). doi:1 0 . 4 8 5 5 0 / a r X i v . 1 6 0 7 . 0 6 5 2 0 . s 4 0 6 8 5 - 0 2 0 - 0 0 1 3 4 - w . [13] A. Beutel, J. Chen, Z. Zhao, E. H. Chi, Data deci[ 2 ] L. Thijssen, B. Lancee, S. Veit, R. Yemane, Dis- sions and theoretical implications when adversarcrimination against Turkish minorities in Germany ially learning fair representations, arXiv preprint and the Netherlands: field experimental evidence on the efect of diagnostic information on labour [14] (H20.1E7d)w.daorid:1s,0 .A4.8J5.5S0t/oarrkXei yv,. 1C7 e07n.s0o0r0i n75g. representamarket outcomes, Journal of Ethnic and Migration tions with an adversary, in: Y. Bengio, Y. LeCun Studies 47 (2021) 1222–1239. doi:1 0 . 1 0 8 0 / 1 3 6 9 1 8 3 X . (Eds.), 4th International Conference on Learning 2 0 1 9 . 1 6 2 2 7 9 3 . Representations, ICLR 2016, San Juan, Puerto Rico, [3] P. Bisschop, B. ter Weel, J. Zwetsloot, Ethnic em- May 2-4, 2016, Conference Track Proceedings, 2016, ployment gaps of graduates in the Netherlands, De Economist 168 (2020) 577–598. doi:1 0 . 1 0 0 7 / [15] pDp. . X1–u1,4S. .doYiu:1a0n.,4 8L5.5 Z0/haarnXgi,v .X1 5. 1 W1.u0,5 8 9F7 a.irGAN+: s 1 0 6 4 5 - 0 2 0 - 0 9 3 7 5 - w . Achieving fair data generation and classification [4] M. Ramos, L. Thijssen, M. Coenders, Labour mar- through generative adversarial nets, in: 2019 IEEE ket discrimination against Moroccan minorities International Conference on Big Data (Big Data), in the Netherlands and Spain: A cross-national 2019, pp. 1401–1406. doi:1 0 . 1 1 0 9 / B i g D a t a 4 7 0 9 0 . and cross-regional comparison, Journal of Ethnic and Migration Studies 47 (2021) 1261–1284. [16] 2D0.1 M9.a9d0r0a6s3,2E2.. Creager, T. Pitassi, R. Zemel, Learndoi:1 0 . 1 0 8 0 / 1 3 6 9 1 8 3 X . 2 0 1 9 . 1 6 2 2 8 2 4 . ing adversarially fair and transferable represen[5] E. Matteazzi, A. Pailhé, A. Solaz, Part-time employ- tations, in: J. Dy, A. Krause (Eds.), Proceedment, the gender wage gap and the role of wage- ings of the 35th International Conference on Masetting institutions: Evidence from 11 European chine Learning, volume 80 of Proceedings of Macountries, European Journal of Industrial Relations chine Learning Research, PMLR, 2018, pp. 3384–3393. 24 (2018) 221–241. doi:1 0 . 1 1 7 7 / 0 9 5 9 6 8 0 1 1 7 7 3 8 8 5 7 . URL: https://proceedings.mlr.press/v80/madras18a. [6] G. Ciminelli, C. Schwellnus, B. Stadler, Sticky floors or glass ceilings? The role of human capital, work- [17] hCt.mWl.ud,oFi.:W10 u.4, 8X5.5 W0/aanrgX,i Yv.. 1H8u0a2 n.0g6, 3X0.9 X.ie, Fairness– ing time flexibility and discrimination in the gender aware news recommendation with decomposed wage gap, OECD Economics Department Working adversarial learning, in: Proceedings of the Papers (2021). doi:1 0 . 1 7 8 7 / 0 2 e f 3 2 3 5 - e n . AAAI Conference on Artificial Intelligence, vol[7] A. Chouldechova, A. Roth, A snapshot of the fron- ume 35, 2021, pp. 4462–4469. doi:1 0 . 4 8 5 5 0 / a r X i v .

Overall Administration/Secretarial

Automation/Internet Policy/Executive Security/Defence/Police Commercial/Sales Consultancy/Advice Design/Creative/Journalism Management Financial/Accounting Financial services Catering/Retail HR/Training Procurement/Logistics/Transport Legal Customer service/Call centre/Front ofice Marketing/PR/Communications Medical/Healthcare Education/Research/Science Other Production/Operational Technology Male 467173 45293 (9%) 49527 (10%) 40086 (8%) 23134 (4%) 92801 (19%) 69914 (14%) 19279 (4%) 67412 (14%) 34233 (7%) 34342 (7%) 44647 (9%) 26852 (5%) 99429 (21%) 8638 (1%) 20000 (4%) 46832 (10%) 24018 (5%) 38430 (8%) 86749 (18%) 77790 (5%) 102798 (22%) 167585 (38%) 8547 (1%) 33541 (7%) 8821 (2%) 66461 (15%) 42245 (9%) 24839 (5%) 32153 (7%) 25523 (5%) 29882 (6%) 60588 (13%) 53679 (12%) 29677 (6%) 18488 (4%) 71090 (16%) 58598 (13%) 85414 (19%) 66318 (15%) 82728 (18%) 25452 (25%) 9097 (2%) 212878 (23%) 58074 (6%) 73627 (8%) 31955 (3%) 159262 (17%) 112159 (12%) 44118 (4%) 99565 (11%) 59756 (6%) 64224 (7%) 105235 (11%) 80531 (8%) 129106 (14%) 27126 (2%) 91090 (10%) 105430 (11%) 109432 (12%) 104748 (11%) 169477 (18%) 103242 (11%) 111895 (12%)

F-M Ratio 0.93

2 0 0 6 . 1 6 7 4 2 . 1 (Long and Short Papers), Association for Compu[18]

Liu ,

Zhao ,

Zhang ,

Lin ,

Yang ,

Xu , tational Linguistics, Minneapolis, Minnesota, 2019 ,

Lin ,

Fan , Dual constraints and adversarial pp. 609 - 614 . doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 0 6 1 .

learning for fair recommenders , Knowledge-Based [28]

Richards ,

W. P.

Bouman ,

Seal ,

M. J.

Barker , T. O.

Systems 239 ( 2022 ) 108058. doi:1 0 . 1 0 1 6 / j . k n o s y s . Nieder, G.

T'Sjoen, Non-binary or genderqueer

2 0 2 1 . 1 0 8 0 5 8 . genders, International Review of Psychiatry 28 [19]

I. J.

Goodfellow ,

Pouget-Abadie ,

Mirza , B. Xu, ( 2016 ) 95 - 102 . doi:1 0 . 3 1 0 9 / 0 9 5 4 0 2 6 1 . 2 0 1 5 . 1 1 0 6 4 4 6 .

Warde-Farley ,

Ozair ,

Courville ,

Bengio , [29]

Harrison ,

Grant ,

J. L.

Herman , A gender

( 2014 ). doi:1 0 . 4 8 5 5 0 / a r X i v . 1 4 0 6 . 2 6 6 1 . otherwise in the national transgender discrimina [20]

Özdenizci ,

Wang ,

Koike-Akino , D. Erdoğ- tion survey , LGBTQ Public Policy Journal at the

muş , Learning invariant representations from EEG Harvard Kennedy School 2 ( 2012 ) 13 . URL: https:

via adversarial inference , IEEE access 8 ( 2020 ) //escholarship.org/uc/item/2zj46213.

27074- 27085 . doi:1 0 . 1 1 0

/ A C C E S S . 2 0 2 0 . 2 9 7 1 6 0 0 . [30]

Davidson , Gender inequality: Nonbinary trans[21]

Caliskan ,

Bryson ,

Narayanan , Semantics gender people in the workplace , Cogent Social Sci-

derived automatically from language corpora con- ences 2 (

2016 ) 1236511 . doi:1 0 . 1 0 8 0 / 2 3 3 1 1 8 8 6 . 2 0 1 6 .

tain human-like biases , Science 356 ( 2017 ) 183 - 186 . 1 2 3 6 5 1 1 .

doi:1 0 . 1 1 2 6 / s c i e n c e . a a l 4 2 3 0 . [31]

A. A.

Fogarty , L. Zheng, Gender ambiguity in [22]

Mehrabi ,

Morstatter ,

Saxena , K.

Lerman, the workplace: Transgender and gender-diverse

Galstyan , A survey on bias and fairness in discrimination , ABC-CLIO , 2018 . URL: https://

machine learning , ACM Comput. Surv . 54 ( 2021 ). publisher.abc-clio.com/9781440863233/.

doi:1 0 . 1 1 4 5 / 3 4 5 7 6 0 7 . [32]

B. H.

Zhang ,

Lemoine , M. Mitchell, Mitigat[23]

Calanca ,

Sayfullina ,

Minkus , C.

Wagner, ing unwanted biases with adversarial learning,

Malmi , Responsible team players wanted: an in: Proceedings of the 2018 AAAI/ACM Confer-

analysis of soft skill requirements in job advertise- ence on AI, Ethics, and

Society , 2018 , pp. 335 - 340 .

ments , EPJ Data Science 8 ( 2019 ) 1 - 20 . doi:1 0 . 1 1 4 0 / doi:1 0 . 1 1 4 5 / 3 2 7 8 7 2 1 . 3 2 7 8 7 7 9 .

e p j d s / s 1 3 6 8 8 - 0 1 9 - 0 1 9 0 - z . [33]

Sattigeri ,

S. C.

Hofman ,

Chenthamarakshan , [24]

Parasurama ,

Sedoc , Gendered information K. R. Varshney , Fairness

GAN

: Generating datasets

Proceedings , volume 2022 , Academy of Manage- ment 63 ( 2019 ) 3 - 1 . doi:1 0 . 1 1 4 7 / J R D . 2 0 1 9 . 2 9 4 5 5 1 9 .

ment Briarclif

Manor

, NY 10510 , 2022 , p. 17133 .

doi:h t t p s : / / d o i . o r g / 1 0 . 5 4 6

/ A M B P P . 2 0 2 2 . 2 8 5 . [25]

R. Hall

Maudslay ,

Gonen ,

Cotterell ,

Teufel ,

Industry and resumes

Proceedings of the 2019 Conference on Empirical

Methods in Natural Language Processing and the Table 3 shows the distribution of the samples in each

9th International Joint

Conference on Natural Lan- industry group over the whole data set . In more than

2019 , pp. 5267 - 5275 . doi:1 0 . 1 8 6 5 3 / v 1 / D 1 9 - 1 5 3 0 . the fewest samples from the female group . The are in[26]

Pruksachatkun ,

Krishna ,

Dhamala , R. Gupta, dustries where the resumes from the female group are

guistics: ACL-IJCNLP 2021 , 2021 , pp. 3320 - 3331 .

doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1 0 6 . 1 0 8 2 6 . B. Word-substitution debiasing [27]

Gonen , Y. Goldberg,

Lipstick on a pig: Debiasing

ings of the 2019 Conference of the North American Table 4 shows the substitutions of the gendered words