<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Recommender Systems, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Mitigation in Candidate Recom mender Systems with Fairness Gates</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adam Mehdi Arafan</string-name>
          <email>adammehdiarafan@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Graus</string-name>
          <email>david.graus@randstadgroep.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fernando P. Santos</string-name>
          <email>f.p.santos@uva.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emma Beauxis-Aussalet</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Randstad Groep Nederland</institution>
          ,
          <addr-line>Diemen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vrije Universiteit Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>Recommender Systems (RS) have proven successful in a wide variety of domains, and the human resources (HR) domain is no exception. RS proved valuable for recommending candidates for a position, although the ethical implications have recently been identified as high-risk by the European Commission. In this study, we apply RS to match candidates with job requests. The RS pipeline includes two fairness gates at two diferent steps: pre-processing (using GAN-based synthetic candidate generation) and post-processing (with greedily searched candidate re-ranking). While prior research studied fairness at pre- and post-processing steps separately, our approach combines them both in the same pipeline applicable to the HR domain. We show that the combination of gender-balanced synthetic training data with pair re-ranking increased fairness with satisfactory levels of ranking utility. Our findings show that using only the gender-balanced synthetic data for bias mitigation is fairer by a negligible margin when compared to using real data. However, when implemented together with the pair re-ranker, candidate recommendation fairness improved considerably, while maintaining a satisfactory utility score. In contrast, using only the pair re-ranker achieved a similar fairness level, but had a consistently lower utility.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
spite the many benefits of ML-enabled tools, biases can
occur and be amplified through the highly scalable nature
of ML-enabled systems. Algorithms used in applications
such as recidivism prediction, predictive policing, or
facial recognition, have revealed bias towards either race,
gender or both [1, 2]. These biases can also be expressed
through proxy (unobservable) correlations expressed via
sensitive attributes such as gender and poorly defined
decision boundaries [3, 4].
ommender systems (CRS). The goal of such a system is to
recommend the best candidates for a specific job, often
computing ranked lists of candidates in descending order
of relevance. A variety of fairness issues may arise from
the large and diverse pools of candidates and job ofers.</p>
    </sec>
    <sec id="sec-2">
      <title>In the case of the HR industry, bias in recommendations comes with a high risk of harm as candidates can</title>
      <p>RecSys in HR’22: The 2nd Workshop on Recommender Systems for
Human Resources, in conjunction with the 16th ACM Conference on
†Work done while on internship at Randstad Groep Nederland.
pipeline. We aim to close this gap by testing SOTA bias
mitigation methods in both pre- and post-processing,
and observing the impact on the fairness of candidate
ranking. We propose a pipeline for a CRS that integrates [10]. These deeper models, more specifically GANs,
aftwo bias mitigation mechanisms (called Fairness Gates, forded the synthesis of more complex unstructured data
FG) at the pre- and post-processing steps. By FG, we refer such as images and videos. In the context of this thesis
to the enforcement of bias mitigation techniques within project, GANs will be used to generate tabular
(structhe pipeline. The FGs are a synthetic data generator tured) synthetic candidate data.
and a greedy re-ranker. Despite their popularity, GANs are mainly used for
un</p>
      <p>The synthetic data generator enforces gender bal- structured data synthesis tasks such as image and video
ance in the sampling size while the greedy re-ranker synthesis, the generation of synthetic tabular data such
optimizes for both utility (the quality or usefulness of as job candidates is not only uncommon from a domain
candidate recommendations) and gender balance in candi- perspective but also from a technical perspective. This is
date ranking. In this paper, we explore the fairness-utility caused by the dificulty of learning discrete features with
trade-ofs among re-ranked CRS outputs trained using potentially imbalanced classes. A challenge for which Xu
synthetic data or only real data. Therefore, we focus on et al. found a solution by integrating a Gumbel Softmax
exploring what are the impacts and trade-ofs be- (GS) activation function in their   . The GS is based
tween utility and fairness that arise from combining on the Gumbel-Max trick, a common method for discrete
synthetic data generation at pre-processing and greedy approximation [12].
pair re-ranking at a post-processing level. With the ability to generate categorical features, other</p>
      <p>Our experimental results show that the best compro- issues can hinder the tabular candidate synthesis process.
mise between fairness and utility is achieved when com- Issues such as input datasets with mixed distributions (as
bining the two FGs rather than using just one. is the case for our input data) can severely afect
generative performance. For these problems, Xu et al. propose
two solutions: mode-specific normalization for
contin2. Background and Related Work uous column normalization and conditional sampling
to enforce class balancing, both are known problems in
Before presenting the experiments conducted within our discriminatory generative modelling. Therefore,  
novel candidate recommendation pipeline, essential ter- is an ideal generator for the task at hand as it can
balminology needs to be defined alongside the state of the art ance imbalanced datasets and handle mixtures of data
in the (sub)task(s) at hand. More specifically, we will first types. Before outlining the fairness-related work, we
ionutrr ofirdstuFceGs,ybnetfhoereticinctarnoddiudcaitnegsfyanirthneessissawnhdicsphesceirfyveinsgas rbeultaitoen  to both ttoheouarcaCdReSmpicipaenlidnedoamndaidnisgcaups.s its
contrithe relevant techniques used in the CRS pipeline. Finally, Candidate synthesis is uncommon, although fairness
we will conclude with the research gap and a summary research showed successful use of tabular GANs to
generof how the discussed techniques fit in our CRS. ate fair data and more domain-relevant research showed
the use of Gaussian copulas for synthetic candidate
gen2.1. Data Synthesis eration, considerations using   s to support
downstream tasks are rare if not unavailable [5, 13]. In the
synthetic candidate generation domain, van Els et al. is the
unique example in our high risk of harm task. Therefore,
the use of GANs, more specifically   s to generate
candidates will greatly improve the fairness of our CRS
pipeline.</p>
      <p>In fact, as outlined by Xu et al., conditional sampling
will allow us to synthesize balanced training data with
ease which can be used downstream as a fair balanced
basis to train candidate-scoring algorithms and mitigate
bias; the use of conditional sampling alongside reject
sampling (to be introduced in the methodology section) is
how we link candidate synthesis with fairness and
ultimately bias mitigation in our end-to-end CRS
pipeline. Therefore, the use of   s is novel in the
candidate recommendation domain. With the synthetic
pre-processing techniques outlined, we will provide
an outline of the fairness literature, by focusing more
specifically on post-processing methods.</p>
    </sec>
    <sec id="sec-3">
      <title>Originally proposed by Rubin in 1993, the synthetic data</title>
      <p>solution was initially tasked to overcome confidentiality
concerns during surveys [8]. Although confidentiality
issues have become more important with new stricter
European regulations such as the General Data Protection
Regulation (GDPR), the current applications of synthetic
data have also shown their strength in generating fair
and private synthetic data. In fact, synthetic data
applications extend far beyond survey data synthesis, use cases
range from missing data imputation as well as data
augmentation solutions in semi-supervised learning, media
applications with image-to-image translation and finally
image super-resolution [9].</p>
      <p>Data synthesis has evolved from Bayesian
bootstrapping methods and predictive posterior distributions to
deeper techniques such as Autoencoders (AE), Variational
Autoencoders (VAEs), autoregressive models, Boltzmann
machines, deep belief networks, and generative
adversarial networks (GANs) after the advent of deep learning</p>
      <sec id="sec-3-1">
        <title>2.2. Fairness</title>
        <p>With the relevant background and related work on
candidate synthesis introduced, we now proceed further down
our CRS pipeline towards the second FG which will
mitigate bias at the post-processing level, therefore, after
the models are trained on synthetic data to score real
candidates. The scored candidates are then evaluated
according to a relevant fairness metric and re-ranked using
a relevant post-processing technique.</p>
        <p>Currently, multiple fairness metrics exist, each with
their respective strengths and weaknesses. In our case,
we only consider demographic parity, which was defined
by Kusner et al. as:
• Demographic Parity: ”A predictor  ̂ satisfies
demographic parity if P( ̂ | = 0) = P( ̂ | = 1).”
For  representing a sensitive attribute with 
levels.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Many other fairness techniques exist, namely the re</title>
      <p>moval of any sensitive attributes. We stress that simply
removing sensitive attributes is not guaranteed to
remove bias. This process of simply removing protected
attributes is known as fairness through unawareness and
was shown to perpetuate unfairness [14]. In fact, in our
CRS pipeline, we are using the opposite logic to achieve
fairness through awareness by explicitely using gender
to re-rank candidates in the post-processing step.
2.2.1. Fairness in Rankings</p>
    </sec>
    <sec id="sec-5">
      <title>While demographic parity is useful for quantifying fair</title>
      <p>ness, the enforcement of such rules has yet to be defined.
Fairness can be enforced either through a data cleaning
process verifying for class imbalances and the existence
of sensitive (proxy) variables (pre-processing) or
modifying model output post-training with approaches such
as re-ranking (post-processing)[7]. Although we
consider the two approaches in this project, the evaluation of
our model will follow the SOTA post-processing
techniques which are presented below.</p>
      <p>For our CRS pipeline we will use Geyik et al.’s
approach considering it is already used in the HR domain
(the task at hand was the recommendation of candidates
in LinkedIn). Additionally, Geyik et al. achieved SOTA
performance with more than a 4-fold reduction in
unfairness and a reduction in utility of only 6%. From a
research gap perspective, candidate re-ranking is widely
used in the industry and researched in Information
Retrieval literature. However, despite not being novel in
this sub-task, our CRS pipeline fills the research gap by
performing the re-ranking of candidates on synthetically
trained scoring models.</p>
      <p>This is where our end-to-end CRS pipeline contributes
to both the domain and the relevant literature, by testing
how the combination of candidate synthesis for scoring
model training combines with re-ranking methods for a
better bias mitigation end-to-end process. This
combination is novel in both the HR domain and in the
literature for fairness and generative modelling.</p>
      <sec id="sec-5-1">
        <title>2.3. Summary and Research Gap</title>
        <p>The above mini-literature review outlined the diferent
key areas of (candidate) synthesis and fairness
processing techniques. As shown, the combination of multiple
processing techniques within one CRS pipeline has never
been attempted. Therefore, our pipeline is presented
as a combination of the presented related work and it
will be evaluated based on the output of the candidate
rankings. For the evaluation, we will not be comparing
our CRS pipeline’s   to Xu et al. nor will we be
comparing our re-ranker to Geyik et al. as we are using
drastically diferent datasets. Instead we will be
developing our own evaluation framework for the candidate
data at hand which we will outline in section 3.</p>
        <p>The goal of this section was to provide a high-level
overview of the literature and techniques used all while
exposing the academic gap where our pipeline resides. In
the following section, we use the provided background
to introduce our experiments with in-depth technical
detail and apply the SOTA related work to the candidate
recommendation problem with our novel CRS pipeline.</p>
        <sec id="sec-5-1-1">
          <title>3. Methodology</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Our CRS follows a point-wise learning to rank approach,</title>
      <p>where for a given job  , we fetch and rank candidates
 , much like given a query, the goal is to rank
documents in the traditional document retrieval scenario. In
other words, our recommender system predicts relevance
scores  ,̂ given the candidate and job features  , .</p>
      <p>We use real data from an international HR company.
For training purposes, the candidate features   are
associated with a ground truth label  , where  , = 1 if the
candidate  has been recruited or shortlisted for a job  ,
and 0 otherwise.</p>
      <p>The data used for training is of a structured nature,
spanning real-valued, categorical, and binary features.
Features correspond to candidate features (e.g., job
seekers’ preferences such as minimum salary, preferred
working hours, or maximum travel distance, in addition to data
related to their work experience or level of education).
Job features (e.g., industry of the company, company size,
geographical location), and finally candidate-job features
that represent their overlap (e.g., geographical distance
between candidate and job, or a binary feature indicating
whether candidate has worked in job’s industry before).
Much in the same vein that query, document, and
query</p>
      <sec id="sec-6-1">
        <title>3.2. Candidate scoring and re-ranking</title>
        <p>document features are designed in a traditional learning
to rank for information retrieval-scenario.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>We trained CRS models to score candidates  by estimat</title>
      <p>ing their relevance score  ̂ for the jobs  . We trained a
3.1. Gender balance and synthetic data total of 10 CRS models, using real or synthetic job
candiImbalanced data is very common in CRSs, and we focus dates as training data (5 datasets each respectively). The
on gender imbalance for our case, which is common in jobs for which candidates are scored remain those of the
the job market. To efectively study the issue of imbal- real data, more specifically, the real holdout test data.
ance, we construct various explicitly (im)balanced sce- We tested the CRS models with their respective
holdnarios through a rejection sampling algorithm based on out test sets, comprising real data with the same gender
John V. Neumannn’s technique [15]. We first sampled re- balance. For each test set, we scored candidates using
balanced subsets of the original training data,considering either the CRS trained with synthetic data or with real
gender as the sensitive attribute  . We only considered 2 data (of the same gender balance), i.e., we use 2 CRS
genders (female, male) as unfortunately our dataset does models per each of the 5 test sets, and thus obtain a total
not contain enough samples of non-binary genders. of 10 sets of scores. After scoring candidates we rank</p>
      <p>To construct our (im)balanced subsets, we randomly candidates by descending order of relevance scores, and
sampled job candidates from each job request  with a obtain 10 sets of rankings.
constrained proportion of candidates from each gender. After the candidates are scored and ranked, we
inWe generated two datasets with heavy imbalance troduce our second Fairness Gate (FG) at the
post(one with 20% of female candidates, one with 20% of processing level of the CRS pipeline. This FG aims to
males); two datasets with minor imbalance (one with improve the fairness of candidate ranking by using a
45% of female candidates, one with 45% of males); re-ranking algorithm that interleaves males and females
and a balanced dataset (with 50% of male and female equally at the top ranks (e.g., Figure 2). For our
expericandidates). For each training dataset, 10% of the data mental CRS pipeline, we reused the re-ranking algorithm
points were kept as a held-out test set. To avoid data from Geyik et al. [7], and obtained 10 sets of re-rankings
leakage, all job requests  were unique to the test set. (Figure 1).</p>
      <p>The test dataset sizes in number of unique &lt; ,  &gt; -pairs
after rejection sampling are shown in Table 1. 3.3. Metrics and Evaluation
Test Data
heavy imbalance (20% males)
heavy imbalance (20% females)
minor imbalance (45% males)
minor imbalance (45% females)
balanced</p>
    </sec>
    <sec id="sec-8">
      <title>We trained 5 synthetic data models, using each re</title>
      <p>balanced dataset as training data for the CTGAN
algorithm [11]. We were able to generate balanced synthetic
data using the models’ conditional sampling parameters.
We generated balanced synthetic data where each gender
represents 50% of the dataset, for both positive ( , = 1)
and negative ( , = 0) examples.</p>
      <p>The synthetic data generation is our first fairness
gate (FG) in the CRS pipeline. This FG aims to improve
the fairness of candidate scoring  ,̂ by training the CRS
on balanced data. The full overview of the experimental
pipeline is shown in Figure 1.</p>
    </sec>
    <sec id="sec-9">
      <title>The impact of the re-ranking is evaluated in terms of</title>
      <p>utility using Normalised Discounted Cumulative Gain
(  ), a common ranking metric to maximise [16]. To
measure the impact of the re-ranking, we compared the
  scores before re-ranking (by considering the
initial ranking as the ideal ranking) and after re-ranking.
A lower   score means re-ranking had a negative
impact on the original rankings. A higher   score
means re-ranking had less impact. As we are considering
the impact of the ranking, the   score was
calculated after ranking, hence the appearance of only one
score. Therefore, we used the   as a single impact
metric. The original predicted ranks were used as ground
truth (ideal ranking) which was measured against the
re-ranked candidates. To ensure the ideal ranks are valid,
we have used common classification metrics such as F1
and AUC.</p>
      <p>In terms of fairness, we used    (normalized
discounted cumulative Kullback-Leibler divergence), a
distance metric comparing distribution dissimilarity, such
as rank distributions [7].</p>
      <p>Here,    calculates the dissimilarity between the
distributions of males and females, especially at the top
ranks. We consider that demographic parity is achieved
when the rank distributions of males and females are
similar (i.e.,    = 0 ).</p>
      <sec id="sec-9-1">
        <title>4. Results and Analysis</title>
        <p>the increase in utility is almost two-fold (+45%).</p>
        <p>The    diference is very small between CRS
modWe present the results of the CRS that include one, two, els trained with real or synthetic datasets, and shows
or none of our Fairness Gates (FG): re-balancing the train- a negligible improvement of fairness. These results
ing set with synthetic data (1st FG), and re-ranking the show that using balanced synthetic data to train
job candidates (2nd FG). We consider 3 levels of data im- CRS mnodels (1st FG) considerably improved utility
balance, and summarise the NDCG and NDKL for each (  ) while maintaining the same level of
fairlevel in Table 2. ness (   ).</p>
        <p>The   diference is noticeable between CRS mod- The    decreases before and after ranking
els trained with real or synthetic datasets (i.e., between (i.e., last two columns in Table 2), showing that the
pairs of rows in Table 2). For the heavy imbalance case,</p>
        <p>We also explored the score distributions for male and
female candidates. Those attributed by CRS models
trained with real data are unevenly skewed toward the
left, even in cases where the real data is balanced (
balanced dataset ). However, for CRS models trained with
synthetic data, the score distributions of both genders
shift more to the right, creating a more
normallyshaped score distribution across both studied
genders.
5. Discussion
rank distributions of male and female candidates are Despite the promising results shown in section 4, our
more similar after re-ranking. The decrease is of CRS pipeline has shown some pitfalls. More specifically,
similar magnitude for each level of data imbalance, i.e., the computation of   using ranked candidates as
whether the CRS model is trained with real or synthetic ground truth and only evaluating the re-ranked
performance can come with additional validity issues. However, satisfactory results. The goal was to build a
recommenit should be noted that these validity issues can be easily dation pipeline using both real and synthetic data to be
averted by adding another   calculation evaluating able to experiment with fair processing techniques and
also non-re-ranked candidates against a ground truth as a result, mitigate bias in candidate recommendations.
constructed from another holdout set for example. From this perspective, the double fair-gated CRS pipeline</p>
        <p>Additionally, supplementary validation methods could was successfully built and the generation of synthetic
have been considered. For instance, it could have been candidates was successful, valid and accurate throughout
beneficial to use future  , not included in the data, in the pipeline.
further evaluations. Statistical tests could have also been The generated data has shown to be accurate on all
conducted, while other user-based approaches, such as (im)balance levels, validating the expectations on
modean evaluation with recruiters, could have contributed to specific normalization and conditional sampling in
CTreinforce the validity of this project. These extra valida- GANs, while also demonstrating the benefits of rejection
tion steps should be implemented before deploying the sampling methods in re-balancing imbalanced data and
fairness mechanisms proposed using the synthetic candidates generated from it to score</p>
        <p>Furthermore, some findings were unexplainable with real (im)balanced test subsets fairly. From a fairness
perthe current analysis. For instance, the    scores for spective, it was also shown how scorers trained on
synCRSs trained on real minor imbalanced datasets are lower thetic candidates outperform scorers trained on balanced
than those trained on real balanced datasets, which also real data from a utilitarian perspective.
applies after re-ranking. Although the scores vary by Although the issues outlined in section 5 concerning
a small margin, such behaviour is dificult to explain the lack of measurement of pre-re-ranked utility raise
considering the complexity of our pipeline, rendering some minor validity concerns, the evidence shows how
de-bugging tasks equally complex. synthetically-trained CRSs provide fair, useful
can</p>
        <p>Additional unexplainable results are also visible on the didate recommendations when integrated in such a
synthetic to real comparison with CRSs trained on syn- pipeline.
thetic datasets such as heavy imbalance showing more
unfairness by a small margin when compared to
realtrained counterparts. These unexplainable findings be- 7. Future Work
tween real and synthetic subsets are even more puzzling
considering, figure 3 shows more balanced scoring for In future work, the recommendations shared in the
disall synthetically-trained CRSs which should result in a cussion can be considered. More specifically, the use of
lowFienra  lly, the sicmoprelebmeefonrteatrioe-nraonfkdienmg.ographic parity to eavdadliutiaotnioanl euvsainlugartieocnrumiteetrhsoodrsthweituhsehuomffauntu-irne-trheeq-uloesotps
enforce equal proportions between genders oversimpli- to test the CRS pipeline.
ifes the complexity of the candidate hiring landscape. Additionally, future researchers should also consider
This oversimplification can be resolved in future research the use of less data-greedy rejection sampling techniques
with a lesser degree of generalizability. Future research as we have lost more than 80% the amount of the
holdcan be more specific by adjusting fairness rules to the out information we had at the start of the pipeline. This
domain of the job request  . For instance, certain jobs can either be resolved with more elegant rejection
samsuch as security personnel can show real-world skewness pling constraints, the use of larger datasets or
datatowards a certain gender. A future CRS pipeline needs augmentation techniques through synthetic data for
into adjust its fairness rules at  level. stance. The latter could have been considered in this</p>
        <p>Despite these limitations and suggestions for future project if it was within the scope of our research.
work, overall, our research successfully showed that the Finally, with a solved data scarcity problem future
combination of synthetic data and re-ranking was a com- researchers can consider the discussed domain-adjustable
bination contributing to both fairness and utility even fairness rules for more specific fairness constraints to
when compared to CRSs trained on real balanced data overcome real-world skewness.
such as the balanced dataset. Therefore, as expected, a
combination of pre-processing and post-processing FGs 8. Acknowledgements
proved to be useful.</p>
      </sec>
      <sec id="sec-9-2">
        <title>6. Conclusion</title>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>The goal of our CRS pipeline was never to produce SOTA synthetic candidates and recommendations, despite our</title>
    </sec>
    <sec id="sec-11">
      <title>We acknowledge the University of Amsterdam - Master programme Information Studies for creating the conditions to perform this research and for financially supporting this publication.</title>
      <p>ing fairness assessments with synthetic data: a
practical use case with a recommender system for
hu[1] A. Chouldechova, Fair prediction with dis- man resources, 2022.</p>
      <p>parate impact: A study of bias in recidi- [14] M. J. Kusner, J. R. Loftus, C. Russell, R. Silva,
Counvism prediction instruments, Big Data 5 terfactual fairness, 2018. arXiv:1703.06856.
(2017) 153–163. URL: https://doi.org/10.1089/ [15] J. Neumann, Various techniques used in connection
big.2016.0047. doi:10.1089/big.2016.0047. with random digits, National Bureau of Standards,
arXiv:https://doi.org/10.1089/big.2016.0047, Applied Math Series 12 (1951) 768–770.
pMID: 28632438. [16] K. Järvelin, J. Kekäläinen, Cumulated gain-based
[2] J. Buolamwini, T. Gebru, Gender shades: Intersec- evaluation of ir techniques, ACM Transactions on
tional accuracy disparities in commercial gender Information Systems (TOIS) 20 (2002) 422–446.
classification, in: S. A. Friedler, C. Wilson (Eds.),
Proceedings of the 1st Conference on Fairness,
Accountability and Transparency, volume 81 of
Proceedings of Machine Learning Research, PMLR, 2018,
pp. 77–91. URL: https://proceedings.mlr.press/v81/
buolamwini18a.html.
[3] S. Hajian, J. Domingo-Ferrer, A methodology
for direct and indirect discrimination prevention
in data mining, IEEE Transactions on
Knowledge and Data Engineering 25 (2013) 1445–1459.</p>
      <p>doi:10.1109/TKDE.2012.72.
[4] A. Prince, D. Schwarcz, Proxy discrimination in the
age of artificial intelligence and big data, Iowa Law
Review 105 (2020) 1257–1318. Publisher Copyright:
© 2020 University of Iowa. All rights reserved.
[5] A. Rajabi, O. O. Garibay, Tabfairgan: Fair tabular
data generation with generative adversarial
networks, arXiv preprint arXiv:2109.00666 (2021).
[6] Y. Li, H. Chen, S. Xu, Y. Ge, Y. Zhang, Towards
personalized fairness based on causal notion, CoRR
abs/2105.09829 (2021). URL: https://arxiv.org/abs/
2105.09829. arXiv:2105.09829.
[7] S. C. Geyik, S. Ambler, K. Kenthapadi,
Fairnessaware ranking in search &amp; recommendation
systems with application to linkedin talent search,
2019. URL: https://doi.org/10.1145/3292500.3330691.</p>
      <p>doi:10.1145/3292500.3330691.
[8] D. B. Rubin, Discussion statistical disclosure
limita</p>
      <p>tion, Journal of Oficial Statistics 9 (1993) 461–468.
[9] I. Goodfellow, Nips 2016 tutorial: Generative
adversarial networks, 2017. URL: https://arxiv.org/abs/
1701.00160. doi:10.48550/ARXIV.1701.00160.
[10] A. C. Ian GoodFellow, Yoshua Bengio, Deep
Learning, 1st ed., MIT Press, Cambridge, Massachusetts,</p>
      <p>United States, 2016.
[11] L. Xu, M. Skoularidou, A. Cuesta-Infante, K.
Veeramachaneni, Modeling tabular data using
conditional gan, 2019. URL: https://arxiv.org/abs/1907.</p>
      <p>00503. doi:10.48550/ARXIV.1907.00503.
[12] E. Jang, S. Gu, B. Poole, Categorical
reparameterization with gumbel-softmax, 2016. URL: https:
//arxiv.org/abs/1611.01144. doi:10.48550/ARXIV.</p>
      <p>1611.01144.
[13] S.-J. van Els, D. Graus, E. BeauxisAussalet,
Improv</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>