<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fairness in job recommendations: estimating, explaining, and reducing gender gaps⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guillaume Bied</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christophe Gaillac</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Morgane Hofmann</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippe Caillou</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bruno Crépon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Solal Nathan</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michèle Sebag</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre de Recherche en Economie et Statistique (CREST)</institution>
          ,
          <addr-line>Palaiseau</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Laboratoire Interdisciplinaire des Sciences du Numérique (LISN)</institution>
          ,
          <addr-line>Orsay</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Nufield College and Oxford University</institution>
          ,
          <addr-line>Oxford</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Pôle emploi</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Algorithmic recommendations of job ads have the potential to reduce frictional unemployment, but raise concerns about fairness due to biases in past data. Our research investigates the issue of algorithmic fairness with a specific focus on gender in a hybrid job recommendation system developed in partnership with the French Public Employment Service (PES), which is trained on past hires. First, by viewing job ads as a set of characteristics (such as wage and contract type), we document how the algorithm treats job seekers diferently based on gender, both unconditionally and conditionally on their search parameters and qualifications. Second, we discuss the notion(s) of algorithmic fairness applicable in this context and the trade-ofs involved. We show that the considered system reflects some existing diferences in hiring or applications but does not exacerbate them. Finally, we consider adversarial de-biasing technique as a practical tool to demonstrate the trade-ofs between recall and reduced diferentiated treatment.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fairness</kwd>
        <kwd>Job recommender systems</kwd>
        <kwd>Adversarial de-biasing</kwd>
        <kwd>Gender gaps</kwd>
        <kwd>Human ressources</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        At the core of e-business, recommender systems leverage past data to help users locate relevant
items among large amounts of possible ones that would be costly to explore otherwise. Since an
important part of unemployment can be explained by informational frictions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], including the
costs of acquiring information and cognitive limitations, recommender systems could improve
matching on the labor market. As labor market outcomes shape livelihoods, social positions
and individual identities, helping job seekers find the right jobs matters.
      </p>
      <p>
        Yet job recommender systems are also a textbook case of fairness issues in machine learning
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Algorithms trained on real-world data, which involve human biases and discriminatory
practices, may reproduce, or even increase, past undesirable behavior such as gender stereotypes,
and widen labor market inequalities. Ensuring this does not happen is a major concern for the
scientific community, Public Employment Services as well as for all citizens.
      </p>
      <p>
        This paper investigates the issue of gender fairness within the context of the audit of a
recommender system called MUlti-head Sparse E-recruitment (MUSE hereafter) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], developed
in partnership with the French Public Employment service (PES). MUSE leverages extensive
data about job seekers’ and job ads’ characteristics and learns from past hiring patterns. Our
contributions are threefold. Firstly, we discuss the appropriate notion of algorithmic fairness
that should be adopted in the PES setting. Gender disparities in hirings, viewed in terms of
job characteristics, such as occupation, distance, wage, full or part-time status can arise from
diferentiated application choices arising from job seekers’ preferences. The algorithm’s
replication of this behavior appears justified in maximizing users’ welfare (see the related individual,
envy-freeness, and preference-based notions of fairness respectively in [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]). However,
these gaps can also arise from diferential valuations of inherent job seeker’s characteristics by
recruiters based on gender, which can be seen as discriminatory or unfair. Secondly, we propose
to disentangle the impact of job search fundamentals (search parameters and qualifications)
from other job seekers’ characteristics in explaining observed gaps by using double machine
learning [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We analyse job ad recommendations and document gender disparities both
unconditionally and conditionally on job search fundamentals, showing that these standalone do
not fully account for the observed gender gaps. Nevertheless, the system does not exacerbate
existing diferences in hiring or applications. This discussion brings forth a tension between a
PES’s missions and values: providing optimal person-dependent recommendations regarding
access to employment while ensuring fair treatment between women and men. Finally, we
illustrate this trade-of by developing an adversarial de-biasing approach [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] aiming at making
recommendations gender-blind. Although this approach reduces diferential treatment, it also
leads to an overall performance loss and reduction in access to employment, which is more
pronounced for women.
      </p>
      <p>
        The rest of the paper is structured as follows. Section 2 describes the data and the MUSE
algorithm. Section 3 proposes to leverage the Double Machine Learning method (DML hereafter)
[
        <xref ref-type="bibr" rid="ref10 ref7 ref9">7, 9, 10</xref>
        ] to make inference on the efect of gender on the recommendations, while controlling
for the channel of the job search fundamentals. Section 4 audits the algorithm in terms of
recommendation performance, provides evidence of diferentiated treatment, and compares
these diferences to those found in hiring and application behavior. Section 5 introduces
adversarial techniques to reduce recommendation reliance on gender, and documents their
impact on performance metrics and diferentiated treatment. Section 6 concludes and provides
perspectives for further work. Appendix D contains a simple model explaining the diferent
potential sources of diferential treatment and relating them to gender inequalities in observed
applications and hires.
      </p>
      <p>
        Related work. Fairness in the context of recommender systems draws an increasing amount
of work, surveyed by [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ]. Depending on the application domain, fairness issues may
arise w.r.t. items (sharing users’ attention in an equitable way), w.r.t. users (presenting a fair
selection of items to the users), or both [
        <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 15, 16</xref>
        ]. In the present work, we focus on user
fairness.
      </p>
      <p>
        Some approaches to user fairness question whether recommendations are equally relevant for
diferent groups of users in terms of standard metrics such as recall or NDCG. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] audits search
engines for diferential satisfaction between demographics. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] extends this investigation to
several public recommendation datasets, discussing whether diferent groups of users (in terms
of age or gender) retrieve the same utility from recommendations based on standard metrics.
Such diferences may be due to class imbalance , which may lead a recommender system to
better capture the interaction patterns of a majority group in a collaborative filtering setting [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] measure fairness both in terms of diferentiated values of predicted ratings conditionally
on characteristics, as well as wrt prediction errors between genders.
      </p>
      <p>
        Other works emphasize the trade-of between recommendation performance and other
fairness measures. Among them, [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] approach the problem of collaborative filtering under the
lenses of a notion of neutrality akin to demographic parity: recommendations should not vary
according to a user-specified viewpoint such as gender. However, with labor market applications
in mind, [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] argue that such metrics possibly ignore some legitimate links between gender
and preferences. In a labor market context, [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] is concerned with occupation recommendation
while reducing the gender wage gap. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] conduct a correspondence study of several Chinese
job boards, demonstrating that some profiles are recommended diferent job ads depending on
whether they are labelled women or men, thus showing a significant causal impact of gender.
      </p>
      <p>
        Finally, several approaches exist to prevent fairness issues: pre-processing, in-processing
and post-processing. Adversarial in-processing methods, initially proposed in the classification
setting [
        <xref ref-type="bibr" rid="ref24 ref25 ref8">8, 24, 25</xref>
        ], attempt to decorrelate neural representations with gender. The approach has
been proposed for neural recommenders in a labor market setting [
        <xref ref-type="bibr" rid="ref20 ref22 ref26">22, 20, 26</xref>
        ] with diferent
motivations and notions of fairness in mind.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Experimental setting</title>
      <p>Overview of the data The proprietary dataset provided by the French PES contains
characteristics of job ads and registered job seekers, as well as their interactions, from 2019 to mid-2022
in the Auvergne-Rhône-Alpes region.</p>
      <p>The -th job seeker’s characteristics, represented as a vector  ∈ R483 after pre-processing,
include job search criteria, labor market profile information, and administrative data (see
Appendix B for more details). Within , job seekers’ search fundamentals (search criteria and
qualifications, denoted ) include desired wage, occupation, geographic location and accepted
mobility, search for a full-time or part-time job, qualification level of the desired position, and
accepted working hours.</p>
      <p>
        Overall, the labor market profile information in  includes experience, hard and soft skills
provided in the PES’s ontology, possession of a driver’s license, educational achievements,
textual data (CV, description of past work experience), and administrative data (number of past
unemployment spells, reasons for registration, and the type of follow-up provided by the PES).
Skills and textual descriptions are each reduced by singular value decomposition [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. It is
emphasized that job seekers’ gender is available as a binary variable, although it is not provided
to the recommender system.
      </p>
      <p>Similarly, the -th job ad is represented by vector  ∈ R469 after pre-processing. Available
features include lower and upper bounds for the ofered wage, workplace postcode, desired
skills, requirements in terms of education, contract type, working hours, and textual descriptions
of the firm and position. Textual information and skills are also reduced by singular value
decomposition. We also observe whether a job seeker  applied to a job ad , and whether he
or she was hired on that position. The train and test set cover 1.2 million job seekers and 2.2
million job ads. The 285,992 observed hires are split between train and test on a weekly basis:
85% of weeks are assigned to the train set (representing 241,715 hires), and the rest to the test
set (44,277 hires).</p>
      <p>Datasets used for the analysis The algorithm’s recommendations will be studied using
several distinct datasets.</p>
      <p>
        To study gender gaps conditional on job seeker search fundamentals (more in section 3), we
restrict the analysis to men and women that cannot be perfectly distinguished on the basis of
their characteristics, following the overlap / common support assumption [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. More precisely, if
individuals’ gender could be accurately predicted on the basis of characteristics, one could hardly
disentangle the impact of such characteristics and that of gender on the recommendations.
      </p>
      <p>
        The population with common support is selected as follows. The prediction of gender is
achieved using random forest [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] considering selected features including education, desired
wage, experience, geographic location, desired contract type, occupation, level of qualification,
search for part-time job, accepted mobility. The learned classifier, referred to as propensity score,
with accuracy circa 88% is used to select the job seekers in the common support, retaining
individuals with propensity score in [0.05, .95].
      </p>
      <p>To study recommendations issued to all job seekers at a given point in time, we consider
all job seekers registered during a randomly chosen week of the test set (the fourteenth ISO
week of 2022). In order to measure recommendation performance, and to contrast diferentiated
treatment by the algorithm with diferences observed in hiring behavior, we also consider
recommendations to all job seekers which are hired during the test weeks. To study application
behavior, we consider the average characteristics (all weeks pooled together) of the applications
of job seekers for which hires are observed in the test set (we observe 169,325 such applications
after restriction to the common support).</p>
      <p>The sizes, compositions in terms of gender, and size after restriction to job seekers in the
common support, of the datasets of interest are reported in Table 4 in Appendix.</p>
      <sec id="sec-2-1">
        <title>2.1. Algorithm</title>
        <p>
          The algorithm MUSE is briefly described for the sake of self-containedness, referring the reader
to [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for a more comprehensive presentation.
        </p>
        <p>Architecture MUSE is a two-tier hybrid job recommender system, designed to address
the sparsity and cold start issues inherent to the job recommendation setting, and to meet
computational requirements. It is trained on hiring data. Hires, rather than other type of
interactions, are chosen as training labels since they indicate strong mutual interest of the job
seeker and recruiter.</p>
        <p>The first tier of the algorithm aims at retrieving a subset of 1,000 job ads (to be re-ranked by
the second tier) eficiently. It is a two-tower model, trained with a triplet margin loss, which
constructs embeddings for job seekers and job ads based on their contextual information 
and  . It correctly keeps 82.25% of matches in the test set among its top-1,000 selection. In the
following, we take this first stage operation as given, discarding all job ads but those ranked
among the top 1,000 for each job seeker.</p>
        <p>The second tier of the algorithm takes as input: the job seeker’s description ; the job ad’s
description w.r.t. -th job seeker, noted  , formed of the job ad description  concatenated
with the score and rank of associated to job ad  for  by the first tier of the algorithm, and
the distance in kilometers between  and . Two embeddings respectively denoted  and  are
learned on the top of  and , ; with  formed as the concatenation of these embeddings and
their element-wise product ( = [(),  ( ), () ⊙  ( )]). The recommendation
score ^ is learned as a standard neural net on the top of  :</p>
        <p>̂︀ =  ( )
where  is a one-hidden layer feedforward neural network parameterized by  . Model
parameters are learned end-to-end with a cross-entropy loss:
min  := ∑︁  log(̂︀ ) + (1 −  ) log(1 − ̂︀ ),
,, ,
where  is 1 if  hired  and 0 otherwise. In practice, negative examples (pairs which are not
matches) are sampled uniformly at random within the first tier’s top-1,000 selection. To issue
recommendations, job ads are ranked by decreasing ̂︀ .</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Measuring the efect of gender on recommendations controlling for the preferences</title>
      <sec id="sec-3-1">
        <title>3.1. Measures of interest</title>
        <p>We seek to measure how the algorithm’s recommendation performance varies between men
and women, but also how the characteristics of the job ads depend on gender, unconditionally
and conditionally on job search fundamentals.</p>
        <p>Recommendation performance will be measured by the recall@, defined as the share of
hires correctly ranked among the algorithm’s top  recommendations in the test set.
Characteristics of recommended jobs We study gendered diferences in terms of the
following characteristics of the top recommended job ad: 1) The logarithm of the ad’s wage; 2)
The distance in kilometers of the job’s workplace to the job seeker’s zip code; 3) Whether the
job ad corresponds to an executive position in the company; 4) Whether the contract is defined
for an indefinite duration or not; 5) The number of hours worked per week; 6) Whether the
share of women among job seekers searching for a job in the occupation is less than 20%.</p>
        <p>We also consider an aggregate indicator of the fit between the job seeker’s search criteria and
the recommended job,1 defined as an average of five binary indicators describing the fit w.r.t.
to the job seeker’s i) accepted geographic mobility; ii) desired type of occupation; iii) desired
wage; iv) desired type of contract; v) desired working hours.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Methodology</title>
        <p>
          Parameters of interest. We seek to document whether diferent jobs are recommended
to women and men on average and conditionally on their job search fundamentals (search
parameters and qualifications). Previous studies in economics [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] have documented gendered
preferences for commuting time, contract type and wage. These preferences are reflected in
jobseekers’ job search parameters and may partially explain observed gender dissimilarities in
recommendations. However, the diferent job search parameters might not be the only ones
contributing to the diferences in recommendations: the study will try to identify whether other
gendered features have an impact on recommendations. Our method disentangles disparities
due to diferent job search fundamentals from those due to other characteristics and their
valuation by the algorithm.
        </p>
        <p>If disparities due to preferences are potentially justifiable from the users’ perspective, the
other ones could be considered as a sign of unfair algorithmic treatment.</p>
        <p>In the following, the covariate  stands for the whole set of variables describing the job
seekers, and used by the recommendation system; it includes information on past employment
history, demographics (e.g number of children), and self description in the text of the resume. The
control  stands for the variables describing the job search fundamentals (job search parameters
and qualifications, detailed in Appendix B;  ⊂ ). The outcome  of the recommendation
system includes the set of variables describing the recommended job ad (job type, wage, whether
the job is part-time or full-time) and cross-features (distance between the locations of the job
seeker and the job ad, fit w.r.t. the job seeker’s aggregated search criteria).</p>
        <p>The question of gender-related bias arises when men and women with same search
fundamentals  are recommended substantially diferent job ads (diferent outcomes  ): even though
the system has no direct access to the gender , it might value the characteristics in  −  in
a gender-biased way.</p>
        <p>To assess this potential diferential treatment, we focus on two quantities separately. First,
we consider the naive average characteristics  of the recommended ofers:</p>
        <p>
          = E[ | = 1] − E[ | = 0],
 = 1 and  = 0 denoting respectively women and men hereafter. This parameter can simply
be estimated by taking diference in means. Following our discussion, it is questionable whether
it is the role of a (fair) recommender system to directly disregard job search fundamentals
. Our parameter of interest is thus the gender related gap  in one recommended job ads
characteristic  , while controlling for the efects of . Taking inspiration of [
          <xref ref-type="bibr" rid="ref9">31, 9</xref>
          ] regarding
1This indicator is inspired from the proprietary PES indicator used for querying job ads.
the estimation of the gender wage, we thus consider the following model:2
 =  0() +   + ,
        </p>
        <p>E(|, ) = 0,
(1)
where  0() := E( | = 0,  = ) are the expected characteristics of jobs for men with
preferences , and  is a noise variable. To be able to identify  , we also make the standard
assumption of common support, stating that there exists both men and women sharing all types
of search parameters , i.e., for all , there exists  &gt; 0, s.t () := P( = 1| = ) ∈ [, 1 −  ].</p>
        <p>
          Let us give more intuition about the interpretation of the efect  in our context of the impact
of gender on recommendations. Consider a linear specification of the efect of the diferent job
search parameters on the recommendations in (1), i.e.,  0() = ′ 0. Here, denoting by  1 and
 0 the coeficients of the regression of  on  for women and men respectively, we obtain the
Oaxaca decomposition, used in the literature on gender wage gap [
          <xref ref-type="bibr" rid="ref9">31, 9</xref>
          ], of the average efect:
 =  0′(E(| = 1) − E(| = 0)) + ( 1 −  0)′E(| = 1),
        </p>
        <p>⏟ Explained efe⏞ct by  ⏟ =, unexpla⏞ined efect
where  is the residual of the average gender diference  that cannot be explained by .</p>
        <p>Estimation of the gender gap  is performed using the double machine learning method
(DML) [see, e.g., 7, 10]. This methods provides an estimator of  which is asymptotically normal,
robust to the preliminary estimation of other nuisance parameters, () := E( |) and the
propensity score () := P( = 1|) using diferent machine learning estimators. Details are
given in Appendix C.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>In all the tables presented hereafter in this section, the column “p-value" presents the p-value
indicating the significance of the measure reported in the adjacent left column. Results presented
in this section use random forest estimators for functions  and . However, our results are
not sensitive to the choice of the estimator as shown in Appendix E.</p>
      <sec id="sec-4-1">
        <title>4.1. Recommendation performance is higher for women</title>
        <p>We fist report the recall @ for all hires in the test set, as well as for male and female job
seekers separately. For instance, the algorithm correctly ranks within its top 20 (resp. top 50)
recommendations the job ad on which a job seeker was hired in 35% (resp. 49%) of cases. This
success rate is 33.3% for men (resp. 47.5), and 36.6% for women (resp. 50%), with a statistically
significant diference (more on Table 5, Appendix). More generally, we find the recall@  to
be higher for women than for men at all values of  considered. While the magnitude of the
diference is limited, it is statistically significant. The observed higher performance of the
algorithm for women could be explained by the importance given by the model to the distance
criterion. Women assign greater value to proximity when searching for a job, see Table 2 on
hires and applications, which could make their job choices easier to predict.
2The presented methodology follows the CATE identification procedure [see, 28], being granted that the gender
cannot be considered a treatment.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Characteristics of job ads recommended to men and women are diferent</title>
        <p>Table 1 provides conditional and unconditional estimates for gender diferences in recommended
ofer characteristics for all registered job seekers and the selected sub-population (section 2).</p>
        <p>The first and third columns show that, whatever the restrictions on the population, women
are on average recommended diferent jobs than men. Their recommended job ads are paid 2.3%
less than men; half a kilometer closer to home, shorter in terms of weekly working hours (by
2.9 hours); less often of indefinite duration (4 percentage points less often), and executive status
(0.4 percentage points). Recommended jobs are also less often in male-dominated occupations
(41% less often). Women’s recommended jobs also have a lesser degree of fit with their own
search criteria (a loss of 0.028 points in the aggregate fit measure between 0 and 1). All of these
diferences are statistically significant.</p>
        <p>However, the results using the DML estimation (Table 1, column Cond.  ) show that
restricting the analysis to the population of job seekers with common support and conditioning
on job seeker’s search fundamentals  leads to a reduced gender gap in all discussed job ads
characteristics. Nevertheless, after conditioning on , women’s recommended jobs still fit
less with their search parameters (by 0.011 points), and remain significantly diferent in all
discussed dimensions. For instance, 17% of the wage gender gap is left unexplained by job
search characteristics and qualifications of job seekers.</p>
        <p>Uncond.</p>
        <p>Full pop.
Wage (log) -0.023 0.0 -0.016 0.0 -0.004 0.000
Distance (km) -0.474 0.0 -0.231 0.0 0.400 0.000
Executive -0.004 0.0 -0.009 0.0 -0.002 0.032
Long term contract -0.040 0.0 -0.034 0.0 -0.014 0.000
%Women &lt; 20 -0.411 0.0 -0.219 0.0 -0.033 0.000
Hours worked per week -2.934 0.0 -1.957 0.0 -0.381 0.000
Fit to job search parameters -0.028 0.0 -0.019 0.0 -0.011 0.000
Notes: The first column reports the gender gap  in terms of job characteristics on average. The third column reports the
gender gap on the population of job seekers with a propensity score between 0.05 and 0.95. The fifth column reports, on
the population of job seekers with suficiently comparable characteristics, the estimates for the gender gap  controlling
for search parameters using DML. Results are given using random forests as estimators for the functions  and  and
are robust to this choice as shown in Appendix E.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Inequalities in recommendations against the ones observed in hiring and applications</title>
        <p>Inequalities in recommendations are comparable or smaller than the observed ones
in hirings. We turn to the comparison of the characteristics of the recommended job ads to
those observed in real-world hires ( Hire). We focus on the job seekers in the test set for which
we observe hires.</p>
        <p>The first column of the upper section of Table 2 shows that, for the population with common
support and conditionally on job seeker’s search criteria, there exist diferences in hiring behavior
 Hire between women and men. Women are hired on job ads that have a lower aggregate fit
(by 0.019) with their search criteria than men. They are hired less often in male-dominated
occupations (14.1pp); are less often hired on indefinite duration duration contracts (3.4pp), and
work less hours (1.11 hours). All of these diferences are statistically significant. Moreover, they
are hired on jobs that are paid less, 3 and are less often hired in executive positions.</p>
        <p>On the other hand, the third column of Table 2, which reports estimates for  in
recommendations for the subsample of hired job seekers, illustrates that the patterns are similar to those
established on the whole population in the fifth column of Table 1. However, the gap between
the characteristics of hires and the characteristics of recommended job ads after conditioning
on  ( DifH) presented in the fifth column of Table 2 show that they are somehow comparable.
Indeed, the algorithm has little impact on the fit between job seeker’s search criteria and the job
ads, and does not increase the gap in wages, executive status or long term contracts. Surprisingly,
the algorithm seems to recommend job ads in occupations where men are over-represented less
often, and recommends positions with more working hours, thus slightly reducing gender gaps.</p>
        <p>Eventually, if the algorithm recommends diferent types of ofers to men and women, there is
no evidence that it increases the inequalities already observed on the labor market when we
condition for job seekers’ job search fundamentals.</p>
        <p>Observed diferences largely replicate those observed in application behavior.
Differential treatment in hires may originate from job seekers’ application behavior and from
recruiters’ discriminatory behavior (see, e.g., the formal model in appendix D). In the present
section, we wish to compare the magnitude of the gender gap in the algorithm’s recommendations
 to the magnitude of the gender gap found in job seekers’ applications  App. As applications
can also be seen as a noisy proxy for job seekers’ utility, especially if application costs are low
(see appendix D), if the diferences  DifA were large, this would indicate that the algorithm’s
learned recommendations reflect job seekers’ preferences but also recruiter biases.</p>
        <p>Due to diferent data sources, we study the sub-population of job seekers with hires in the
test weeks for which we observe applications (all weeks pooled together).</p>
        <p>The first column of the second panel of Table 2 reports estimates for gender gaps  App in
applications conditionally on . Indeed, the conditional estimates for the gender gaps are
significant in application behavior, in terms of fit to search criteria (a significant diference of
0.029 points in the aggregate index), wages, long term contracts, full time jobs, weekly working
hours, and occupations where men are over-represented.</p>
        <p>Crucially, based on results on the fifth column of the second panel of Table 2, the
conditional estimate for the diference between applications’ characteristics and the algorithm’s
recommendations  DifA are not statistically significantly diferent from zero with respect to fit
to search criteria and to all objective job characteristics aside from occupations where men
3An estimate of 1% for the gender wage gap on the job ofers, conditional on search criteria, might be surprising
considering the larger magnitudes generally discussed in the economics literature. It should be noted that we have
a large set of stated preferences and that the analysis focuses on registered job seekers (rather than on the working
population as a whole), with jobs closer to the national minimum wage than those in the national population.</p>
        <p>Diferences between women and men Diference of Diferences
 Hire() p-value  (MUSE) p-value  DifH (MUSE) p-value
 App (Observed) p-value  (MUSE) p-value  DifA (MUSE)
are over-represented and number of hours worked. In the two latter cases, the diferences in
conditional gender gaps is reduced in the algorithm’s recommendations.</p>
        <p>Altogether, gender gaps exist in the algorithm’s recommendations even after conditioning on
job seekers’ search fundamentals, but those gaps are not larger than those found in hires or in
job seekers’ application behavior. These results suggest that the recall, the relevance w.r.t. job
seekers’ search fundamentals, and the reduction of the gender-related gaps in recommendation
might be antagonistic.</p>
        <p>This conjecture will be investigated empirically using adversarial techniques in the next
section.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Limiting diferential treatment with adversarial methods</title>
      <p>
        The goal of this section is to investigate the consequences of de-correlating the latent
representations from the gender , using an adversarial method [
        <xref ref-type="bibr" rid="ref20 ref22 ref24 ref8">24, 8, 22, 20</xref>
        ], in terms of gender gaps
and recall.
      </p>
      <p>,</p>
      <p />
      <sec id="sec-5-1">
        <title>5.1. Methodology: gender-blind recommendation through adversarial learning</title>
        <p>In the following, we take the pre-selection of 1,000 job ads by the first tier of the algorithm as
given (considering job ads ranked beyond 1,000 to be irrelevant), and incorporate the adversarial
setup to the second tier of the recommender system. Recall that in the usual setting, the
algorithm minimizes:
min  := ∑︁  log(̂︀ ) + (1 −  ) log(1 − ̂︀ ),
,</p>
        <p>̂︀
where  corresponds to the weights parameterizing the latent representation  of job seeker
 and job ad  (viewed with respect to its relation to ). The adversary is instantiated as a
three-hidden-layer feedforward neural network predicting gender from the latent. Denote its
prediction for gender by  =  ( ), the adversary then tries to solve:
min  = ∑︁  log( ) + (1 − ) log(1 − ̂︀ ),</p>
        <p>̂︀

whereas the recommender system incurs a penalty if the adversary’s predictions perform well,
leading to the program: min,</p>
        <p>−  , where  &gt;
amongst the two objectives. In practice, we alternate between stochastic gradient updates of
0 is a hyper parameter prioritizing
the two sets of parameters { } and {,  }</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Results</title>
        <p>ommendations obtained using the adversarial strategy, letting  range over {0.001, 0.01, 0.1, 1}.</p>
        <p>Adopting the adversarial penalization strategy leads to a slight loss in recall@20: a diference
of 0.016 points between  = 0 and  = 1. While recall remains higher for women than for
men, women bear most of the loss (0.018 points, against 0.013 for men) due to adversarial
de-biasing (see a theory about this risk in [32]). As  increases, the gender predictions made
by the adversary become less accurate (the accuracy drops from 85% when  = 0.001 to a
near-random accuracy of 53% when  = 1).</p>
        <p>In terms of unconditional gaps, adopting the adversarial strategy - at least for these levels
of penalization - does not reduce the gender gaps to zero for all characteristics, as would
perhaps have been expected. Indeed, statistically significant diferences in terms of contract
type, occupations, hours worked and fit to search criteria remain. Yet, for all values of  , all
unconditional gender gaps are considerably reduced. For instance, the log wage gap is divided
by 12 (when comparing  = 0 to  = 1). All conditional gender gaps are also decreased.</p>
        <p>Altogether, the use of adversarial de-biasing techniques, aiming at making
recommendation gender-blind, entails a slight loss in recommendation performance. Moreover, it reduces
unconditional and conditional gender gaps, without suppressing them.</p>
        <p>Note that the presented adversarial strategy decorrelates the latent from gender, regardless
of the strategy aiming to only target gaps conditional on  is left for further work.
of whether it represents features from job search fundamentals  or from ∖. An adaptation
Notes: Results are presented on the subsample of hired job seekers, for diferent weights  given to the adversarial term in the loss function. Column  = 0
restates the standard algorithm’s performances for convenience in comparisons. Recall and adversary accuracy are computed on the test set (all hired job seekers).
Unconditional and conditional gaps are computed on the population of hired job seekers with common support. Unconditional gaps correspond to a diference in
means between men and women. Conditional gaps are obtained by DML, using random forests to estimate  and .</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion / perspectives</title>
      <p>Our main contribution is an audit of the gender fairness of the MUSE recommender system,
trained on real-world hiring data. First, we find recall to be slightly higher for women than
for men. Second, we provide evidence of diferentiated treatment of men and women by the
algorithm in terms of recommended job characteristics, even conditionally on job seekers’ search
criteria. In the latter case, we find female job seekers to be recommended jobs that fit their own
search criteria less often. In the latter case, we find female job seekers to be recommended jobs
that do not increase gendered gaps observed in hirings or applications, and even decreases them
in the cases of occupation type and working hours. A comparison of recommended job ads to
application behavior leads to similar conclusions. Finally, we investigate the trade-ofs between
recommendation performance and gender gaps entailed by the use of adversarial de-biasing
techniques. The use of such techniques entails a slight loss in terms of recall, but narrows some
of the conditional and unconditional gender gaps without eliminating them.</p>
      <p>Ultimately, the merits of de-biased algorithms attempting to reduce gender gaps in
recommendations hinge on the acceptability of the proposed job ads in terms of job seekers’ (possibly
gendered) preferences. An algorithm straying of too far from job seekers’ search behavior
might lead to a deadweight loss: a loss in recommendation quality without any efect on labor
market inequalities if recommendations are simply discarded as irrelevant. Answering whether
a suitable equilibrium can be found requires interacting with job seekers.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We warmly thank C. Vessereau, S. Robidou and P. Beurnier from Pôle emploi for making this
research possible and granting access to the proprietary data. First author was funded on a
grant from the DataIA Institute, Saclay.
[31] N. Fortin, T. Lemieux, S. Firpo, Decomposition methods in economics, in: Handbook of
labor economics, volume 4, Elsevier, 2011, pp. 1–102.
[32] M. P. Kim, A. Korolova, G. N. Rothblum, G. Yona, Preference-informed fairness, arXiv
preprint arXiv:1904.01793 (2019).
[33] P. M. Robinson, Root-n-consistent semiparametric regression, Econometrica: Journal of
the Econometric Society (1988) 931–954.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Additional tables</title>
      <sec id="sec-8-1">
        <title>Sample size</title>
      </sec>
      <sec id="sec-8-2">
        <title>Number men</title>
      </sec>
      <sec id="sec-8-3">
        <title>Number women % men</title>
      </sec>
      <sec id="sec-8-4">
        <title>Full week</title>
      </sec>
      <sec id="sec-8-5">
        <title>Full week (overlap)</title>
      </sec>
      <sec id="sec-8-6">
        <title>Hires</title>
      </sec>
      <sec id="sec-8-7">
        <title>Hires (overlap)</title>
        <p>Hires &amp; Applications (overlap)
Notes: The first column presents the total sample size for the diferent datasets used in the analysis:
“Full week" and “Full week (overlap)" present the sample size for a week in the test set before and after
restriction to job seekers satisfying the overlap condition required in the Double Machine Learning
method of Section 3;, “Hires", “Hires (overlap)", and “Hires &amp; Applications (overlap)" present respectively
the sample sizes for the subsamples of job seekers in the test set who have been hired, hired and for
whom the overlap condition holds, and the subset of the latter one where we also observe applications.
Notes: Recall@ is the recall on all the population on
the first top  recommendations. Columns “Men" and
“Women" present the same recall@ separately for men
and women. The last column performs a test of equality
between columns 2 and 3.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>B. Details on variables used</title>
      <p>4QPV refers to poor urban areas in need of public intervention, particularly in terms of urban renewal</p>
    </sec>
    <sec id="sec-10">
      <title>C. Details on the estimation of the heterogeneous efect of gender on the recommendations using the double machine learning method (DML).</title>
      <p>To perform the estimation of the gender gap  , we use the double machine learning method
(DML) [see, e.g., 7, 10]. This method is based on a rewriting of (1), following the intuition of
[33], as
 − () = ( − ()) + ,
(2)
where () := E( |) is a regression function and () := P( = 1|) is the propensity
score, i.e. the probability to be a women ( = 1) given the observed preferences and
qualifications . The later are nuisance parameters, which have to be estimated in a first step, but the
reformulation (3) allows the estimation of  to be doubly robust to this first stage estimation
error. This means that we can obtain an estimator for  which is asymptotically normal under
theoretical conditions which are satisfied by many machine learning methods. Estimation
thus consists of 1) estimating  and  using machine learning estimators  and ; 2) estimate
̂︀ ̂︀
the gender gap  via minimization of the mean squared error associated to (3) using plug-in
leave-one-out versions of ̂︀ and , i.e., i.e. predicting without using the -th example [see 10].</p>
      <p>̂︀
0():
(Decision applying)
⏟
˜(, )( (, ) +  ) + (1 − ˜(, ))0() −  ≥</p>
      <p>Expected utility w⏞hen applying</p>
      <p>0()
Utility w⏟ithou⏞t applying
.</p>
    </sec>
    <sec id="sec-11">
      <title>D. Gendered recommendations, applications, and hires: a simple formal model</title>
      <p>To discuss the diferent sources of biases which can appear in the recommendations, and how
they compare to those appearing both in the realized job applications and hires, we consider
the following simple model of the decision to apply for a job and of the hiring.</p>
      <p>For job seekers and job ads having respective types  and , we denote the chances that the
interview yields a hiring by  (, ). The job seekers may not be rational and have expectations
about their opportunities ˜(, ) which difer from the objective ones  (, ) ̸= ˜(, ). We
assume that job seekers expect to have a utility  (, ) +  −  if hired, where  is a unobserved
random part and  is the cost of application. On the contrary, they expect to have their baseline
utility 0() minus the cost . In this model, job seekers decide to apply for a job with type 
if their expected utility if they do so is greater than their utility if they do not apply, namely
In this model, the probability of observing an application of  on a job ad of type  is
(Probability of observing an application)
(, ) = − 
 (, ) − 0() − ˜(, )</p>
      <p>,
where −  denotes the cdf of − . We note that, when the cost of application are zero,  = 0, we
ifnd the intuitive idea that only the utility matters in the job seekers’ decisions. Otherwise, their
expected chances ˜(, ) of a positive output weight their utility gains and might censor their
decision of applying, hence the observed data. This simply underlines that realized applications
are then not a pure expressions of the preferences, but also mix with possibly wrong expectations.
It is finally of interest to consider the form taken by the probability of observing a hiring, which
is simply the product of the probability of application times the objective probability of a positive
output after the interview: (, ) = (, ) (, ).</p>
      <p>
        This model helps us discussing several mechanisms that could yield a diferential treatment
along the lines of gender. First, preferences  might be gender-specific [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], e.g. women tend to
appreciate the relative values of commuting time and wages diferently from men. An algorithm
learning from past hires or applications could reproduce these diferences in preferences. If
the later are the product of social norms or other constraints, a policymaker might find this
unfair that the algorithm convey these diferences, hence justifying to impose parity along these
lines. Second, even if job seekers are rational, there might be gendered diferences in the hiring
chances  , e.g. taste or statistical discrimination against a gender by recruiters.
      </p>
      <p>The algorithm could also reproduce these diferences. A final pitfall is that the hiring
expectations ˜ might also be gendered: there might be diferences in the perceptions and the
representations of the chances to be hired, leading to diferences in self-censorship or
overconfidence. In our model, this could directly create or exacerbate the diferences which might
already be present in the objective chances  , and impacting the training data of the algorithm.</p>
    </sec>
    <sec id="sec-12">
      <title>E. Robustness checks</title>
      <p>To ensure estimates for  obtained by Double Machine Learning results are robust to the choice
of machine learning technique used for the approximation of  and , we report alternative
estimates for  obtained using a XGBoost and Lasso estimators as well as the p-values associated
in Table 8, columns 3-6. Results are consistent with what we find with a random forest estimator
(Columns 1-2).</p>
      <p>Wage (log)
Distance (km)
Executive
Long term contract
%Women &lt; 20
Hours worked per week
Fit to job search parameters</p>
      <p>
        Cond. 
Random Forest
-0.004
0.400
-0.002
-0.014
-0.033
-0.381
-0.011
Notes: Column 1, 3 and 5 report, on the population of job seekers with suficiently comparable characteristics, the
estimates for the gender gap  controlling for search parameters using DML and respectively a random forest, XGBoost
and lasso estimator for the functions  and .
Our main specification (equation 1) focuses on average gender gaps  (after controlling for job
search fundamentals ). However, gender gaps are likely to be heterogeneous, at least for a
subset 0 of Z. For instance, the gender gaps in recommendations may be greater for women
looking for high wages than for those seeking low wages. Accordingly, we propose to study
gender gaps conditional on 0, in line with the estimation of so-called Conditional Average
Treatment Efects in the causal estimation literature [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. More precisely, to provide insights
about this potential heterogeneity, we assume
 − () = ( − ()) (0) + ,
(3)
with  (0) a linear function. In the following, we consider 0 as an expansion of a single
feature of interest - job seekers’ monthly reservation wage in euros - on a base of B-splines to
increase the specification’s flexibility. To reduce sensitivity on outliers we top code at 90% and
bottom code at 10%.
      </p>
      <p>Figure 1 shows the conditional gender wage gap (solid line) in the characteristics of
recommendations according to reservation wage and provides confidence interval at 95%.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Belot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kircher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <article-title>Providing advice to jobseekers at low cost: An experimental study on online advice, The review of economic studies 86 (</article-title>
          <year>2019</year>
          )
          <fpage>1411</fpage>
          -
          <lpage>1447</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Barocas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <article-title>Fairness and machine learning</article-title>
          .
          <source>fairmlbook. org</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bied</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nathan</surname>
          </string-name>
          , E. Perennes,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Caillou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Crépon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gaillac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sebag</surname>
          </string-name>
          ,
          <article-title>Toward job recommendation for all</article-title>
          , Working paper (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pitassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Reingold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <article-title>Fairness through awareness</article-title>
          ,
          <source>in: Proceedings of the 3rd innovations in theoretical computer science conference</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Varian</surname>
          </string-name>
          , Eficiency, equity and envy,
          <source>Journal of Economic Theory</source>
          <volume>9</volume>
          (
          <year>1974</year>
          )
          <fpage>63</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Zafar</surname>
          </string-name>
          , I. Valera,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gummadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <article-title>From parity to preferencebased notions of fairness in classification</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Chernozhukov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chetverikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Demirer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Duflo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hansen</surname>
          </string-name>
          , W. Newey, Double/debiased/neyman machine learning of treatment efects,
          <source>American Economic Review</source>
          <volume>107</volume>
          (
          <year>2017</year>
          )
          <fpage>261</fpage>
          -
          <lpage>265</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Edwards</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Storkey</surname>
          </string-name>
          ,
          <article-title>Censoring representations with an adversary, 2015</article-title>
          . URL: https: //arxiv.org/abs/1511.05897. doi:
          <volume>10</volume>
          .48550/ARXIV.1511.05897.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chernozhukov</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Spindler, Closing the us gender wage gap requires understanding its heterogeneity</article-title>
          , arXiv preprint arXiv:
          <year>1812</year>
          .
          <volume>04345</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wager</surname>
          </string-name>
          ,
          <article-title>Quasi-oracle estimation of heterogeneous treatment efects</article-title>
          ,
          <source>Biometrika</source>
          <volume>108</volume>
          (
          <year>2021</year>
          )
          <fpage>299</fpage>
          -
          <lpage>319</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. D. Ekstrand</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Diaz</surname>
          </string-name>
          , et al.,
          <source>Fairness in information access systems, Foundations and Trends® in Information Retrieval</source>
          <volume>16</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , W. Ma, M. Zhang*, Y. Liu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>A survey on the fairness of recommender systems</article-title>
          ,
          <source>ACM Journal of the ACM (JACM)</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tan</surname>
          </string-name>
          , S. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Fairness in recommendation: A survey,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2205.13619. doi:
          <volume>10</volume>
          .48550/ARXIV.2205.13619.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          ,
          <article-title>Fairness of exposure in rankings</article-title>
          ,
          <source>in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, KDD '18</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2018</year>
          , p.
          <fpage>2219</fpage>
          -
          <lpage>2228</lpage>
          . URL: https://doi.org/10.1145/3219819.3220088. doi:
          <volume>10</volume>
          .1145/3219819.3220088.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          ,
          <article-title>Policy learning for fairness in ranking</article-title>
          , in: H.
          <string-name>
            <surname>Wallach</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>d'Alché-</article-title>
          <string-name>
            <surname>Buc</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Fox</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>32</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2019</year>
          . URL: https://proceedings. neurips.cc/paper/2019/file/9e82757e9a1c12cb710ad680db11f6f1-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Geyik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ambler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kenthapadi</surname>
          </string-name>
          ,
          <article-title>Fairness-aware ranking in search &amp; recommendation systems with application to LinkedIn talent search</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, ACM</source>
          ,
          <year>2019</year>
          . URL: https://doi.org/10.1145%
          <fpage>2F3292500</fpage>
          .3330691. doi:
          <volume>10</volume>
          .1145/3292500.3330691.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          , E. Yilmaz,
          <article-title>Auditing search engines for diferential satisfaction across demographics</article-title>
          ,
          <source>in: Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion</source>
          , ACM Press,
          <year>2017</year>
          . URL: https://doi.org/10.1145%
          <fpage>2F3041021</fpage>
          .3054197. doi:
          <volume>10</volume>
          .1145/3041021. 3054197.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>M. D. Ekstrand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>I. M.</given-names>
          </string-name>
          <string-name>
            <surname>Azpiazu</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          <string-name>
            <surname>Ekstrand</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Anuyah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>McNeill</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Pera</surname>
          </string-name>
          ,
          <article-title>All the cool kids, how do they fit in?: Popularity and demographic biases in recommender evaluation and efectiveness</article-title>
          , in: Conference on fairness,
          <source>accountability and transparency, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>172</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>A. B. Melchiorre</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Rekabsaz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Parada-Cabaleiro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Brandl</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Lesota</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Schedl</surname>
          </string-name>
          ,
          <article-title>Investigating gender fairness of recommendation algorithms in the music domain</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>58</volume>
          (
          <year>2021</year>
          )
          <fpage>102666</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Keya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Foulds</surname>
          </string-name>
          ,
          <article-title>Debiasing career recommendations with neural fair collaborative filtering</article-title>
          ,
          <source>in: Proceedings of the Web Conference</source>
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>3779</fpage>
          -
          <lpage>3790</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kamishima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Akaho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Asoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sakuma</surname>
          </string-name>
          ,
          <article-title>Enhancement of the neutrality in recommendation</article-title>
          , in: Decisions@ RecSys,
          <year>2012</year>
          , pp.
          <fpage>8</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luppes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Oosterhuis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Schoenmacker</surname>
          </string-name>
          ,
          <article-title>Closing the gender wage gap: Adversarial fairness in job recommendation (</article-title>
          <year>2022</year>
          ). URL: https://arxiv.org/abs/2209.09592. doi:
          <volume>10</volume>
          .48550/ARXIV.2209.09592.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Kuhn,
          <article-title>Understanding algorithmic bias in job recommender systems: An audit study approach (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wadsworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Piech</surname>
          </string-name>
          ,
          <article-title>Achieving fairness through adversarial learning: an application to recidivism prediction, 2018</article-title>
          . URL: https://arxiv.org/abs/
          <year>1807</year>
          .00199. doi:
          <volume>10</volume>
          . 48550/ARXIV.
          <year>1807</year>
          .
          <volume>00199</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Beutel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <article-title>Data decisions and theoretical implications when adversarially learning fair representations</article-title>
          ,
          <year>2017</year>
          . URL: https://arxiv.org/abs/1707.00075. doi:
          <volume>10</volume>
          .48550/ARXIV.1707.00075.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>Towards personalized fairness based on causal notion</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , ACM,
          <year>2021</year>
          . URL: https://doi.org/10.1145%
          <fpage>2F3404835</fpage>
          .3462966. doi:
          <volume>10</volume>
          .1145/3404835.3462966.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Deerwester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Furnas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Harshman</surname>
          </string-name>
          ,
          <article-title>Indexing by latent semantic analysis</article-title>
          ,
          <source>Journal of the American society for information science 41</source>
          (
          <year>1990</year>
          )
          <fpage>391</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Imbens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Rubin</surname>
          </string-name>
          , Causal inference in statistics, social, and biomedical sciences, Cambridge University Press,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Random forests,
          <source>Machine learning 45</source>
          (
          <year>2001</year>
          )
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>T.</given-names>
            <surname>Le</surname>
          </string-name>
          <string-name>
            <surname>Barbanchon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rathelot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roulet</surname>
          </string-name>
          ,
          <article-title>Gender diferences in job search: Trading of commute against wage</article-title>
          ,
          <source>The Quarterly Journal of Economics</source>
          <volume>136</volume>
          (
          <year>2021</year>
          )
          <fpage>381</fpage>
          -
          <lpage>426</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>