<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>C. Criscuolo);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Causal Metrics of Fairness in Machine Learning on Data-Driven Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chiara Criscuolo</string-name>
          <email>chiara.criscuolo@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Martinenghi</string-name>
          <email>davide.martinenghi@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jing Huang</string-name>
          <email>jing1.huang@mail.polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Fairness, Statistical Fairness Metrics, Causal Fairness Metrics, Machine Learning</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Milano - Department of Electronics</institution>
          ,
          <addr-line>Information, and Bioengineering</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Via G. Ponzio 34/5</institution>
          ,
          <addr-line>20133 Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>In the digital age, machine learning (ML) algorithms are becoming increasingly important in decision-making processes across a wide range of domains, including criminal justice, healthcare, and finance. While these algorithms provide significant benefits, they also pose the risk of perpetuating and exacerbating societal biases, especially when fairness is not taken into account during their design and implementation. We address the critical issue of fairness in machine learning, with a focus on combining statistical and causal fairness metrics to provide a more comprehensive approach to evaluate and ensure fairness by selecting the most suitable metric. To tackle this problem, we developed a research methodology aimed at systematically reviewing the existing literature while focusing on four research questions targeting the relationship between statistical and causal fairness metrics, which drove our analysis and categorization of papers. Based on the results of this review, we built a new fairness decision tree that integrates both types of metrics, which can guide users to choose the most suitable metric.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the digital age, machine learning (ML) systems have become essential to many aspects of daily living
and social functions. Machine learning covers a wide range of critical applications, from criminal
justice and credit scoring to healthcare diagnostics. However, a serious concern for justice has emerged
along with the technological advancements that machine learning has brought forth. It is imperative to
ensure that these systems do not perpetuate or exacerbate societal inequalities and biases that already
exist [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Machine learning algorithms, being data-driven, may inadvertently encode human bias. One
striking example is the COMPAS Risk Assessment Tool [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] based on information about a defendant’s
criminal record, type of ofense, record of contact with the community, and history of failing to appear
in court to assist the judge in making bail decisions. Regarding this last aspect, the ProPublica team
found that the software used by U.S. courts incorrectly labeled Black defendants as high-risk, almost
twice as likely as White people [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Similar biases have been identified in other domains, such as
e-commerce, where diferentiated pricing strategies unfairly target returning customers based on their
online behavior [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>These examples show the need for a systematic approach to assessing and mitigating bias in machine
learning systems. The problem’s complexity is increased by the fact that fairness in ML can be understood
from multiple perspectives, including statistical fairness, which focuses on ensuring equitable outcomes,
and causal fairness, which aims to understand and address the underlying causal mechanisms that lead
to biased results.</p>
      <p>In this paper, we study the topic of fairness in machine learning, with a focus on the diference
between statistical and causal fairness metrics; in particular, we study the possibility of incorporating
them into a common vision. Our key contributions are as follows:
(J. Huang)</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
• Research Methodology: We develop a systematic research methodology to identify and
categorize the relevant literature.
• Results of Literature Analysis: Our systematic analysis of selected papers brings significant
insights into the distinction between statistical and causal fairness, and the datasets most commonly
used.
• Fairness Decision Tree: We present a new fairness decision tree framework that integrates
both statistical and causal fairness metrics.</p>
      <p>The rest of this paper is organized as follows. Section 2 introduces some preliminary concepts.
Section 3 presents the adopted research methodology. Section 4 describes the analysis of the results.
Section 5 presents the fairness decision tree. Section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>
        Fairness can be defined as the absence of any prejudice or favoritism toward an individual or a group
based on their inherent or acquired characteristics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and it holds significant relevance within the
domain of Machine Learning (ML). In this context, ML algorithms embody a decision-making paradigm
characterized by impartiality and the absence of bias. It is important to acknowledge that existing
biases in the data can significantly influence the performance and outcomes of these algorithms,
rendering the data and results unfair [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In terms of fairness, binary classification plays a crucial role
in decision-making systems where outcomes significantly impact individuals, such as loan approvals,
hiring decisions, and medical diagnoses. It maps input features to one of two possible outcomes: positive
(1) or negative (0). These predictions are then compared to the actual outcomes to evaluate the model’s
performance. Ensuring fairness in these models is essential to prevent discrimination [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Existing fairness definitions in ML algorithms can be classified into two categories: statistical fairness
and causal fairness. Statistical fairness focuses on frequency statistics, ensuring equitable outcomes
across demographic groups. In contrast, causal fairness explores causal relationships between attributes
and outcomes, intervening to eliminate biases rooted in causal mechanisms. Statistical-based fairness
metrics are categorized in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] and the Fairness Decision Tree presented by Baresi et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is designed
to identifying the most suitable fairness interpretations for ML-based systems.
      </p>
      <p>
        While statistics ofer the tools to identify patterns and correlations within data [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Judea Pearl’s
work on causality challenges us to understand the ”why” behind these patterns. This means that
causal fairness difers from statistical fairness in that it is not entirely determined by observed data and
necessitates the introduction of additional cause-and-efect assumptions. Causal fairness is a definition
of fairness based on a causal connection between protected attributes and decisions. Causal graph
models have limitations in their very structure, derived from domain knowledge, and inconsistencies in
assumptions may occur [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Based on observed data, causal graph models often sufer from model
non-uniqueness, which refers to the possibility that multiple diferent causal graph models can fit
the same set of observed data equally well. This non-uniqueness implies that there may be multiple
plausible explanations for the causal relationships in the data.
      </p>
      <p>
        Based on Pearl’s structural causal models [
        <xref ref-type="bibr" rid="ref11 ref9">11, 9</xref>
        ], a structural equation-based mathematical object
that describes the causal mechanisms of a system. Each causal model is associated with a causal graph
for visualizing in a more user-friendly way the causal inference, where causal efects are carried by
the causal paths that trace arrows pointing from the cause to the efect [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. To better illustrate these
notions, we introduce the Ladder of Causation taken from [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], a causal hierarchy presented by Pearl,
which afirms that causation has three levels: association, intervention, and counterfactual.
1. Association [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] can be inferred directly from the observed data using conditional probabilities
and conditional expectations, which correspond to statistically-based fairness metrics.
2. Intervention [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] involves not only seeing what is but also changing what we see. Interventional
questions deal with expressions of the type  ( |(), ) , which denote “The probability of event
 =  , given that we intervene and set the value of  to  and subsequently observe event  =  ”.
It can be estimated experimentally from randomized trials or analytically using causal Bayesian
networks.
3. Counterfactual [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] deals with expressions of the type  (  | ′,  ′) which denote ”The probability
that event  =  would be observed had  been  , given that we actually observed  to be  ′
and Y to be  ′”. It can be computed only when the model is based on functional relations or is
structural.
      </p>
      <p>
        The majority of causal-based fairness notions are defined in terms of the non-observable quantities  of
interventions and counterfactuals, so their applicability depends heavily on the identification of those
quantities in the data. An overview of the principal causal-based fairness metrics is presented by [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
1. No unresolved discrimination [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]: Requires that there exists no path from the protected
attribute  to the predicted outcome  ̂.
2. Total Causal Efect [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]: Is defined as the efect of changing the sensitive attribute  from  = 0
to  = 1 on decision  =  along all causal paths from  to  , it is considered to be fair if the
diference between the conditional distributions is within the fair threshold.
      </p>
      <p>It is defined as follows:</p>
      <p>
        ( ) =  ( |( = 1)) −  ( |( = 0))
3. Path-specific Efect [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]: Given a causal path set, the path-specific efect is defined as the value
change of the sensitive attribute  from  = 0 to  = 1 on decision  =  along specific causal
path  , it is considered to be fair if the diference is within the fair threshold.
      </p>
      <p>It is defined as follows:</p>
      <p>
        ( ) =  ( |( = 1| ,  = 0| ))̂ −  ( |( = 0))
where  ( |( = 1| ,  = 0| ))̂ represents the post-intervention distribution of  where the efect
of intervention ( = 1) is transmitted only along  while the efect of reference intervention
( = 0) is transmitted along the other paths.
4. No proxy discrimination [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]: Requires there exists no path from the protected attribute  to
the predicted outcome  ̂ that is blocked by a proxy variable  .
      </p>
      <p>
        It is defined as follows:
 ( |( =  )) =  ( |( = 
′)) ∀ ,  ′ ∈ ()
5. Counterfactual Fairness [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]: Requires that the predicted outcome  ̂ in the graph does not
depend on a descendant of the protected attribute  . This means that an outcome  achieves
counterfactual fairness towards an individual if the probability of  =  for such an individual is
the same as the probability of  =  for the same individual but belongs to a diferent sensitive
group.
      </p>
      <p>It is defined as follows:
 ( =1 ( )| = ,  = 0) =  (
=0 ( )| = ,  = 0)
Where  is the subset of observed variables  except sensitive variables and decision variables.</p>
      <p>
        Any context  =  represents a certain sub-group of the population.
6. Individual direct discrimination [
        <xref ref-type="bibr" rid="ref16 ref2">16, 2</xref>
        ]: It aims to discover the direct discrimination at
the individual level. It is based on situation testing, by comparing the individual with similar
individuals from both groups (protected and unprotected). This means for a target individual  ,
select top-K individuals most similar to  from group  = 1 , denoted as  + and top-K individuals
most similar to i from group  = 0 , denoted as  −. The target individual is considered as
discriminated if the diference observed between the rate of positive decisions in  − and  + is
higher than a predefined threshold (typically 5%).
      </p>
      <p>Causal inference is used to define the distance function (,  ′) to measure similarity between
individuals: given a causal graph, only the variables that are direct parent nodes of the decision
variable are considered to compute the similarity between individuals, which are denoted as
 =   ( )\{}
. The formal definition of (,  ′) is:
||
=1
(,  ′) = ∑ |(
 ,  ′) ∗  (
 ,  ′)|
(</p>
      <p>
        ,  ′) is defined as follows:
Where  (
the causal efect of each of the selected variables   ∈  on the actual outcome. In particular,
 ,  ′) is a distance function proposed by Luong et al. in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and (
 ,  ′) represents
( ) =  ( |()) −  ( |(
′, \  ))
in  to take the same values as  .
      </p>
      <p>Where  ( |())
 ( |(</p>
      <p>is the efect of the interventions that forces  to take the set of values  , and
′, \  )) is the efect of the intervention that forces   to take value  ′ and other attributes

7. Equality of Efort</p>
      <p>
        [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]: It detects discrimination by comparing the efort required to reach the
same level of the actual outcome of individuals from advantaged and disadvantaged groups who
are similar to the target individual. A treatment variable  is selected and used to address the
question: “To what extent the treatment variable  should change to make the individual (or a
group of individuals) achieve a certain outcome level?”.
      </p>
      <p>Equality of efort notions are defined based on the potential outcome framework into individual 
Equal efort and system  -Equal efort. Both criteria can be used to measure the efort discrepancy
between protected and unprotected groups.</p>
      <p>Let’s consider    as the potential outcome for individual  had  been  , [
 ] the expected outcome

under treatment  for individual  , and consider, similar to Individual direct discrimination,  + and
 − as two sets of similar individuals of group  = 0 and  = 1 respectively.</p>
      <sec id="sec-2-1">
        <title>In consequence, [</title>
        <p>+] is the expected outcome under treatment  for the subgroup  +. And the
needed minimal value of treatment variable  to achieve  -level of outcome within the subgroup
 + is defined as follows:
For a certain outcome level  , individual  -Equal efort is satisfied for individual
 if:
Ψ +( ) =</p>
        <p>
          ∈ [   +] ≥ 
Ψ +( ) = Ψ  −( )
Ψ +( ) = Ψ  −( )
When + and − are extended to the entire group with sensitive attribute  = 0 and  = 1
respectively,  + is used to denote the first set and  − denoted the second one. The System
 -Equal efort is satisfied for a sub-population if:
8. Path-specific Counterfactual Fairness (PC Fairness) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]: Is defined to cover various
causalitybased fairness notions. Given a factual condition  = 
where  ⊆ 
predictor  ̂ achieves the PC fairness is it satisfies the following expression:
and causal path set  , a
( ( ̂ =1|,=0| ̂ | ) −  ( ̂ =0 | )) ≤
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Where  is a predefined fairness threshold.</title>
        <p>corresponds to the Total Causal Efect .</p>
        <p>Consequently, for example, if we set  contains all causal paths and  an empty set, PC-Fairness
Finally, regarding unfairness mitigation, depending on the stage of the ML algorithm, pre-processing,
in-processing, and post-processing mechanisms can be used to intervene in the algorithm to achieve
fair ML, respectively.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Research Methodology</title>
      <p>This section defines the research questions and the methodological approach taken to address them,
including the specific search techniques and keywords that were employed to locate pertinent material,
discussing the inclusion and exclusion criteria employed to identify the most relevant studies.</p>
      <sec id="sec-3-1">
        <title>3.1. Research Questions</title>
        <p>The methodology is based on a set of structured research questions designed to explore fairness in
ML, comprehending both statistical and causal dimensions. These questions guide the entire research
process, from the initial literature review to the final analysis and synthesis. Here, we discuss each of
these questions.</p>
        <p>RQ1: What are the main concepts and diferences between statistical-based and causal-based
fairness?</p>
        <p>This question is to provide a clear and simple review of the fundamental ideas and diferences between
statistical and causal approaches to fairness in ML. Understanding these diferences is important for
comprehending each perspective’s specific advantages and limits.</p>
        <p>RQ2: Which datasets are most commonly used in fairness research, and are there diferences
between those used in causal-based and statistical-based studies?</p>
        <p>Identifying commonly used datasets in both causal and statistical fairness is crucial as it helps in
understanding the contexts in which fairness metrics are tested and validated. This question attempts
to figure out not just the most often used datasets in general, but also if there are diferences in dataset
utilization across causal and statistical fairness studies.</p>
        <p>RQ3: Is it possible to have a common vision between causal and statistical fairness?
This question investigates the theoretical feasibility and applicability of combining causal-based and
statistical-based fairness metrics.</p>
        <p>RQ4: How to choose the most suitable metric considering both perspectives?
This final question discusses the research’s practical consequences, suggesting an approach for
selecting the most appropriate fairness metric while balancing causal and statistical aspects.</p>
        <p>By addressing these questions, this work provides the foundation for a detailed investigation of
fairness research, ensuring an in-depth knowledge of both the theoretical foundations and practical
applications of the subject.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Search Strategy</title>
        <p>We delineate the comprehensive search strategy employed to gather relevant literature, focusing on
evaluating fairness in ML with a particular emphasis on fairness metrics.</p>
        <p>The literature collection process is illustrated in Figure 1 and which delineates the steps from research
question formulation to final paper selection that are: Keywords and Query Creation, Setting Database
and Inclusion and Exclusion Criteria, Paper Analysis and Snowballing.</p>
        <p>To ensure targeted and relevant database searches, keywords were identified from the research
questions. These keywords, along with supportive terms, guide the search process efectively and are:
AI, Machine Learning, ML, Data, Fairness, Metric, Definition, Solution, Mitigation, Ethic, Measure, Causal,
Statistical, Group, Individual, Counterfactual, Interventional.</p>
        <p>The query was constructed using boolean operators to refine search results and ensure relevance to
our objectives. Furthermore, the search was conducted across prominent databases including Scopus,
ACM Digital Library, IEEE Xplore, and Google Scholar, focusing on publications from the past five
years in the field of computer science. The final query is the following:</p>
        <p>(fair* OR discriminat* OR unfair* OR bias*) AND (causal* OR statistic* OR
individual* OR group* OR counterfact* OR intervention* OR parity*) AND (metric*
OR measur* OR defin* OR solut* OR mitigat*) AND ((machine learning) OR data* OR
ethic* OR (artificial intelligence))</p>
        <p>The search provided a significant number of results: Scopus 5,270, ACM Digital Library 910, IEEE
Xplore 2,335, Google Scholar 16,800.</p>
        <p>For the selection strategy, we present the inclusion and exclusion criteria utilized during the selection
process for identifying relevant literature. These criteria serve as guidelines to ensure the systematic
and targeted inclusion of papers that align with our objectives while excluding those that do not meet
the specified thematic requirements. The Inclusion Criteria are:
• Article or Paper: Articles, conference papers, or research papers that contribute to the discourse
on fairness in ML.
• More than 5 citations: Papers cited more than five times, indicating the paper’s impact and
relevance.
• Discussion on fairness: This includes any study that addresses fairness in the context of ML,
covering both theoretical and practical aspects.
• New mitigation technique: Studies that suggest methods to reduce or eliminate biases in ML
models.
• Tool: Studies that introduce software or tools designed to assess, measure, or enhance fairness in</p>
        <p>ML models.
• New perspective: Papers that provide innovative viewpoints or conceptual frameworks for
understanding fairness.
• New fairness metric: Studies that develop and validate new metrics for evaluating fairness in</p>
        <p>ML.</p>
        <p>The Exclusion Criteria are:
• Theses or reports: Academic theses and technical reports, as these often serve as preliminary or
non-peer-reviewed documents.
• Surveys: Summary of existing research rather than contributing new findings or perspectives.
• Papers that primarily focus on techniques without discussing fairness: Studies focusing
on techniques or algorithms without addressing their fairness implications.</p>
        <p>• Papers not written in English: Studies written in languages other than English.</p>
        <p>
          Through the selection process based on inclusion and exclusion criteria, a total of 26 papers were
initially identified as relevant. Subsequently, employing a snowballing technique to expand the pool of
selected papers, an additional 3 papers were incorporated, bringing the total number of selected papers
to 29. Snowballing technique, also known as snowball sampling or iterative citation searching [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], is
a technique commonly used in research to identify additional relevant studies beyond those initially
retrieved. It involves reviewing the reference lists of selected papers to identify additional sources that
may not have been captured in the initial search by examining the references of retrieved papers and
identifying relevant citations. Thus, the incorporation of additional papers through snowballing enriches
the research process by capturing potentially overlooked or lesser-known studies that contribute to a
more comprehensive understanding of the subject matter. Finally, the paper analysis is designed to
collect all the information to address the research questions from the selected papers.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results of Literature Analysis</title>
      <p>The following data were systematically collected from the papers found (besides paper’s title, authors,
year of publication, venue and URL/DOI):
• Fairness Metrics (Causal or Not or Both): Whether the paper discusses causal fairness metrics
or statistical fairness metrics or both.
• Analysis Type: The type of analysis conducted in the paper: Classification, Evaluation, Definition,
or Solution Proposal (a new tool, perspective or algorithm).
• Content: The paper’s main findings and contributions.
• Context: The specific domain or application area of the research.
• Experimental Datasets: The datasets used in the paper, if any.</p>
      <p>• Methods: Mitigation techniques type, if pre-, in- or post-processing.</p>
      <p>This comprehensive data collection approach ensures that each paper’s relevant information is
captured accurately and thoroughly, facilitating further analysis and synthesis of findings in subsequent
stages of the research process. A portion of our findings is encapsulated in Table 1.</p>
      <sec id="sec-4-1">
        <title>4.1. Answering RQ1-RQ2</title>
        <p>In this section, we are going to address the research questions RQ1 and RQ2 by analyzing the selected
papers. The distribution of causal and non-causal papers among selected papers is: 70% focus on causal
fairness metrics and 30% on statistical fairness metrics.</p>
        <p>RQ1: What are the main concepts and diferences between statistical-based and causal-based
fairness?</p>
        <p>Statistical-based fairness metrics are grounded in ensuring that the observed outcomes of a
machinelearning model are distributed equitably. These metrics focus on the fairness of the model’s predictions
without delving into the underlying causal mechanisms that generate the data. Causal-based fairness
metrics, in contrast, emphasize the importance of understanding and addressing the causal relationships
between variables. These metrics, including Counterfactual fairness and Interventional fairness, seek to
identify and mitigate biases that arise from the causal influence of protected attributes on the outcomes.
By examining the causal pathways, thus, causal fairness metrics aim to ensure that decisions are not
only fair in an observational sense but also in a causal sense, addressing deeper, more systemic biases.</p>
        <p>To summarize, the fundamental diference between statistical and causal fairness lies in their approach.
While statistical fairness focuses on the distribution of outcomes, causal fairness delves into the root
causes of biases, examining how and why these biases occur.</p>
        <p>Q2: Which datasets are most commonly used in fairness research, and are there diferences
between those used in causal-based and statistical-based studies?</p>
        <p>
          The visualization in Figure 2 provides the number of datasets used across papers and shows a detailed
comparison of the frequency of dataset usage in these studies. From the chart, it is evident that the
Adult dataset [41] is predominantly used in both causal and statistical studies, highlighting its relevance
and utility in fairness research, given its rich demographic features and applicability to numerous
fairness metrics. The Compas [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] dataset is widely used in causal studies, which underscores the
necessity of understanding fairness within specific domains, such as criminal justice, where biases can
have significant societal implications. Similarly, the German Credit dataset [ 42] is commonly used
in statistical studies to evaluate fairness in financial decision-making, such as credit scoring models,
Perspective
Definition
Tool
Perspective
Definition
Classification
Definition
Algorithm
Algorithm
Tool
Algorithm
Tool
Tool
Definition
Perspective
Classification
Algorithm
Classification
Tool
Algorithm
Algorithm
Evaluation
Perspective
Classification
Definition
Classification
Evaluation
Classification
/
/
Pre, Post
/
Pre
In
/
Pre
Pre, Post
Pre
Pre, Post
Post
Pre, Post
Pre
/
Pre, Post
/
In
Pre, Post
Pre, In, Post
Pre
Pre
/
Pre, Post
Pre, In, Post
Pre
/
/
/
        </p>
        <p>DATE
where use reflects the importance of ensuring equitable treatment in financial services, an important
area where biases can impact individuals’ economic opportunities. The chart also highlights the use of
synthetic datasets in causal studies, which are artificially created by researchers, and they are essential
for testing causal metrics because they allow researchers to design experiments that can isolate and
examine the efects of specific variables on fairness, providing insights that might not be possible with
real-world data.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Fairness Decision Tree</title>
      <p>
        In this section, we are going to present our new fairness decision tree (Figure 3), which aims to extend
and update the original proposal by Baresi et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Our goal is to assist people in selecting the most
appropriate fairness definitions for their machine learning (ML) systems by combining both statistical
and causal fairness metrics. The proposed decision tree is open and can further be extended if new
needs and definitions arise.
      </p>
      <p>First, we will provide a detailed summary of the original decision tree, covering nodes A to F and
metrics 1 to 12. Following that, we will introduce the new causal part of the tree (nodes G, H, I, L, M,
and metrics 13 to 19), which is the original contribution of this work.</p>
      <p>The tree begins with the question, ”Is past knowledge relevant?” (A). If the answer is yes, the tree
helps experts decide about the importance of past decisions compared to new predictions. If the answer
is no, the focus shifts to predictions and legitimate attributes (B). Based on the responses, the tree
suggests diferent statistical fairness definitions. When past knowledge is relevant, the next question
(C) asks which type of predictions the expert is interested in: wrong, correct, or both. If the focus is on
wrong predictions, the tree further asks whether the interest is in negative or positive predictions (D)
and how conservative the decisions should be (E). Based on the responses, the tree suggests specific
statistical fairness definitions. If the expert is interested in correct predictions, the next question (F)
asks how to balance predictions and past decisions, leading to other three possible statistical definitions.</p>
      <p>The introduction of causal fairness metrics expands the tree with new questions (G, H, I, L, M).
These additions address the limitations of purely statistical approaches by also considering the causal
relationships. The new causal fairness metrics section starts with the question, ”Are you interested in
modifying the predictions?” (G). This question is important because modifying predictions involves
engaging with causal relationships, it suggests an interest in not just observing what is but also in
altering and understanding the causal relationships that lead to specific outcomes. If the answer is
no, we move on (C), returning to the statistical part of the tree. If the answer is yes, the tree explores
the possibility of adding or modifying the cause-efect relationships through the question ”Do you
want to add or modify cause-efect relations?” (H). Here, it is essential to distinguish between two
paths that follow from the response to this question. This leads to two diferent nodes depending on
whether the expert chooses to add (Counterfactual fairness metrics) or modify (Interventional fairness
metrics) cause-efect relationships. In both cases, the decision tree diferentiates between analyzing
by group or by individual with the question ”Do you want to analyze by group or by individual?” (I),
which separates the path into group-level and individual-level analysis. This distinction is significant
as it acknowledges that fairness can be assessed by considering the overall impact on groups, where
the focus is on collective fairness, or by individual circumstances, which demands a more granular,
personalized assessment of fairness.</p>
      <p>
        In the counterfactual case (ADD), if the interest is in the group analysis, we suggest using the
Unresolved Discrimination [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (13). This metric captures any discrimination that remains after accounting for
all known causal pathways. For those interested in individual analysis, the Counterfactual fairness [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
(14) is proposed. This metric assesses fairness by comparing the actual outcome with the outcome that
would have occurred in a counterfactual world where the protected attribute is diferent.
n
r
i
a
F
      </p>
      <p>
        In the Interventional case (MODIFY), for group analysis, if the interest is on modifying the overall
impact of the causal relationships, the decision tree guides the expert to consider the Total Causal
Efect [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (15). This metric quantifies the total influence of a protected attribute on the outcome,
considering all possible pathways. If the interest is in indirect influences, the No Proxy Discrimination [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
(16) metric is recommended. This metric ensures that the protected attribute does not influence the
outcome indirectly through proxy variables. For those interested in specific causal pathways, the
Path-Specific Efect [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (17) is highlighted. This metric allows experts to dissect the causal graph
and analyze the efect of the protected attribute through specific pathways. And when the analysis
is at the individual level, the tree distinguishes between actions and similarities (L). This bifurcation
addresses two diferent dimensions of individual-level fairness. Actions refer to the behaviors or eforts
that individuals must undertake to achieve certain outcomes. For instance, the Equality of Efort [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
(18) metric is recommended. This metric assesses fairness by evaluating the level of efort required by
diferent individuals to attain the same result, focusing on the processes or actions rather than just
the outcomes. On the other hand, similarities refer to the comparison between individuals who have
similar attributes. Here, the Individual Direct Discrimination [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] (19) metric is presented. This metric
compares individuals with similar attributes (nodes) to ensure that decisions are not biased against
similar individuals.
      </p>
      <sec id="sec-5-1">
        <title>5.1. Answering RQ3-RQ4</title>
        <p>This integration of causal and statistical fairness metrics into a unique fairness decision tree answers
the following research questions.</p>
        <p>RQ3: Is it possible to have a common vision between causal and statistical fairness?
While statistical fairness and causal fairness each take a diferent approach, they share a common
vision of fairness in ML models. Both approaches recognize biases and inequities that may exist in the
models that process data, but they address these issues from diferent angles: statistical metrics assess
the fairness of a model by comparing observed data using conditional probabilities and conditional
expectations, focusing on the distribution of model outputs and ensuring that the outcomes are equitable.
Causal ones, instead, delve into the underlying causal relationships that drive model decisions. This
approach is more concerned with understanding and eliminating the efects of sensitive attributes on
model outputs through causal pathways. Thus, it provides a deeper analysis by examining how changes
in a protected attribute causally influence the outcome, considering both direct and indirect efects.
Despite their diferent methodologies, both statistical and causal fairness strive to achieve the same
goal: reducing unfairness in ML models. Therefore, it is possible and also beneficial to have a common
vision between causal and statistical fairness, as they complement each other in addressing diferent
facets of bias and discrimination in ML.</p>
        <p>RQ4: How to choose the most suitable metric considering both perspectives?
Choosing the most suitable fairness metric requires an approach that balances insights from both
statistical and causal perspectives. This decision is complex and context-dependent, necessitating a
thorough understanding of the dataset, the model, and the specific fairness goals of the application.
The fairness decision tree framework presented in this chapter provides a method for navigating these
choices. Here is a detailed approach to selecting the most appropriate metric.</p>
        <p>• Deeper Understanding of the Current Dataset and Model: The first step is to analyze the
features and attributes contained in the dataset. It is crucial to focus on protected attributes that
might trigger inequality, such as race, gender, and age. Understanding the distribution of these
attributes is essential for selecting appropriate metrics, as it helps identify potential sources of
bias.
• Choose the Appropriate Fairness Metrics: Use the fairness decision tree as a guide to navigate
through various fairness metrics. The definition of fairness can vary depending on the context.
Thus, it is also crucial to clarify the specific fairness objectives to align the choice of metrics with
the desired outcomes.
• Comprehensive Assessment and Comparison: Use a combination of multiple metrics to
perform a comprehensive assessment. Comparing the results of diferent metrics can provide a
better understanding of fairness. Additionally, it is beneficial to compare the chosen metrics with
those used in other studies to ensure consistency and validity.</p>
        <p>This approach ensures a deeper understanding and mitigation of underlying biases, considering both
perspectives.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>
        We investigate the topic of fairness in machine learning (ML), with a focus on the diference between
statistical and causal fairness metrics, in particular, the possibility to incorporate them into a common
vision. Our approach was motivated by the need to investigate, understand, and unify existing fairness
metrics, which usually concentrate solely on statistical outcomes or delve into causal mechanisms
without considering both perspectives. We introduced the fairness decision tree, a novel solution that
integrates the causal fairness metrics into the original tree proposed by Baresi et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], ofering a
comprehensive view for evaluating fairness in ML models. This tree guides users through a structured
process to select the most suitable fairness metrics based on the specific context of their application. This
solution also shows that the statistical-based and causal-based fairness metrics could have a common
vision since both perspectives have the same goal: reducing unfairness in ML models, although they
address diferent facets of bias.
      </p>
      <sec id="sec-6-1">
        <title>6.1. Limitations and Future Work</title>
        <p>One limitation of this work is that the fairness decision tree framework has yet to be validated and
thoroughly tested in real-world scenarios. This validation could be achieved through user testing,
surveys, or field experiments to determine its practicality and efectiveness in a variety of applications.
Another limitation is that the reviewed literature covers only the last seven years, potentially excluding
older but still relevant studies. Thus, expanding the time frame to include earlier work could provide a
more comprehensive understanding of how fairness metrics have evolved over time. Furthermore, the
fairness decision tree may need refinement when applied to various contexts, as real-world applications
could present challenges not fully anticipated by the current framework.</p>
        <p>Future research may improve the findings of this work in a variety of ways. One direction could be
the application of the tree to various datasets, potentially through experiments involving real-world
scenarios or user interactions to validate its utility and robustness across diferent domains. Furthermore,
as fairness in ML evolves, new metrics and perspectives are likely to emerge. Thus, future research
should be adaptive, incorporating these advancements while continuing to investigate the relationships
between various fairness perspectives.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We thank Professor L. Tanca and T. Dolci for their support. This work was supported in part by project
SERICS (PE00000014) under the NRRP MUR program funded by the EU - NGEU and from the Italian
PRIN project 2022XERWK9 “S-PIC4CHU” – Semantics-based Provenance, Integrity, and Curation for
Consistent, High-quality, and Unbiased data science.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, GPT-4 was used in order to check grammar and spelling. After
using these tool, the authors reviewed and edited the content as needed and take full responsibility for
the publication’s content.</p>
      <p>Artificial Intelligence, 2018.
[23] P. Saleiro, B. Kuester, L. Hinkson, J. London, A. Stevens, A. Anisfeld, K. T. Rodolfa, R. Ghani,</p>
      <p>Aequitas: A bias and fairness audit toolkit, arXiv preprint arXiv:1811.05577 (2018).
[24] B. Salimi, L. Rodriguez, B. Howe, D. Suciu, Capuchin: Causal database repair for algorithmic
fairness, arXiv preprint arXiv:1902.08283 (2019).
[25] N. Kilbertus, P. J. Ball, M. J. Kusner, A. Weller, R. Silva, The sensitivity of counterfactual fairness
to unmeasured confounding, in: Uncertainty in artificial intelligence, PMLR, 2020, pp. 616–626.
[26] J. N. Yan, Z. Gu, H. Lin, J. M. Rzeszotarski, Silva: Interactively assessing machine learning fairness
using causality, in: Proceedings of the 2020 chi conference on human factors in computing systems,
2020, pp. 1–13.
[27] M. Sato, S. Takemori, J. Singh, T. Ohkuma, Unbiased learning for the causal efect of
recommendation, in: Proceedings of the 14th ACM conference on recommender systems, 2020, pp.
378–387.
[28] R. Guo, J. Li, H. Liu, Learning individual causal efects from networked observational data, in:</p>
      <p>Proceedings of the 13th international conference on web search and data mining, 2020, pp. 232–240.
[29] R. Binns, On the apparent conflict between individual and group fairness, in: Proceedings of the
2020 conference on fairness, accountability, and transparency, 2020, pp. 514–524.
[30] M. Morik, A. Singh, J. Hong, T. Joachims, Controlling fairness and bias in dynamic learning-to-rank,
in: Proceedings of the 43rd international ACM SIGIR conference on research and development in
information retrieval, 2020, pp. 429–438.
[31] B. Salimi, B. Howe, D. Suciu, Database repair meets algorithmic fairness, ACM SIGMOD Record
49 (2020) 34–41.
[32] A. Castelnovo, R. Crupi, G. Del Gamba, G. Greco, A. Naseer, D. Regoli, B. S. M. Gonzalez, Befair:
Addressing fairness in the banking sector, in: 2020 IEEE International Conference on Big Data
(Big Data), IEEE, 2020, pp. 3652–3661.
[33] B. Van Breugel, T. Kyono, J. Berrevoets, M. Van der Schaar, Decaf: Generating fair synthetic data
using causally-aware generative networks, Advances in Neural Information Processing Systems
34 (2021) 22221–22233.
[34] W. Pan, S. Cui, J. Bian, C. Zhang, F. Wang, Explaining algorithmic fairness through fairness-aware
causal path decomposition, in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge
Discovery &amp; Data Mining, 2021, pp. 1287–1297.
[35] A. Kasirzadeh, A. Smart, The use and misuse of counterfactuals in ethical machine learning, in:
Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021, pp.
228–236.
[36] Y. Li, H. Chen, S. Xu, Y. Ge, Y. Zhang, Towards personalized fairness based on causal notion, in:
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in
Information Retrieval, 2021, pp. 1054–1063.
[37] M. Du, F. Yang, N. Zou, X. Hu, Fairness in deep learning: A computational perspective, IEEE</p>
      <p>Intelligent Systems 36 (2020) 25–34.
[38] J. Ma, R. Guo, M. Wan, L. Yang, A. Zhang, J. Li, Learning fair node representations with graph
counterfactual fairness, in: Proceedings of the Fifteenth ACM International Conference on Web
Search and Data Mining, 2022, pp. 695–703.
[39] A. N. Carey, X. Wu, The causal fairness field guide: Perspectives from social and formal sciences,</p>
      <p>Frontiers in big Data 5 (2022) 892837.
[40] S. Guha, F. A. Khan, J. Stoyanovich, S. Schelter, Automated data cleaning can hurt fairness in
machine learning-based decision making, IEEE Transactions on Knowledge and Data Engineering
(2024).
[41] B. Becker, R. Kohavi, Adult, UCI Machine Learning Repository, 1996. DOI:
https://doi.org/10.24432/C5XW20.
[42] M. Lichman, et al., Uci machine learning repository, 2013, 2013.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehrabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Morstatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galstyan</surname>
          </string-name>
          ,
          <article-title>A survey on bias and fairness in machine learning</article-title>
          ,
          <source>ACM computing surveys (CSUR) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Makhlouf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhioua</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Palamidessi, Survey on causal-based machine learning fairness notions</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>09553</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Dieterich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mendoza</surname>
          </string-name>
          , T. Brennan,
          <article-title>Compas risk scales: Demonstrating accuracy equity and predictive parity, Northpointe Inc 7 (</article-title>
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hannak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Soeller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lazer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mislove</surname>
          </string-name>
          , C. Wilson,
          <article-title>Measuring price discrimination and steering on e-commerce web sites</article-title>
          ,
          <source>in: Proceedings of the 2014 Conference on Internet Measurement Conference</source>
          , IMC '14,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2014</year>
          , p.
          <fpage>305</fpage>
          -
          <lpage>318</lpage>
          . URL: https://doi.org/10.1145/2663716.2663744. doi:
          <volume>10</volume>
          .1145/2663716.2663744.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          , E. DeFilippis, G. Radanovic,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Parkes</surname>
          </string-name>
          , Y. Liu,
          <article-title>How do fairness definitions fare? examining public attitudes towards algorithmic definitions of fairness</article-title>
          ,
          <source>in: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>99</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Caton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <article-title>Fairness in machine learning: A survey, ACM Computing Surveys (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <article-title>Fairness definitions explained</article-title>
          ,
          <source>in: Proceedings of the international workshop on software fairness</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Baresi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Criscuolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghezzi</surname>
          </string-name>
          ,
          <article-title>Understanding fairness requirements for ml-based software</article-title>
          ,
          <source>in: 2023 IEEE 31st International Requirements Engineering Conference (RE)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>341</fpage>
          -
          <lpage>346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          ,
          <source>Causality: Models, Reasoning and Inference</source>
          , 2nd ed., Cambridge University Press, USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khademi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Foley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Honavar</surname>
          </string-name>
          ,
          <article-title>Fairness in algorithmic decision making: An excursion through the lens of causality</article-title>
          ,
          <source>in: The World Wide Web Conference</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2907</fpage>
          -
          <lpage>2914</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <article-title>Pc-fairness: A unified framework for measuring causality-based fairness</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Causal modeling-based discrimination discovery and removal: Criteria, bounds, and algorithms</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>31</volume>
          (
          <year>2018</year>
          )
          <fpage>2035</fpage>
          -
          <lpage>2050</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          ,
          <article-title>The seven tools of causal inference, with reflections on machine learning</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>62</volume>
          (
          <year>2019</year>
          )
          <fpage>54</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kilbertus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Rojas</given-names>
            <surname>Carulla</surname>
          </string-name>
          , G. Parascandolo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Janzing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <article-title>Avoiding discrimination through causal reasoning</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>M. J. Kusner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Loftus</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Silva</surname>
          </string-name>
          , Counterfactual fairness,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Situation testing-based discrimination discovery: A causal inference approach</article-title>
          ., in: IJCAI, volume
          <volume>16</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>2718</fpage>
          -
          <lpage>2724</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Turini, k-nn as an implementation of situation testing for discrimination discovery and prevention</article-title>
          ,
          <source>in: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>502</fpage>
          -
          <lpage>510</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>W.</given-names>
            <surname>Huan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , X. Wu,
          <article-title>Fairness through equality of efort</article-title>
          ,
          <source>in: Companion Proceedings of the Web Conference</source>
          <year>2020</year>
          ,
          <year>2020</year>
          , pp.
          <fpage>743</fpage>
          -
          <lpage>751</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wohlin</surname>
          </string-name>
          ,
          <article-title>Guidelines for snowballing in systematic literature studies and a replication in software engineering</article-title>
          ,
          <source>in: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering</source>
          , EASE '14,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2014</year>
          . URL: https://doi.org/10.1145/2601248.2601268. doi:
          <volume>10</volume>
          .1145/2601248.2601268.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Galhotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Brun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Meliou</surname>
          </string-name>
          ,
          <article-title>Fairness testing: testing software for discrimination</article-title>
          ,
          <source>in: Proceedings of the 2017 11th Joint meeting on foundations of software engineering</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>498</fpage>
          -
          <lpage>510</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Loftus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Kusner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <article-title>Causal reasoning for algorithmic fairness</article-title>
          , arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>05859</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nabi</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Shpitser</surname>
          </string-name>
          ,
          <article-title>Fair inference on outcomes</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>