<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fairness-aware Naive Bayes Classifier for Data with Multiple Sensitive Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stelios Boulitsakis-Logothetis</string-name>
          <email>stelios.b.logothetis@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Durham Durham</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <fpage>56</fpage>
      <lpage>65</lpage>
      <abstract>
        <p>Fairness-aware machine learning seeks to maximise utility in generating predictions while avoiding unfair discrimination based on sensitive attributes such as race, sex, religion, etc. An important line of work in this field is enforcing fairness during the training step of a classifier. A simple yet effective binary classification algorithm that follows this strategy is two-naive-Bayes (2NB), which enforces statistical parity requiring that the groups comprising the dataset receive positive labels with the same likelihood. In this paper, we generalise this algorithm into N-naive-Bayes (NNB) to eliminate the simplification of assuming only two sensitive groups in the data and instead apply it to an arbitrary number of groups. We propose an extension of the original algorithm's statistical parity constraint and the post-processing routine that enforces statistical independence of the label and the single sensitive attribute. Then, we investigate its application on data with multiple sensitive features and propose a new constraint and post-processing routine to enforce differential fairness, an extension of established group-fairness constraints focused on intersectionalities. We empirically demonstrate the effectiveness of the NNB algorithm on US Census datasets and compare its accuracy and debiasing performance, as measured by disparate impact and DF-ϵ score, with similar group-fairness algorithms. Finally, we lay out important considerations users should be aware of before incorporating this algorithm into their application, and direct them to further reading on the pros, cons, and ethical implications of using statistical parity as a fairness criterion.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Today, countless machine learning-based systems are in use
that autonomously make decisions or aid human
decisionmakers in applications that significantly impact
individuals’ lives. This has made it vital to develop ways of
ensuring these models are trustworthy, ethical, and fair. The
ifeld of fairness-aware machine learning is centered on
enhancing the fairness, explainability, and auditability of ML
models. A goal many research works in this field share is
to maximise utility in generating predictions while avoiding
discrimination against people based on specific sensitive
attributes, such as race, sex, religion, nationality, etc.</p>
      <p>
        Researchers have devised many formalisations to try and
capture intuitive notions of fairness, each with different
priorities and limitations. We summarise the ones we will
men___________________________________
In T. Kido, K. Takadama (Eds.), Proceedings of the AAAI 2022 Spring Symposium
“How Fair is Fair? Achieving Wellbeing AI”, Stanford University, Palo Alto, California,
USA, March 21–23, 2022. Copyright © 2022 for this paper by its authors. Use permitted
under Creative Commons License Attribution 4.0 International (CC BY 4.0).
tion here in Table 1. Traditionally, the proposed notions have
been classified into two categories. The simplest and most
well-studied, group fairness, is based on defining distinct
protected groups in the given data. Then, for each of these
groups, a user-selected statistical constraint must be
satisifed. This has notable disadvantages: It requires groups to be
treated fairly in aggregate, but this guarantee does not
necessarily extend to individuals
        <xref ref-type="bibr" rid="ref3">(Awasthi et al. 2020)</xref>
        . Further,
different statistical constraints prioritise different aspects of
fairness. Many of them have also been shown to be
incompatible with each other, making the choice even more
difficult for users. Finally, the choice of the protected groups that
should be considered is an open question
        <xref ref-type="bibr" rid="ref13 ref5">(Blum et al. 2018;
Kleinberg, Mullainathan, and Raghavan 2017)</xref>
        .
      </p>
      <p>
        An orthogonal notion to group fairness is individual
fairness. Put simply, this notion requires that ”similar
individuals be treated similarly”
        <xref ref-type="bibr" rid="ref9">(Dwork et al. 2012)</xref>
        . This approach
addresses the previous lack of any individual-level
guarantees. However, it requires strong functional assumptions and
still requires the step of choosing an underlying metric over
the dataset features
        <xref ref-type="bibr" rid="ref3">(Awasthi et al. 2020)</xref>
        .
      </p>
      <p>
        Alternative models of fairness have been proposed to
address the disadvantages of the two traditional definitions.
One model is causal fairness, which examines the unfair
causal effect the sensitive attribute value may have on the
prediction made by an algorithm
        <xref ref-type="bibr" rid="ref14 ref21">(Mhasawade and Chunara
2021)</xref>
        . Another, which is explored in this paper, is
differential fairness (DF). This is an extension of the established
group fairness concepts that applies them to the case of
intersectionalities, meaning groups that are defined by multiple
overlapping sensitive attributes
        <xref ref-type="bibr" rid="ref15">(Foulds et al. 2020; Morina
et al. 2019)</xref>
        .
      </p>
      <p>
        A similar model is statistical parity subgroup fairness
(SF), which focuses on mitigating intersectional bias by
applying group fairness to the case of infinitely many, very
small subgroups
        <xref ref-type="bibr" rid="ref12">(Kearns et al. 2018)</xref>
        . SF and DF are
notable because they both enable a more nuanced
understanding of unfairness than when a single sensitive attribute and
broad, coarse groups are considered. A key difference
between them, however, is DF’s focus on minority groups. The
SF measure of subgroup parity weighs larger groups more
heavily than very small ones, while DF-parity considers all
groups equally. This means DF can provide greater
protection to very small minority groups since, in SF, their impact
on the overall score is reduced (Foulds et al. 2020).
      </p>
      <p>
        Despite the lack of consensus on any universal notion of
fairness, research has proceeded using the existing models.
A major line of work in the development of fair learning
algorithms is enforcing fairness during the training step of a
classifier
        <xref ref-type="bibr" rid="ref8">(Donini et al. 2018)</xref>
        . A simple yet effective
algorithm that follows this strategy is Calders and Verwer’s
twonaive-Bayes algorithm
        <xref ref-type="bibr" rid="ref6">(Calders and Verwer 2010)</xref>
        (2NB).
This algorithm was originally proposed as one of three ways
of pursuing fairness in naive Bayes classification. It received
fur
        <xref ref-type="bibr" rid="ref10">ther attention in the 2013</xref>
        publication (Kamishima et al.
2013) which asserted its effectiveness in enforcing group
fairness in binary classification and explored its underlying
statistics. It works by training separate naive Bayes
classiifers for each of the two (by assumption) groups comprise
the dataset, the privileged and the non-privileged group.
Then, the algorithm iteratively assesses the fairness of the
combined model and makes small changes to the observed
probabilities in the direction of making them more fair
(Friedler et al. 2019).
      </p>
      <p>
        A recent publication exploring the arguments for and
against statistical parity
        <xref ref-type="bibr" rid="ref18">(Ra¨z 2021)</xref>
        has served as
motivation to re-visit algorithms based around it. Statistical parity
(also referred to as demographic parity or independence) is
a group fairness notion which requires that the groups
comprising the dataset receive positive labels with the same
likelihood. An assumption that is at the core of 2NB and many
other research works, however, is that of a single, binary
sensitive feature
        <xref ref-type="bibr" rid="ref17">(Oneto, Donini, and Pontil 2020)</xref>
        . This
assumption has been noted to rarely hold in the real world, and
eliminating it is one of the essential goals of the previously
introduced notions of differential fairness and subgroup
parity fairness
        <xref ref-type="bibr" rid="ref12">(Foulds et al. 2020; Kearns et al. 2018)</xref>
        .
      </p>
      <p>This opens the question of how 2NB can be applied to
data with multiple, overlapping sensitive attributes while
avoiding oversimplification. The 2NB algorithm is
applicable to a wide range of tasks and its effectiveness, even in
comparison to more complex algorithms, has been
demonstrated (Kamishima et al. 2013; Friedler et al. 2019). At the
same time, its’ design is sufficiently elegant and intuitive to
be approachable to practitioners across many disciplines - an
important advantage. Thus, extending the algorithm to cover
more use cases will be the focus of this work.</p>
    </sec>
    <sec id="sec-2">
      <title>Contributions</title>
      <p>This paper seeks to build upon Calders and Verwer’s work
by exploring the following:
• We adapt the original 2NB structure and balancing
routine to support multiple, polyvalent (categorical)
sensitive features.
• We use this new property of the algorithm to apply it to
differential fairness.
• To support the above, we examine the extended
algorithm’s performance on real-world US Census data.
• Finally, we lay out important considerations users should
be aware of before using this algorithm. We draw upon
the literature to lay out the pros, cons, and ethical
implications of using statistical parity as a fairness criterion.</p>
      <sec id="sec-2-1">
        <title>Name</title>
      </sec>
      <sec id="sec-2-2">
        <title>Statistical Parity</title>
      </sec>
      <sec id="sec-2-3">
        <title>Disparate Impact</title>
      </sec>
      <sec id="sec-2-4">
        <title>Subgroup Fairness</title>
      </sec>
      <sec id="sec-2-5">
        <title>Differential Fairness</title>
      </sec>
      <sec id="sec-2-6">
        <title>Individual Fairness</title>
      </sec>
      <sec id="sec-2-7">
        <title>Causal Fairness</title>
      </sec>
      <sec id="sec-2-8">
        <title>Definition</title>
      </sec>
      <sec id="sec-2-9">
        <title>Likelihood of positive prediction given group membership should be equal for all groups.</title>
      </sec>
      <sec id="sec-2-10">
        <title>Mean ratio of positive predictions for each pair</title>
        <p>of groups should be 1 or greater than p%.</p>
      </sec>
      <sec id="sec-2-11">
        <title>Group fairness applied to infinite number of very small groups.</title>
      </sec>
      <sec id="sec-2-12">
        <title>Group fairness applied to groups defined by multiple overlapping sensitive attributes.</title>
      </sec>
      <sec id="sec-2-13">
        <title>Distance between the likelihood of outcomes between any two individuals should be no greater than similarity distance between them.</title>
      </sec>
      <sec id="sec-2-14">
        <title>Use of causal modelling to find effect of sensitive attributes on predictions. Table 1: Some notable formalisations of fairness.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Naive Bayes Naive Bayes is a probabilistic data mining
and classification algorithm. In spite of its relative
simplicity, it has been shown to be very competent in real-world
applications that require classification or class probability
estimation and ranking1. Various strategies have been explored
for improving the algorithm’s performance by weakening its
conditional independence assumption. These include
structure extension, attribute weighting, etc. These techniques
focus on maximising accuracy or averaged conditional log
likelihood (Jiang 2011). Calders and Verwer’s proposal of
composing multiple naive Bayes models instead aims to
enforce independence of predictions with respect to a binary
sensitive feature, thus satisfying the statistical parity
constraint between the two groups
        <xref ref-type="bibr" rid="ref6">(Calders and Verwer 2010)</xref>
        .
Fair Classification There is a large body of research into
designing learning methods that do not use sensitive
information in discriminatory ways
        <xref ref-type="bibr" rid="ref17">(Oneto, Donini, and Pontil
2020)</xref>
        . As mentioned, various formalisations of fairness
exist but the most well-studied one is group fairness
        <xref ref-type="bibr" rid="ref5">(Blum
et al. 2018)</xref>
        . Many algorithms designed around this notion
are introduced as part of the comparative experiment in
Section 3.
      </p>
      <p>
        A more recent proposal, differential fairness (DF),
extends existing group fairness concepts to protect subgroups
defined by intersections of and by individual sensitive
attributes. The original papers by (Foulds et al. 2020) and
        <xref ref-type="bibr" rid="ref15">(Morina et al. 2019)</xref>
        explore the context of intersectionality,
and provide comparisons of DF with established concepts.
The first paper asserts DF’s distinction from subgroup parity
and demonstrates its usefulness in protecting small minority
groups. The latter paper gives methods to robustly estimate
the DF metrics and proposes a post-processing technique to
enforce DF on classifiers.
      </p>
      <p>
        1Recent, novel applications include
        <xref ref-type="bibr" rid="ref16 ref22">(Valdiviezo-Diaz et al.
2019; Feng et al. 2018; Niazi et al. 2019)</xref>
        among others.
Humanistic Analysis A line of work that is parallel to
fair algorithm development focuses on analysing these
proposals from an ethical, philosophical, and moral standpoint.
A recent such publication, which examines statistical
parity among other notions, and which motivated and
influenced this paper, is by Hertweck, Heitz, and Loi
        <xref ref-type="bibr" rid="ref14 ref21">(Hertweck, Heitz, and Loi 2021)</xref>
        . They propose
philosophicallygrounded criteria for justifying the enforcement of
independence/statistical parity in a given task. They include
scenarios where enforcing statistical parity is ethical and justified,
as well as counter-examples where the criteria are met but
independence should not be enforced. As with many
similar works, they conclude by directing the reader to strike
a balance between fairness and utilitarian concerns (such as
accuracy) in their task. (Heidari et al. 2019) do similar work,
laying out the moral assumptions underlying several
popular notions of fairness. In
        <xref ref-type="bibr" rid="ref18">(Ra¨z 2021)</xref>
        , Ra¨z critically
examines the advantages and shortcomings of statistical parity as
a fairness criterion and makes an overall positive case for it.
      </p>
      <p>
        (Friedler, Scheidegger, and Venkatasubramanian 2016)
introduce the concept of distinct worldviews which influence
how we pursue fairness. One of them is that We’re All Equal
(WAE) i.e. there is no association between the construct (the
latent feature that is truly relevant for the prediction) and the
sensitive attribute. The orthogonal worldview is that What
You See Is What You Get, wherein the observed labels are
accurate reflections of the construct. In
        <xref ref-type="bibr" rid="ref14 ref21">(Yeom and Tschantz
2021)</xref>
        , Yeom and Tschantz give a measure of disparity
amplification and dissect the popular group fairness models of
statistical parity, equalised odds, calibration, and predictive
parity through the lens of worldviews. They argue that
under WAE, statistical parity is required to eliminate disparity
amplification. However, deviating from this worldviews
introduces inaccuracy when we enforce parity.
      </p>
      <p>2</p>
      <sec id="sec-3-1">
        <title>N-Naive-Bayes Algorithm</title>
        <p>The proposed N-naive-Bayes algorithm is a supervised
binary classifier that allows the enforcement of a statistical
fairness constraint in its predictions. Given an (ideally large)
training set of labelled instances, the algorithm partitions the
data based on sensitive attribute value and trains a separate
naive Bayes sub-estimator on each of the sub-sets. This is an
extension of the original two-naive-Bayes structure, where
exactly two sub-estimators are trained. The next step of the
training stage is for the conditional probabilities P (Y |S) to
be empirically estimated from the training set. Where Ns is
the number of instances that belong to group s, and Ny,s
the number of instances of that group that have label y, the
empirical conditional probability2 is given as:</p>
        <p>P (y|s) =</p>
        <p>Ny,s + α
Ns + 2 ∗ α
(1)</p>
        <p>Finally, the algorithm modifies the joint distribution
P (Y, S) to enforce the given fairness constraint. Then, the
2Equation (1) gives a smoothed empirical probability, where the
constant α is the parameter of a symmetric Dirichlet prior with
concentration parameter 2 ∗ α, since a binary label is assumed.
ifnal predicted class probabilities, for a sample xs (where x
is the feature vector excluding the sensitive feature s), is:
P (y|xs) = P (x|y) ∗ P (s|y) ∗ P (y)
= Cs(x) ∗ P (s|y) ∗ P (y)
= Cs(x) ∗ P (s ∩ y)
(2)
(3)
(4)
Where Cs is the the sub-estimator for sensitive group s ∈ S.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Enforcing Statistical Parity</title>
      <p>
        To satisfy the statistical parity constraint, the original 2NB
algorithm runs a heuristic post-processing routine that
iteratively adjusts the conditional probabilities P (Y |S) of the
groups in the direction of making them equal. During its
execution, this probability-balancing routine alternates
between reducing N (Y = 1, S = 1) and increasing N (Y =
1, S = 0) depending on the number of positive labels
outputted by the model at each iteration. This is to try and keep
the resultant marginal distribution of Y stable. Once
balancing is complete, the value of P (S|Y ) can be induced from
Ny,s similar to (1). The first contribution of this paper is to
extend this routine to suit the polyvalent definition of
statistical parity we will use:
Definition 1. Statistical (Conditional) Parity for Polyvalent
S
        <xref ref-type="bibr" rid="ref19">(Ritov, Sun, and Zhao 2017)</xref>
        :
      </p>
      <p>For predicted binary labels yˆ and polyvalent sensitive
feature S, statistical (conditional) parity requires 3:
P (yˆ = 1|s) = P (yˆ = 1|s′) ∀ s, s′ ∈ S
(5)</p>
      <p>We modify the probability-balancing routine to subtract
and add probability to the group with the highest (max) and
lowest (min) current P (Y = 1|s) respectively. These
probabilities are re-computed with each iteration, and the max
and min groups re-selected. Further, we introduce the
constraint that only groups designated by the user as privileged
can receive a reduction in their likelihood of getting a
positive label 4. This is to avoid making any assumptions about
which groups it would be appropriate to demote positive
instances of. It allows the balancing routine to terminate
immediately if it over-corrects, or if the data is such that
P (yˆ = 1|snp) &gt; P (yˆ = 1|sp) to begin with, as is the case in
the well-known UCI Adult dataset, for example. This gives
us the final form of our statistical parity criterion:
Definition 2. Statistical Parity Criterion for NNB:
For predicted binary labels yˆ and sensitive feature S:
P (yˆ = 1|sp) = P (yˆ = 1|snp) ∀ (sp, snp) ∈ Sp × Snp (6)
Where Sp and Snp are the sub-sets of all privileged and
nonprivileged sub-groups of S respectively.</p>
      <p>We adapt the above definition into a score that the
algorithm can minimise:
disc = max P (yˆ = 1|sp) −
min P (yˆ = 1|snp)
(7)
3The cited definition requires this to hold for all values of yˆ,
however for a binary label it is sufficient to check yˆ = 1.
4A similar constraint is explored by (Zafar et al. 2017).
Algorithm 1: Pseudocode for a probability-balancing routine
to enforce statistical parity
1: Calculate the parity score, disc, of the predicted classes
by the current model and store smax, smin
2: while disc &gt; disc0 do
3: Let numpos be the number of positive samples by
the current model
4: if numpos &lt; the number of positive samples in the
training set then
5: N (y = 1, smin) + = ∆ ∗ N (y = 0, smin)
6: N (y = 0, smin) − = ∆ ∗ N (y = 0, smin)
7: else
8: N (y = 1, smax) + = ∆ ∗ N (y = 1, smax)
9: N (y = 0, smax) − = ∆ ∗ N (y = 1, smax)
10: end if
11: If any N (y, s) is now negative, rollback the changes
and terminate
12: Recalculate P (Y |S), disc, smax, smin
13: end while</p>
      <p>Note that the above criterion can easily be relaxed to
apply the four-fifths rule for removing disparate impact (or its
more general form, the p% rule (Zafar et al. 2017)) instead
of perfect statistical parity. For the purposes of this paper,
however, we explore the effect of statistical parity in its base
form.</p>
      <p>We also note the definition of disparate impact we use in
the evaluation stage:</p>
    </sec>
    <sec id="sec-5">
      <title>Definition 3.</title>
      <p>Disparate Impact (Mean) for Polyvalent S:
1</p>
      <p>X</p>
      <p>P (yˆ = 1|snp)
|Sp × Snp| (sp,snp) P (yˆ = 1|sp)</p>
      <p>Algorithm 1 describes the extended probability
balancing heuristic for enforcing parity. The values of sp, snp in
the parity criterion (Equation 7) are referred to as smax and
smin respectively. At each iteration, the routine determines
these groups and adjusts their conditional probabilities. A
further modification from the original is that the proportion
by which the probabilities are adjusted with each iteration is
now proportional to the size of the group itself, instead of the
size of the opposite group. In experiments, this yields a great
performance improvement, especially where the distribution
of samples over S is very imbalanced.</p>
    </sec>
    <sec id="sec-6">
      <title>Enforcing Differential Fairness</title>
      <p>An alternative measure of fairness we explore is differential
fairness, as given in (Foulds et al. 2020).</p>
      <p>Definition 4. A classifier is ϵ-differentially fair if:
e−ϵ
≤</p>
      <p>P (yˆ|s) ϵ</p>
      <p>P (yˆ|s′) ≤ e ∀ s, s′ ∈ S, yˆ ∈ Y</p>
      <p>The (smoothed) empirical differential fairness score, from
the empirical counts in the data, assuming a binary label, is:
eϵ−
≤</p>
      <p>N (yˆ, s) + α
N (s) + β</p>
      <p>N (s′) + β ϵ
N (yˆ, s′) + α ≤ e ∀ s, s′ ∈ S, yˆ ∈ Y
(8)
(9)</p>
      <p>This is used in experiments to estimate the value of ϵ (the
ϵ-score) from the predicted labels on the dataset 5. In
experiments we set β = 2 ∗ α and substitute with the observed
conditional probability estimates from the dataset. An
additional measure given in (Foulds et al. 2020) to assess
fairness from the standpoint of intersectionality is differential
fairness bias amplification. This measure gives an indication
of how much a black-box classifier increases the unfairness
over the original data (Foulds et al. 2020; Zhao et al. 2017).
Definition 5. Differential Fairness Bias Amplification</p>
      <p>A classifier C satisfies (ϵ 2 − ϵ 1)-DF bias amplification
w.r.t. dataset D if C is ϵ 2-DF fair and D is ϵ 1-DF fair.</p>
      <p>To adjust the joint distribution P (Y, S) to minimise
satisfy DF-fairness and minimise the ϵ-score, we propose a
new heuristic probability-balancing routine and associated
discrimination score. The distinction from the balancing
routine given in Algorithm 1 is that this focuses on
outputting a narrower range of probabilities, while still
avoiding negatively impacting groups that are designated as
nonprivileged. To form the new discrimination score, we
apply the principle of separating privileged and non-privileged
sub-groups of S from the previous section to the ϵ-score
definition:
e−ϵ</p>
      <p>≤
Snp:</p>
      <p>P (yˆ = 1|snp) ϵ</p>
      <p>P (yˆ = 1|sp) ≤ e ∀ (sp, snp) ∈ Sp × Snp (10)
We then express this restricted ϵ-score as the maximum of
two ratios: eϵ = max(ρ d, ρ u), where for (sp, snp) ∈ Sp ×
ρ d = max P (yˆ = 1|snp) , ρ u = max P (yˆ = 1|sp)
P (yˆ = 1|sp)</p>
      <p>P (yˆ = 1|snp)
(11)</p>
      <p>The execution of the proposed balancing routine is
determined by these ratios. If ρ d is greater, then the
nonprivileged sub-group with smallest probability at that
iteration receives an increase in probability. If ρ u is greater, then
the privileged group with highest probability receives a
decrease in probability. These conditions can be expected to
alternate as the conditional probabilities P (Y |S) converge.
Iteration continues until ρ d is close to zero. The smax and
smin groups are determined as in the previous section.</p>
      <p>
        This routine disregards the number of positive labels the
model produces, while Algorithm 1 attempts to keep that
number close to the number of positive labels in the
training data. This allows it to avoid situations where a single,
non-privileged sub-group with small probability would
require the probabilities of the privileged groups to be reduced
significantly. In such cases, other non-privileged sub-groups
might maintain much higher probabilities, therefore giving
a poor ϵ-score. An further difference is the proportion by
5Note that this definition produces noisier estimates for
subgroups with fewer members.
        <xref ref-type="bibr" rid="ref15">(Morina et al. 2019)</xref>
        shows that as the
dataset grows, the given estimate converges to the true value, and
that this happens regardless of the chosen smoothing parameters.
However, for small or imbalanced datasets, more robust estimation
methods should be used.
Algorithm 2: Pseudocode for a probability-balancing routine
to enforce DF parity
1: Calculate the ratios ρ d, ρ u empirically from the
predicted classes by the current model, store smax, smin
2: while ρ d &gt; disc0 do
3: if ρ u ≤ ρ d then
4: N (y = 0, smin) − = ∆ ∗ N (y = 0, smin)
5: N (y = 1, smin) + = ∆ ∗ N (y = 1, smin)
6: else
7: N (y = 0, smax) + = ∆ ∗ N (y = 0, smax)
8: N (y = 1, smax) − = ∆ ∗ N (y = 1, smax)
9: end if
10: Recalculate P (Y |S), ρ d, ρ u, smax, smin
11: end while
which each Ny,s is modified grows/decreases exponentially.
In experiments, this allows the routine escape local minima
that occur during the adjustment of P (Y |S) and lead to
inefficiency. This routine does, however, offer a theoretical
accuracy trade-off compared to Algorithm 1, which we
investigate in the following section.
      </p>
      <p>
        Finally, note that all the above probability-balancing
routines (including Calders and Verwer’s original one) are
based around the assumption that the distribution of labels
over the sensitive feature(s) in the training set is reflective of
the test setting. This assumption is not unique to this model
(see
        <xref ref-type="bibr" rid="ref1">(Agarwal et al. 2018; Hardt, Price, and Srebro 2016)</xref>
        ),
and under it, we can conclude that minimising the given
fairness measure on the training set generalises to the test data
        <xref ref-type="bibr" rid="ref20">(Singh et al. 2021)</xref>
        .
      </p>
      <p>3</p>
      <sec id="sec-6-1">
        <title>Experimental Results</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Setup</title>
      <p>We implement NNB in Python within the scikit-Learn
framework, using Gaussian naive Bayes as the
subestimator. We then evaluate its performance in two
experiments.</p>
      <p>
        For both experiments, we use real-world data from the US
Census Bureau6.
        <xref ref-type="bibr" rid="ref7">(Ding et al. 2021)</xref>
        define several
classification tasks on this data, each involving a sub-set of the total
features available. We consider two:
• Income: Predict whether an individual’s income is
above $50,000. The data for this problem is filtered so
that it serves as a comparable replacement to the
wellknown UCI Adult dataset.
• Employment: Predict whether an individual is
employed
      </p>
      <p>
        The details of which features are included in each task and
what filtering takes place can be found in the paper
        <xref ref-type="bibr" rid="ref7">(Ding
et al. 2021)</xref>
        and the associated page on GitHub7. To
evaluate NNB we use data from the 2018 census in the state
of California. The sensitive feature(s) used in each task are
      </p>
      <sec id="sec-7-1">
        <title>6https://www.census.gov/programs</title>
        <p>surveys/acs/microdata/documentation.html
7https://github.com/zykls/folktables
indicated after its name, e.g. Income-Race-Sex is the
Income task using race and sex as the sensitive features.
To best capture intersectional fairness when using multiple
sensitive features, we follow the approach from (Foulds et al.
2020) and define each group s as a tuple of the sub-groups
of each sensitive feature that each sample belongs to.
First Experiment This experiment compares NNB’s
performance with other algorithms. The comparison includes
”vanilla” models as baselines for performance, and several
group-fairness-aware algorithms that have a similar focus to
NNB - ensuring non-discrimination across protected groups
by optimising metrics such as statistical parity or disparate
impact. Specifically, we consider the following:
• GaussianNB, DecisionTree, LR, SVM:
scikitLearn’s Gaussian naive Bayes, Decision Trees, Logistic
Regression, and SVM.
• Feldman-DT, Feldman-NB: A pre-processing
algorithm that aims to remove disparate impact. It equalises
the marginal distributions of the subsets of each attribute
with each sensitive value (Feldman et al. 2015). The
resulting ”repaired” data is then used to train scikit-Learn
classifiers - Decision Trees (DT) and Gaussian naive
Bayes (NB).
• Kamishima: An in-processing method that introduces a
regularisation term to logistic regression to enforce
independence of labels from the sensitive feature (Kamishima
et al. 2012).
• ZafarAccuracy, ZafarFairness: An
inprocessing algorithm that applies fairness constraints
to convex margin-based classifiers (Zafar et al. 2017) .
Specifically, we test two variations of a modified logistic
regression classifier: The first maximises accuracy
subject to fairness (disparate impact) constraints, while
the latter prioritises removing disparate impact.
• 2NB: Calders and Verwer’s original algorithm, using the
same GaussianNB sub-estimator as NNB.
• NNB-Parity, NNB-DF: N-naive-Bayes tuned to
satisfy statistical parity using Algorithm 1, and DF-parity
using Algorithm 2.</p>
        <p>For the comparison we use the benchmark provided by
(Friedler et al. 2019). The fairness-aware algorithms are
tuned via grid-search to optimise accuracy. The performance
of the algorithms is then measured over ten random train-test
splits of the data.</p>
        <p>Second Experiment This experiment demonstrates how
NNB performs in finer detail. We consider GaussianNB,
NNB-Parity, and NNB-DF as before, and we further
include 2NB, the original two-naive-Bayes algorithm
implemented identically to NNB. Finally, we include Perfect
as a secondary baseline, to illustrate the scores that would
be achieved by a perfect classifier.</p>
        <p>To evaluate the performance of the above algorithms, we
note the mean and variance of the following measures over
10 random train-test splits: accuracy, AUC, disparate
impact score (mean of the DI between all privileged and
nonprivileged groups), statistical parity score (as defined in 2),
DF-ϵ (as defined in 4), DF-bias amplification score (as
deifned in 5). We also compare the resultant distribution of
labels over groups of S on a single random train-test split.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Results</title>
      <p>First Experiment Figure 1 gives the accuracy vs. the
disparate impact and DF-ϵ scores on the Income-Race
and Income-Race-Sex tasks. Figure 2 shows the same for
Employment-Race and Employment-Race-Sex. It can be
seen that on Income-Race, NNB results in a higher DI score
than 2NB and has often over-favoured non-privileged groups
causing a score &gt; 1. Its accuracy is on-par with 2NB and
the baseline naive Bayes, DT, and LR models. Feldman’s
algorithm with Decision Trees results similar disparate impact
score in some splits, but lower accuracy. The same is true for
the DF-ϵ score on this task. On Income-Race-Sex, NNB-DF
beats out all other algorithms in achieving DI ∼ 1, however
NNB-Parity has higher accuracy than both NNB-DF and
naive Bayes. NNB-DF is also the most successful at
minimising the ϵ-score for this task, though again this comes at
the cost of lower accuracy than the baseline model.</p>
      <p>On Employment-Race all naive Bayes models achieve
similar accuracy, while DT and LR-based models rank
higher, and SVM the highest. The same can be observed for
Employment-Race-Sex, and for both tasks NNB-DF again
gives the ϵ-scores closest to zero.</p>
      <p>Second Experiment Table 2 gives the scores achieved
on the Income-Race task, and Table 3 gives the
same Employment-Race-Sex. On Income-Race,
both NNB models gave an improved parity score compared
to the perfect classifier and GaussianNB. NNB and 2NB
also gave improved disparate impact scores over the baseline
models, but 2NB under-corrected while the NNB models
gave a score &gt; 1 indicating they favoured the non-privileged
groups over the privileged group.</p>
      <p>NNB-Parity and NNB-DF gave similar disparate
impact scores, but the former gave higher accuracy while the
latter produced a narrower range of positive label
proportions, and thus better parity, ϵ, and DF-bias amplification
scores. The evident accuracy trade-off is more pronounced
in the latter task, with NNB-Parity achieving an accuracy
of 0.7445 ± 0.00, and NNB-DF achieving 0.7199 ± 0.00.
On Employment-Race-Sex, NNB-DF outperformed
NNB-Parity on all scores. This was also the case for
Employment-Race, where both models had similar
accuracy but NNB-DF displayed less over-correction in its
disparate impact score (1.0336 ± 0.0001 versus 1.2760 ±
0.0002), in addition to the expected improvement in ϵ-score
(0.1068 ± 0.001 versus 0.3434 ± 0.0001). This suggests the
DF balancing routine is better suited for the Employment
task than the parity-based routine.</p>
      <p>4</p>
      <sec id="sec-8-1">
        <title>Discussion</title>
        <p>In this work we presented an extension of the
two-naiveBayes algorithm, adapting it to suit datasets with
multiple, polyvalent sensitive features. We applied the proposed
N-naive-Bayes structure to intersectionality and
differential fairness by giving an alternative probability-balancing
routine. Our experiments on real-world datasets yielded
favourable results and demonstrated the effectiveness and
the differences between the parity and DF-based approaches.</p>
        <p>
          We conclude by laying out key considerations users
should take into account before using N-naive-Bayes:
Statistical Parity as a Fairness Criterion Statistical
parity stands opposed to the (aggregate) accuracy of a classifier,
except in degenerate cases where the data is already fair, so
it is recommended that a balance between the two is pursued
          <xref ref-type="bibr" rid="ref14 ref21">(Hertweck, Heitz, and Loi 2021)</xref>
          . This also applies to the
extended, but still parity-based, DF measure that was explored
in Section 2. In their worldview-based analysis, Yeom and
Tschantz caution us that even under WAE, blind
enforcement of statistical parity can introduce new discrimination
into the system
          <xref ref-type="bibr" rid="ref14 ref21">(Yeom and Tschantz 2021)</xref>
          . Thus, users must
be aware of the ethical implications of using parity as a core
fairness constraint, the possible impact it may have on
individuals, and the moral objections these individuals may
justifiably raise.
        </p>
        <p>
          We recommend further reading on the advantages and
disadvantages of group fairness in general
          <xref ref-type="bibr" rid="ref18 ref9">(Ra¨z 2021; Dwork
et al. 2012; Heidari et al. 2019)</xref>
          , as well as parity specifically
          <xref ref-type="bibr" rid="ref14 ref14 ref21 ref21">(Hertweck, Heitz, and Loi 2021; Yeom and Tschantz 2021)</xref>
          ,
so users can make informed decisions on how to apply
statistical parity and N-naive-Bayes to their application.
GaussianNB
NNB-Parity
NNB-DF
0.4680 ± 0.0045
0.0353 ± 0.0059
Limitations of NNB N-naive-Bayes (as with
two-naiveBayes) has inherent limitations. The algorithm does not
automatically make a classification task fair when it is
applied. This is only considered to be possible by doing
extensive domain-specific investigation (Hardt, Price, and Srebro
2016). Rather, the algorithm introduces a form of affirmative
action to the task, increasing and decreasing the likelihood
of different groups receiving a positive label in an attempt to
satisfy the given parity constraint. This intentional
manipulation of the original distribution over the data can be done
to correct for structural biases in the data, for the purposes
of compliance with regulations, or even as part of an effort
to counteract historical inequalities.
        </p>
        <p>
          Users should always consider the implications of
estimating probability distributions for each group separately (as
is done at the beginning of the training stage), as well as
the mechanism behind any post-facto probability tuning they
decide on. Further, users should understand the implications
of affirmative action, its downstream effects, and ensure it
is appropriate to their application. As a starting point for
further reading, see
          <xref ref-type="bibr" rid="ref11 ref9">(Dwork et al. 2012; Kannan, Roth, and
Ziani 2019)</xref>
          . Sociological and legal works such as
          <xref ref-type="bibr" rid="ref2">(Kalev,
Dobbin, and Kelly 2006; Anderson 2003)</xref>
          are also
recommended.
        </p>
        <p>
          Finally, the explicit choice of sensitive features to
consider when enforcing statistical parity is a simplification of
the real world and should be done carefully. One should
consider the ontology behind observed values in the dataset:
race, for example, has varying definitions, each of which
comes with its own assumptions. Further, identifying groups
in the data using a set of observable qualities, whatever those
may be, also carries implicit assumptions about how all the
factors involved interact with each other and the validity of
decomposing them into discrete features
          <xref ref-type="bibr" rid="ref4">(Barocas, Hardt,
and Narayanan 2019, Ch. 5)</xref>
          .
the 3rd Innovations in Theoretical Computer Science
Conference, ITCS ’12, 214–226. New York, NY, USA:
Association for Computing Machinery. ISBN 9781450311151.
Feldman, M.; Friedler, S. A.; Moeller, J.; Scheidegger, C.;
and Venkatasubramanian, S. 2015. Certifying and
Removing Disparate Impact. In Proceedings of the 21th ACM
SIGKDD International Conference on Knowledge
Discovery and Data Mining, KDD ’15, 259–268. New York,
NY, USA: Association for Computing Machinery. ISBN
9781450336642.
        </p>
        <p>Feng, X.; Li, S.; Yuan, C.; Zeng, P.; and Sun, Y. 2018.
Prediction of Slope Stability using Naive Bayes Classifier.
KSCE Journal of Civil Engineering, 22(3): 941–950.
Foulds, J. R.; Islam, R.; Keya, K. N.; and Pan, S. 2020. An
Intersectional Definition of Fairness. In 2020 IEEE 36th
International Conference on Data Engineering (ICDE), 1918–
1921. Dallas, Texas, USA: IEEE.</p>
        <p>Friedler, S. A.; Scheidegger, C.; and Venkatasubramanian,
S. 2016. On the (im)possibility of fairness. CoRR,
abs/1609.07236: 16.</p>
        <p>Friedler, S. A.; Scheidegger, C.; Venkatasubramanian, S.;
Choudhary, S.; Hamilton, E. P.; and Roth, D. 2019. A
Comparative Study of Fairness-Enhancing Interventions in
Machine Learning. In Proceedings of the Conference on
Fairness, Accountability, and Transparency, FAT* ’19, 329–338.
New York, NY, USA: Association for Computing
Machinery. ISBN 9781450361255.</p>
        <p>Hardt, M.; Price, E.; and Srebro, N. 2016. Equality of
Opportunity in Supervised Learning. In Proceedings of the 30th
International Conference on Neural Information Processing
Systems, NIPS’16, 3323–3331. Red Hook, NY, USA:
Curran Associates Inc. ISBN 9781510838819.</p>
        <p>Heidari, H.; Loi, M.; Gummadi, K. P.; and Krause, A.
2019. A Moral Framework for Understanding Fair ML
through Economic Models of Equality of Opportunity. In
Proceedings of the Conference on Fairness,
Accountability, and Transparency, FAT* ’19, 181–190. New York,
NY, USA: Association for Computing Machinery. ISBN
9781450361255.</p>
        <p>Hertweck, C.; Heitz, C.; and Loi, M. 2021. On the Moral
Justification of Statistical Parity. In Proceedings of the 2021
ACM Conference on Fairness, Accountability, and
Transparency, FAccT ’21, 747–757. New York, NY, USA:
Association for Computing Machinery. ISBN 9781450383097.
Jiang, L. 2011. Random one-dependence estimators. Pattern
Recognition Letters, 32(3): 532–539.</p>
        <p>Kalev, A.; Dobbin, F.; and Kelly, E. 2006. Best Practices or
Best Guesses? Assessing the Efficacy of Corporate
Affirmative Action and Diversity Policies. American Sociological
Review, 71(4): 589–617.</p>
        <p>Kamishima, T.; Akaho, S.; Asoh, H.; and Sakuma, J. 2012.
Fairness-Aware Classifier with Prejudice Remover
Regularizer. In Flach, P. A.; De Bie, T.; and Cristianini, N., eds.,
Machine Learning and Knowledge Discovery in Databases, 35–
50. Berlin, Heidelberg: Springer Berlin Heidelberg. ISBN
978-3-642-33486-3.
Yeom, S.; and Tschantz, M. C. 2021. Avoiding
Disparity Amplification under Different Worldviews. In
Proceedings of the 2021 ACM Conference on Fairness,
Accountability, and Transparency, FAccT ’21, 273–283. New York,
NY, USA: Association for Computing Machinery. ISBN
9781450383097.</p>
        <p>Zafar, M. B.; Valera, I.; Rogriguez, M. G.; and Gummadi,
K. P. 2017. Fairness Constraints: Mechanisms for Fair
Classification. In Singh, A.; and Zhu, J., eds., Proceedings of the
20th International Conference on Artificial Intelligence and
Statistics, volume 54 of Proceedings of Machine Learning
Research, 962–970. Ft. Lauderdale, FL, USA: PMLR.
Zhao, J.; Wang, T.; Yatskar, M.; Ordonez, V.; and Chang,
K. 2017. Men Also Like Shopping: Reducing Gender
Bias Amplification using Corpus-level Constraints. CoRR,
abs/1707.09457.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dudik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Langford</surname>
            , J.; and Wallach,
            <given-names>H.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>A Reductions Approach to Fair Classification</article-title>
          . In Dy, J.; and
          <string-name>
            <surname>Krause</surname>
          </string-name>
          , A., eds.,
          <source>Proceedings of the 35th International Conference on Machine Learning</source>
          , volume
          <volume>80</volume>
          <source>of Proceedings of Machine Learning Research</source>
          ,
          <volume>60</volume>
          -
          <fpage>69</fpage>
          . Stockholm, Sweden: PMLR.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2003</year>
          . Integration, Affirmative Action, and
          <string-name>
            <given-names>Strict</given-names>
            <surname>Scrutiny</surname>
          </string-name>
          . New York University Law Review,
          <volume>77</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1195</fpage>
          -
          <lpage>1271</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Awasthi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mansour</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and Mohri,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2020</year>
          . Beyond Individual and Group Fairness. CoRR, abs/
          <year>2008</year>
          .09490.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Barocas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Hardt,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Fairness and Machine Learning</article-title>
          . fairmlbook.org. http://www. fairmlbook.org.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Blum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gunasekar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Lykouris,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Srebro</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>On Preserving Non-Discrimination When Combining Expert Advice</article-title>
          .
          <source>In Proceedings of the 32nd International Conference on Neural Information Processing Systems</source>
          , NIPS'
          <volume>18</volume>
          ,
          <fpage>8386</fpage>
          -
          <lpage>8397</lpage>
          . Red Hook,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA: Curran Associates Inc.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Calders</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Verwer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Three naive Bayes approaches for discrimination-free classification</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          ,
          <volume>21</volume>
          (
          <issue>2</issue>
          ):
          <fpage>277</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hardt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2021</year>
          .
          <article-title>Retiring Adult: New Datasets for Fair Machine Learning</article-title>
          .
          <source>CoRR, abs/2108</source>
          .04884.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Donini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Oneto</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ben-David</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shawe-Taylor</surname>
            , J.; and Pontil,
            <given-names>M.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Empirical Risk Minimization under Fairness Constraints</article-title>
          .
          <source>In Proceedings of the 32nd International Conference on Neural Information Processing Systems</source>
          , NIPS'
          <volume>18</volume>
          ,
          <fpage>2796</fpage>
          -
          <lpage>2806</lpage>
          . Red Hook,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA: Curran Associates Inc.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Dwork</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hardt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pitassi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Reingold</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ; and Zemel,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2012</year>
          .
          <article-title>Fairness through Awareness</article-title>
          . In Proceedings of Kamishima, T.;
          <string-name>
            <surname>Akaho</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Asoh, H.; and
          <string-name>
            <surname>Sakuma</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>The Independence of Fairness-Aware Classifiers</article-title>
          .
          <source>In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops</source>
          , ICDMW '
          <volume>13</volume>
          ,
          <fpage>849</fpage>
          -
          <lpage>858</lpage>
          . USA:
          <article-title>IEEE Computer Society</article-title>
          . ISBN 9781479931422.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Kannan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Ziani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Downstream Effects of Affirmative Action</article-title>
          .
          <source>In Proceedings of the Conference on Fairness, Accountability, and Transparency</source>
          , FAT* '
          <volume>19</volume>
          ,
          <fpage>240</fpage>
          -
          <lpage>248</lpage>
          . New York, NY, USA:
          <article-title>Association for Computing Machinery</article-title>
          .
          <source>ISBN 9781450361255.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Kearns</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Neel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z. S.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness</article-title>
          . In Dy, J.; and
          <string-name>
            <surname>Krause</surname>
          </string-name>
          , A., eds.,
          <source>Proceedings of the 35th International Conference on Machine Learning</source>
          , volume
          <volume>80</volume>
          <source>of Proceedings of Machine Learning Research</source>
          ,
          <volume>2564</volume>
          -
          <fpage>2572</fpage>
          . Stockholmsmassan, Stockholm, Sweden: PMLR.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Mullainathan,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; and Raghavan,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Inherent Trade-Offs in the Fair Determination of Risk Scores</article-title>
          . In Papadimitriou, C. H., ed.,
          <source>8th Innovations in Theoretical Computer Science Conference (ITCS</source>
          <year>2017</year>
          ), volume
          <volume>67</volume>
          <source>of Leibniz International Proceedings in Informatics (LIPIcs)</source>
          ,
          <volume>43</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          :
          <fpage>23</fpage>
          . Germany:
          <article-title>Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik</article-title>
          .
          <source>ISBN 978-3-95977-029-3.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Mhasawade</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ; and Chunara,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2021</year>
          .
          <article-title>Causal Multi-Level Fairness</article-title>
          .
          <source>In Proceedings of the 2021 AAAI/ACM Conference on AI</source>
          ,
          <string-name>
            <surname>Ethics</surname>
          </string-name>
          , and Society, AIES '
          <volume>21</volume>
          ,
          <fpage>784</fpage>
          -
          <lpage>794</lpage>
          . New York, NY, USA:
          <article-title>Association for Computing Machinery</article-title>
          .
          <source>ISBN 9781450384735.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Morina</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Oliinyk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Waton</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Marusic</surname>
            ,
            <given-names>I.;</given-names>
          </string-name>
          and
          <string-name>
            <surname>Georgatzis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Auditing and Achieving Intersectional Fairness in Classification Problems</article-title>
          . arXiv:
          <year>1911</year>
          .01468.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Niazi</surname>
            ,
            <given-names>K. A. K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Akhtar</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Khan,
          <string-name>
            <given-names>H. A.</given-names>
            ;
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ; and
            <surname>Athar</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Hotspot diagnosis for solar photovoltaic modules using a Naive Bayes classifier</article-title>
          .
          <source>Solar Energy</source>
          ,
          <volume>190</volume>
          :
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Oneto</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Donini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and Pontil,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>General Fair Empirical Risk Minimization</article-title>
          .
          <source>In 2020 International Joint Conference on Neural Networks (IJCNN)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . Glasgow, United Kingdom: IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Ra</surname>
            ¨z,
            <given-names>T.</given-names>
          </string-name>
          <year>2021</year>
          . Group Fairness:
          <article-title>Independence Revisited</article-title>
          .
          <source>In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency</source>
          , FAccT '
          <volume>21</volume>
          ,
          <fpage>129</fpage>
          -
          <lpage>137</lpage>
          . New York, NY, USA:
          <article-title>Association for Computing Machinery</article-title>
          .
          <source>ISBN 9781450383097.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Ritov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>On conditional parity as a notion of non-discrimination in machine learning</article-title>
          .
          <source>arXiv:1706</source>
          .
          <fpage>08519</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mhasawade</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ; and Chunara,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Fairness</given-names>
            <surname>Violations</surname>
          </string-name>
          and
          <article-title>Mitigation under Covariate Shift</article-title>
          .
          <source>In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency</source>
          ,
          <source>FAccT '21</source>
          ,
          <fpage>3</fpage>
          -
          <lpage>13</lpage>
          . New York, NY, USA:
          <article-title>Association for Computing Machinery</article-title>
          .
          <source>ISBN 9781450383097.</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Valdiviezo-Diaz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ortega</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cobos</surname>
            , E.; and LaraCabrera,
            <given-names>R.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>A Collaborative Filtering Approach Based on Na¨ıve Bayes Classifier</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>7</volume>
          :
          <fpage>108581</fpage>
          -
          <lpage>108592</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>