<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Counterfactual Reasoning for Responsible AI Assessment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giandomenico Cornacchia</string-name>
          <email>giandomenico.cornacchia@poliba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vito Walter Anelli</string-name>
          <email>vitowalter.anelli@poliba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fedelucio Narducci</string-name>
          <email>fedelucio.narducci@poliba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Azzurra Ragone</string-name>
          <email>azzurra.ragone@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eugenio Di Sciascio</string-name>
          <email>eugenio.disciascio@poliba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Counterfactual Reasoning</institution>
          ,
          <addr-line>Fairness, Audit, Explainability, Responsibility</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Polytechnic University of Bari</institution>
          ,
          <addr-line>Via Orabona, 4, Bari, 70125</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Università degli Studi di Bari Aldo Moro</institution>
          ,
          <addr-line>Piazza Umberto I, 1, Bari, 70125</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>2116</volume>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>As the use of AI and ML models continues to grow, concerns about potential unfairness have become more prominent. Many researchers have focused on developing new definitions of fairness or identifying biased predictions, but these approaches have limited scope and fail to analyze the minimum changes in user characteristics required for positive outcomes (i.e. counterfactuals). In response, this proposed methodology aims to use counterfactual reasoning to identify unfair behaviours in the case of fairness under unawareness. Furthermore, counterfactual reasoning can serve as a comprehensive methodology for evaluating all the essential conditions for a reliable, responsible, and trustworthy model.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>As stated by the World Economic Forum’s Global Future
CEUR
Workshop
Proce dings
htp:/ceur-ws.org
ISN1613-073</p>
      <p>CEUR</p>
      <p>Workshop Proceedings (CEUR-WS.org)
https://www.weforum.org/communities/
gfc-on-artificial-intelligence-for-humanity
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
most significant barrier to AI adoption and acceptance by
users. In fact, AI systems often amplify social and ethical
issues such as gender and demographic discrimination,
and they lack interpretability and explainability.</p>
      <p>As an example, in the financial domain, the decision
cise and detailed regulatory compliance requirements
(i.e., Equal Credit Opportunity Act , Federal Fair Lending
Act , and Consumer Credit Directive for EU
Community).These rules aim to prevent discrimination in
human decision-making processes. However, they do not
ift scenarios involving Machine Learning (ML) or, more
when AI replaces human decisions, like in the case of
instant lending, there is a risk of revealing a loophole
and international organizations have released guidelines,
norms, and principles to prevent the irresponsible usage
of AI, e.g., the EU Commission with “The Proposal for
Harmonized Rule on AI” and the expert group on “AI in
Society” of the Organisation for Economic Co-operation
and Development (OECD).</p>
      <p>
        Although scientists train their models without explicit
discriminating intent, deploying AI systems without
taking ethical concerns into account may lead to
discrimination [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Even more problematic is figuring out which
type of discrimination is being implemented.
      </p>
      <sec id="sec-1-1">
        <title>1.1. Counterfactual Reasoning as a</title>
      </sec>
      <sec id="sec-1-2">
        <title>Responsible AI practice</title>
        <p>
          Counterfactual Reasoning is an active and flourishing
ifeld in artificial intelligence research [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ]. This research
was initially born to investigate causal links [4], and
today it can count on several contributions [5]. Most
of them define and employ counterfactuals as a helpful
tools to explain the decisions taken by modern decision
to leverage a counterfactual generation tool to reveal the
support systems. The underlying rationale is that some
presence of implicit biases in a decision support system.
aspects of past events could predict future events. In
The approach aims to answer the question: “How would
detail, some studies focus on identifying causality-related
the system have decided if we had replaced some user
aspects to discover the link between the counterfactuals
characteristics? These characteristics identify a protected
or a non-protected group?”.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>and the analyzed phenomenon.</p>
      <p>Counterfactual Reasoning finds application in various
ifelds. To summarize what we have briefly detailed before,
machine learning research has positively valued these
contributions ranging from Explainable AI [6] to the most
recent counterfactual fairness measures [7, 8].</p>
      <p>Beyond the theoretical aspects, Counterfactual
Reasoning is extensively applied to interactive systems [9,
10, 11, 12]. Unfortunately, this important application
showed some limitations. These systems employ
machine learning models that reflect the data they use for
learning. Consequently, the same information influences
the reasoning, and the contribution of Counterfactual
Reasoning could be limited or somehow biased. The
explaining policy, coming from Counterfactual Reasoning,
exhibits a bias toward the implemented learning model.
Researchers devoted considerable efort to tackle this
issue and proposed new models such as doubly robust
estimators [13]. Overall, even though limitations that need
a solution, Counterfactual Reasoning is taking over
ExIn this respect, the European Union’s “right to
explanation” played a crucial role in arousing a further interest in
this methodologies [15]. Indeed, they are compliant with
the regulation and easily interpreted by either a domain
expert or a layperson [16].</p>
      <p>Decision support systems particularly benefited from
main is vital, the more the fairness problem emerges. For
instance, the issue cannot be overlooked in sensitive
domains such as justice, risk assessment, or clinical risk
prediction. This need promoted the most promising
research in the Counterfactual Reasoning field to analyze
and mitigate this issue. A further important issue under
the lens of European regulators is the discrimination of
AI models. On this point, the EU Commission proposes
a conformity assessment before AI systems are put into
service or placed on the market 2. In fact, their tools
are subject to fair and trustworthy audit assessments to
the input characteristics suficient to determine that a
predictor will not suggest unfair treatment? Even though
the user does not provide protected characteristics, the
system could predict sensitive features from variables,
i.e., proxy variables, that still represent protected
characteristics[17, 18, 19]. In this regard, our investigation aims</p>
      <sec id="sec-2-1">
        <title>2https://digital-strategy.ec.europa.eu/en/policies/</title>
        <p>regulatory-framework-ai
feature.</p>
        <p>This section introduces the notation adopted hereinafter.
Data points:</p>
        <p>We assume the dataset 
is an 
dimensional space containing  non-sensitive features, 
sensitive features, and a target attribute. In other words,
we have  ⊆ ℝ  , with  =  + +1
. A data point  ∈ 
is
then represented as  = ⟨ x, s,  ⟩ , with x = ⟨ 1,  2, ...,   ⟩
representing the sub-vector of non-sensitive features,
s = ⟨ 1,  2, ...,   ⟩ the sub-vector of sensitive features and 
being a binary target feature. Given a vector of sensitive
features, ∀  ∈ s,   = 0 refers to the unprivileged group
and   = 1 to the privileged group of the  -th sensitive
Target Labels: Given a target feature  ∈ {0, 1} ,  = 1 is
the positive outcome and  = 0 is the negative one.
Outcome Prediction:  ∈̂ {0, 1} represents the
prediction for x ⊂  estimated by  (⋅) , a function such that
Sensitive Feature Prediction:  ̂ ∈ {0, 1} represents the
prediction of the  -th sensitive feature for a given data
point estimated by    (⋅), a function s.t. 
 (x) =  ̂ .

Counterfactual samples: Given a vector x and a
perturbation  = ⟨ 1,  2, ...,   ⟩, we say that a vector cx =
⟨  1</p>
        <p>,   2, ...,    ⟩ = x +  is a counterfactual (CF) of x if
| x| =  , to denote the set of possible counterfactual
samples for x. A function ( x) compute 
counterfactuals for x.</p>
        <p>For simplicity, we denote  (⋅) ,    (⋅), and (⋅) as the
Decision Maker, the Sensitive-Feature Classifier , and the
Counterfactual Generator respectively.
3.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>Our study proposes a novel fairness definition, two novel
metrics for detecting bias in a scenario where sensitive
the training process, and an explanation methodology.</p>
      <sec id="sec-3-1">
        <title>3.1. Fairness through the counterfactual lens</title>
        <p>Excluding sensitive features makes verifying that all
users are treated equally incredibly challenging. In the
instant lending case, imagine that a customer applies for
a loan, and his/her request is rejected. Understanding
plainable AI, and it is becoming the de facto standard for
explaining decisions taken by autonomous systems [14].  ( x) =  .̂
these models. However, the more the application do-  ( cx) = 1 −  ( x) = 1 −  .̂ We use the set  x
, with
check their conformity. However, is a shallow check of features are omitted (i.e., fairness under unawareness) in
(a) male on Classic ML model (b) female on Classic ML model (c) male on Debiasing model (d) female on Debiasing model
samples with a positive outcome, respectively, for a Classic ML model (i.e. XGB) and a Debiasing one (i.e. Adversarial Debiasing).
if the customer has been discriminated is hard to ver- guaranteed both for the privileged and the unprivileged
ify when sensitive information is not used. Our process
pipeline is as follows: the Decision Maker makes
decisions without exploiting sensitive features, then if the
outcome is negative (e.g. loan rejected), the
Counterfactual Generator is exploited to propose modifications to
user characteristics and request for reaching a positive
outcome (e.g. loan approved). For each data point  with
a negative prediction  ( x) = 0, we generate a set of
counterfactual samples  x that reach a positive outcome (i.e.,
∀cx ∈  x s.t.  ( cx) = 1). Afterward, each
counterfactual (CF) sample is evaluated by the Sensitive-Feature
Classifier</p>
        <p>that predicts the value of the (omitted)
sensitive feature for the given CF sample. If the CF sample
is classified as e.g. male (privileged group), while the
original sample was e.g. female (unprivileged group), the
decision model could be biased and its unfairness can be
quantified (Eq. 3 and 4).</p>
        <p>Indeed, each CF sample derives from the original
sample x plus a perturbation  , where  is the distance from
the original sample for getting a positive outcome, and
it should be independent from the user-sensitive
characteristics. Figure 1 depicts a scenario in which male (blu
color) is the privileged category, and female (red color) is
the unprivileged one. For each subfigure, a sample with
an unfavorable decision and its corresponding CFs are
depicted. A classic ML model (i.e., XGB) is compared with a
debiasing ML model (i.e., AdvDeb). We can observe that
for the male sample and classic ML model (Figure 1(a)),
the CF samples belong to the same sensitive category
(i.e., male). For the female sample (Figure 1 (b)), this is
not true, revealing a bias of the model. Conversely, the
debiasing model (Figure 1 (c) and (d)) shows no
predominance in the generated counterfactuals of one value of
the sensitive class. However, a change of the outcome,
e.g. from negative to positive, should not be determined
by a flip of the value(s) of the sensitive feature(s). Now,
we introduce our fairness criteria and metrics.
group [20].</p>
        <p>P(  ( | −=0 ) ≠  ∣  (
| −=0 ) = 1, | −=0 ) = P(  ( | −=1 ) ≠  ∣  (
| −=1 ) = 1, | −=1 )</p>
        <p>(1)</p>
        <p>To define a sort of discrimination score of a given
decision model, we propose a metric that we call
Counterfactual Flips. The metric quantifies the discriminatory
behavior the model might put in place.</p>
        <p>Definition 3.2 (Counterfactual Flips). Given a sample x
belonging to a demographic group  whose model output
is denoted as  ( x), a generated set  x of  counterfactuals
with desired  ∗ outcome. ∀cx ∈  x s.t.  ( cx) =  ∗, the
Counterfactual Flips indicate the percentage of
counterfactual samples belonging to another demographic group (i.e.,
  (cx) ≠   (x), with   (x) =  ).</p>
        <p>CFlips(x,  x,   (⋅)) ≜</p>
        <p>∑=1 (1(cx)) where 1(cx) = {10 iiff  

(cx) =   (x) = 
(cx) ≠   (x) ≠  (2)
data point. However, from an individual-fairness wise, a
debated issue is the definition of a metric that considers
Definition 3.1 (Counterfactual Fair Opportunity). A de- that distance [21]. Accordingly, we propose a new metric
cision model is fair if the counterfactual samples of individ- that considers CFs ranked based on the Mean Absolute
uals with unfavorable decisions maintain the same sensitive
value to reach a positive outcome. This behavior must be
Deviation from the original sample and other criteria [6].</p>
        <p>The insight behind this metric is that the more the CF
more its impact on the metric value. Thus, the metric
penalizes CFs ranked in the top positions for which the
value of the sensitive feature is flipped. More formally:
Definition 3.3 (Discounted Cumulative Counterfactual
Fairness). Given a set of Counterfactuals Cx for a sample
x , the Discounted Cumulative Counterfactual Fairness
DCCFx measures the cumulative gain of the ranking of</p>
        <p>counterfactuals w.r.t. the sensitive group of the original
sample:</p>
        <p>DCCFx ≜
∑

  ,cx ∈ x
2(1−1( x )) − 1
log2(  + 1)</p>
        <p>(4)
where   is the rank of cx in  x and 1( x ) from Eq. 2.</p>
        <p>If more CF samples belonging to the same sensitive
group as the original data point are in a higher ranking
position, the result will be a higher DCCF. Thereby, we
factual Fairness (IDCCF) as an ideal ranking in which
each CF sample cx belongs to the same sensitive group as
the original sample x (Eq. 5), and the normalized DCCF
(nDCCF) (Eq. 6).</p>
        <p>IDCCFx ≜
∑

  ,cx ∈ x</p>
        <p>2(1) − 1
log2(  + 1)</p>
        <p>(5)
1
| | −| x</p>
        <p />
        <p>In the same way as CFlips, given a set of samples
can formulate the Ideal Discounted Cumulative Counter- ing the features of the nearest counterfactual sample (i.e.,
nDCCFx ≜ IDCCFx</p>
        <p>DCCFx
(6)
4. Experimental Analysis
4.1. Experimental setting
is ranked high (in the top positions of the ranking), the
can be useful in that direction. Indeed A counterfactual
cx can be seen as a perturbation from a starting sample
x of a quantity  (i.e., cx = x +  ). For a numerical or
ordinal feature  ,   can be expressed as the diference
between the counterfactual and the feature of the sample</p>
        <p>−   . For a categorical feature  ,   can be expressed
in a one-hot encoding form as -1 to the category that
is removed and 1 to the category that is engaged. Let
be  the diference between the posterior conditional
probability of predicting a counterfactual sample and the
original sample as belonging to the privileged group (i.e.,
 =</p>
        <p>P( ( cx) = 1|cx) − P( ( x) = 1|x)). We can identify
the most influential features for  (⋅) evaluating the
Pearson correlation between  and  : (,  )</p>
        <p>. In the same way,
we can identify the proxy feature influencing a
discrimination in the decision maker through the investigation
of   (⋅) [17, 23, 24]. The ranked correlation can be used
to generate a Natural Language based explanation for
the knowledge expert and a user-based explanation
usthrough the investigation of  as actionable recommended
step) [12, 25].</p>
        <p>Dataset. The experimental evaluation has been carried
out on state-of-the-art benchmark datasets (i.e., Adult3
with gender as sensitive information). We do not include
any sensitive features for training the model,
guaranteeing the fairness under unawareness setting.</p>
        <p>Decision Maker. To keep the approach as general as
possible, we opted for Logistic Regression4 (LR),
SupportVector Machines4 (SVM), XGBOOST4 (XGB) , and
LightDebiased Decision Maker. To investigate the quality
and the reliability of our metrics we used also two
debiased classifiers, Adversarial Debiasing4 (AdvDeb)
proposed by Zhang et al. [26] and Linear Fair Empirical Risk
Minimization4 (lferm) proposed by Donini et al. [27] as
in-processing algorithms.</p>
        <p>DiCE [6], with | x| equal to 1005.</p>
        <p>Counterfactual Generator. For the sake of
reproducibility and reliability, the counterfactuals are
generated</p>
        <p>with an external counterfactual framework,
menting this component due to its capability to learn
3https://archive.ics.uci.edu/ml/datasets/adult
4LR, SVM: https://scikit-learn.org/; XGB: https://github.com/dmlc/
xgboost; LGBM: https://github.com/microsoft/LightGBM; AdvDeb:
https://github.com/Trusted-AI/AIF360; lferm: https://github.com/
jmikko/fair_ERM;
diference (i.e.,
being close to zero.</p>
        <p>For both CFlips and nDCCF, we are interested in the</p>
        <p>GBM4 (LGBM).</p>
        <p>Δ), between privileged and unprivileged,</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Explainability through the counterfactual lens</title>
        <p>Several methods have been proposed to explain
blackbox models. SHAP is inspired by the cooperative game
theory based on the Shapley Values [22]. Each feature
is considered a player that contributes diferently to the
planation provided by this method probably is not so
clear for a customer who does not have experience with
how an algorithm works. Furthermore, Shapley value
does not give in to which extent changing a feature can
result in a diferent outcome. For this reason, if we want
to improve the user’s trust and, in general, the user
exoutcome (i.e., the algorithm decision). However, the ex- Sensitive-Feature Classifier.</p>
        <sec id="sec-3-2-1">
          <title>We used XGB for imple</title>
          <p>perience with the system, we need to make the expla- 5DiCE ofers several strategies for generating candidate
counterfacnation more understandable. Counterfactual Reasoning
tual samples, but we choose to only exploit the Genetic one.
Following a brief analysis of how our methodology can
be useful not only to investigate unfair model behaviour
but also to explain and quantify proxy discriminative
non-linear dependencies. features.</p>
          <p>Metrics. We evaluate the models’ performance with the In Figure 2, we can find the rank of features correlation
Accuracy (ACC) and model fairness by measuring Equal with a Flip in   (⋅) with MLP as  (⋅) decision boundary
Opportunity6 (DEO). for the generation of cx and XGB as   (⋅) for the
AdultSplit and Hyperparameter Tuning. The datasets have debiased dataset. The analysis is restricted to only
sambeen split with the hold-out method 90/10 train-test set, ples negatively predicted in order to specifically quantify
with stratified sampling w.r.t. the target and sensitive la- the proxy-features that lead to a positive prediction with
bels, to respect the original distribution in each split. The also a change in the sensitive information. In detail,
Decision Maker, the Debiased models, and the Sensitive- a negatively correlated feature (e.g., Adm-Clerical) is a
Feature Classifier have been tuned on the training set feature that has an opposite direction with respect to
with a Grid Search k-fold (k=5) cross-validation method- E[  ( −) ∣  −] while a positively correlated one (e.g.,
ology, the first two optimizing AUC metric, and the latter hours per week) has the same direction.
F1 score to prevent unbalanced predictions on the
sensitive feature.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <sec id="sec-4-1">
        <title>4.2. Fairness Results</title>
        <p>Now that the setting is clear enough, we can move on
to analyze how well they perform in terms of fairness.
The performance of the Decision Makers on the metrics
DEO, as well as our suggested metrics CFlips and nDCCF
are reported in Table 1. It is important to point out that
the CFlips metric indicates how often a change of result
for the Decision Maker corresponds to a change in the
classification of the sensitive feature (e.g., from female
to male and vice-versa). Conversely, the nDCCF metric
gives more importance to counterfactuals with highest
positions in the ranking (the most similar to the original
sample) that do not change the sensitive class.</p>
        <p>For the three debiased models (i.e., AdvDeb, lferm, and
FairC) the Δ is close to zero for both our metrics, meaning
that there is not a great diference in the CFlips for both
groups (privileged and unprivileged one). The debiased
models perform the same both with standard fairness
metrics and our metrics (i.e., CFlips, nDCCF).
6 =
|P( =̂ 1 ∣  = 1,  = 1) −</p>
        <p>P( =̂ 1 ∣  = 0,  = 1) |
In this work, we present a novel methodology for
detecting bias in decision-making models that do not use
sensitive features and work in a context of fairness under
unawareness. Furthermore, we propose a new fairness
concept (i.e., Counterfactual Fair Opportunity), two
related fairness metrics (i.e., CFlis and nDCCF), and an
explainability methodology.</p>
        <p>In the future, we plan to define a strategy to generate
fair and actionable counterfactual samples with the aim
of developing a debiasing model that could be efectively
fair in the context of fairness under unawareness.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Corbett-Davies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pierson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Feller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Huq</surname>
          </string-name>
          ,
          <article-title>Algorithmic decision making and the cost of fairness</article-title>
          , in: KDD, ACM,
          <year>2017</year>
          , pp.
          <fpage>797</fpage>
          -
          <lpage>806</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Ginsberg</surname>
          </string-name>
          , Counterfactuals, Artif. Intell.
          <volume>30</volume>
          (
          <year>1986</year>
          )
          <fpage>35</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Explanation in artificial intelligence: Insights from the social sciences</article-title>
          ,
          <source>Artif. Intell</source>
          .
          <volume>267</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>