1. Introduction

Counterfactual Reasoning for Responsible AI Assessment

Giandomenico Cornacchia

giandomenico.cornacchia@poliba.it 0 1

Vito Walter Anelli

vitowalter.anelli@poliba.it 0 1

Fedelucio Narducci

fedelucio.narducci@poliba.it 0 1

Azzurra Ragone

azzurra.ragone@uniba.it 0 2

Eugenio Di Sciascio

eugenio.disciascio@poliba.it 0 1 0 Counterfactual Reasoning , Fairness, Audit, Explainability, Responsibility 1 Polytechnic University of Bari , Via Orabona, 4, Bari, 70125 , Italy 2 Università degli Studi di Bari Aldo Moro , Piazza Umberto I, 1, Bari, 70125 , Italy

2020

2116 29 31

As the use of AI and ML models continues to grow, concerns about potential unfairness have become more prominent. Many researchers have focused on developing new definitions of fairness or identifying biased predictions, but these approaches have limited scope and fail to analyze the minimum changes in user characteristics required for positive outcomes (i.e. counterfactuals). In response, this proposed methodology aims to use counterfactual reasoning to identify unfair behaviours in the case of fairness under unawareness. Furthermore, counterfactual reasoning can serve as a comprehensive methodology for evaluating all the essential conditions for a reliable, responsible, and trustworthy model.

1. Introduction

As stated by the World Economic Forum’s Global Future CEUR Workshop Proce dings htp:/ceur-ws.org ISN1613-073

CEUR

Workshop Proceedings (CEUR-WS.org) https://www.weforum.org/communities/ gfc-on-artificial-intelligence-for-humanity © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). most significant barrier to AI adoption and acceptance by users. In fact, AI systems often amplify social and ethical issues such as gender and demographic discrimination, and they lack interpretability and explainability.

As an example, in the financial domain, the decision cise and detailed regulatory compliance requirements (i.e., Equal Credit Opportunity Act , Federal Fair Lending Act , and Consumer Credit Directive for EU Community).These rules aim to prevent discrimination in human decision-making processes. However, they do not ift scenarios involving Machine Learning (ML) or, more when AI replaces human decisions, like in the case of instant lending, there is a risk of revealing a loophole and international organizations have released guidelines, norms, and principles to prevent the irresponsible usage of AI, e.g., the EU Commission with “The Proposal for Harmonized Rule on AI” and the expert group on “AI in Society” of the Organisation for Economic Co-operation and Development (OECD).

Although scientists train their models without explicit discriminating intent, deploying AI systems without taking ethical concerns into account may lead to discrimination [ 1 ]. Even more problematic is figuring out which type of discrimination is being implemented.

1.1. Counterfactual Reasoning as a Responsible AI practice

Counterfactual Reasoning is an active and flourishing ifeld in artificial intelligence research [ 2, 3 ]. This research was initially born to investigate causal links [4], and today it can count on several contributions [5]. Most of them define and employ counterfactuals as a helpful tools to explain the decisions taken by modern decision to leverage a counterfactual generation tool to reveal the support systems. The underlying rationale is that some presence of implicit biases in a decision support system. aspects of past events could predict future events. In The approach aims to answer the question: “How would detail, some studies focus on identifying causality-related the system have decided if we had replaced some user aspects to discover the link between the counterfactuals characteristics? These characteristics identify a protected or a non-protected group?”.

2. Preliminaries

and the analyzed phenomenon.

Counterfactual Reasoning finds application in various ifelds. To summarize what we have briefly detailed before, machine learning research has positively valued these contributions ranging from Explainable AI [6] to the most recent counterfactual fairness measures [7, 8].

Beyond the theoretical aspects, Counterfactual Reasoning is extensively applied to interactive systems [9, 10, 11, 12]. Unfortunately, this important application showed some limitations. These systems employ machine learning models that reflect the data they use for learning. Consequently, the same information influences the reasoning, and the contribution of Counterfactual Reasoning could be limited or somehow biased. The explaining policy, coming from Counterfactual Reasoning, exhibits a bias toward the implemented learning model. Researchers devoted considerable efort to tackle this issue and proposed new models such as doubly robust estimators [13]. Overall, even though limitations that need a solution, Counterfactual Reasoning is taking over ExIn this respect, the European Union’s “right to explanation” played a crucial role in arousing a further interest in this methodologies [15]. Indeed, they are compliant with the regulation and easily interpreted by either a domain expert or a layperson [16].

Decision support systems particularly benefited from main is vital, the more the fairness problem emerges. For instance, the issue cannot be overlooked in sensitive domains such as justice, risk assessment, or clinical risk prediction. This need promoted the most promising research in the Counterfactual Reasoning field to analyze and mitigate this issue. A further important issue under the lens of European regulators is the discrimination of AI models. On this point, the EU Commission proposes a conformity assessment before AI systems are put into service or placed on the market 2. In fact, their tools are subject to fair and trustworthy audit assessments to the input characteristics suficient to determine that a predictor will not suggest unfair treatment? Even though the user does not provide protected characteristics, the system could predict sensitive features from variables, i.e., proxy variables, that still represent protected characteristics[17, 18, 19]. In this regard, our investigation aims

2https://digital-strategy.ec.europa.eu/en/policies/

regulatory-framework-ai feature.

This section introduces the notation adopted hereinafter. Data points:

We assume the dataset is an dimensional space containing non-sensitive features, sensitive features, and a target attribute. In other words, we have ⊆ ℝ , with = + +1 . A data point ∈ is then represented as = ⟨ x, s, ⟩ , with x = ⟨ 1, 2, ..., ⟩ representing the sub-vector of non-sensitive features, s = ⟨ 1, 2, ..., ⟩ the sub-vector of sensitive features and being a binary target feature. Given a vector of sensitive features, ∀ ∈ s, = 0 refers to the unprivileged group and = 1 to the privileged group of the -th sensitive Target Labels: Given a target feature ∈ {0, 1} , = 1 is the positive outcome and = 0 is the negative one. Outcome Prediction: ∈̂ {0, 1} represents the prediction for x ⊂ estimated by (⋅) , a function such that Sensitive Feature Prediction: ̂ ∈ {0, 1} represents the prediction of the -th sensitive feature for a given data point estimated by (⋅), a function s.t. (x) = ̂ . Counterfactual samples: Given a vector x and a perturbation = ⟨ 1, 2, ..., ⟩, we say that a vector cx = ⟨ 1

, 2, ..., ⟩ = x + is a counterfactual (CF) of x if | x| = , to denote the set of possible counterfactual samples for x. A function ( x) compute counterfactuals for x.

For simplicity, we denote (⋅) , (⋅), and (⋅) as the Decision Maker, the Sensitive-Feature Classifier , and the Counterfactual Generator respectively. 3.

Methodology

Our study proposes a novel fairness definition, two novel metrics for detecting bias in a scenario where sensitive the training process, and an explanation methodology.

3.1. Fairness through the counterfactual lens

Excluding sensitive features makes verifying that all users are treated equally incredibly challenging. In the instant lending case, imagine that a customer applies for a loan, and his/her request is rejected. Understanding plainable AI, and it is becoming the de facto standard for explaining decisions taken by autonomous systems [14]. ( x) = .̂ these models. However, the more the application do- ( cx) = 1 − ( x) = 1 − .̂ We use the set x , with check their conformity. However, is a shallow check of features are omitted (i.e., fairness under unawareness) in (a) male on Classic ML model (b) female on Classic ML model (c) male on Debiasing model (d) female on Debiasing model samples with a positive outcome, respectively, for a Classic ML model (i.e. XGB) and a Debiasing one (i.e. Adversarial Debiasing). if the customer has been discriminated is hard to ver- guaranteed both for the privileged and the unprivileged ify when sensitive information is not used. Our process pipeline is as follows: the Decision Maker makes decisions without exploiting sensitive features, then if the outcome is negative (e.g. loan rejected), the Counterfactual Generator is exploited to propose modifications to user characteristics and request for reaching a positive outcome (e.g. loan approved). For each data point with a negative prediction ( x) = 0, we generate a set of counterfactual samples x that reach a positive outcome (i.e., ∀cx ∈ x s.t. ( cx) = 1). Afterward, each counterfactual (CF) sample is evaluated by the Sensitive-Feature Classifier

that predicts the value of the (omitted) sensitive feature for the given CF sample. If the CF sample is classified as e.g. male (privileged group), while the original sample was e.g. female (unprivileged group), the decision model could be biased and its unfairness can be quantified (Eq. 3 and 4).

Indeed, each CF sample derives from the original sample x plus a perturbation , where is the distance from the original sample for getting a positive outcome, and it should be independent from the user-sensitive characteristics. Figure 1 depicts a scenario in which male (blu color) is the privileged category, and female (red color) is the unprivileged one. For each subfigure, a sample with an unfavorable decision and its corresponding CFs are depicted. A classic ML model (i.e., XGB) is compared with a debiasing ML model (i.e., AdvDeb). We can observe that for the male sample and classic ML model (Figure 1(a)), the CF samples belong to the same sensitive category (i.e., male). For the female sample (Figure 1 (b)), this is not true, revealing a bias of the model. Conversely, the debiasing model (Figure 1 (c) and (d)) shows no predominance in the generated counterfactuals of one value of the sensitive class. However, a change of the outcome, e.g. from negative to positive, should not be determined by a flip of the value(s) of the sensitive feature(s). Now, we introduce our fairness criteria and metrics. group [20].

P( ( | −=0 ) ≠ ∣ ( | −=0 ) = 1, | −=0 ) = P( ( | −=1 ) ≠ ∣ ( | −=1 ) = 1, | −=1 )

(1)

To define a sort of discrimination score of a given decision model, we propose a metric that we call Counterfactual Flips. The metric quantifies the discriminatory behavior the model might put in place.

Definition 3.2 (Counterfactual Flips). Given a sample x belonging to a demographic group whose model output is denoted as ( x), a generated set x of counterfactuals with desired ∗ outcome. ∀cx ∈ x s.t. ( cx) = ∗, the Counterfactual Flips indicate the percentage of counterfactual samples belonging to another demographic group (i.e., (cx) ≠ (x), with (x) = ).

CFlips(x, x, (⋅)) ≜

∑=1 (1(cx)) where 1(cx) = {10 iiff (cx) = (x) = (cx) ≠ (x) ≠ (2) data point. However, from an individual-fairness wise, a debated issue is the definition of a metric that considers Definition 3.1 (Counterfactual Fair Opportunity). A de- that distance [21]. Accordingly, we propose a new metric cision model is fair if the counterfactual samples of individ- that considers CFs ranked based on the Mean Absolute uals with unfavorable decisions maintain the same sensitive value to reach a positive outcome. This behavior must be Deviation from the original sample and other criteria [6].

The insight behind this metric is that the more the CF more its impact on the metric value. Thus, the metric penalizes CFs ranked in the top positions for which the value of the sensitive feature is flipped. More formally: Definition 3.3 (Discounted Cumulative Counterfactual Fairness). Given a set of Counterfactuals Cx for a sample x , the Discounted Cumulative Counterfactual Fairness DCCFx measures the cumulative gain of the ranking of

counterfactuals w.r.t. the sensitive group of the original sample:

DCCFx ≜ ∑ ,cx ∈ x 2(1−1( x )) − 1 log2( + 1)

(4) where is the rank of cx in x and 1( x ) from Eq. 2.

If more CF samples belonging to the same sensitive group as the original data point are in a higher ranking position, the result will be a higher DCCF. Thereby, we factual Fairness (IDCCF) as an ideal ranking in which each CF sample cx belongs to the same sensitive group as the original sample x (Eq. 5), and the normalized DCCF (nDCCF) (Eq. 6).

IDCCFx ≜ ∑ ,cx ∈ x

2(1) − 1 log2( + 1)

(5) 1 | | −| x

In the same way as CFlips, given a set of samples can formulate the Ideal Discounted Cumulative Counter- ing the features of the nearest counterfactual sample (i.e., nDCCFx ≜ IDCCFx

DCCFx (6) 4. Experimental Analysis 4.1. Experimental setting is ranked high (in the top positions of the ranking), the can be useful in that direction. Indeed A counterfactual cx can be seen as a perturbation from a starting sample x of a quantity (i.e., cx = x + ). For a numerical or ordinal feature , can be expressed as the diference between the counterfactual and the feature of the sample

− . For a categorical feature , can be expressed in a one-hot encoding form as -1 to the category that is removed and 1 to the category that is engaged. Let be the diference between the posterior conditional probability of predicting a counterfactual sample and the original sample as belonging to the privileged group (i.e., =

P( ( cx) = 1|cx) − P( ( x) = 1|x)). We can identify the most influential features for (⋅) evaluating the Pearson correlation between and : (, )

. In the same way, we can identify the proxy feature influencing a discrimination in the decision maker through the investigation of (⋅) [17, 23, 24]. The ranked correlation can be used to generate a Natural Language based explanation for the knowledge expert and a user-based explanation usthrough the investigation of as actionable recommended step) [12, 25].

Dataset. The experimental evaluation has been carried out on state-of-the-art benchmark datasets (i.e., Adult3 with gender as sensitive information). We do not include any sensitive features for training the model, guaranteeing the fairness under unawareness setting.

Decision Maker. To keep the approach as general as possible, we opted for Logistic Regression4 (LR), SupportVector Machines4 (SVM), XGBOOST4 (XGB) , and LightDebiased Decision Maker. To investigate the quality and the reliability of our metrics we used also two debiased classifiers, Adversarial Debiasing4 (AdvDeb) proposed by Zhang et al. [26] and Linear Fair Empirical Risk Minimization4 (lferm) proposed by Donini et al. [27] as in-processing algorithms.

DiCE [6], with | x| equal to 1005.

Counterfactual Generator. For the sake of reproducibility and reliability, the counterfactuals are generated

with an external counterfactual framework, menting this component due to its capability to learn 3https://archive.ics.uci.edu/ml/datasets/adult 4LR, SVM: https://scikit-learn.org/; XGB: https://github.com/dmlc/ xgboost; LGBM: https://github.com/microsoft/LightGBM; AdvDeb: https://github.com/Trusted-AI/AIF360; lferm: https://github.com/ jmikko/fair_ERM; diference (i.e., being close to zero.

For both CFlips and nDCCF, we are interested in the

GBM4 (LGBM).

Δ), between privileged and unprivileged,

3.2. Explainability through the counterfactual lens

Several methods have been proposed to explain blackbox models. SHAP is inspired by the cooperative game theory based on the Shapley Values [22]. Each feature is considered a player that contributes diferently to the planation provided by this method probably is not so clear for a customer who does not have experience with how an algorithm works. Furthermore, Shapley value does not give in to which extent changing a feature can result in a diferent outcome. For this reason, if we want to improve the user’s trust and, in general, the user exoutcome (i.e., the algorithm decision). However, the ex- Sensitive-Feature Classifier.

We used XGB for imple

perience with the system, we need to make the expla- 5DiCE ofers several strategies for generating candidate counterfacnation more understandable. Counterfactual Reasoning tual samples, but we choose to only exploit the Genetic one. Following a brief analysis of how our methodology can be useful not only to investigate unfair model behaviour but also to explain and quantify proxy discriminative non-linear dependencies. features.

Metrics. We evaluate the models’ performance with the In Figure 2, we can find the rank of features correlation Accuracy (ACC) and model fairness by measuring Equal with a Flip in (⋅) with MLP as (⋅) decision boundary Opportunity6 (DEO). for the generation of cx and XGB as (⋅) for the AdultSplit and Hyperparameter Tuning. The datasets have debiased dataset. The analysis is restricted to only sambeen split with the hold-out method 90/10 train-test set, ples negatively predicted in order to specifically quantify with stratified sampling w.r.t. the target and sensitive la- the proxy-features that lead to a positive prediction with bels, to respect the original distribution in each split. The also a change in the sensitive information. In detail, Decision Maker, the Debiased models, and the Sensitive- a negatively correlated feature (e.g., Adm-Clerical) is a Feature Classifier have been tuned on the training set feature that has an opposite direction with respect to with a Grid Search k-fold (k=5) cross-validation method- E[ ( −) ∣ −] while a positively correlated one (e.g., ology, the first two optimizing AUC metric, and the latter hours per week) has the same direction. F1 score to prevent unbalanced predictions on the sensitive feature.

5. Conclusion 4.2. Fairness Results

Now that the setting is clear enough, we can move on to analyze how well they perform in terms of fairness. The performance of the Decision Makers on the metrics DEO, as well as our suggested metrics CFlips and nDCCF are reported in Table 1. It is important to point out that the CFlips metric indicates how often a change of result for the Decision Maker corresponds to a change in the classification of the sensitive feature (e.g., from female to male and vice-versa). Conversely, the nDCCF metric gives more importance to counterfactuals with highest positions in the ranking (the most similar to the original sample) that do not change the sensitive class.

For the three debiased models (i.e., AdvDeb, lferm, and FairC) the Δ is close to zero for both our metrics, meaning that there is not a great diference in the CFlips for both groups (privileged and unprivileged one). The debiased models perform the same both with standard fairness metrics and our metrics (i.e., CFlips, nDCCF). 6 = |P( =̂ 1 ∣ = 1, = 1) −

P( =̂ 1 ∣ = 0, = 1) | In this work, we present a novel methodology for detecting bias in decision-making models that do not use sensitive features and work in a context of fairness under unawareness. Furthermore, we propose a new fairness concept (i.e., Counterfactual Fair Opportunity), two related fairness metrics (i.e., CFlis and nDCCF), and an explainability methodology.

In the future, we plan to define a strategy to generate fair and actionable counterfactual samples with the aim of developing a debiasing model that could be efectively fair in the context of fairness under unawareness.

[1]

Corbett-Davies ,

Pierson ,

Feller ,

Goel ,

Huq , Algorithmic decision making and the cost of fairness , in: KDD, ACM, 2017 , pp. 797 - 806 .

[2]

M. L.

Ginsberg , Counterfactuals, Artif. Intell. 30 ( 1986 ) 35 - 79 .

[3]

Miller , Explanation in artificial intelligence: Insights from the social sciences , Artif. Intell . 267 ( 2019 ) 1 - 38 .