1. Introduction

A GUI for the Fair & Explainable Selective Classifier IFAC

Daphne Lenders

Roberto Pellungrini

Fosca Giannotti

0 0 Scuola Normale Superiore , Pisa , Italy

In this paper, we present a Graphical User Interface (GUI) for IFAC, a selective classification model that refrains from making decisions in case they are uncertain or unfair. Since IFAC makes use of explainable-by-design methods to detect potentially unfair decisions, our GUI visualizes these explanations to let users understand the reason for abstention. We demonstrate how users can interpret the explanations, allowing them to contextualize, validate, and challenge the detected bias patterns.

eol>Selective Classification Bias Audit Explainable AI Human-in-the-loop

1. Introduction 2. Background & General Intuition

Our GUI visualizes the instances whose original predictions were rejected by IFAC, due to uncertainty or unfairness. For instances that got rejected for the latter reason, users can view an explanation of why the original prediction was deemed as unfair. This explanation consists of a global part, displaying which

Test Instance + larger at-risk subgroup an instance was a part of, and a local part, showing an individual discrimination analysis for the instance. In this section, we describe the basic intuition behind the rejection framework and both types of fairness analysis. To do so we make use of the folktables dataset as our running example. This dataset contains sensitive information about peoples’ gender and race, as well as some neutral characteristics, like their occupation and working hours. The associated task is to predict a person’s income level (high vs. low). Classification models developed for this task typically favour the group of white men. They have higher positive decision rates and make fewer False Negatives and more False Positives for this group compared to other demographics.

2.1. Selective Classification

We describe a dataset as a triplet (L, S, ), where L represents the legally-grounded features and takes values in ℒ ⊆ R ; S refers to the sensitive attributes and takes values in ⊆ R ; is the binary target variable, with domain = {0, 1}. We use X to describe the pair of legally grounded and sensitive features (L, S). Hence, in our running example, is the income level, that needs to be predicted based on L, which includes features like education level and working hours, and S, which are gender and race.

To find a mapping between the feature space of X and , we can learn a classification model ℎ(x), minimizing some empirical risk function. To prevent ℎ from discriminating based on features in S and to increase its predictive accuracy, the selective classification framework behind IFAC proposes to learn some abstention mechanism over its predictions. We denote the selective function, determining which of ℎs predictions are kept as . In the case of IFAC, considers the fairness and uncertainty of predictions. IFAC measures uncertainty through the prediction probability (x) = ( = |X = x) outputted by ℎ for predicted label ℎ(x) = . Depending on whether the predictions are considered fair/unfair and certain/uncertain, there are four diferent scenarios that IFAC must deal with, as shown in Figure 1. The easiest case is when a prediction is both deemed fair and certain, in which case the prediction can be kept. Predictions that are fair yet uncertain get rejected, in line with the classical selective classification framework. In the case of prediction unfairness, there are two scenarios: if a prediction is both unfair and uncertain, and hence there are double reasons to doubt ℎ’s original decision, a fairness intervention is performed and ℎ’s original label is flipped. In case the prediction is unfair yet certain, human expertise is required to assess this prediction, and IFAC rejects it.

To prevent IFAC from rejecting all predictions, a user-defined coverage parameter determines the minimum amount of predictions that should be made. Additionally, IFAC takes a fairness-weight parameter, that denotes the ratio of rejections that can be made out of unfairness concerns, and how much room should be left for rejecting (fair but) uncertain predictions. These parameters serve to tune two separate thresholds - t_fair_certain and t_unfair_certain - that respectively determine at which prediction probabilities fair and unfair predictions should be viewed as certain/uncertain, to consequentially keep, reject, or intervene on the predictions. For full details behind tuning these parameters we refer to the original paper behind IFAC [ 6 ]

2.2. At-Risk Subgroups

The first step in IFACs fairness assessment is identifying population subgroups at risk of discrimination by a classifier ℎ, for which the methodology of discriminatory association rule mining is adopted [ 7 ].

Let us assume access to a dataset of realizations, , that consists of the features X = (L, S) • A specific realization of a single feature ⊂ X is called an item. • An itemset, denoted by , is a combination of multiple items, that can be decomposed into (, ) where is an itemset consisting of only legally grounded features, and one consisting of only sensitive features • A transaction, denoted by , represents an itemset corresponding to one instance in , where each feature is assigned exactly one value.

• We say verifies itemset (, ) if (, ) ⊆ .

For example, in the folktables dataset, consider the feature race. A specific realization, such as (race=Black), is an item. An itemset is a combination of multiple items, such as (race=Black, education=Masters) and can be decomposed into = (race=Black) and = (education=Masters) One single row from the dataset can be called a transaction.

To learn associations between the data’s features and the decision outcome in we can extract decision rules of the form (, ) → . The support of a decision rule regarding is calculated as ((, ) → ) = ((, ), ) with () = |{ ∈ : ⊆ }| || , where || is the cardinality operator. Further, the confidence of a rule is defined as ((, ) → ) = ((, ), ) ((, ))

Finally, IFAC assesses how problematic a decision rule is by measuring the impact of the sensitive features in on , through the Selective Lift (slift) [ 7 ]. In this paper, we use the definition of slift by diference , which measures how the confidence of a rule decreases when negating its sensitive part. ((, ) → ) = ((, ) → ) − ((, ¬) → ) (1)

Computing (, ¬) → requires one to take the confidence of all the transactions that verify but do not verify . If the slift of some rule exceeds some user-defined threshold, we can describe the itemset (, ) as an at-risk subgroup of the data.

Example : Consider the decision rule (race = Black, education = Masters → income = low) with a confidence of 0.9. If we find that the rule (¬race = Black, education = Masters → income = low) has a confidence of 0.3, the slift is 0.6. This relatively high measure can indicate that the subgroup (race = Black, education = Masters) are at risk of discrimination.

2.3. Individual Discrimination

If an instance falls under any subgroup at risk of discrimination, IFAC also performs Situation Testing, an explainable-by design method to determine fair treatment on a local level [ 8 ]. Given some individual instance x, Situation Testing searches for its most similar instances from the favoured and nonfavoured group, which we denote respectively as and . Recall that in the folktables dataset we see white men as favoured, and all other demographic groups as non-favoured. To define x’s individual discrimination score, we compute the diference in positive decision ratios between and . Hence: (x) = |{ ∈ : = 1}| − |{ ∈ : = 1}| (2)

If this discrimination score exceeds some user defined threshold, x is deemed to be treated unfairly.

3. Methodology

The GUI behind IFAC visualizes the instances in a decision task, their predictions, as well as IFACs decisions to keep, reject or intervene on these predictions. In case IFAC rejects or intervenes on predictions based on unfairness concerns, the GUI also visualizes the explanations behind these rejections; i.e. the at-risk subgroups these instances belong to and the outcome of their individual discrimination analysis.

Currently, the GUI is only available as a prototype meant to explore the rejections of an IFAC model trained on the folktables dataset. However, the platform is built to be extensible and flexible to various classification tasks Before describing the (visual) components of the tool, we shortly outline the classification model used for building this prototype and explain how we ran the global and local discrimination analysis on it 1.

Classification Model After preprocessing the dataset, we split the initial dataset into a training part (n=9600), two validation sets (val_1, val_2 both with n=3600) and a test set (n=1200).

For our classification model, we train a Random Forest Classifier using the default sklearn hyperparameters. The GUI visualizes the model’s predictions on the test set, while in the background both val_1 and val_2 are used for the discrimination analysis as described in the next paragraphs. At-Risk Subgroups The at-risk subgroups that are visualized in the GUI are extracted after applying the initial random forest model on val_1. To display at-risk groups for each single-axis and intersectional demographic group, we split val_1 according to each sensitive feature value and their combination. In our case, using sensitive attributes race (black or white) and gender (male or female), we end up splitting the data according to (race = white), (race = black), (sex = male), (sex = female), (race = white, sex = male), (race = white, sex = female), (race = black, sex = male), (race = black, sex = female). On the data belonging to each of these groups, we seperately apply the apriori algorithm to mine decision rules of the form (, ) → , where represents the classifier’s ℎ decision outcome (i.e. people’s income) which is either high or low. Since we assume the group of white men to be favoured, we only extract rules with = ℎℎ for them, while for all other demographic groups that are potentially discriminated, we only select rules with = . Using equation 1, we compute the slift for each of the associations, and filter rules with > 0.4. Further, we assess statistical significance with a Z-test and only retain rules with < 0.01 Situation Testing For all of the test instances that belong to one of the identified at-risk subgroups, we also compute an individual discrimination score as described in section 2.3. To compute these scores, we search val_2 for each instance’s top 5 nearest neighbors from both the favoured and non-favoured group, and compute their diference in positive decision ratio. If an instance’s disc_score exceeds 0.2, we view its prediction as unfair.

Learning Rejection Thresholds IFAC deems instances unfair if they fall under an at-risk subgroup and if they are individually discriminated against. How many of these instances can then be rejected, depends on the coverage, which we set at 0.8, meaning that 20% of the test-set instances can be rejected. Moreover, we set IFAC’s fairness weight to 0.9, meaning that out of those rejected instances 90% should get rejected out of unfairness concerns and the remainder should be rejected solely because of uncertainty. Based on these two parameters we tune the two thresholds (t_fair_certain and t_unfair_certain) on the val_2, such that (ℎ, )(x) = ⎪⎧ℎ(x) if (x) and (x) => t_fair_certain ⎪ ⎪ ⎪⎨abstain if (x) and (x) < t_fair_certain ⎪lfip if ¬ (x) and (x) < t_unfair_certain ⎪ ⎪ ⎪⎩abstain if ¬ (x) and (x) >= t_unfair_certain 1for full information we refer to our github: https://github.com/daphnetje/IFAC_GUI 1 2

4. The Tool Through the Eyes of a User

Now that we have described all the theoretical background and methodology behind our GUI, we are going to describe each of its components and how a user can interact with them.

4.1. Inspecting Single-Axis and Intersectional Demographic Groups

The first thing a user sees when opening the GUI are the diferent single-axis and intersectional groups of the data, along with some statistical information and the at-risk subgroups, based on which IFAC makes its rejections. In Figure 2 a fragment of this starting screen is visualized. Based on the statistical information displayed, a user can quickly assess how the group of white men is being favoured for this decision task, as their positive decision ratio is 44%, considerably higher than for white and black women. The at-risk groups within each demographic group are displayed inside a slide carousel, that a user can browse through to understand where the biggest fairness concerns lay. Here users can click on a specific at-risk group, like for instance group #6 within white women. This group consists of white women, with an associate degree, working more than 50 hours a week. As indicated by the confidence measure, they receive a low-income prediction 100% of the time. This group could be of interest as their high education level and amount of working hours, would intuitively be associated with high incomes. This is further confirmed by the slift of 0.68, indicating that white men with the same degree and working hours are only associated with a low income 32% of the time. In the next section, we visualize the interface after a user has selected this at-risk group.

4.2. Selecting an At-Risk Subgroup 4.3. Checking Individual Discrimination

discrimination score.

Ultimately, this explanation behind IFAC’s rejection serves to assist humans expert in deciding whether the evidence of discrimination is strong enough to override the original low-income prediction. In this case, experts must decide whether high-income predictions for similar white men are justified by diferences in age and occupation or whether they indicate unfair treatment. At this point, they can also seek additional details for the afected instance, such as their exact occupation within the healthcare sector, to assess whether this should deserve a high income. These considerations, underscore the essential role of human domain experts within the selective classification framework of IFAC. While computational methods can efectively identify at-risk subgroups and potential individual discrimination, these should be viewed as decision-support tools rather than definitive arbiters of fairness. Visualizing IFAC’s rejected instances along with the explanations behind them, can guide experts into understanding unfairness issues and making more just predictions.

5. Conclusion & Future Work

In this paper, we introduced a prototype-GUI, that visualizes instances that were rejected by the selective classification algorithm IFAC. Behind all of IFACs unfairness-based rejections, it visualizes the explanations of why predictions are seen as unfair, making use of explainable-by-design methods of discriminatory association rule mining [ 7 ] and situation testing [ 8 ]. Through a practical scenario, we demonstrated how the GUI assists users in reviewing rejected instances while underscoring the indispensable role of human expertise in contextualizing, interpreting, and, when necessary, challenging the underlying patterns of bias.

Despite its potential, the GUI remains a prototype with room for further development. First, the current implementation is limited to a single decision task: income prediction on the folktables dataset. To evaluate the practical usability of the tool and its impact in the real world, future iterations should extend its functionality to support any classification task.

Additionally, the GUI currently serves only as a tool to view rejected predictions, without allowing users to modify them or explore the impact of corrective actions. Incorporating an interactive intervention feature, that enables users to adjust decisions and observe how fairness metrics evolve, would transform the GUI into an active bias mitigation solution.

Acknowledgments

This work was funded by the European Union under TANGO "It takes two to tango: a synergistic approach to human-machine decision making", Grant Agreement 101120763, CUP E53C23001170006. The work was also supported by the PNRR-M4C2-Investimento 1.3, Partenariato Esteso PE00000013“FAIR-Future Artificial Intelligence Research” Spoke 1 “Human-centered AI”, funded by the European Commission under the NextGeneration EU programme. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Health and Digital Executive Agency (HaDEA). Neither the European Union nor the granting authority can be held responsible for them.

Declaration on Generative AI

During the preparation of this work, the authors used ChatGPT to perform grammar checks. All generated content was reviewed by the authors, who take full responsibility for this publication.

[1]

Kamishima ,

Akaho ,

Sakuma , Fairness-aware learning through regularization approach , in: 2011 IEEE 11th international conference on data mining workshops, IEEE , 2011 , pp. 643 - 650 .

[2]

Wadsworth ,

Vera ,

Piech , Achieving fairness through adversarial learning: an application to recidivism prediction , arXiv preprint arXiv: 1807 . 00199 ( 2018 ).

[3]

Lenders , T. Calders, Real-life performance of fairness interventions-introducing a new benchmarking dataset for fair ml , in: Proceedings of the 38th ACM/SIGAPP symposium on applied computing , 2023 , pp. 350 - 357 .

[4]

Favier , T. Calders, Cherry on the cake: fairness is not an optimization problem , Machine Learning 114 ( 2025 ) 160 .

[5]

Goethals ,

Calders ,

Martens , Beyond accuracy-fairness: Stop evaluating bias mitigation methods solely on between-group metrics ( 2024 ).

[6]

Lenders ,

Pugnana ,

Pellungrini ,

Calders ,

Pedreschi ,

Giannotti , Interpretable and fair mechanisms for abstaining classifiers , in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases , Springer, 2024 , pp. 416 - 433 .

[7]

Pedreschi ,

Ruggieri ,

Turini , Measuring Discrimination in Socially-Sensitive Decision Records , SIAM , 2009 , pp. 581 - 592 .

[8]

B. L.

Thanh ,

Ruggieri , F. Turini, k-nn as an implementation of situation testing for discrimination discovery and prevention , in: KDD, ACM, 2011 , pp. 502 - 510 .