A Fair Selective Classifier to Put Humans in the Loop Daphne Lenders1,2 , Andrea Pugnana3 , Roberto Pellungrini4 , Toon Calders1,2 , Fosca Gianotti4 and Dino Pedreschi3 1 Adrem Data Lab, University of Antwerp, Antwerp, Belgium 2 DigiTax, University of Antwerp, Antwerp, Belgium 3 KDD Lab, University of Pisa, Pisa, Italy 4 KDD Lab, Scuola Normale Superiore, Pisa, Italy Abstract In this paper we propose a practical human-in-the-loop approach for algorithmic fairness, utilizing the selective classification framework. We describe a classification model that abstains from making predictions in cases of unfairness or uncertainty. Any rejected predictions can be passed on to a human expert, to review the possible unfairness issues and make the decisions more just. Keywords Fair Classification, Selective Classification, Human in the loop 1. Introduction Fairness in automated decision-making tasks has been an ongoing research area for the last 15 years. While so far fairness has often been treated as a mathematical notation to be optimized, recently more attention has been paid to its highly context-dependent nature. Computer Science and legal scholars have argued that the fairness of an entire system cannot be expressed through a single number, but that instead a system’s fairness should be assessed by studying where unfairness occurs, which subgroups are affected by it and in which cases any disparate treatment might be justifiable [1, 2, 3]. Hence, also to improve the fairness of a decision-making model, optimizing for one single fairness notion is not sufficient and instead, one should take a context- dependent approach and fix unfairness where it occurs [1, 3]. Since it is difficult to automatize such nuanced considerations the call for having a human expert, with sufficient knowledge about a domain, is growing. This call is backed by AI legislation, with the recently passed EU AI Act stating in its Article 14 that any “high-risk" AI system should be overseen and adaptable by a human, to minimise any risks that might otherwise be posed by the system [4]. While the necessity for having a human-in-the-loop is clear, no practical guidelines are given about how a human could oversee a decision-making process, especially if a system makes decisions for thousands of individuals that cannot all be manually reviewed. In our paper, we propose to utilize the framework of selective classification [5]. Selective classification allows building classifiers that can refrain from predicting when not confident enough. This allows one to trade off predictive performance for coverage, i.e. the percentage of EWAF’24: European Workshop on Algorithmic Fairness, July 01–03, 2024, Mainz, Germany $ daphne.lenders@uantwerpen.be (D. Lenders) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings instances for which a prediction is provided. In our paper, we extend the selective classification framework to take into account not only the uncertainty around a prediction but also its unfairness. Possibly unfair instances can then be passed on to human experts for review. Our selective classifier also provides explanations for why predictions are perceived as unfair, which can further help experts in making more well-informed decisions. 2. Methodology In this section, we illustrate our selective classifier on the running example of the “income dataset" [6]. This is a dataset, consisting of information about individuals’ working life, their education, and their demographics. The associated classification task is to predict peoples’ income level, specifically, if it is above 50K a year. To have a clearer idea of how this prediction task may be relevant in real life, imagine a bank using the “high income" prediction as a proxy for whether an individual has the financial means to pay back a loan. In this task, we consider the attributes “sex" and “race" as sensitive information, that may serve as grounds for illegal discrimination. In this data the possible values for sex are “male" and “female" and the possible values for race are “white", “black" and “other". A standard classification model, that we have trained on this data, performs best on the reference group of white men while other groups are at higher risk of unfair treatment. This unfair treatment holds in terms of the model’s predictions and its errors. In other words, the ratio of positive decision outcomes (high income) is considerably lower for non-white-men than for white men. Also, the “False Negative" errors for these groups are severe, meaning that even if individuals from these groups have a high income, the classifier is likely to predict a low income for them. We have created a selective classification model that can increase the fairness and the accuracy of such a base classifier, by not making predictions for a) individuals that it is not certain about and b) for individuals that it is certain about, but where the prediction is biased. In Figure 1 we describe the basic intuition of how this selective classification model works. It consists of a base classifier, that makes an initial prediction for an instance and a rejector, that decides whether to keep, reject or intervene on this prediction. To do so, it receives an instance along with the associated prediction label and prediction probability and first analyses the label’s fairness on a global and local level. For the first, it checks if the instance and prediction fall under any global patterns of unfairness that have been established in the base classifier, using the methodology of possibly-discriminated subgroups by [7]. For the local fairness check, the Situation Testing algorithm is executed, and the prediction label for the instance in question is compared to the labels of similar instances in the dataset [8]. If both the global and local fairness checks fail, we consider the prediction label to be unfair. The rejector then takes the prediction probability of the base classifier as a proxy for its certainty, and depending on that performs a fairness intervention or abstains from making a decision. If the certainty of the prediction is below a certain threshold, a fairness intervention is performed; otherwise, it rejects the original prediction. The reasoning behind the fairness intervention is that an uncertain and unfair prediction is likely to be inaccurate, and it is safe to alter it. In the case in which the rejector has not deemed a predicted label as unfair, it may still abstain from predicting in case the prediction probability falls under a dedicated threshold. Thus, the rejector rejects fair but Rejector Test Black Box Fairness Analysis Certainty Analysis Instance Prediction Global Check Using Certain Uncertain Unfair Subgroups + Fair Predict Reject Local Check Using Unfair Reject Intervene Situation Testing t_fair_certain, t_unfair_certain Figure 1: Basic illustration of our selective classification framework. uncertain predictions and only keeps the original prediction if it is both fair and certain. Illustrative Example To better illustrate the idea behind our selective classifier, we will go through an illustrative example, showing how our selective classifier rejects the base prediction for a woman, at risk of discrimination. In Figure 2 we see that a baseline classifier needs to make a prediction for a married woman, aged between 60-69 years and working in the Sales sector. The baseline classifier predicts that she has a low income with a probability of 74.17%. To decide whether to keep this prediction our rejector first assesses if this prediction falls under any global patterns of unfairness. To do so, it has a list of subgroups that the classifier is known to behave unfairly on [7]. In this case, our instance falls under the subgroup of women, aged between 60 and 69, working in the Sales sector. This subgroup is deemed to be at risk of discrimination because, the classifier is known to predict a low income for them 90% of the times, compared to only 40% of times for the same subgroup who are not female. Because of the high difference, the prediction has failed the first global fairness check, and a local fairness check is performed. In this case, the 3 most similar instances from the reference group of white men, and the non-reference group are selected and their positive label ratios are compared to each other. This allows the rejector to not just focus on the classifier’s behaviour on individuals working in Sales and aged between 60 and 69; but also take other relevant characteristics, like their education and their amount of working hours into account. Because even on this fine-grained analysis, the reference group receives more favourable treatment (2/3 positive labels compared to 0/3), the individual fairness check fails. The overall prediction is therefore deemed unfair, and the rejector needs to decide whether to perform a fairness intervention or reject the prediction. To do so, it checks if the prediction probability of 74.17% falls above some threshold (which is learned in a separate step not described in this methodology) of certainty. Since in this case, it does, we fall into the case of an unfair but certain prediction, and the rejector rejects the originally predicted label. In a next step, the instance could be passed on to a human expert who can review the decision in more detail. The rejector’s global and local fairness analysis can help in making the rejection process more transparent and let human experts make more well-informed decisions. 3. Preliminary Results & Discussion In Table 1 we show some preliminary results of applying our fair selective classifier on the income dataset. We compare the performance of a full coverage classifier (BC), with the performance over all non-rejected instances of a regular uncertainty-based classifier (USC) and our fair selective classifier (FSC). Both selective classifiers had a coverage of 80%, meaning they could reject 20% of the instances. All performances are averaged over 10 test sets. Regarding Base Classifier Prediction Global Fairness Check Why is this marked as discriminatory? Age: 60 - 69 - For this subgroup classifier predicts 'low income' 90% of the time Sex: Female At-Risk of Discrimination Race: White - For opposing group 'low income' is only predicted 40% of the time Marital Status: Married sex = Female AND age = 60 - (NOT Female, 60-69 years, working in Sales) Education: High School 69 AND occupation = Sales Diploma race = Black AND education = Local Fairness Check Workinghours: 40-49 Master AND age = 50-59 Certainty Check Predictions for similar instances 'White & Male' Global + Local Check Fail Workclass: Private M. Status Education W.Hours W.Class Pred. Prediction Probability = 0.7417 Occupation: Sales sex = Female AND race = Other AND occupation = Engineering Married High School 40-49 Private High Married High School 40-49 Private High Probability lays above Divorced High School 40-49 Private Low certainty threshold for unfair At-Risk of Favouritism predictions Predictions for similar instances NOT 'White & Male' sex = Male AND race = White M. Status Education W.Hours W.Class Pred. AND education = Bachelor AND Unfair + Certain Prediction: Married High School 40-49 Private Low REJECT! workinghours = More than 50 Predicted Label: Low Income Widow High School 40-49 Private Low Prediction Probability: 0.7417 sex = Male AND race = White Married High School 30-39 Private Low AND education = Master Individual Discrimination Score = (2/3) - (0/3) = 2/3 Figure 2: An example of how a prediction of a base-classifier is rejected, because of unfairness concerns. Table 1 Performances of a baseline classifier, a regular selective classifier and our fair selective classifier All M. Wh. F. Wh. M. Bl. F. Bl. M. Oth. F. Oth. BC .78 ± .01 BC .33±.03 .57±.03 .57±.09 .60±.11 .44±.18 .59±.22 Accuracy USC .83 ± .01 FNR CSC .26±.03 .54±.04 .61±.11 .67±.10 .30±.18 .54±.26 FSC .80 ± .01 FSC .37±.04 .44±.06 .57±.08 .49±.11 .41±.17 .52±.25 BC .65 ± .03 BC .24±.03 .10±.01 .12±.04 .05±.01 .08±.07 .05±.05 Precision USC .69 ± .03 FPR USC .20±.03 .06±.01 .07±.03 .02±.01 .07±.08 .03±.04 FSC .64 ± .03 FSC .18±.03 .11±.01 .10±.04 .04±.02 .08±.07 .05±.05 BC .57 ± .02 BC .43±.02 .17±.01 .17±.03 .09±.01 .18±.07 .13±.07 Pos. Recall USC .62 ± .02 USC .43±.03 .13±.01 .12±.03 .05±.02 .16±.07 .10±.07 Ratio FSC .59 ± .04 FSC .36±.02 .20±.01 .16±.03 .09±.02 .17±.08 .15±.07 overall performance, we see that both selective classifiers manage to increase the performance of the base classifier, by abstaining from some of its decisions. The performance increase of the uncertainty-based classifier is higher, but when focussing on the False Positive-, False Negative and Positive Decision Ratios over different demographics, we see that this comes at the cost of fairness. Regarding all measures, a non-selective baseline has high differences between the group of white men and other groups. In many cases, this difference is increased with the uncertainty-based classifier: it mostly improves the performance for the reference group, but decreases it for others. Our selective classifier manages to make the performance measures over all groups more equal, decreasing the models errors in regards to its False Positive Rates for white men, and its False Negative Rates for other groups. This also results in less differences in Positive Decision Ratios across demographics. While the results of our method are not perfect, and the error rates on e.g. the group of black men are still quite high, we believe that a human-in-the-loop can further enhance the fairness of the system: for instance, they can equalize the positive decision ratios by giving more positive decision labels over all rejected instances of minority groups. Further, they could improve the system, by embedding their domain knowledge in it; e.g. specifying known subgroups at risk of discrimination (used for the global fairness check), that might have been missed by our system. Acknowledgments D. Lenders and T. Calders were funded by Digitax Centre of Excellence UAntwerp and by Research Foundation Flanders under FWO file number: V467123N A. Pugnana and R. Pellungrini and D. Pedreschi and F. Giannotti have received funding by PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - “FAIR - Future Artificial Intelligence Research" - Spoke 1 “Human-centered AI", funded by the European Commission under the NextGeneration EU programme, ERC-2018-ADG G.A. 834756 “XAI: Science and technology for the eXplanation of AI decision making” and Prot. IR0000013. This work was also funded by the European Union under Grant Agreement no. 101120763 - TANGO. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Health and Digital Executive Agency (HaDEA). Neither the European Union nor the granting authority can be held responsible for them. The work has also been realised thanks to NextGenerationEU - National Recovery and Resilience Plan, PNRR) - Project: “SoBigData.it - Strengthening the Italian RI for Social Mining and Big Data Analytics” - Prot. IR000001 3 - Notice n. 3264 of 12/28/2021. References [1] D. Lenders, T. Calders, Users’ needs in interactive bias auditing tools introducing a require- ment checklist and evaluating existing tools, AI and Ethics (2023) 1–29. [2] S. Costanza-Chock, I. D. Raji, J. Buolamwini, Who audits the auditors? recommendations from a field scan of the algorithmic auditing ecosystem, in: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1571–1583. [3] S. Wachter, B. Mittelstadt, C. Russell, Why fairness cannot be automated: Bridging the gap between eu non-discrimination law and ai, Computer Law & Security Review 41 (2021) 105567. [4] The European Commission, The EU Artificial Intelligence Act - Article 14, 2023. https://artificialintelligenceact.com/title-iii/chapter-2/article-14/. [5] K. Hendrickx, L. Perini, D. V. der Plas, W. Meert, J. Davis, Machine learning with a reject option: A survey, ArXiv abs/2107.11277 (2021). URL: https://api.semanticscholar.org/ CorpusID:236318084. [6] F. Ding, M. Hardt, J. Miller, L. Schmidt, Retiring adult: New datasets for fair machine learning, Advances in neural information processing systems 34 (2021) 6478–6490. [7] D. Pedreschi, S. Ruggieri, F. Turini, Discrimination-aware data mining, in: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 560–568. [8] B. T. Luong, S. Ruggieri, F. Turini, k-nn as an implementation of situation testing for discrimination discovery and prevention, in: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 502–510.