=Paper=
{{Paper
|id=Vol-3908/paper_59
|storemode=property
|title=On Prediction-modelers and Decision-makers: Why Fairness Requires More Than a Fair
Prediction Model
|pdfUrl=https://ceur-ws.org/Vol-3908/paper_59.pdf
|volume=Vol-3908
|authors=Teresa Scantamburlo,Joachim Baumann,Christoph Heitz
|dblpUrl=https://dblp.org/rec/conf/ewaf/Scantamburlo0H24
}}
==On Prediction-modelers and Decision-makers: Why Fairness Requires More Than a Fair
Prediction Model==
On Prediction-Modelers and Decision-Makers: Why Fairness Requires More Than a Fair Prediction Model Teresa Scantamburlo1,* , Joachim Baumann2,3,* and Christoph Heitz3,* 1 Ca’ Foscari University of Venice, European Centre for Living Technology, Italy 2 University of Zurich, Switzerland 3 Zurich University of Applied Sciences, Switzerland Abstract This paper addresses the ambiguous relationship between prediction and decision in the field of prediction- based decision-making. Many studies blur these concepts, referring to ‘fair prediction’ without a clear differentiation. We argue that distinguishing between prediction and decision is crucial for ensuring algorithmic fairness, as fairness concerns the consequences on human lives created by decisions, not predictions. We clarify the distinction between the concepts of prediction and decision, and demonstrate how these two elements influence the final fairness properties of a prediction-based decision system. To this aim, we propose a framework that enables a better understanding and reasoning about the conceptual logic of creating fairness in prediction-based decision-making. Our framework delineates different roles, specifically the ‘prediction-modeler’ and the ‘decision-maker,’ and identifies the information required from each to implement fair systems. This framework facilitates the derivation of distinct responsibilities for both roles and fosters discussion on insights related to ethical and legal requirements. This is an extended abstract based on the full paper published in AI & SOCIETY [1]. Keywords prediction-based decision, algorithmic fairness, ethical decision-making, human-in-the-loop 1. Introduction Algorithmic fairness has emerged as a important topic within the Machine Learning (ML) research community during recent years [2, 3], attracting attention not only from a technical standpoint but also from philosophical, political, and legal perspectives [4, 5]. This literature builds upon established scholarship investigating the limits of classification systems and power asymmetries in data collection practices [6]. Concerned with the consequences of prediction- based decisions on individuals and groups from a social justice viewpoint [7], the discourse on algorithmic fairness has been focusing on the fairness of prediction models, which represent the core of ML research [8, 9, 10, 11, 12]. Given this focus, it is unsurprising that much of the debate has revolved around how prediction models can cause unfairness. We argue that the prediction model as such cannot be the reason for unfairness. Rather, it is the usage of the prediction model within its specific context that leads to unfairness. For instance, EWAF’24: European Workshop on Algorithmic Fairness, July 01–03, 2024, Mainz, Germany * All authors contributed equally. $ teresa.scantamburlo@unive.it (T. Scantamburlo); baumann@ifi.uzh.ch (J. Baumann); christoph.heitz@zhaw.ch (C. Heitz) 0000-0002-3769-8874 (T. Scantamburlo); 0000-0003-2019-4829 (J. Baumann); 0000-0002-6683-4150 (C. Heitz) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings the racial discrimination attributed to the COMPAS tool’s recidivism risk model [13] is not an inherent feature of the model itself. Discrimination arises only when judges make decisions based on the COMPAS risk scores. Consequently, the connection between the characteristics of a prediction model, such as its false-positive or false-negative rates, and the potential harm to specific societal groups, like African Americans in the case of COMPAS, hinges on the assumption of how the model’s outputs translate into tangible consequences for people’s lives. As a prototypical case of how prediction models are implemented in real-world applications, we focus our analysis on prediction-based decision systems, in which the outcomes of ML prediction algorithms are leveraged to make decisions impacting human lives. We imagine a (human or automated) decision-maker who is taking decisions on people or for people, who makes decisions about or for individuals, while these decisions are informed by a prediction regarding certain characteristics of the people involved. This scenario encapsulates many of the cases commonly associated with discussions on algorithmic fairness, including banks making loan decisions based on repayment predictions, companies making hiring decisions based on job performance prediction, or universities making admission decisions based on academic achievement predictions.1 In such prediction-based decision systems, we may distinguish two conceptually different functions: First, we have the function of prediction, performed by a prediction model that processes individual data of a person to produce a prediction of a target variable associated with this person, which is not known to the decision-maker at the time of decision-making. This prediction might come in the form of a score, a probability, or a point prediction. Second, we have the function of decision, which is informed by the prediction, but in nearly all cases also influenced by additional parameters. For example, in of a bank’s loan decision, not only the repayment probability but also the interest rate and the bank’s business strategy may be decisive parameters. This idea has been studied in so-called cost-sensitive learning problems [14]. However, it remains unclear how incorporating a fairness requirement alters the cost-sensitive approach and how prediction and decision functions interact in this process. This paper is primarily concerned with on group fairness, the most established and widely adopted category of fairness. Group fairness seeks to prevent systematic disadvantages in algorithmic decisions based on sensitive attributes (such as gender, age, or race) [15, 2]. While there are also other types of fairness, such as counterfactual fairness [16], individual fairness [11], and procedural fairness definitions [17], these are not addressed in this study. 2. The relation between predictions and decisions In popular narratives of algorithmic decision-making, the distinction between the idea of decision and that of prediction seems to be blurred. Neologisms like ‘fair prediction’ [18] or ‘fairness-aware learning’ [10] have become familiar within the ML community, inadvertently promoting the notion that fairness is an intrinsic property of prediction models. This conflation of concepts does not necessarily stem from an explicit ideological stance, and some studies 1 It is important to note that other scenarios, such as recommender systems where predictions inform individuals making decisions about themselves, also exist. While the findings of this paper may not be directly applicable in such contexts, they could serve as inspiration for future research. clearly specify that fairness is a characteristic pertinent to decision rules [19]. Nonetheless, formal characterizations often apply fairness criteria to the prediction model (e.g., the classifier), presupposing that the decision equates to the prediction’s outcome (e.g., see [20, 21, 22]). Such formulations suggest that the relation between prediction and decision is fixed and given, implying that a specific prediction directly leads to a specific decision. However, this is not true in many realistic examples, where the optimal decision depends on the prediction as well as on other parameters, as is explicitly acknowledged by the idea of cost-sensitive learning [14]. Thus, qualifying a prediction as ‘fair’ is misleading unless we explicitly assume how a prediction is transformed into a decision. In general, the fairness attribution applies more properly to the entire system (i.e., the combination of prediction and decision rules) rather than to the prediction alone. Abstract formalization facilitates the overlap between the concepts of prediction and decision. For example, in classification tasks, the objective of prediction—to choose from among several options—can be seen as a special type of decision-making. From this viewpoint, the functions performed by a prediction model and a decision-maker appear similar. However, moving beyond mathematical simplifications to consider ethical implications reveals that decision-making encompasses more than selecting alternatives; it involves actions that affect individuals and the environment. In other words, decisions change the status quo, thus bearing consequences for the decision-maker, the decision subjects, and potentially the broader environment. In contrast, a prediction in itself has no direct impact; its influence on decision-making is enabled only through a policy or decision rule that outlines the consequences of future actions. Take, for instance, a bank that decides whether to approve loan applications based on the predicted likelihood of repayment. Approving a loan has a tangible impact, offering the recipient enhanced financial flexibility and new purchasing opportunities. This benefit is withheld from applicants who are denied. Apparently, the prediction algorithm influences the decision, but the prediction itself is not what creates (un)fairness, it is the decision specifying how to use the prediction estimate. Note that even if the decision is fully determined by the prediction – a case which is rarely met – the question of whether the prediction algorithm is fair or not is conditioned on the assumed relation between prediction and decision rule. This is why we conceptually suggest to clearly distinguish between the two elements of prediction and decision, which both are ingredients of any prediction-based decision system, whether it be fully automatic or also influenced by humans. Most importantly, distinguishing these concepts encourages a broader examination of algorithmic decision-making as a process embedded in social constructs, reflective of value judgments and power asymmetries. 3. Responsibilities of prediction-modelers and decision-makers For studying the interaction of prediction and decision, we introduce a framework allowing us to distinguish the tasks and responsibilities of two different roles: The role of the ‘prediction- modeler,’ and the role of the ‘decision-maker.’ Drawing on decision-theoretic concepts, we may think of two different agents, one being responsible for the prediction model and the other one being responsible for the decision-making. Our motivation for distinguishing these roles is not only fed by the theoretical analysis of how predictions are converted into (un)fair treatment as discussed above, but also by the practical observation that these two roles are often split organizationally and covered by different people, different departments, or even different companies. From a responsibility standpoint, the decision-maker is responsible for the decisions, and hence their consequences. Conversely, prediction-modelers have their own set of responsibilities. They are responsible for creating the basis for a good decision, which consists in (a) delivering a meaningful and robust prediction (e.g., think of transparency and safety requirements in [23]), and (b) supplying the decision-maker with all necessary information to address fairness and other relevant ethical obligations (refer to accountability and fairness considerations in [23] and the requirements outlined by [24]). The rationale for differentiating these roles stems from their distinct objectives. Prediction- modelers focus on achieving high prediction performance, such as accuracy, which poses challenges in the context of consequential decisions (see, e.g. [25] and [26]). Our framework2 is grounded in a decision-theoretic analysis of prediction-based decision- making, assuming binary outcomes for both the decision 𝐷 and the decision-critical unknown variable 𝑌 , and links to existing literature that conceptualizes fairness as a decision-theoretic problem (see, for example, [27] and [28]). The analysis demonstrates that rational decision- making is an optimization problem, reliant on the decision-maker’s utility function and the prediction 𝑃 (𝑌 = 1). Accordingly, the ideal output of a prediction model for a decision-maker is a probabilistic model, in particular a calibrated score. Interestingly, this does not change if a fairness constraint is added to the decision problem – the only change is in the decision rule. These results suggest that to achieve fairness while still optimizing for a decision-maker’s goal, the task of the prediction-modeler is not to deliver a ‘fair’ prediction model, but to deliver not- skewed estimation of the outcome of interest. On the other hand, the task of the decision-maker is to combine the prediction with their goal achievement. We also analyze the necessary interaction between the two roles to enable the development of fair decision systems. The decision-maker should communicate details about the target variable 𝑌 and group attributes to the prediction-modeler, who, in turn, must relay information on model performance, calibration functions, and group-specific baseline distributions. This study focuses on human decision-makers at the final stage of the AI decision process. We know there are other important decision-makers involved earlier, like human annotators and trainers. While their roles are also important, discussing all human influences is beyond the scope of this paper. 4. Discussion and conclusions Our findings underscore that different actors bear varying responsibilities towards the collective aim of creating a fair system.3 In algorithmic decision-making under group fairness constraints, these obligations translate into specific pieces of information that each role is expected to deliver to the other. The deliverables suggested in our framework are not optional. They can be derived 2 We focus on a post-processing approach to fairness. For a more extensive discussion of how our approach relates to pre-processing and in-processing techniques see the full article [1]. 3 Here, we focus more specifically on professional responsibility, that is, the set of obligations based on a role played in a certain context. Responsibility, of course, extends beyond roles, and for a broader discussion see [29]. from the very nature of the decision problem and establish a strong interdependence between the roles involved. While we acknowledge that the ultimate responsibility for ensuring fairness in decisions lies with the decision-maker, their ability to address fairness issues depends heavily on the work of the prediction-modeler. Without a minimum set of essential information regarding both the training population and performance characteristics of the prediction model, a decision maker cannot guarantee fair decisions. The interdependence between the roles recalls the problem of creating meaningful com- munication channels among parties involved in the design and implementation of artificial intelligence (AI) systems. Specifically, it clarifies the need for a structured exchange of informa- tion critical for achieving fairness in prediction-based decision-making. The line between organizations that design and those that operate decision-making systems can be unclear and complicated. For example, when banks work with credit agencies, it can be hard to separate their roles in ensuring fairness – credit agencies might use banks’ preferences in their models. This highlights the need to look at fairness in decision-making not just in theory but also by considering the actual roles and institutions involved. References [1] T. Scantamburlo, J. Baumann, C. Heitz, On prediction-modelers and decision-makers: why fairness requires more than a fair prediction model, AI & SOCIETY (2024) 1–17. doi:10.1007/s00146-024-01886-3. [2] S. Barocas, M. Hardt, A. Narayanan, Fairness and Machine Learning, fairmlbook.org, 2019. [3] M. Kearns, A. Roth, The Ethical Algorithm: The Science of Socially Aware Algorithm Design, Oxford University Press, Inc., USA, 2019. [4] R. Binns, Fairness in Machine Learning: Lessons from Political Philosophy, Technical Report, 2018. URL: http://proceedings.mlr.press/v81/binns18a.html. [5] S. Barocas, A. D. Selbst, Big Data’s Disparate Impact, California Law Review 104 (2016) 671–732. URL: http://www.jstor.org/stable/24758720. [6] G. C. Bowker, S. L. Star, Sorting things out: Classification and its consequences, MIT press, 2000. [7] D. K. Mulligan, J. A. Kroll, N. Kohli, R. Y. Wong, This thing called fairness: Disciplinary confusion realizing a value in technology, Proceedings of the ACM on Human-Computer Interaction 3 (2019) 1–36. [8] D. Pedreschi, S. Ruggieri, F. Turini, Discrimination-Aware Data Mining, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, Association for Computing Machinery, New York, NY, USA, 2008, p. 560–568. URL: https://doi.org/10.1145/1401890.1401959. doi:10.1145/1401890.1401959. [9] T. Calders, S. Verwer, Three Naive Bayes Approaches for Discrimination-Free Clas- sification, Data Min. Knowl. Discov. 21 (2010) 277–292. URL: https://doi.org/10.1007/ s10618-010-0190-x. doi:10.1007/s10618-010-0190-x. [10] T. Kamishima, S. Akaho, H. Asoh, J. Sakuma, Fairness-Aware Classifier with Prejudice Remover Regularizer, in: P. A. Flach, T. De Bie, N. Cristianini (Eds.), Machine Learning and Knowledge Discovery in Databases, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 35–50. [11] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, R. Zemel, Fairness through awareness, in: ITCS 2012 - Innovations in Theoretical Computer Science Conference, ACM Press, New York, New York, USA, 2012, pp. 214–226. URL: http://dl.acm.org/citation.cfm?doid=2090236. 2090255. doi:10.1145/2090236.2090255. [12] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, C. Dwork, Learning Fair Representations, in: S. Dasgupta, D. McAllester (Eds.), Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, PMLR, Atlanta, Georgia, USA, 2013, pp. 325–333. URL: https://proceedings.mlr.press/v28/zemel13.html. [13] J. Angwin, J. Larson, S. Mattu, L. Kirchner, Machine bias, ProP- ublica, May 23 (2016) 139–159. URL: https://www.propublica.org/article/ machine-bias-risk-assessments-in-criminal-sentencing. [14] C. Elkan, The Foundations of Cost-Sensitive Learning, in: Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001, pp. 973–978. [15] R. Binns, On the apparent conflict between individual and group fairness, in: FAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, Inc, New York, NY, USA, 2020, pp. 514–524. URL: https://dl.acm.org/doi/10.1145/3351095.3372864. doi:10.1145/3351095.3372864. [16] M. J. Kusner, J. Loftus, C. Russell, R. Silva, Counterfactual Fairness, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 30, Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/paper/2017/file/ a486cd07e4ac3d270571622f4f316ec5-Paper.pdf. [17] N. Grgić-Hlača, M. B. Zafar, K. P. Gummadi, A. Weller, Beyond Distributive Fair- ness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learn- ing, Proceedings of the AAAI Conference on Artificial Intelligence 32 (2018). URL: https://ojs.aaai.org/index.php/AAAI/article/view/11296. [18] A. Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments., Big data 5 (2017) 153–163. doi:10.1089/big.2016.0047. [19] S. Corbett-Davies, S. Goel, The measure and mismeasure of fairness: A critical review of fair machine learning, arXiv preprint arXiv:1808.00023 (2018). [20] M. B. Zafar, I. Valera, M. G. Rodriguez, K. P. Gummadi, Learning fair classifiers, arXiv preprint arXiv:1507.05259 1 (2015). [21] A. K. Menon, R. C. Williamson, The cost of fairness in binary classification, in: S. A. Friedler, C. Wilson (Eds.), Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, PMLR, New York, NY, USA, 2018, pp. 107–118. URL: http://proceedings.mlr.press/v81/menon18a.html. [22] R. Berk, H. Heidari, S. Jabbari, M. Kearns, A. Roth, Fairness in Criminal Justice Risk Assessments: The State of the Art, Sociological Methods & Research 50 (2021) 3–44. URL: https://doi.org/10.1177/0049124118782533. doi:10.1177/0049124118782533. [23] High-Level Expert Group on Artificial Intelligence, Ethics guidelines for trustworthy AI, Technical Report, European Commission, Brussles, 2019. URL: https://op.europa.eu/en/ publication-detail/-/publication/d3988569-0434-11ea-8c1f-01aa75ed71a1. [24] European Commission, Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on AI and amending certain union legislative acts, Technical Report, Brussels, 2021. URL: https://eur-lex.europa.eu/legal-content/EN/TXT/ ?qid=1623335154975&uri=CELEX%3A52021PC0206. [25] S. Athey, The Impact of Machine Learning on Economics, in: A. Agrawal, J. Gans, A. Goldfarb (Eds.), The Economics of Artificial Intelligence: An Agenda, University of Chicago Press, 2019, pp. 507–552. [26] S. Athey, Beyond prediction: Using big data for policy problems, Science 355 (2017) 483–485. doi:10.1126/science.aal4321. [27] N. S. Petersen, An Expected Utility Model for “Optimal” Selection, Journal of Educational Statistics 1 (1976) 333–358. URL: https://doi.org/10.3102/10769986001004333. doi:10.3102/ 10769986001004333. [28] R. L. Sawyer, N. S. Cole, J. W. L. Cole, Utilities and the Issue of Fairness in a Decision Theoretic Model for Selection, Journal of Educational Measurement 13 (1976) 59–76. URL: http://www.jstor.org/stable/1434493. [29] Van de Poel Ibo and Royakkers Lambèr, Ethics, Technology and Engineering:An Introduc- tion, Wiley-Blackwell, 2011.