A Frictional Design Approach: Towards Judicial AI and its Possible Applications Caterina Fregosi1,* , Federico Cabitza1,2 1 Università degli Studi di Milano-Bicocca, Milan, Italy 2 IRCCS Ospedale Galeazzi-Sant’Ambrogio, Milan, Italy Abstract Decision support systems (DSS) are increasingly being integrated into high-stakes domains like healthcare, law, and finance, where critical decisions have significant consequences. Traditional DSS often provide a single, clear-cut recommendation, which can lead to automation bias and diminish the user’s sense of agency. However, there is a growing concern about the over-reliance on these systems and the potential for deskilling among users. The knowledge gap we aim to address is the development of decision support systems that effectively encourage critical reflection and maintain user engagement and responsibility in decision-making processes. In this workshop contribution, we report on the development of Judicial AI, a novel approach inspired by Frictional AI. Judicial AI diverges from traditional DSS by offering multiple, contrasting explanations to support different potential outcomes. This design encourages users to engage in deeper cognitive processing, thereby promoting critical reflection, reducing automation bias, and preserving the user’s sense of agency. This ongoing study employs a two-arm experiment to investigate the effects of this approach in the context of content classification tasks, comparing it with the traditional protocol. The expected outcomes of this ongoing study suggest that the Judicial protocol could not only mitigate automation bias but also safeguard users’ sense of agency and promote long-term skill retention. Keywords Frictional AI, Judicial AI, Human-AI Decision making process, eXplainable AI (XAI) 1. Introduction In domains where decision-makers face high-stakes scenarios with significant consequences, Decision Support Systems (DSS) are increasingly implemented. It is essential to support users not only in identifying the optimal decision but also in effectively managing the decision-making process. This approach aims to mitigate the detrimental effects of interaction, such as overcon- fidence or underconfidence [1], while fostering appropriate reliance on the decision support system [2]. This involves providing users with support to critically assess both their own reasoning processes and the AI system’s recommendations, a feature often absent in oracular decision support systems [3]. Such systems tend to offer clear-cut answers, thereby fostering an uncritical reliance on the system. Cooper (1999) introduced the concept of cognitive friction defined as “the resistance encountered by a human intellect when it engages with a complex HHAI-WS 2024: Workshops at the Third International Conference on Hybrid Human-Artificial Intelligence (HHAI), June 10—14, 2024, Malmö, Sweden * Corresponding author. $ c.fregosi@campus.unimib.it (C. Fregosi); federico.cabitza@unimib.it (F. Cabitza)  0009-0004-7626-8131 (C. Fregosi); 0000-0002-4065-3415 (F. Cabitza) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings system of rules” [4]. In technology domain, a design inspired by friction concept intentionally incorporates what Cox et al (2016) describe as design frictions “points of difficulty encountered during interaction with technology” [5] or what Cabitza et al (2019) term programmed ineffi- ciencies [6]. Contrary to trends aiming to create seamless interactions that promote speed and efficiency, a “positive friction ”strategy deliberately integrates these elements to improve user engagement and reflection [7]. The term Frictional AI was introduced by Cabitza et al (2024) as an umbrella term for a variety of methods aimed at encouraging reflection in human-AI decision making processes by introducing cognitive friction [8]. In the domain of Human-Computer Interaction (HCI), the design of decision support systems (DSS) that offer multiple, well-argued explanations for different hypotheses—rather than simply presenting the (allegedely) correct answer—presents two significant advantages that address cognitive and ethical concerns. Firstly, such a system is designed to mitigate the risk of automation bias, a well-documented phenomenon where users over-rely on automated systems [9], often accepting their outputs uncritically even when they are wrong [10]. By presenting multiple plausible explanations, backing up each option, the DSS compels the user to engage in deeper cognitive processing, comparing and contrasting the arguments put forth for each hypothesis. This engagement naturally limits automation bias, as users are less likely to defer uncritically to a single system-provided solution. Even when one explanation appears more convincing, the presence of alternative perspectives serves as a safeguard, ensuring that the user remains critical, reducing (but not eradicating) the chances of endorsing a false or irrelevant conclusion. Secondly, offering multiple explanations helps address a less explored but equally important issue in HCI: the potential loss of agency in human-AI interaction [11, 12], especially when the system is renown for its accuracy and reliability. When users are presented with only one “right” answer, they may gradually lose their sense of control and responsibility over decision-making processes [13]. This phenomenon, which can be assimilated to the concept of deresponsibilization [14], reflects the risk that users may start to perceive themselves as mere executors of the system’s decisions rather than active, responsible agents, which are accountable for the final decision (as they still are). Over time, this can lead to long-term consequences, including loss of motivation, loss of skill and hampered learning [8]. By fostering an environment where the user must evaluate and decide between multiple, well-supported hypotheses, the DSS preserves and even enhances the user’s sense of agency. The user remains an active participant in the decision-making process, fully responsible for the final choice, which in turn helps maintain and develop their cognitive skills. To this end, we have designed an experiment introducing a Judicial system, one of the protocol associated with Frictional AI, which involves an AI system providing contrasting plausible explanations that each support a different decision outcome [15, 8]. This resonates with the “agonistic machine learning” models [16] and with the Evaluative AI paradigm introduced by Miller (2023) [3] for explainable decision support. The novelty introduced by the Judicial AI system is that, inspired by the judicial domain, it proposes distinct explanations to support each of the two possible outcomes. In this project we investigate the textual generative setting in Judicial protocol for sentence classification and its effects in terms of accuracy, confidence, reliance, perceived responsibility and sense of agency for the decisions made. Figure 1: A schematic view of the study design. Participants are divided into two groups: one group receives AI contrastive explanations (Judicial Protocol), while the other group receives AI advice (Tradi- tional Protocol). Both groups make an initial decision and indicate their initial confidence level before interacting with the AI. After receiving the AI input, participants make a final decision and report their final confidence. The study also assesses participants’ perceived responsibility, sense of agency, and AI influence. 2. Methods As illustrated in 1, we will conduct a two-arm experiment to examine how different interaction designs influence decision-making in content moderation tasks. Participants in each arm will be presented with the same set of 30 sentences, previously identified as complex by state-of- the-art hate speech detection systems and sourced from a social media platform. The two arms of the study will employ distinct interaction protocols, which will be randomly assigned to participants. • Judicial: Participants will be presented with a sentence and asked to classify it as either hate speech or not hate speech. Additionally, they will be required to rate their confi- dence in their decision using a four-level ordinal scale, which ranges from “not at all confident” to “completely confident”. Following this initial judgment, participants will be shown the sentence again, accompanied by two arguments, generated by the Judicial AI system, presented in colored boxes: one in a pastel red box on the left, advocating for the classification of the sentence as hate speech, and another in a pastel blue box on the right, presenting an opposite argument. Participants will then be asked to provide their final decision and confidence rating using the same scale as before. This process will be systematically repeated for all 30 cases. To minimize potential order bias in the decision-making process, half of the participants assigned this protocol will encounter first the opposing viewpoint. Specifically, the pastel blue boxes, representing arguments for content classification as not hate speech, will be positioned on the left, while the pastel red boxes, supporting the classification of content as hate speech, will be on the right. • Traditional: The initial screen for each case will be consistent with the Judicial arm. Participants will first be asked to make an initial decision on whether the sentence constitutes hate speech or not, and to rate their confidence in this decision. After this, the system will present its classification of the sentence (hate speech or not) and participants will be asked to either confirm or reject this classification, providing their confidence level in this final decision. Pre-test and post-test questionnaires will be administered to assess participants’ trust in the AI system. Additionally, the post-test questionnaire will evaluate the sense of agency and responsibility participants perceive regarding the decisions they made during the study. We expect to address the following research questions: R1: Is the Traditional protocol associated with higher accuracy compared to the Judicial protocol? R2: Are respondents of the Judicial protocol more confident than Traditional ones in their own final decision? R3: Is there a significant difference in reliance behavior between the Judicial and Traditional protocols? R4: Do Judicial respondents feel a higher sense of agency and responsibility regarding their decisions compared to Traditional respondents? To address the proposed research questions, a series of analyses will be conducted 1 on the groups of users subjected to the Traditional and Judicial interaction protocols, as outlined in Table 1. 3. Expected results A DSS that offers multiple plausible explanations not only aligns with the principles of user- centered design but also plays a crucial role in maintaining critical engagement, preserving user agency, and ensuring the retention of decision-making skills, thereby addressing both automation bias and the risk of deskilling in human-AI interactions. By encouraging deeper and more critical reflection, this design reduces the risk of fostering undue user trust, which can contribute to the White Box Paradox [17]. However, it is important to note that the protocol could still inadvertently introduce bias if one explanation seems more convincing, even if it is incorrect. The adoption of the Judicial protocol in human-AI interaction is expected to have a significant impact on the quality of decisions made by users, in particular on perceived agency and control over their choices. Therefore, we believe Judicial AI could represent a promising direction in the study of improved decision support system processes, potentially increasing both the effectiveness of these systems and user satisfaction. Further research focused on refining Judicial protocols and examining their long-term effects could have significant implications for the design and implementation of future decision support systems. 1 with the tool available at https://mudilab.github.io/dss-quality-assessment/ Table 1 Research questions (RQs) and corresponding planned analyses to evaluate the impact of Judicial and Traditional protocols on user accuracy, confidence, reliance, and sense of agency in decision-making processes. RQ Research Question Planned analysis RQ1 Is the Traditional protocol associated with Compare initial and final accuracy levels higher accuracy compared to the Judicial of users in both the Traditional and Ju- protocol? dicial protocol groups to assess if the ab- sence of direct advice in the Judicial pro- tocol affects final accuracy. RQ2 Are respondents of the Judicial protocol Analyze and compare the final confidence more confident than Traditional ones in levels and the differences between initial their own final decision? and final confidence for both groups to de- termine if the Judicial protocol increases confidence in the final decision. RQ3 Is there a significant difference in reliance Compare reliance by analyzing how users between the Judicial and Traditional pro- in both groups rely on correct or incor- tocols? rect advice/explanations. Examine cases where the initial decision differs from the final decision to identify reliance patterns. RQ4 Do Judicial respondents feel a higher Analyze the final questionnaire responses sense of agency and responsibility regard- to compare the perceived levels of AI influ- ing their decisions compared to Tradi- ence, responsibility, and sense of agency tional respondents? between the two groups. Acknowledgments C. Fregosi and F. Cabitza acknowledge funding support provided by the Italian project PRIN PNRR 2022 InXAID - Interaction with eXplainable Artificial Intelligence in (medical) Decision making. CUP: H53D23008090001 funded by the European Union - Next Generation EU. References [1] T. Kliegr, Š. Bahník, J. Fürnkranz, A review of possible effects of cognitive biases on interpretation of rule-based machine learning models, Artificial Intelligence 295 (2021) 103458. [2] F. Cabitza, A. Campagner, R. Angius, C. Natali, C. Reverberi, Ai shall have no dominion: on how to measure technology dominance in ai-supported human decision-making, in: Proceedings of the 2023 CHI conference on human factors in computing systems, 2023, pp. 1–20. [3] T. Miller, Explainable ai is dead, long live explainable ai! hypothesis-driven decision support using evaluative ai, in: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, pp. 333–342. [4] A. Cooper, The inmates are running the asylum, Springer, 1999. [5] A. L. Cox, S. J. Gould, M. E. Cecchinato, I. Iacovides, I. Renfree, Design frictions for mindful interactions: The case for microboundaries, in: Proceedings of the 2016 CHI conference extended abstracts on human factors in computing systems, 2016, pp. 1389–1397. [6] F. Cabitza, A. Campagner, D. Ciucci, A. Seveso, Programmed inefficiencies in dss-supported human decision making, in: Modeling Decisions for Artificial Intelligence: 16th Interna- tional Conference, MDAI 2019, Milan, Italy, September 4–6, 2019, Proceedings 16, Springer, 2019, pp. 201–212. [7] Z. Chen, R. Schmidt, Exploring a behavioral model of" positive friction" in human-ai interaction, arXiv preprint arXiv:2402.09683 (2024). [8] F. Cabitza, C. Natali, L. Famiglini, A. Campagner, V. Caccavella, E. Gallazzi, Never tell me the odds: Investigating pro-hoc explanations in medical decision making, Artificial Intelligence in Medicine (2024) 102819. [9] Z. Buçinca, M. B. Malaya, K. Z. Gajos, To trust or to think: cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making, Proceedings of the ACM on Human-Computer Interaction 5 (2021) 1–21. [10] M. Vered, T. Livni, P. D. L. Howe, T. Miller, L. Sonenberg, The effects of explanations on automation bias, Artificial Intelligence 322 (2023) 103952. [11] H. Limerick, J. W. Moore, D. Coyle, Empirical evidence for a diminished sense of agency in speech interfaces, in: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015, pp. 3967–3970. [12] A. Galsgaard, T. Doorschodt, A.-L. Holten, F. C. Müller, M. P. Boesen, M. Maas, Artificial intelligence and multidisciplinary team meetings; a communication challenge for radiolo- gists’ sense of agency and position as spider in a web?, European Journal of Radiology 155 (2022) 110231. [13] R. Legaspi, W. Xu, T. Konishi, S. Wada, N. Kobayashi, Y. Naruse, Y. Ishikawa, The sense of agency in human–ai interactions, Knowledge-Based Systems 286 (2024) 111298. [14] C. Sureau, Medical deresponsibilization, Journal of assisted reproduction and genetics 12 (1995) 552–558. [15] C. Natali, et al., Per aspera ad astra, or flourishing via friction: Stimulating cognitive activation by design through frictional decision support systems, in: CEUR workshop proceedings, volume 3481, 2023, pp. 15–19. [16] M. Hildebrandt, Privacy as protection of the incomputable self: From agnostic to agonistic machine learning, Theoretical Inquiries in Law 20 (2019) 83–121. [17] F. Cabitza, A. Campagner, L. Ronzio, M. Cameli, G. E. Mandoli, M. C. Pastore, L. M. Scon- fienza, D. Folgado, M. Barandas, H. Gamboa, Rams, hounds and white boxes: Investigating human–ai collaboration protocols in medical diagnosis, Artificial Intelligence in Medicine 138 (2023) 102506.