<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>How to Explain in XAI? - Investigating Explanation Protocols in Decision Support Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Caterina Fregosi</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The objective of this proposal is to bridge the gap between Deep Learning (DL) and System Dynamics (SD) by developing an interpretable neural system dynamics framework. While DL excels at learning complex models and making accurate predictions, it lacks interpretability and causal reliability. Traditional SD approaches, on the other hand, provide transparency and causal insights but are limited in scalability and require extensive domain knowledge. To overcome these limitations, this project introduces a Neural System Dynamics pipeline, integrating Concept-Based Interpretability, Mechanistic Interpretability, and Causal Machine Learning. This framework combines the predictive power of DL with the interpretability of traditional SD models, resulting in both causal reliability and scalability. The eficacy of the proposed pipeline will be validated through real-world applications of the EU-funded AutoMoTIF project, which is focused on autonomous multimodal transportation systems. The long-term goal is to collect actionable insights that support the integration of explainability and safety in autonomous systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Context and Motivation</title>
      <sec id="sec-1-1">
        <title>1.1. Technical and Ethical Risks in Human-AI Collaboration</title>
        <p>
          This quiet moment of reliance illustrates a growing concern in the integration of AI into high-stakes
decision-making: not only how to improve immediate performance, but also how to preserve human
agency and responsibility. Clinical Decision Support Systems (CDSSs) that incorporate AI models
are increasingly capable of supporting clinicians with diagnostic and treatment decisions, ofering
the promise of improved eficiency and accuracy. However, these systems also introduce a complex
set of risks. At a functional level, there is the danger of inappropriate reliance—users may over-trust
AI and accept its outputs uncritically, or they may reject beneficial AI support. A central challenge
is to foster appropriate reliance, the ability to accept AI-generated recommendations when they are
correct, and to reject them when they are misleading [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Beyond performance-related concerns lie
more structural risks that pertain to the quality and ethics of human-AI collaboration itself. As AI
systems become increasingly authoritative in their presentation, they may gradually erode clinicians’
sense of responsibility, reflective reasoning, and agency [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Over time, this can contribute to the
erosion of diagnostic capabilities (i.e., deskilling) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and the inhibition to develop these skills. Clinicians
may lose the motivation or even the capacity to critically engage with complex cases, particularly
when AI systems ofer confident recommendations with limited scope for deliberation. These risks are
particularly salient in medical contexts, where accountability and diagnostic reasoning are integral to
both professional identity and clinical accuracy.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Explainable AI: Promise and Limitations</title>
        <p>
          In response to these unintended consequences of human-AI interaction, Explainable AI (XAI) has
emerged as a prominent research direction, aiming to make certain aspects of a system more
understandable and interpretable [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          In this context, we adopt a specific view of explanations as “the (characterizing) output of a XAI system,
that is, the output of any computational system aimed at making AI-generated advice more
understandable, appropriable and exploitable by their intended users” [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In other words, XAI explanations are
not necessarily "explanations" in the everyday sense, but rather clarifying outputs about the system’s
primary recommendation. Nevertheless, growing empirical evidence shows that explanations alone
do not necessarily improve decision quality [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]; in fact, explanations can sometimes mislead users in
various ways. For a phenomenon known as white-box paradox , users may develop overtrust in AI
systems simply because the system’s internal logic appears intelligible or well-justified, regardless
of the actual correctness of the advice. In such cases, the more we see (or we think we do) inside
the system, the more we trust it, and persuasive but flawed explanations can mask underlying errors,
leading users to accept incorrect recommendations. In our own research, we identified a complementary
efect (the XAI Halo Efect ) in which misleading explanations degraded decision quality even when
the AI recommendations themselves were accurate [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. These efects challenge the assumption that
increased explainability inherently leads to improved outcomes in hybrid human-AI collaboration.
        </p>
        <p>
          Moreover, current systems interaction protocols often assume that users will interpret advices and
explanatory cues as intended by designers. In practice, however, users bring their own mental models,
domain expertise, and cognitive habits. Thus, what matters is not only the content of the explanation, but
also the interaction protocol, that is, the way system outputs are framed, structured, and integrated into
decision-making workflows. In this light, explanations must be not only intelligible but also behaviorally
efective, that is, capable of promoting appropriate reliance, mitigating cognitive biases, and supporting
collaborative decision-making. One commonly used explanatory strategy is the display of confidence
scores, which serve as indicators of the system’s certainty in its own recommendations. However, our
recent study, currently under review, demonstrates that such cues are far from neutral [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. When
wellcalibrated, confidence scores significantly improved decision accuracy and reduced both automation bias
and conservatism bias. In contrast, miscalibrated confidence—for instance, expressing high certainty in
incorrect outputs—led users to follow flawed advice or disregard accurate recommendations, ultimately
impairing performance. These findings underscore a broader point: efective explainability is not solely
a matter of transparency, but of alignment—between what is presented, how it is interpreted, and
the behaviors it elicits. Designing explanations that are user-tailored and that support collaborative
reasoning, rather than passive acceptance, is essential to minimizing unintended consequences.
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Rethinking Human-AI Interaction: Beyond the Oracular Model</title>
        <p>These concerns call for a shift from explanation as output to explanation as interaction design, able to
foster critical reflection and to preserve user agency and responsibility.</p>
        <p>
          Nevertheless, most CDSSs are still designed according to an “Oracular” model, in which a single
recommendation is presented with the intent to persuade rather than promote dialogue [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. This
approach can lead users to accept AI-generated recommendations passively. As a response, researchers
have begun to explore alternative explanation paradigms [
          <xref ref-type="bibr" rid="ref10 ref11 ref9">10, 9, 11</xref>
          ], designed to prompt users to actively
engage with the decision space, weighing competing arguments rather than passively accepting AI
outputs. Building on this line of research, we propose a protocol, Judicial, which draws inspiration from
deliberative practices in judicial settings. Rather than ofering a definitive recommendation, the system
presents two contrastive explanations, each supporting a diferent decision outcome. The aim is to
re-engage the user’s discriminative capacities by requiring an active choice between alternatives. This
approach motivates my broader research objective: not merely to improve diagnostic accuracy, but to
investigate which forms of explanation—and how their content, structure, and framing influence users’
engagement with AI systems. Specifically, I aim to examine how diferent explanatory strategies afect
critical reflection, perceived agency, and responsibility, and how these efects may vary depending on
individual expertise, cognitive style, and decision-making preferences.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>
        There is a growing consensus that Human-AI collaboration should not be conceived in terms of a
single optimal design, but rather as a socio-technical intervention whose impact depends on how the
system is embedded into users’ workflows and cognitive routines. Our recent work, currently under
review, on protocol-driven design in hybrid intelligence, emphasizes that diferent interaction protocols
influence not only decision accuracy, but also patterns of user reliance, the degree of dependence on AI,
and the long-term risk of professional deskilling [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This shift reflects a broader move away from
one-size-fits-all systems that deliver authoritative recommendations, toward adaptive and reflective
interaction models tailored to users’ roles, expertise, and decision-making needs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In this view,
supporting agency and responsibility in AI-supported decision-making requires not just technical
robustness or explainability, but also careful attention to the mode and timing of human-AI interaction.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Agency</title>
        <p>
          Human agency, understood as the subjective experience of authorship and responsibility over one’s
decisions, involves more than the capacity to choose—it entails experiencing those choices as
authentically one’s own. In decision-making contexts supported by AI, this sense of experiential ownership
can be undermined. Specifically, Explainable AI systems that provide a single recommendation and
a persuasive explanation in an authoritative or overly confident manner may lead users to gradually
disengage from the decision-making process, thereby reducing opportunities for reflective thinking [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          Over time, this erosion of agency may give rise to a phenomenon akin to deresponsibilization, in
which users come to view themselves less as accountable decision-makers and more as passive executors
of algorithmic outputs [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. These risks are particularly pronounced in high-stakes domains such as
clinical diagnostics, where active user engagement with AI outputs is critical for ensuring both safety
and the quality of decision-making. Addressing these challenges requires the development of interaction
models that go beyond mere explanation and instead support the user’s role as a responsible agent
within the decision-making loop.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Frictional AI and the Judicial Protocol</title>
        <p>
          While decision support systems have demonstrated clear benefits in improving diagnostic accuracy, their
widespread adoption has raised significant concerns regarding uncritical reliance [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and the gradual
erosion of sense of agency, responsibility, and diagnostic skills [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. These risks are particularly evident
in systems that follow an oracular interaction model, in which a single, confident recommendation is
presented [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. In response, recent research has emphasized the importance of interaction protocols that
actively support users’ cognitive engagement and promote reflective, accountable decision-making.
Such protocols include mechanisms that prompt users to interrogate both their own reasoning and
the system’s recommendations. Building on the concept of cognitive friction, some design strategies
deliberately introduce “positive friction” to foster human reflection and reduce uncritical acceptance [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>To this end, our research group (MUDI Lab, University of Milano-Bicocca) introduced the term
Frictional AI as an umbrella concept for a variety of methods aimed at encouraging reflection in human-AI
decision making processes by intentionally introducing cognitive friction.</p>
        <p>
          One such method is the Judicial protocol, inspired by the reasoning practices of judges. Rather than
providing a single recommendation, this protocol presents multiple, contrasting diagnostic alternatives,
each supported by persuasive, yet potentially fallible, justifications. These justifications are not
necessarily factual; instead, they are constructed to argue persuasively in favor of one classification over
another. For this reason, we refer to them as perorative explanations. This interaction model aligns
with approaches such as Evaluative AI [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], the Reflection Machine [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], and the concept of Dissenting
Explanations [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. These approaches support a model of human-AI collaboration in which users remain
central, accountable agents. A design shift increasingly recognized in both HCI and XAI communities
as critical for preserving human agency, professional judgment, and long-term competence. These
theoretical perspectives collectively inform the present research project, which operationalizes the
principles of Frictional AI through a novel interaction protocol. The following sections present the
Judicial paradigm and its empirical evaluation in a clinical decision-making context.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Research question/s, Hypothesis and Objectives</title>
      <p>To assess the practical implications of Judicial approach, this study investigates the impact of three
diferent explanation conditions in a diagnostic task: (1) a Traditional protocol, ofering a single
recommendation with explanation; (2) an Alternative Judicial protocol, where a single AI system
provides two contrasting diagnoses with justifications; and (3) an Antagonist Judicial protocol, in
which two separate systems each advocate for a diferent diagnosis with distinct explanatory arguments.
By stratifying the sample of medical students and experienced clinicians based on their level of clinical
expertise, the study will evaluate the impact of the three explanation protocols on users’ diagnostic
accuracy, confidence, sense of agency, sense of responsibility, perceived influence, and perceived utility
of the system. By exploring these variables, the study seeks to contribute to the design of DSS that
better support critical decision-making in clinical practice.</p>
      <p>
        However, before evaluating these dimensions, it is essential to determine whether the cognitive demands
imposed by the Judicial protocol might detrimentally afect diagnostic performance. Indeed, in
highstakes contexts such as healthcare, accuracy remains a priority. For this reason, our exploratory
study [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], detailed in the Preliminary Results section, was designed to first assess whether the Judicial
protocol could preserve or even enhance diagnostic accuracy, rather than impair it due to increased
cognitive load, yielding encouraging preliminary findings.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Research questions</title>
        <sec id="sec-3-1-1">
          <title>We aim to address the following research questions:</title>
          <p>1. RQ1: Are there significant diferences in users’ perceived sense of agency and responsibility
between the Traditional DSS and the Judicial explanation protocols?
2. RQ2: Are there significant diferences in diagnostic accuracy and user confidence between the</p>
          <p>Traditional DSS and the Judicial explanation format?
3. RQ3: Are there significant diferences in diagnostic accuracy and user confidence between the</p>
          <p>Antagonist and Alternative Judicial conditions?
4. RQ4: Are there significant diferences in the perceived influence and perceived utility of the AI
system between the Traditional DSS and the Judicial explanation formats?
5. RQ5: Are there significant diferences in the perceived utility, influence, sense of agency and
sense of responsibility between the Antagonist and Alternative Judicial protocols?
To further explore potential moderating efects, the sample will be stratified by level of clinical experience,
allowing us to assess whether these variables vary as a function of users’ expertise.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Hypotheses</title>
        <p>We formulate the following hypotheses:
• H1: Participants are expected to report a stronger sense of agency and responsibility in response
to Judicial protocol cases compared to Traditional DSS cases. The Judicial formats are specifically
designed to preserve agency by promoting critical thinking and encouraging users to adjudicate
between diagnostic alternatives. In contrast, the Traditional protocol positions the user as a
passive recipient of advice, which may diminish the perceived sense of agency.
• H2: We hypothesize that Traditional protocol cases will be associated with higher diagnostic
accuracy, due to the presentation of a single, high-confidence recommendation that minimizes
ambiguity and cognitive load. However, Judicial protocol cases (Alternative or Antagonist)—by
encouraging deliberation and critical evaluation through contrastive explanations—may foster
higher post-decision confidence, particularly when users reach a conclusion after resolving
conflicting information.
• H3: A significant diference in diagnostic accuracy and user confidence is expected between the
Antagonist and Alternative Judicial protocols. Receiving two opposing recommendations from
distinct systems (Antagonist) may be perceived as more epistemically legitimate, as it mirrors
inter-expert disagreement. In contrast, encountering contradictory outputs from a single system
(Alternative) may elicit skepticism regarding the system’s internal coherence. These diferences
in perceived epistemic authority may influence both user confidence and diagnostic decisions,
depending on how participants interpret the source and implications of the disagreement.
• H4: Perceived influence and utility of the AI system on users’ decisions are expected to difer
significantly between the Traditional and Judicial protocols. The Traditional format presents a
single recommendation with a persuasive explanation, which may lead to a stronger subjective
sense of system influence—users may feel they are merely following the AI’s advice. In contrast,
Judicial protocols do not provide ad advice, potentially diminishing the perception of being
directly “guided” by the system. However, the Judicial formats may be perceived as more useful
by users who value deliberation and autonomy, as they ofer richer informational input and
promote active diagnostic reasoning. Thus, while influence may be lower, perceived utility could
be equal or even higher, particularly among experienced users.
• H5: Perceived utility, influence, agency, and responsibility are expected to difer between the
Alternative and Antagonist Judicial protocols. When conflicting explanations are presented by
a single system (Alternative), users may question the system’s credibility, potentially reducing
perceived utility and trust. In contrast, when divergent recommendations are attributed to distinct
systems (Antagonist), the disagreement may be interpreted as reflective of epistemic complexity
rather than internal inconsistency, potentially enhancing user engagement and the perceived
value of the AI as a decision support tool.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methods</title>
      <p>To test these hypotheses, a between- and within-subjects experimental design will be implemented (see
Figure 1.</p>
      <p>Participants Participants—comprising clinicians and medical students from the University of Milano
Statale—will be randomly assigned to one of two experimental conditions: Alternative Judicial or
Antagonist Judicial. We aim to recruit a minimum of 50 participants, all of whom will participate on
a voluntary basis. Responses to the online survey will be collected anonymously.</p>
      <p>Procedure During the experimental session, each participant will evaluate a total of 10 clinical cases,
presented in a fixed, predefined order. These cases comprise 5 supported by the Traditional DSS, which
provides a single diagnostic recommendation accompanied by an explanatory rationale, and 5 supported
by either the Alternative Judicial or Antagonist Judicial format, depending on the participant’s group
assignment. In the Alternative Judicial group, a single DSS presents two alternative diagnoses, each
with a corresponding explanation. In the Antagonist Judicial group, two distinct DSSs each advocate for
a diferent diagnosis, accompanied by their respective justifications.</p>
      <p>After each AI-assisted decision, participants will be asked to rate their confidence in the decision,
their perception of the case’s complexity, and the utility of the support system This experimental
design enables both within-subject comparisons (Traditional vs. Judicial cases) and between-subject
comparisons (Alternative vs. Antagonist Judicial conditions). All AI recommendations used in the
study are simulated to ensure consistency across participants and to maintain full control over the
diagnostic content. Clinical cases are adapted from the New England Journal of Medicine1 and include
symptomatology, medical history, and laboratory results, together forming realistic complex diagnostic
scenarios. Each case includes multiple diferential diagnoses, with the correct diagnosis explicitly
identified in the clinical discussion sections of the source articles. For the Judicial conditions, the correct
and alternative diagnoses were selected with the assistance of a clinician among the study’s authors,
who identified the most plausible alternative diagnosis for each case based on clinical reasoning. In
all conditions, AI explanations are designed to be persuasive and are grounded in the clinical features
of the respective case. To simulate a realistic yet imperfect decision support system, the AI accuracy
in Traditional cases was fixed at 80%, with incorrect recommendations introduced in a controlled
and systematic manner. Participants interact with the AI via an online interface developed using
LimeSurvey2. For each clinical case, they first view the AI output and then choose between two
diagnostic options, followed by a self-assessed confidence rating on a 4-point ordinal scale. After each
condition (i.e., after completing the 5 Traditional cases and again after the 5 Judicial cases), participants
also evaluate the sense of agency (AGC), responsibility (RESP), and the influence of the AI system
on their decision (INF) (see Figure 1). These measures aim to assess participants’ sense of agency
under each explanation condition. After the experimental session, participants will complete two
standardized psychometric instruments to assess stable individual diferences relevant to
decisionmaking. Specifically, they will complete the Italian version of the Short Big Five Inventory (BFI) to
measure personality traits according to the five-factor model, and the Italian adaptation of the Decision
Styles Scale (DSS), which focuses on the rational decision style and the intuitive decision style. These
measures will enable the exploration of potential interactions between dispositional traits and responses
to diferent AI explanation protocols.</p>
      <p>Sense of Agency: Participants’ sense of agency is assessed through a set of post-decision items measuring
perceived influence, ownership over decisions, and sense of responsibility in each decision-making task.
These items are inspired by constructs from the Sense of Agency literature and were adapted to fit the
clinical decision-making context.</p>
      <p>Accuracy: Diagnostic accuracy is coded dichotomously as correct (1) or incorrect (0), based on the
reference diagnosis reported in the source medical literature.</p>
      <p>Confidence, Utility : These variables are measured using self-reported 4-point ordinal scales, designed
to minimize central tendency bias.
1https://www.nejm.org/
2https://www.limesurvey.org/it</p>
    </sec>
    <sec id="sec-5">
      <title>5. Preliminary Results</title>
      <p>
        We conducted an exploratory user study in a controlled setting [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Sixteen medical professionals (8
spine surgeons and 8 musculoskeletal radiologists) were recruited to participate. The task involved
assessing 18 vertebral X-ray images for the presence of fractures. Participants first examined each image
independently and recorded their diagnosis and confidence. In a second phase, they were exposed to the
Judicial AI protocol, which presented two activation maps ofering contrasting, perorative explanations
supporting either a positive (fracture) or negative (no fracture) classification. After reviewing these
maps, participants could revise their diagnosis and re-rate their confidence. This human-first protocol
ensured that participants’ initial judgments were uninfluenced by AI outputs. The non-inferiority test
showed that overall diagnostic accuracy with Judicial AI support was not inferior to unaided
decisionmaking (Z = 3.94, p &lt; .001), and even improved significantly among experienced clinicians (Glass’s Delta
= 0.99, 95% CI [0.50, 1.47], p = .045). Regarding confidence, the Wilcoxon signed-rank test confirmed
non-inferiority overall (p &lt; .001), and confidence gains were more pronounced in complex cases (Clif’s
Delta = 0.296, p = .034). Less experienced users exhibited a modest improvement in confidence but no
significant accuracy benefit. The perceived utility of Judicial support was rated positively by 57% of
participants, significantly above chance (Binomial test, p = .015), with no significant diferences across
expertise levels (Mann-Whitney, p = .32). These preliminary findings support the feasibility of the
Judicial AI in clinical diagnostic tasks. The protocol maintained or improved diagnostic performance
overall, with particularly strong efects for experienced clinicians and in more complex cases, contexts
where interpretive support is arguably most needed. The system also enhanced users’ confidence,
suggesting its value as a reflective support tool. The modest impact on less experienced users may be
due to the increased cognitive load associated with interpreting two explanations. This suggests the
need for adaptive support tailored on users’ interaction needs.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Next step /Future Works</title>
      <p>The next phase of this research involves the implementation of the experimental protocol described
above. This study will assess whether contrastive explanation formats, as instantiated in the Judicial
paradigm, can enhance not only diagnostic accuracy but also users’ sense of agency and responsibility.
A central objective is to determine whether the deliberative framing promoted by Judicial protocols
meaningfully reinforces users’ experiential ownership of the decision-making process. In addition, the
study will examine whether the efects of diferent explanation protocols are moderated by clinical
experience, as suggested by preliminary findings. This will help determine whether more experienced
clinicians are better equipped to engage with contrastive information, while less experienced users may
benefit from more structured or guided support. Such insights are critical for designing adaptive decision
support systems that tailor explanatory strategies to users’ expertise, cognitive style, and interaction
needs. A further line of research concerns the role of framing within the Judicial paradigm itself. By
comparing the Antagonist and Alternative versions, where conflicting explanations are attributed to two
distinct systems or to a single source, respectively, we aim to clarify whether the origin of disagreement
influences perceptions of trust, responsibility, and reliance. If framing efects are substantial, future
research will need to investigate how users interpret epistemic authority under diferent configurations,
and how these interpretations shape collaborative dynamics. Taken together, these research directions
contribute to a broader goal: the development of human–AI interaction protocols that move beyond
accuracy and transparency to actively support accountable, and user-centered collaboration.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The author has not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>See</surname>
          </string-name>
          ,
          <article-title>Trust in automation: Designing for appropriate reliance</article-title>
          ,
          <source>Human factors 46</source>
          (
          <year>2004</year>
          )
          <fpage>50</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <article-title>What is the sense of agency and why does it matter?</article-title>
          ,
          <source>Frontiers in psychology 7</source>
          (
          <year>2016</year>
          )
          <fpage>1272</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y. S. J.</given-names>
            <surname>Aquino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Rogers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Braunack-Mayer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Frazer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. T.</given-names>
            <surname>Win</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Houssami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Degeling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Semsarian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Carter</surname>
          </string-name>
          ,
          <article-title>Utopia versus dystopia: professional perspectives on the impact of healthcare artificial intelligence on clinical roles and skills</article-title>
          ,
          <source>International Journal of Medical Informatics</source>
          <volume>169</volume>
          (
          <year>2023</year>
          )
          <fpage>104903</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Longo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brcic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Confalonieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          , et al.,
          <source>Explainable artificial intelligence (xai) 2</source>
          .
          <article-title>0: A manifesto of open challenges and interdisciplinary research directions</article-title>
          , Information
          <string-name>
            <surname>Fusion</surname>
          </string-name>
          (
          <year>2024</year>
          )
          <fpage>102301</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Campagner</surname>
          </string-name>
          , G. Malgieri,
          <string-name>
            <given-names>C.</given-names>
            <surname>Natali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schneeberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stoeger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          , Quod erat demonstrandum?
          <article-title>-towards a typology of the concept of explanation for the design of explainable ai</article-title>
          ,
          <source>Expert systems with Applications</source>
          <volume>213</volume>
          (
          <year>2023</year>
          )
          <fpage>118888</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Nushi</surname>
          </string-name>
          , E. Kamar,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <article-title>Does the whole exceed its parts? the efect of ai explanations on complementary team performance</article-title>
          ,
          <source>in: Proceedings of the 2021 CHI conference on human factors in computing systems</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fregosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Campagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Natali</surname>
          </string-name>
          ,
          <article-title>Explanations considered harmful: The impact of misleading explanations on accuracy in hybrid human-ai decision making</article-title>
          ,
          <source>in: World Conference on Explainable Artificial Intelligence</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>255</fpage>
          -
          <lpage>269</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fregosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vicente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Campagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <article-title>Too sure for our own good: A user study on ai confidence and human reliance, Submitted to the 41st</article-title>
          <source>Conference on Uncertainty in Artificial Intelligence (UAI)</source>
          ,
          <year>2025</year>
          .
          <article-title>Manuscript under review</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Explainable ai is dead, long live explainable ai! hypothesis-driven decision support using evaluative ai</article-title>
          ,
          <source>in: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>333</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Haselager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schrafenberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lanillos</surname>
          </string-name>
          , S. Van De Groes,
          <string-name>
            <surname>M. Van Hoof</surname>
          </string-name>
          ,
          <article-title>Reflection machines: Supporting efective human oversight over medical decision support systems</article-title>
          ,
          <source>Cambridge Quarterly of Healthcare Ethics</source>
          <volume>33</volume>
          (
          <year>2024</year>
          )
          <fpage>380</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Reingold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Talati</surname>
          </string-name>
          ,
          <article-title>Dissenting explanations: Leveraging disagreement to reduce model overreliance</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>38</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>21537</fpage>
          -
          <lpage>21544</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Campagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fregosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <article-title>Five degrees of separation: Investigating the unexpected potential of displaced human-ai collaboration protocols for apter ai support, 2025. Submitted to the 28th CSCW</article-title>
          .
          <article-title>Manuscript under review</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Steinfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rosé</surname>
          </string-name>
          , J. Zimmerman,
          <article-title>Re-examining whether, why, and how human-ai interaction is uniquely dificult to design</article-title>
          , in
          <source>: Proceedings of the 2020 chi conference on human factors in computing systems</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Legaspi</surname>
          </string-name>
          , W. Xu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Konishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kobayashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Naruse</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Ishikawa,</surname>
          </string-name>
          <article-title>The sense of agency in human-ai interactions</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>286</volume>
          (
          <year>2024</year>
          )
          <fpage>111298</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sureau</surname>
          </string-name>
          ,
          <article-title>Medical deresponsibilization</article-title>
          ,
          <source>Journal of assisted reproduction and genetics 12</source>
          (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Buçinca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Malaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Z.</given-names>
            <surname>Gajos</surname>
          </string-name>
          ,
          <article-title>To trust or to think: cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>5</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <article-title>Exploring a behavioral model of “positive friction” in human-ai interaction</article-title>
          , in: International Conference on Human-Computer Interaction, Springer,
          <year>2024</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Famiglini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fregosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pe</surname>
          </string-name>
          , E. Parimbelli,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>La Maida</surname>
          </string-name>
          , E. Gallazzi,
          <article-title>From oracular to judicial: Enhancing clinical decision making through contrasting explanations and a novel interaction protocol</article-title>
          ,
          <source>in: Proceedings of the 30th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>745</fpage>
          -
          <lpage>754</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>