<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Agentic CBR in Action: Empowering Loan Approvals Through Interactive, Counterfactual Explanations⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedram Salimi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nirmalie Wiratunga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Corsar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Robert Gordon University</institution>
          ,
          <addr-line>Garthdee House, Garthdee Rd, Aberdeen AB10 7AQ</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Large Language Models (LLMs) have demonstrated impressive conversational capabilities, yet their susceptibility to hallucinations and inconsistent recommendations poses significant risks in high-stakes domains such as finance. This paper presents an interactive chatbot for loan application guidance that leverages a case-based reasoning (CBR) approach to generate actionable counterfactual explanations within an agentic framework. Our system employs a supervisor agent, built using the LangGraph framework, to orchestrate four specialised agents: a classifier agent that provides an initial loan prediction, a causally-aware counterfactual explanation agent that proposes minimal yet feasible modifications to reverse an unfavourable decision, a Feature Actionability Taxonomy (FAT) agent that updates user-specific immutability constraints based on feedback, and a template-based natural language generation (NLG) agent that transforms counterfactual suggestions into clear, user-friendly explanations. A key strength of our design is the automated feedback loop: when users indicate that certain suggestions are unworkable, the FAT agent revises the constraints and instructs the counterfactual generation agent to produce a refined explanation. We detail the system architecture and workflow and outline an experimental plan that compares our full agentic chatbot to ablated variants and a LLM-Only Baseline. And finally we outline a planned user study to evaluate how controlled reasoning afects trust in high-stakes lending.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Conversational AI</kwd>
        <kwd>Counterfactual explanations</kwd>
        <kwd>Agentic workflow</kwd>
        <kwd>CBR</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Hallucinations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Large language models (LLMs) such as GPT-4 have demonstrated remarkable conversational abilities,
making them attractive for use as automated assistants in decision-making domains. However, in
highstakes applications like financial loan assessments, the unreliability of LLM outputs, especially their
tendency to generate hallucinations, i.e., plausible-sounding but incorrect or unfounded information,
poses a significant risk [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Users seeking loan advice or explanations for an AI-driven loan decision
require accurate and trustworthy information; any incorrect guidance could lead to poor financial
decisions or loss of trust [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This creates a need for techniques to ground LLM responses in verifiable
logic and data.
      </p>
      <p>
        Explainable AI (XAI) is crucial in lending and other domains to justify algorithmic decisions and
provide users with recourse options [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Among XAI methods, counterfactual explanations have gained
prominence: they answer “What if ?” scenarios by describing how a small change in the input features
could alter the decision outcome. For example, an applicant might be told, “If your annual income were
$5,000 higher, then your loan would be approved.” Such explanations not only reveal key factors behind a
decision but also inform the user of actionable steps to potentially achieve a desired outcome in the
future [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Despite their usefulness, counterfactual explanations in practice face challenges regarding feasibility
and personalisation [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. Many methods assume any input feature can be freely changed, which is
not true for real users, as some features (like age or credit history) cannot be altered, and others can
be changed only indirectly or with great efort. Recent work has begun to formalise these constraints
via feature actionability taxonomy [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and to incorporate causal relationships to avoid suggesting
implausible changes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Another challenge is how to present and interact with such explanations: a
static list of numerical feature changes can be overwhelming or confusing to users without additional
context or an opportunity to ask questions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In other words, an explanation interface should ideally
guide the user through understanding and possibly acting on the recommendations, in a sensitive and
interactive manner.
      </p>
      <p>In this paper, we address these challenges by combining case-based reasoning (CBR) with LLM
capabilities to create a conversational agent for loan application guidance. CBR, an approach where
reasoning is grounded in specific instances or “cases” naturally complements LLMs by providing concrete
examples or analogies that the generative model can use to formulate explanations. By retrieving and
adapting similar past cases (or generating counterfactual cases), a CBR component can supply factual
anchors that reduce the risk of hallucination from the LLM. Meanwhile, the LLM enables a flexible
dialogue with the user, clarifying their context and preferences and presenting explanations in fluent
natural language.</p>
      <p>We present a novel interactive conversational AI system that implements this CBR-LLM synergy in the
context of loan application decisions. In this synergy, CBR anchors explanations in concrete cases while
the LLM contextualises those cases into conversational, user-specific advice which using the strengths
of both symbolic retrieval and generative language. Our system is built with an agentic workflow
using a framework called LangGraph, 1 which allows the LLM to act as a controller orchestrating
multiple specialised modules. The supervisor loan agent integrates four key agents: (1) an agent with
trained loan approval classifier that provides an initial prediction, (2) a causally-aware instance-based
counterfactual explanation discovery agent that finds how the unsuccessful input case could change to
yield a positive outcome, (3) a Feature Actionability Taxonomy (FAT) agent that encodes user-specific
immutability and ethical constraints on feature changes, and (4) a template-based natural language
generation (NLG) agent that guide the LLM to convert explanations in a proper format considering
fairness and actionability then putting it into user-friendly dialogue responses. By decomposing the task
among these modules, we use the strengths of each (e.g., reliable numeric computation and causal logic
in the structured modules, and language expressiveness in the LLM) while mitigating the weaknesses
of an unconstrained LLM.</p>
      <p>In summary, our contributions are as follows:
• We introduce a novel interactive CBR-enhanced LLM agentic framework that ensures
modularity: the supervisor agent interacts with dedicated agents for prediction, explanation, updating
user constraints and NLG, improving the handling of numerical computations, counterfactual
generation, and ethical constraints compared to a monolithic LLM approach.
• We integrate a Feature Actionability Taxonomy into the explanation process, enabling the system
to respect user-specific immutable features as well as generating personalised feasible, respectful
suggestions.
• We demonstrate how an LLM can serve as a dialogue orchestrator to parsing user intent, invoking
specialised tools via LangGraph, and tailoring the final wording, thereby contributing reasoning,
tool-use, and adaptive NLG beyond simple template filling
• We outline an evaluation strategy comparing our full system against a LLM-Only Baseline and
an ablated version, to quantify the benefits in terms of explanation quality, user trust, and the
reduction of hallucinated or infeasible recommendations.</p>
      <p>To our knowledge, this is one of the first works to systematically combine case-based reasoning
with large language models in an agentic setting, aiming to enhance explainability and reliability in
high-stakes decision support. Next, we discuss related literature before detailing the methodology of
our system, experimental setup, followed by our planned user study, and conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Conversational XAI and Interactive CBR. Researchers have increasingly recognised the need for
making AI explanations more conversational and interactive. Traditional explanation interfaces (e.g.,
static texts or visualisations) do not allow users to seek clarification or explore “what-if” scenarios in
depth. Conversational XAI attempts to fill this gap by using dialogue to deliver and refine explanations.
For instance, Wijekoon et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] propose a CBR-driven interactive XAI approach where users can
iteratively query an explanation system and receive case-based clarifications. Their work indicates
that users may benefit from explanations that reference similar prior cases or counterfactual examples
in a dialogue, which can enhance understanding. Our chatbot follows this paradigm, enabling
backand-forth interaction about a model’s decision and potential changes. In contrast to purely scripted
dialogues, however, our approach leverages an LLM for flexible natural conversation management,
guided by a structured workflow to keep the dialogue grounded.
      </p>
      <p>
        Counterfactual explanations and actionable recourse. Counterfactual explanations have become
a cornerstone of interpretable machine learning [
        <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
        ]. They provide individuals subject to an automated
decision with a description of how things could be diferent to obtain a desirable outcome. Numerous
algorithms exist to generate counterfactuals. Some optimise feature perturbations to achieve minimal
changes while flipping the model’s prediction [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]; others search within a database of past instances
for a nearest neighbor with a diferent outcome [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. Ensuring these counterfactuals are not only
technically valid but also actionable and realistic has been a focus of recent work. Ustun et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
introduced the notion of actionable recourse, emphasising that recommendations should correspond to
feasible interventions a user can actually perform. Poyiadzi et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] similarly proposed FACE, which
ifnds feasible counterfactuals that lie on a manifold of plausible data. The idea of incorporating causal
constraints is explored by Mahajan et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], who argue that counterfactuals should not violate known
causal relationships in the domain (e.g., one should not suggest altering a feature in a way that is
causally impossible). Our system builds on these principles by generating counterfactual explanations
that are informed by real instances and by filtering or adjusting them according to feasibility constraints
(drawing on ideas from [
        <xref ref-type="bibr" rid="ref5 ref6 ref8">5, 6, 8</xref>
        ]). Importantly, we embed this capability within an interactive dialogue,
whereas most prior methods assume a single-shot explanation delivery.
      </p>
      <p>
        LLM augmentation and hallucination mitigation. The advent of powerful LLMs has led to
explorations of how they can be integrated with existing AI models or knowledge sources to improve
reliability. A key concern is the phenomenon of hallucination in generative models, where the model
outputs incorrect information with high confidence [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. One line of work to mitigate this is
retrievalaugmented generation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], in which the language model is provided with pertinent documents or data
retrieved from a knowledge base, ensuring it has factual grounding for its responses. Another line is
enabling LLMs to use external tools or calculators for tasks outside their core language abilities [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ].
For example, the ReAct framework [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and Toolformer [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] show that by interleaving reasoning
steps with tool invocations, an LLM can solve problems more accurately and avoid making up facts
that could be computed. Similarly, Shen et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] present HuggingGPT, where an LLM orchestrates
various AI models (for vision, speech, etc.) to tackle complex multi-modal tasks. These approaches
inspire our design: we use the LLM not as an isolated decision-maker, but as a coordinator that queries
a dedicated classifier for the actual loan decision and a reasoning module for valid counterfactuals,
thereby anchoring the conversation in truthful, model-verified information. In essence, our approach
can be seen as applying the tool-augmented LLM paradigm specifically for XAI: the LLM “tool” here
is a case-based reasoner that supplies concrete instance-based counterfactual explanations to ensure
ifdelity and robustness in the dialogue.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Agentic System Workflow with LangGraph</title>
        <p>For our interactive chatbot, we developed a workflow using the LangGraph framework, which allows us
to explicitly define the sequence and branching of actions the agent (driven by an LLM) should perform.
The architecture is illustrated in Figure 1. At a high level, the system involves a supervisor agent (which
is powered by an LLM) interacting with external modules through a structured graph of operations.
This design ensures that the supervisor agent consults the right agent or tool at the right time, rather
than relying on the LLM to do everything in one prompt.</p>
        <p>At the heart of this system lies a large language model (LLM) supervisor, responsible for more than
just template polishing. It acts as (i) a conversational router that understands free-form user utterances
and maps them onto the appropriate branch of the LangGraph, (ii) an orchestrator that conditionally
calls external tools (classifier, counterfactual module, FAT checker, template NLG) and stitches their
outputs together, and (iii) an adaptive explainer that revises template text to match the user’s tone,
education level, and follow-up questions while strictly preserving the factual content returned by the
downstream agents. These three capabilities are essential: without them the workflow could neither
decide which agent to invoke next nor maintain a coherent, context-aware dialogue.</p>
        <p>
          The interaction begins with the user either providing their loan application details or asking a
question about their application status. The supervisor agent orchestrates the following steps:
1. Classification: The agent first calls a classifier Agent, passing in the user’s application features
(e.g., income, loan amount, credit score, etc.). This agent has a classifier as a tool (a neural network
with 4 linear layers) that has been trained on German Credit dataset contains 32,581 instances
with 11 features (9 mutable) to predict approval or rejection of the loan status, 2. It returns a
prediction (approved/rejected).
2. Counterfactual Generation: If the loan is predicted to be rejected (or if the user asks “How
can I get approved?” or “How can I improve my loan application?”), then the supervisor agent
triggers the counterfactual explanation agent. This agent uses an instance-based approach to
identify one or more plausible modifications to the user’s input that would result in an approval
(see algorithm 1). In our case-based approach, the module searches a database of past successful
applications for a case similar to the user’s to find the nearest decision boundary crossing [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
The output is a set of candidate counterfactual changes (e.g., increase income by $5,000; reduce
loan amount by $10,000).
3. Actionability Check and Constraints Update: The agent then passes the proposed
counterfactual changes through the Feature Actionability Taxonomy Agent (FAT Agent). Each suggested
feature change is assessed for feasibility: this agent categorises the feature as mutable or
immutable (and if immutable, whether it’s sensitive). Sensitive features are protected attributes
such as age, gender, or ethnicity which must not influence credit -worthiness in order to have
equal opportunity for everyone. Based on this, the module may filter out or annotate certain
suggestions. For example, if a counterfactual generator naively suggested “be 5 years older”,
the FAT Agent would label age as immutable (and likely sensitive) and prevent this suggestion
from being presented as an action for the user. For changes that are actionable but indirect (e.g.,
increase credit score, which typically requires other actions), FAT can mark them to be phrased
diferently. Another important task of FAT agent is updating user constraints enabling more
personalised suggestion. This aligns with preference-based case-based reasoning which explicitly
models user constraints during retrieval [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
4. Explanation Synthesis (NLG): Finally, the supervisor agent invokes the NLG template agent.
        </p>
        <p>This agent takes the refined set of counterfactual recommendations and constructs a
conversational explanation. It uses predefined sentence templates that incorporate the user’s data, the
model’s decision, and the suggested changes. The template choice and phrasing are informed
by the FAT categories for each feature to ensure the explanation is polite, understandable, and
appropriately hedged (especially for sensitive factors). The LLM may also be used at this stage in
a lfil-in-the-blank manner to smooth out the text or adjust it to the user’s tone (e.g., formal vs.
informal), without altering the factual content.
5. User Interaction: The compiled explanation is presented to the user by supervisor agent. The
conversation can continue: the user might ask follow-up questions (e.g., “Why is my credit score
considered low?” or “I can’t change X, what else can I do?”). The supervisor agent handles these
by invoking diferent branches of the agentic workflow as needed. For instance, if the user asks
“I cannot decrease my loan amount by $5,000”, then FAT agent will update Feature Actionability
Taxonomy for that feature in the user’s profile and loop back to the counterfactual generation
agent to provide another solution tailored to user’s specific situation.</p>
        <p>
          Algorithm 1 Iterative Adaptation for Counterfactual Generation (adapted from [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ])
Input: Query instance , nearest unlike neighbour , constraint set 
Output: Counterfactual instance ′ or FAIL if none found
′ ←
        </p>
        <p>foreach feature  in priority order do
if  ∈/  and ′ ̸=  then
′ ←  if Classifier (′) = approved then</p>
        <p>return ′
return FAIL</p>
        <p>This agentic loop continues until the user’s queries are resolved. By explicitly encoding the workflow
graph, we ensure that at each turn the supervisor agent knows which agent outputs to incorporate,
maintaining consistency and factuality across the dialogue. One benefit of this modularisation is that
the LLM never has to internally compute or assume the outcome of changes so that it always queries
the actual model, thereby eliminating a potential source of error or hallucination. Similarly, knowledge
about what is changeable is codified in FAT rather than left to the LLM’s general knowledge (which
might be incomplete or biased regarding what a user can or should change).</p>
        <p>Our use of LangGraph diferentiates from simpler pipeline approaches by allowing conditional
branching and iterative flows. For example, based on whether the model outcome is positive or negative,
diferent branches in the graph are followed; based on whether a user-proposed change is feasible or not,
the agent can decide how to respond (perhaps by invoking the FAT logic again or by politely explaining
limitations). The graph structure thus ofers flexibility in the conversation flow beyond a fixed sequence,
which is crucial for interactive dialogue. This modular architecture means improvements or updates to
one component (say, a better classifier or a more nuanced FAT categorisation) can be integrated without
retraining the entire system.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Communication between Agents</title>
        <p>In our agentic framework, every arrow in the figure 1 represents a directed communication act which is
an intentional message exchange that drives the system through its reasoning, acting, and interacting
phases. Each specialised agent, including the supervisor, is configured with its own system prompt.
These prompts guide the agents on how to interact with one another in a cost-eficient and
purposedriven manner. For instance, when the supervisor agent receives a user’s request, it issues a directive to
the classifier agent. The classifier, informed by its system prompt, responds with a brief result (e.g.,
“accepted” or “rejected”) to eficiently update the supervisor’s internal belief state.</p>
        <p>
          This structured communication mirrors the principles of speech act theory, where the expressive act
(such as a request or assertion) not only transmits information but also performs an action. Here, the
supervisor’s directive (a “request” act) and the classifier’s corresponding response (an “inform” act) are
essential steps in the internal virtuous cycle [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. This cycle is designed to integrate reasoning (through
chain-of-thought [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]), acting (by executing targeted decisions), and interacting (via feedback loops
that enable self-reflection and data augmentation), all while reducing computational cost and ensuring
clarity in each module’s role.
        </p>
        <p>Together, these communication acts are guided by predefined system prompts to ensure that the
agentic LLM operates as a cohesive, eficient, and adaptive system.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Causally-Aware Counterfactual Explanation Agent</title>
        <p>
          The counterfactual explanation agent addresses the question “What minimal changes would flip the
decision to a positive outcome?” by integrating instance-based search with causal reasoning. In a
highlevel process, the agent first performs a nearest unlike neighbor (NUN) search in the training dataset.
This search uses a weighted feature distance where the weights are computed from two components: (1)
Individual Treatment Efect (ITE) values that obtained via DECI which is a causal discovery framework
[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] on a Directed Acyclic Graph (DAG) that encapsulates the global causal relationships, and (2) feature
actionability weights from a Feature Actionability Taxonomy (FAT) that categorises features based on
their mutability. The resulting hybrid weights ensure that features with strong causal influence and
practical potential for change are prioritised.
        </p>
        <p>After selecting the nearest unlike neighbour, the agent generates a counterfactual explanation
by iteratively adapting the query instance. For each prioritised feature, the agent updates its value
to match that of the selected instance; importantly, any modification to a feature also triggers a
recursive adaptation of its causally dependent (child) features as defined by the DAG. This parent/child
adaptation ensures that the proposed changes maintain the inherent causal structure, thereby generating
counterfactuals that are both causally efective and practically actionable.</p>
        <p>Once a valid counterfactual is obtained (i.e., one that reverses the model’s decision to the desired
outcome) the counterfactual explanation agent communicates this result back to the supervisor agent,
completing its role within the overall agentic workflow.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Feature Actionability Taxonomy (FAT) Agent Integration</title>
        <p>The FAT agent is responsible for ensuring that counterfactual explanations are personalised and
practically actionable by updating the user’s feature constraints based on their feedback. Conceptually,
the FAT agent analyses counterfactual suggestions and identifies if any proposed changes involve
features that the user deems immutable (e.g. the loan Amount, normally adjustable by applicants,
becomes immutable if the user specifically needs exactly $5,000 to cover a planned expense). When
such features are detected, the FAT agent revises its internal FAT profile to reflect these constraints.</p>
        <p>Once the FAT agent updates this information, it first communicates the revised FAT details back to
the supervisor agent. The supervisor agent, acting as the central coordinator, then relays the updated
FAT guidelines to the counterfactual generation agent, instructing it to generate a new counterfactual
explanation that respects the user’s specified immutability. This dynamic exchange among the FAT
agent, supervisor agent, and counterfactual generation agent ensures that the final counterfactual
recommendations are both causally informed and tailored to the user’s real-world capabilities. If, under
these constraints, no valid counterfactual can be found, the system will explicitly inform the user:
“Given the constraints, no feasible changes can overturn the decision.”</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Template-Based Natural Language Generation (NLG) Agent</title>
        <p>
          In our agentic framework the LLM co-authors the final explanation; it fills the slots of a safe template
and then dynamically re-voices the text (e.g., adjusting formality, adding clarifications requested in
the previous turn), something a rigid template engine alone cannot do [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Template conditioning
constrains the LLM’s free-form generation which significantly reduces factual drift; this mirrors the
ifndings of Upadhyay et al. [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], who reported higher factual accuracy when using templates for obituary
generation. The process unfolds in several coordinated steps: First, the counterfactual generation agent
generates a counterfactual suggestion, which it communicates to the supervisor agent. The supervisor
then forwards this counterfactual to the NLG agent. The NLG agent, governed by its system prompt and
provided tool, fills in predefined templates, such as an introductory sentence, actionable recommendation
templates for mutable or indirectly mutable features, templates that neutrally acknowledge sensitive
and non-sensitive immutable factors, and a concluding sentence to construct a clear and consistent
natural language explanation.
        </p>
        <p>For example, a final output might look like:</p>
        <p>Your loan application was declined due to several factors. To improve your chances:
Take steps to increase your monthly income to at least $5,000 (currently $4,200).</p>
        <p>Reduce your requested loan amount closer to $10,000 (currently $15,000).</p>
        <p>Unfortunately, you cannot change your age (25), and younger applicants often have shorter credit
histories.</p>
        <p>These changes would address the risk factors identified by the model. I hope this helps, good luck
with your next application!
After generating its explanation, the NLG agent returns its output to the supervisor agent. The supervisor
agent then refines the explanation as needed, enhancing its naturalness and realism, before presenting
the final, user-tailored message. This chain of communication ensures that every counterfactual
recommendation is accurately translated into clear, actionable, and contextually appropriate advice.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Summary of Agent Interactions</title>
        <p>The proposed agentic workflow forms a cohesive, feedback-driven system in which each specialised
agent contributes to delivering personalised and actionable explanations. Initially, the classifier agent
predicts the loan decision. If the loan is rejected or if a user requests guidance, the counterfactual
explanation agent identifies the nearest unlike neighbour and adapts its features to generate a counterfactual
explanation for the query.</p>
        <p>If a participant indicates that a suggested change is unworkable (e.g., stating “I cannot change
X”), the FAT agent is activated. In response, the FAT agent updates its internal profile to mark the
specified feature as immutable and promptly communicates these updated constraints to the supervisor
agent. The supervisor agent then relays the revised FAT information to the counterfactual explanation
agent, which regenerates a personalised counterfactual that adheres to the updated constraints. This
revised explanation is forwarded to the template-based NLG agent, which transforms it into a clear
and structured natural language message. Finally, after further refinement by the supervisor agent to
ensure naturalness and realism, the final explanation is presented to the user.</p>
        <p>This iterative process, whereby the system continuously refines its counterfactual based on user
feedback via the FAT agent, is central to ensuring that the provided explanations are both causally
consistent and tailored to the user’s real-world constraints. In the next section, we describe the
experimental setup and planned user study that will evaluate the impact of our integrated agentic
framework on explanation quality and user trust.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental setup</title>
      <p>In order to assess the efectiveness of the proposed system, we outline experiments focusing on two
aspects: (1) the quality and feasibility of the explanations, and (2) the impact on user understanding
and trust.</p>
      <p>We compare three versions of the chatbot:
• Full Agentic System: the complete system as described, using the classifier, counterfactual
reasoning, FAT, and template NLG (the CBR-LLM agent).
• No-FAT Variant: an ablation where the FAT module is disabled. The agent still generates
counterfactuals, but it cannot update user’s specific constraints to provide a better counterfactual
tailored to the user.
• LLM-Only Baseline: a baseline using a state-of-the-art LLM (e.g., GPT-4) with only classifier
tool. This baseline is prompted to perform the entire task on its own. For example, we provide it
with the user’s input and ask it to give an explanation or advice. The prompt can be engineered
with few-shot examples to encourage counterfactual-style outputs. However, the LLM will not
have access to the FAT agent or counterfactual agent to generate responses and it relies only on
its general knowledge and reasoning.</p>
      <sec id="sec-4-1">
        <title>For the LLM we use GPT-4o-mini for all three versions.</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. User Study Evaluation</title>
      <p>In order to assess the efectiveness of our proposed system, we will conduct a user study focusing on
two main aspects: the quality and feasibility of the counterfactual explanations, and the overall impact
on user understanding and trust. In this future study, participants will engage in an interactive chat
environment implemented using Streamlit, 3, where the supervisor agent coordinates among internal
agents, namely, the classifier agent, the counterfactual generation agent, the FAT agent, and the natural
language generation (NLG) agent.</p>
      <p>The central focus of our evaluation will be the performance of the automated feedback loop.
Specifically, the FAT agent will update the user’s constraints based on their feedback regarding unworkable
suggestions, and these updated constraints will prompt the counterfactual generation agent to generate
refined and more actionable counterfactual explanations. This iterative process, which continually
updates the explanation based on user input, is expected to significantly enhance the personalisation
and practical utility of the system’s advice.</p>
      <p>
        Following this interactive phase, participants will complete a detailed survey measuring dimensions
such as overall satisfaction, clarity, helpfulness, fairness, and trustworthiness of the provided
explanations. This comprehensive evaluation approach, informed by recent best practices in user-centred XAI
research [
        <xref ref-type="bibr" rid="ref26 ref27 ref28">26, 27, 28</xref>
        ], will allow us to rigorously assess the impact of our integrated agentic framework
on user experience.
3https://streamlit.io: an open-source Python framework for rapidly building data-driven web applications with minimal
boilerplate.
      </p>
      <sec id="sec-5-1">
        <title>5.1. Participant Recruitment and Screening</title>
        <p>Participants will be recruited from local university networks and professional online communities. We
will use a screening process to select individuals who can realistically assume the role of a loan applicant
and have an elementary understanding of financial decision-making. The required sample size will be
determined using G*Power analysis, targeting a power of 0.80, an alpha level of 0.05, and an anticipated
medium efect size (Cohen’s d = 0.5), which suggests a minimum of 34 participants. To account for
potential dropouts and to enhance the robustness of our findings, approximately 40 participants will be
recruited. All participants will be provided with an information sheet and will give informed consent
prior to their involvement.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Study Design and Procedure</title>
        <sec id="sec-5-2-1">
          <title>Participants are presented with the following scenario:</title>
          <p>• Scenario: You are John Doe, a 36-year-old applicant whose loan request for purchasing a new
car has been rejected. Your task is to interact with the intelligent assistant and explore ways to
improve your loan application.</p>
          <p>In this study, participants interact with each of the three system variants described earlier for up to
ifve turns (one user utterance plus one system reply constitutes a turn). For each of the three chatbot
variants, we collect the complete five-turn dialogue, five per-variant Likert-scale ratings, and optional
free-form feedback, and analyse these data jointly to assess how specific dialogue features (e.g., invoked
modules) influence user evaluations. We adopt a within-subject Latin-square counter-balancing: the
order of the three variants (3! = 6 possible sequences) is randomly assigned, and the same decision
scenario is reused across variants to keep task dificulty constant. The examples of these chat interactions
and UI screenshots will be provided in Appendix I.</p>
          <p>During these interactions, the supervisor agent orchestrates responses to both general queries and
specific requests for guidance on improving a loan application. For general questions, the assistant
provides prompt, context-aware answers; however, when a participant seeks concrete solutions or
strategies to improve their rejected loan application, the counterfactual generation agent is activated.
This agent first evaluates the current scenario (i.e., a loan rejection) and generates a counterfactual
suggestion that details minimal modifications to the input features.</p>
          <p>This generated counterfactual is then communicated to the supervisor agent, which passes it on
to the template-based NLG agent. The NLG agent transforms the counterfactual suggestion into a
clear and structured explanation using predefined templates. If the resulting explanation includes
any suggestions that the participant finds immutable, for example, due to personal constraints, the
FAT agent is triggered. The FAT agent updates the user’s feature constraints (e.g., marking features
as non-negotiable) and communicates these updated FAT details back to the supervisor agent. The
supervisor agent subsequently instructs the counterfactual generation agent to generate a revised
counterfactual explanation that adheres to the updated constraints.</p>
          <p>After further refinement by the supervisor to enhance naturalness and realism, the final, personalised
explanation is returned to the participant. This coordinated multi-agent interaction ensures that the
counterfactual explanations are not only feasible and actionable but also tailored to the user’s
realworld circumstances. To safeguard data quality, we embed an instructed-response attention check
(“Select Disagree”), a recall question on the chatbot’s last suggestion, and a 5 minutes interaction-time
requirement per variant; any participant failing two of these criteria is removed a priori.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Evaluation Survey and Measures</title>
        <p>After the interaction phase, they will complete a detailed survey designed to capture their perceptions
of the system’s explanations.</p>
        <p>
          The survey will include Likert-scale items measuring:
• Overall Satisfaction: How satisfied the participant is with the overall assistance provided by
the chatbot.
• Clarity: The degree to which the counterfactual explanations are clear and understandable.
• Helpfulness: The extent to which the suggestions aid the participant in understanding what
changes could improve their loan application.
• Fairness: The perceived fairness and ethical appropriateness of the recommendations.
• Trustworthiness: The degree to which the participant trusts the assistant’s advice [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ].
        </p>
        <p>For example, an item measuring clarity might ask, “How clear and understandable were the
counterfactual explanations provided by the assistant?” with response options ranging from 1 (not clear at all)
to 5 (extremely clear). Similar items will be framed for the other dimensions.</p>
        <p>In addition to these Likert-scale measures, the survey will include open-ended questions where
participants outline the specific steps they would take based on the suggestions they received. This qualitative
feedback will ofer further insight into the actionability and practical utility of the explanations.</p>
        <p>Because each participant uses all three variants, we will compare their mean Likert ratings with
paired -tests for the three variant pairs (full agentic vs. no-FAT, full agentic vs. LLM-only, and no-FAT
vs. LLM-only). We will first check the normality of the rating diferences with the Shapiro-Wilk test; if
this assumption is violated, we will use the Wilcoxon signed-rank test instead. The GPower sample-size
estimate (medium efect  = 0.5,  = 0.05, power = 0.80) was calculated for this paired-test design.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>We introduced an agentic framework for loan-application guidance that combines case-based reasoning
with large language models. By orchestrating specialised agents including a classifier, a causally aware
counterfactual generator, a Feature Actionability Taxonomy (FAT) checker, and a template-based NLG
module. The system is designed to generate explanations that respect the user’s real-world constraints
and causal dependencies.</p>
      <p>Although empirical validation is still to come, the modular architecture is intended to reduce common
LLM issues such as hallucination and to support a feedback loop that should improve clarity and trust.
These hypotheses will be tested in a forthcoming user study.</p>
      <p>We posit that this agentic-CBR approach can provide personalised, actionable recourse in high-stakes
ifnance scenarios and, once validated, could generalise to other domains where trustworthy decision
support is required.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During preparation of this work, the authors used ChatGPT for the purpose of: grammar and spelling
check, paraphrase and reword. After using this tool, the authors reviewed and edited the content and
take full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Supplementary Material</title>
      <sec id="sec-8-1">
        <title>The code, data and the user study materials can be find here: https://github.com/pedramsalimi/CBRAgent</title>
        <sec id="sec-8-1-1">
          <title>A.1. Study Instructions and Interactions</title>
          <p>Figures 2 and 3 present two screenshots of the instructions that will be provided to participants at the
beginning of the user study. These images depict the comprehensive guidelines for the study scenario,
including task descriptions, interaction guidelines, and post-interaction procedures.</p>
        </sec>
        <sec id="sec-8-1-2">
          <title>A.2. Example of System Interaction</title>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gebru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McMillan-Major</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shmitchell</surname>
          </string-name>
          ,
          <article-title>On the dangers of stochastic parrots: Can language models be too big?</article-title>
          ,
          <source>in: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency</source>
          , FAccT '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>610</fpage>
          -
          <lpage>623</lpage>
          . URL: https://doi.org/10.1145/3442188.3445922. doi:
          <volume>10</volume>
          .1145/3442188. 3445922.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Salimi</surname>
          </string-name>
          ,
          <article-title>Addressing trust and mutability issues in xai utilising case based reasoning</article-title>
          .,
          <source>ICCBR Doctoral Consortium</source>
          <volume>1613</volume>
          (
          <year>2022</year>
          )
          <fpage>0073</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wachter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mittelstadt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <article-title>Counterfactual explanations without opening the black box: Automated decisions and the gdpr</article-title>
          ,
          <source>Harv. JL &amp; Tech. 31</source>
          (
          <year>2017</year>
          )
          <fpage>841</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.-H.</given-names>
            <surname>Karimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Valera</surname>
          </string-name>
          ,
          <article-title>Algorithmic recourse: from counterfactual explanations to interventions</article-title>
          ,
          <source>in: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>353</fpage>
          -
          <lpage>362</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ustun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spangher</surname>
          </string-name>
          , Y. Liu,
          <article-title>Actionable recourse in linear classification</article-title>
          ,
          <source>in: Proceedings of the conference on fairness, accountability, and transparency</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>10</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Poyiadzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sokol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Santos-Rodriguez</surname>
          </string-name>
          , T. De Bie,
          <string-name>
            <given-names>P.</given-names>
            <surname>Flach</surname>
          </string-name>
          ,
          <article-title>Face: feasible and actionable counterfactual explanations</article-title>
          ,
          <source>in: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>344</fpage>
          -
          <lpage>350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Salimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wiratunga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Corsar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wijekoon</surname>
          </string-name>
          ,
          <article-title>Towards feasible counterfactual explanations: A taxonomy guided template-based nlg method</article-title>
          ,
          <source>in: ECAI</source>
          <year>2023</year>
          , IOS Press,
          <year>2023</year>
          , pp.
          <fpage>2057</fpage>
          -
          <lpage>2064</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mahajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Preserving causal constraints in counterfactual explanations for machine learning classifiers</article-title>
          , arXiv preprint arXiv:
          <year>1912</year>
          .
          <volume>03277</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wijekoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Corsar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wiratunga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Salimi</surname>
          </string-name>
          ,
          <article-title>Tell me more: Intent fulfilment framework for enhancing user experiences in conversational xai</article-title>
          ,
          <source>arXiv preprint arXiv:2405.10446</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wijekoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wiratunga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Corsar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Nkisi-Orji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Palihawadana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pradeep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Agudo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Caro-Martínez</surname>
          </string-name>
          ,
          <article-title>Cbr driven interactive explainable ai</article-title>
          ,
          <source>in: International conference on case-based reasoning</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Boonsanong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dickerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <article-title>Counterfactual explanations and algorithmic recourses for machine learning: A review</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Mothilal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Tan,</surname>
          </string-name>
          <article-title>Explaining machine learning classifiers through diverse counterfactual explanations</article-title>
          ,
          <source>in: Proc. Conf. on Fairness, Accountability, and Transparency</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>607</fpage>
          -
          <lpage>617</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Wiratunga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wijekoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Nkisi-Orji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Palihawadana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Corsar</surname>
          </string-name>
          , Discern:
          <article-title>Discovering counterfactual explanations using relevance features from neighbourhoods</article-title>
          ,
          <source>in: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1466</fpage>
          -
          <lpage>1473</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brughmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Leyman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Martens</surname>
          </string-name>
          ,
          <article-title>Nice: an algorithm for nearest instance counterfactual explanations, Data mining and knowledge discovery (</article-title>
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frieske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. J.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Survey of hallucination in natural language generation</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          , et al.,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          , I. Shafran,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          , React:
          <article-title>Synergizing reasoning and acting in language models</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dwivedi-Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dessì</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raileanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lomeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hambro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cancedda</surname>
          </string-name>
          , T. Scialom,
          <article-title>Toolformer: Language models can teach themselves to use tools</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          )
          <fpage>68539</fpage>
          -
          <lpage>68551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , Hugginggpt:
          <article-title>Solving ai tasks with chatgpt and its friends in hugging face</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          )
          <fpage>38154</fpage>
          -
          <lpage>38180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hüllermeier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          ,
          <article-title>Preference-based cbr: First steps toward a methodological framework</article-title>
          ,
          <source>in: International Conference on Case-Based Reasoning</source>
          , Springer,
          <year>2011</year>
          , pp.
          <fpage>77</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smyth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <article-title>A few good counterfactuals: generating interpretable, plausible and diverse counterfactual explanations</article-title>
          ,
          <source>in: International Conference on Case-Based Reasoning</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moriyama</surname>
          </string-name>
          , W.-Y. Wang,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gangopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Takamatsu</surname>
          </string-name>
          ,
          <article-title>Talk structurally, act hierarchically: A collaborative framework for llm multi-agent systems</article-title>
          ,
          <source>arXiv preprint arXiv:2502.11098</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , et al.,
          <article-title>Chain-of-thought prompting elicits reasoning in large language models</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gefner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Antoran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ma</surname>
          </string-name>
          , E. Kiciman,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hilmkil</surname>
          </string-name>
          , et al.,
          <article-title>Deep end-to-end causal inference</article-title>
          ,
          <source>in: NeurIPS 2022 Workshop on Causality for Real-world Impact</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Upadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Massie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clogher</surname>
          </string-name>
          ,
          <article-title>Case-based approach to automated natural language generation for obituaries</article-title>
          ,
          <source>in: International Conference on Case-Based Reasoning</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>279</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Leemann</surname>
          </string-name>
          , T.-T. Nguyen,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fiedler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Unhelkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Seidel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kasneci</surname>
          </string-name>
          , E. Kasneci,
          <article-title>Towards human-centered explainable ai: A survey of user studies for model explanations</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>46</volume>
          (
          <year>2023</year>
          )
          <fpage>2104</fpage>
          -
          <lpage>2122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Maathuis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sent</surname>
          </string-name>
          ,
          <article-title>Human-centered evaluation of explainable ai applications: a systematic review</article-title>
          ,
          <source>Frontiers in Artificial Intelligence</source>
          <volume>7</volume>
          (
          <year>2024</year>
          )
          <fpage>1456486</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Leake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wilkerson</surname>
          </string-name>
          ,
          <article-title>Cases are king: a user study of case presentation to explain cbr decisions</article-title>
          ,
          <source>in: International Conference on Case-Based Reasoning</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <article-title>A taxonomy of emergent trusting in the human-machine relationship</article-title>
          ,
          <source>Cognitive systems engineering</source>
          (
          <year>2017</year>
          )
          <fpage>137</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>