<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Accessed:</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Deeper Understanding: Efects of Domain Knowledge Integration for Conversational XAI</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Isabel Feustel</string-name>
          <email>isabel.feustel@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carolin Schindler</string-name>
          <email>carolin.schindler@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Niklas Rach</string-name>
          <email>niklas.rach@tensor-solutions.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Minker</string-name>
          <email>wolfgang.minker@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Ultes</string-name>
          <email>stefan.ultes@uni-bamberg.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tensor AI Solutions GmbH</institution>
          ,
          <addr-line>Magirus-Deutz-Straße 2, 89075 Ulm</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ulm University</institution>
          ,
          <addr-line>89081 Ulm</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Bamberg</institution>
          ,
          <addr-line>96045 Bamberg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>202</volume>
      <fpage>4</fpage>
      <lpage>04</lpage>
      <abstract>
        <p>Recent advances in explainable artificial intelligence (XAI) have emphasized the need for interactive systems that foster user understanding through dialogue. One key challenge in this context is ensuring that explanations are not only accessible but also faithful to the underlying model behavior. To address this, we combine two complementary approaches: a dialogue strategy for integrating domain knowledge into explanatory interactions and an automated pipeline for acquiring and structuring this knowledge as argumentation trees. We refine the dialogue strategy and implement it in a prototype system, which we evaluate in an online study ( = 80 ) assessing users' perceptions of plausibility and acceptance of AI-generated predictions. Our findings indicate that incorporating domain knowledge into conversational XAI impacts the users' acceptance of the AI's responses, with variation across domains and system settings. These findings underscore the potential of domain-informed dialogue to support more transparent XAI. At the same time, the persuasive nature of structured argumentation introduces ethical risks, as it may lead users to uncritically accept AI decisions. We conclude that this dual potential calls for careful and responsible design of conversational XAI systems. explainable artificial intelligence (XAI), conversational XAI, domain knowledge integration, argumentation structures, bipolar argumentation trees, user-centered evaluation, interactive explanations ArgXAI-25: 3rd International Workshop on Argumentation for eXplainable AI https://nt.uni-ulm.de/feustel (I. Feustel); https://nt.uni-ulm.de/schindler (C. Schindler); https://tensor-solutions.com/ (N. Rach); https://nt.uni-ulm.de/minker (W. Minker); https://www.uni-bamberg.de/ds/ (S. Ultes) Proceedings</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        AI systems are becoming increasingly common, and regulatory frameworks like the EU AI Act [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
are driving the need for transparency and accountability. Explainable AI (XAI) addresses this need by
providing insights into how AI makes decisions, fostering trust and understanding [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A particularly
promising area within this field is Conversational XAI, which facilitates interactive, user-centered
explanations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, current XAI approaches are often limited to providing information about
the underlying model or data alone [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Although such explanations can ofer valuable insight into
decision-making processes, they can present an overly narrow or misleading view, especially when
(nonexpert) users require a broader understanding of the underlying task domain or contextual implications.
To provide more comprehensive context, we propose integrating domain knowledge (DK) into XAI
explanations. This approach goes beyond surface-level details, ofering richer, more meaningful insights
that improve user comprehension and trust. Building on our prior work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we demonstrate that
integrating structured DK significantly influences user engagement and perception of AI outputs In
this paper, we address key limitations of our prior system and refine our approach through three key
contributions:
1. Enhanced Dialogue Strategy: We extend the existing framework [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by adding example-based
explanations and introducing a proactive strategy for DK delivery to increase the use of DK.
2. Robust, Real-World Evaluation: We conduct a larger user study (n=80) across two complex
scenarios: credit loan approval and diabetes risk assessment.
      </p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
3. Automated Scalability: We integrate an automatic pipeline [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to extract and structure DK using
argument mining, demonstrating the feasibility of scaling the approach beyond manually curated
content.
      </p>
      <p>Our results show that DK not only increases user agreement with AI decisions but also improves
their perception of plausibility, efects that vary across application domains. Moreover, we find that DK
shapes both user interpretation of the AI’s reasoning and their overall interaction behavior, highlighting
its importance for adaptive and user-centered explanation design. However, this persuasive power
presents an ethical challenge: structured arguments may lead users to accept flawed AI decisions
uncritically. We emphasize the need for careful and responsible design to ensure these systems remain
faithful and transparent.</p>
      <p>The remainder of this paper is structured as follows: Section 2 discusses related work in conversational
XAI, argumentation-based explanations and knowledge grounded dialogue systems. Section 3 outlines
the method for the incorporation of DK into explanatory dialogues. Section 4 describes the experimental
setup, and Section 5 presents the evaluation results. Section 6 ofers a detailed analysis, and Section 7
concludes with a summary and outlook.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Conversational Explainable AI (XAI) aims to provide interactive, user-centered explanations of AI
decisions. While traditional XAI methods often rely on static, one-time explanations, conversational
approaches facilitate dynamic, multi-turn dialogues that can adapt to users’ varying knowledge levels
and information needs [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. However, developing efective conversational XAI systems presents
several challenges. A systematic review by Mindlin et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] highlights the scarcity of training data
for building such systems, as well as the dificulty in maintaining context over multiple dialogue turns.
Additionally, many existing systems lack the ability to tailor explanations to individual users, limiting
their efectiveness in diverse real-world scenarios.
      </p>
      <p>
        Integrating argumentation theory into conversational XAI has been proposed to improve the clarity
and depth of explanations. Argumentation frameworks enable systems to present supporting and
opposing viewpoints, facilitating a more comprehensive understanding of AI decisions [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. For
example, EQRbot [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] delivers structured argument-based explanations in a chatbot format.
Similarly, argumentative dialogical agents have been used to generate reasoned, ontology-guided review
aggregations with dialogical justifications in recommendation scenarios [ 13].
      </p>
      <p>Incorporating domain knowledge (DK) into dialogue systems improves contextual relevance and
response accuracy. Knowledge-enhanced dialogue systems typically leverage external sources such as
knowledge graphs to guide response generation [14]. Approaches include injecting domain-specific
knowledge via lightweight adapters into pre-trained models [15], or directly integrating structured
knowledge into generation models, as in DialoKG [16]. Recent work also explores dynamically
constructing short-term knowledge graphs from unstructured texts to anticipate follow-up questions
in exploratory dialogues [17]. These eforts underscore the value of DK for supporting coherent,
informative, and user-aligned conversational AI.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Explanatory Dialogue System with Domain Knowledge</title>
      <p>
        This section outlines our methodology for integrating domain knowledge (DK) into explanatory
dialogues using bipolar argumentation trees [18, 19], based on Feustel et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We first present the
architecture of the explanatory dialogue system, then explain how XAI outputs are linked to relevant
DK and how this DK is modeled as argumentation trees. We describe how DK is automatically gathered
from domain-specific documents, and finally, how it is accessed and integrated into the dialogue system
to support interactive, coherent explanations.
      </p>
      <sec id="sec-3-1">
        <title>User</title>
      </sec>
      <sec id="sec-3-2">
        <title>System</title>
      </sec>
      <sec id="sec-3-3">
        <title>User</title>
      </sec>
      <sec id="sec-3-4">
        <title>System</title>
      </sec>
      <sec id="sec-3-5">
        <title>User</title>
      </sec>
      <sec id="sec-3-6">
        <title>System</title>
      </sec>
      <sec id="sec-3-7">
        <title>Hello there, I’m your personal assistant. I’m here to help you understand the prediction you’ve</title>
        <p>received. If you don’t know what to do, you can always ask for help. You can ask about the
features that played a role or what changes could afect the outcome.</p>
      </sec>
      <sec id="sec-3-8">
        <title>What are the main reasons why I would currently be rejected</title>
      </sec>
      <sec id="sec-3-9">
        <title>All right, let’s have a look. In your case, the most influential value was checking account, as</title>
        <p>you can see in the figure below. You can also see the influence of all other values, ordered by
their importance. Purple bars indicate a positive contribution, blue bars a negative contribution.</p>
      </sec>
      <sec id="sec-3-10">
        <title>Do you want to know more about why checking account is relevant?</title>
      </sec>
      <sec id="sec-3-11">
        <title>Yes please explain that aspect in more detail</title>
      </sec>
      <sec id="sec-3-12">
        <title>Setting up direct deposit of income into the checking account is often viewed positively by lenders, as it provides an added level of assurance regarding the applicant’s regular income stream and their ability to repay the loan.</title>
      </sec>
      <sec id="sec-3-13">
        <title>How does duration afect likelihood of being accepted?</title>
      </sec>
      <sec id="sec-3-14">
        <title>Loans for purposes like home improvements or business expansion might be viewed more favorably for larger amounts compared to discretionary spending.</title>
        <sec id="sec-3-14-1">
          <title>3.1. Dialogue System</title>
          <p>
            To support faithful and coherent explanatory dialogues, we rely on the modular dialogue system
architecture from Feustel et al. [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. This modular design ensures high controllability, which is essential
for maintaining explanation fidelity. The system is composed of three general modules: a Natural
Language Understanding (NLU) component, a Dialogue Manager (DM), and a template-based Natural
Language Generation (NLG) module. In addition to the general dialogue architecture, two specialized
components are integrated to handle explanation generation. The Explanation Module provides access to
various types of XAI outputs. Concurrently, the Knowledge Handler interacts with the structured domain
knowledge, which is organized as bipolar argumentation trees [18, 19]. Within the dialogue, users
can explicitly request diferent types of explanations: feature importance through Shapley Values [ 20],
example-based [21], counterfactual [22], or DK explanations. Further, initial why questions are mapped
to feature importance explanations, ofering users a first insight into the model’s reasoning for a
prediction. Following this, the system proactively prioritizes ofering DK explanations to provide
deeper contextual understanding. If no relevant DK is available, the system falls back on providing
counterfactual explanations. A limitation observed in previous work [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] is that DK had to be requested
manually, leading to its insuficient use. To address this, we introduce a proactive prompt following
feature importance explanations (e.g., “Do you want to know more about why savings is relevant?”).
This addition aims to encourage users to engage with the DK without altering the core dialogue flow.
An example dialogue snippet can be found in Table 1.
          </p>
        </sec>
        <sec id="sec-3-14-2">
          <title>3.2. Domain Knowledge Integration</title>
          <p>To efectively incorporate DK into explanatory dialogues, the approach involves four key steps: linking
XAI-generated outputs to domain-relevant concepts, modeling this knowledge using argumentation
trees, gathering domain-relevant knowledge from various textual sources, and enabling dynamic access
to it during interaction. The following sections describe each of these steps in detail.
Linking A crucial component of the method is aligning XAI-generated explanations with relevant
aspects of domain knowledge. We assume that most local explanation methods produce feature-outcome
relations, which indicate how specific input features influence a particular prediction (see Figure 2).
These feature-outcome pairs serve as anchor points for mapping XAI outputs to corresponding domain
concepts. For instance, an explanation such as “a low income negatively impacts credit approval” can
be linked to DK that discusses the role of income in creditworthiness. This mapping allows the system
to contextualize raw model behavior within interpretable and domain-relevant concepts. By forming
this connection, we ensure that explanations are both technically accurate and semantically meaningful,
enabling users to better understand model decisions by relating them to familiar domain concepts.
Modeling To represent the domain knowledge, we employ bipolar argumentation trees [18, 19], where
each node represents a sentential argument, and the edges indicate either a supporting or an attacking
relationship. Each tree is rooted in a central claim adressing the feature-outcome relation (e.g., “Savings
are relevant for acquiring a credit”), with subsequent nodes elaborating or contesting the root argument.
This tree structure enables modular and interpretable organization of domain knowledge. A key strength
of the approach is its suitability on dialogical interaction [23, 24]. Each node in the tree can be selected as
a response to a user query, facilitating explanations that are both structured and contextually adaptive.
Gathering DK can be acquired automatically using the argument mining pipeline introduced by
Schindler et al. [25], which supports the extraction and structuring of arguments from heterogeneous
sources such as domain-specific documents. This pipeline encompasses several stages: initially, the
argument search API by summetix1 [26] extracts relevant argumentative statements from the input
documents. Afterwards, these arguments are pre-grouped by the features of the underlying AI prediction
scenario (e.g., income, savings). Finally, for each feature, an argumentation tree is constructed by
identifying relationships among the arguments and classifying them as either supporting or attacking.
This automated approach streamlines the creation of structured argumentation trees, enhancing the
scalability and consistency of DK representation in explainable AI systems.</p>
          <p>Accessing The selection of DK is handled through a traversal of the argumentation tree. Upon a
user’s request or acceptance of the proactive prompt, the system randomly selects a subnode linked to
the original feature-related claim as the entry point. As users continue the dialogue, the system traverses
downward through the argumentation tree, presenting child nodes as responses. If no further arguments
are available, the system backtracks up the tree. To prevent repetition and maintain conversational
novelty, each argument is presented only once during an interaction.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Study Setup</title>
      <p>
        This chapter outlines the setup of our user study building on a preliminary study conducted in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
While the overall structure of the study remains similar, several key adjustments were made to improve
the validity and interpretability of the evaluation. Most notably, we separated the prediction component
from the explanatory dialogue system and introduced an additional scenario to better investigate the
role of DK in conversational XAI.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Overall Study Setup</title>
        <p>The primary goal of our study is to investigate the influence of DK on users’ understanding and
evaluation of an AI prediction system. Participants interact with a two-part system: (1) a prediction
system that provides binary classification outputs, and (2) a dialogue-based XAI system that ofers
explanations including domain-specific information. To explore how participants react to diferent AI
behaviors, we trained two separate AI models: A true AI, trained on correctly labeled data, with a
dataset that allowed for high predictive accuracy. And a false AI, trained on the same data but with all
labels flipped, resulting in systematically incorrect predictions.</p>
        <p>Each participant is exposed to only one of these AI systems to prevent prior exposure from afecting
their expectations or interaction behavior. Additionally, each participant engages in two separate
explanatory dialogue sessions: one enriched with DK and one without. To control for order efects, the
presentation of DK is counterbalanced across participants so that some begin with the DK condition
while others start without it. Furthermore, each dialogue session involves a diferent scenario to reduce
carryover efects, ensuring that prior exposure to DK does not afect user perception in the subsequent
interaction. After each dialogue interaction, participants answer two core evaluation questions:
Q1 I agree with the decisions made by the prediction system.</p>
        <p>Q2 The prediction system’s decisions are plausible.</p>
        <p>
          Furthermore, the participants evaluate the quality and usability of the dialogue system using the SASSI
questionnaire [27]. We hypothesize that DK will increase agreement and perceived plausibility in the
true AI condition. Conversely, for the false AI condition, we expect DK to expose inconsistencies,
thereby decreasing agreement and perceived plausibility. This contrasts with findings from a preliminary
study by Feustel et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], where participants still showed high agreement with the false AI. This is
presumably due to the integrated system design, which made it challenging to distinguish between
prediction and explanation system. In this study, that limitation is addressed by explicitly separating
the prediction and dialogue components.
        </p>
        <p>At the beginning of the study, participants receive a brief overview and instructions. They then
complete a structured input form, where they are asked to enter values for all input features required
by the AI prediction model. To support this process, helpful hints and descriptions are provided for
each feature (e.g., explaining units or normal value ranges). Once the form is completed, participants
can submit their inputs to receive a prediction from the system. Following this, they engage in an
open-ended dialogue with the XAI system, exploring the model’s reasoning in a natural, self-directed
manner. Participants may also request new predictions at any point during the session by modifying the
input values. After completing both dialogues, participants evaluate each interaction using five-point
Likert scale questions (Q1, Q2, and SASSI), and provide demographic data, their prior experience with
AI, and attitudes toward it. They are also asked to indicate which scenario they preferred, enabling the
analysis of subjective preferences alongside objective measures.</p>
        <p>A total of 80 participants (52 male, 28 female; mean age = 40.72), all native English speakers from the
UK, were recruited via Clickworker to complete the online study. To evaluate the statistical significance
of our findings, we use the Mann–Whitney U test [ 28], the independent samples t-test, and apply the
Holm–Bonferroni correction [29] for the three comparisons of interest: the overall dataset, the credit
scenario data, and the diabetes scenario data.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Scenarios</title>
        <p>The preliminary study included two scenarios: Titanic survival and credit approval. While the Titanic
scenario proved helpful for introducing the concept, it was overly simplified and widely familiar,
limiting the impact of domain knowledge. To better assess the influence of background information,
we introduced a new scenario: diabetes risk assessment.</p>
        <p>The diabetes scenario utilizes the Pima Indians dataset [30], with features including age, insulin,
glucose, BMI, skin thickness, and blood pressure. To make the task gender-neutral and more accessible,
we excluded features such as pregnancies and diabetes pedigree function. For each remaining feature,
we provided a simple reference range (e.g., Glucose: Plasma glucose concentration 2 hours post oral
glucose tolerance test in mg/dl. Normal range: 70–99 mg/dl). The inclusion of the diabetes use case
introduces a setting where domain knowledge is more critical and less familiar to users, enabling a
more robust test of DK’s role.</p>
        <p>The credit approval scenario, based on the German Credit Data [31], includes features such as savings,
checking account status, credit amount, loan purpose, and duration.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Acquisition of Domain Knowledge</title>
        <p>To construct structured DK for this study, we applied the argumentation pipeline (see Section 3.2) to two
sets of documents per scenario. All materials were collected through manual online research, including
publicly available articles and websites. However, we found that much of this content was not tailored
to user-centric XAI scenarios. For example, while credit-related documents often discussed financial
decision-making in general, they rarely addressed specific user questions such as “why are savings
relevant?”. Similarly, the diabetes-related content tended to be highly technical and dificult for lay users
to interpret. Critically, the quality and faithfulness of the resulting argument structures are directly
dependent on the reliability of the input sources. If the documents lack relevance, are inaccurate, or are
not tailored to the explanatory goals of the dialogue system, the resulting argumentative reasoning may
be misleading or unhelpful. Thus, control over the data source is essential for ensuring contextually
grounded explanations.</p>
        <p>To better align the source material with the explanatory goals of the system, we generated a second
set of documents using large language models (LLMs), including ChatGPT2, DeepSeek3, and Gemini4.
These models were prompted to generate concise, accessible summaries that directly answer
XAIrelevant, feature-based questions (e.g., “why is glucose important for diabetes risk?”). All generated
outputs were manually source-checked. As a promising direction for future work, retrieval-augmented
generation (RAG) approaches could be employed to constrain LLM outputs to a curated set of trusted
sources, thereby improving the traceability and reliability of the generated domain knowledge.</p>
        <p>To ensure quality and usability in the explanatory dialogue system, we conducted a manual review
and refinement of the generated argument trees. This included: (1) simplifying and cleaning the text of
the arguments for improved readability in the dialogue; (2) removing arguments irrelevant to the XAI
context; (3) restructuring the argumentation tree in specific cases where a parent node was removed or
where the argumentative reasoning was inconsistent. In such cases, we either attached the orphaned
node to its grandparent or moved the node together with all its subnodes directly under the main
claim to preserve logical coherence; and (4) correcting the assigned type of relation (support or attack)
to better reflect their contextual meaning. While manual steps ensured high-quality evaluation data,
future work could replace parts of this process with scalable alternatives such as semantic filtering,
domain-specific retrieval, or RAG-based pipelines.</p>
        <p>The resulting DK structures includes 52 arguments across all features for the credit domain and 60
arguments for the diabetes scenario. Each AI feature is supported by a minimum of six arguments,
enabling rich and context-aware interaction within the dialogue system.</p>
        <p>Q1
Q2
false
true
false
true

∑</p>
        <p>∑
false
true
false
true
3.13 23 3.12 17
 ∗
where  ∗ is the corrected p-value using the Holm-Bonferroni method [29].</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The evaluation investigates how DK afects user perception and interaction in explanatory XAI dialogues.
We first analyze participants’ responses to agreement (Q1) and plausibility (Q2), first across all scenarios
and then broken down by individual scenario. We then examine scenario-specific efects on dialogue
quality (SASSI), followed by engagement metrics such as dialogue length and DK usage. Finally, we
assess how DK availability influences the use of diferent explanation types.</p>
      <sec id="sec-5-1">
        <title>5.1. Scenario-Dependent Efects on Perceived AI Reasoning</title>
        <p>Across all scenarios, we observed no significant diference in participants’ ratings of how much they
agreed with the AI’s decisions (Q1) or how plausible they found those decisions (Q2), regardless of
whether DK was available or whether the AI system was correct (true AI) or incorrect (false AI), as
shown in Table 1. However, a scenario-specific analysis reveals more nuanced efects. In the diabetes
scenario, the presence of DK was associated with higher Q1 and Q2 ratings when the AI system was
correct (true AI), while it had minimal impact in the false AI condition. In contrast, in the credit
scenario, DK increased both the perceived plausibility and agreement ratings only when the AI was
incorrect (false AI). However, when applying Holm-Bonferroni correction for three comparisons, only
the diference in the False AI condition for the credit scenario remains significant. Despite these efects,
participants expressed no clear overall preference for one scenario over the other, with 39 preferring
credit and 41 preferring diabetes, as indicated by a direct question included in the final questionnaire.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Efects on Dialogue Experience</title>
        <p>The SASSI questionnaire revealed no statistically significant diferences in participants’ perception
of the dialogue experience across AI setting and domain knowledge (DK) conditions after applying
Holm-Bonferroni correction. However, several trends emerged that may inform future investigations.
In the diabetes scenario with a true AI, the presence of DK was associated with higher perceived</p>
        <sec id="sec-5-2-1">
          <title>All Scenarios</title>
        </sec>
        <sec id="sec-5-2-2">
          <title>Credit Scenario</title>
        </sec>
        <sec id="sec-5-2-3">
          <title>Diabetes Scenario No DK DK</title>
          <p>DK usage
 DK_t
Mean number of turns per dialogue ( dt) with domain knowledge (DK) and without (No DK), along with the
corresponding standard deviation ( ).  denotes the  -value from an independent samples t-test comparing  dt,
and  ∗ is the corrected value using the Holm–Bonferroni method [29].  DK_t represents the mean number of DK
turns per dialogue in which DK was provided to the user.
system accuracy ( ∗ = 0.062) and lower ratings of boredom ( ∗ = 0.113), suggesting that DK may have
contributed to greater engagement. In the credit scenario with a false AI, users rated the system as more
pleasant and friendly when DK was present ( ∗ = 0.137), with further indications that DK may have
enhanced enjoyment ( ∗ = 0.171), though again, none of these diferences were statistically reliable
after correction. No meaningful trends were observed in the remaining conditions (diabetes/false AI
and credit/true AI), underscoring the contextual dependency of DK’s efects. While these results do not
provide conclusive evidence, they suggest that DK may positively influence user perceptions in specific
combinations of domain and AI correctness, warranting deeper exploration in future studies.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Engagement and Use of Domain Knowledge</title>
        <p>Participants engaged in significantly longer dialogues when DK was available, as shown in Table 2. An
independent samples t-test revealed a significant increase in dialogue length across all scenarios (  ∗ =
0.014), particularly in the diabetes scenario ( ∗ = 0.012), where users explored the argumentation tree
more actively. In contrast, no significant diference was observed in the credit scenario (
 ∗ = 0.620).</p>
        <p>The average dialogue length was higher in the diabetes scenario with DK (mean = 15.2 turns, SD
= 12.8) compared to the credit scenario (mean = 12.14 turns, SD = 5.2), indicating not only more
extensive interaction but also greater variability in user engagement when domain knowledge was
present. On average, participants interacted with 2.39 DK arguments per dialogue, with a higher number
of arguments explored in the diabetes scenario compared to the credit domain. In total, 44 unique
arguments were discussed across all sessions. The most frequently explored features were glucose in
the diabetes scenario (8 arguments, 56 turns) and credit amount in the credit scenario (6 arguments, 36
turns).</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Interaction with XAI Explanations</title>
        <p>To evaluate how users interacted with diferent explanation types, we analyzed the frequency of their
use across all dialogues. Table 3 shows the percentage of dialogues in which each explanation type
was used, separated by the availability of DK. Importantly, this analysis considers whether DK was
available in the dialogue session, not whether it was explicitly requested by the user, as evaluated
earlier. We also include the use of new predictions in this analysis, interpreting them as a form of
trial-and-error explanation that reflects the user’s reasoning process. We observed that counterfactual
explanations were used significantly more often in dialogues where domain knowledge was not available.
Conversely, domain knowledge explanations were significantly more frequent when domain knowledge
was present, supporting the impact of its availability on explanation behavior. For all other explanation
types, including feature importance and example-based explanations, no significant diferences were
observed.</p>
        <sec id="sec-5-4-1">
          <title>Explanation</title>
        </sec>
        <sec id="sec-5-4-2">
          <title>Counterfactuals</title>
        </sec>
        <sec id="sec-5-4-3">
          <title>Shapley Values</title>
        </sec>
        <sec id="sec-5-4-4">
          <title>Example-based</title>
        </sec>
        <sec id="sec-5-4-5">
          <title>New Prediction</title>
        </sec>
        <sec id="sec-5-4-6">
          <title>Domain Knowledge</title>
          <p>No DK
available available
71%
90%
18%
21%
36%
90%
26%
24%
76%</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <sec id="sec-6-1">
        <title>Percentage of dialogues using each explanation type across domain knowledge availability. “New Prediction”</title>
        <p>indicates that the user entered new input data to request a new prediction.  refers to the  -value from Fisher’s
Our findings ofer several insights into the impact of domain knowledge (DK) in explanatory dialogues.
Specifically, we examine how DK interacts with scenario context, argument quality, explanation type,
and user expectations. We also address the limitations of our approach and outline ethical considerations
that arise from our observations.</p>
        <sec id="sec-6-1-1">
          <title>6.1. Interpretation of Scenario Efects</title>
          <p>A key result is the scenario-dependent influence of domain knowledge. In the diabetes scenario, DK
improved users’ ratings of agreement and plausibility when the AI made correct predictions. In contrast,
in the credit scenario, similar trends occurred when the AI was incorrect. These diferences point to
the central role of domain familiarity, perceived relevance, and users’ expectations in shaping how
explanations are interpreted. Importantly, this asymmetry raises ethical concerns. In the credit scenario,
DK may have had a persuasive efect, making incorrect predictions appear more plausible. Although
the explanations were technically faithful, the presence of structured domain knowledge appeared to
lend undue credibility to flawed AI decisions. This finding illustrates a broader ethical challenge in
explainable AI: the same mechanisms that aim to enhance understanding may inadvertently encourage
over-reliance and reduce critical engagement. When explanations sound authoritative, particularly in
familiar domains like finance or health, users may assume correctness even in the presence of model
errors. Future conversational XAI systems must therefore consider how to balance helpfulness with
transparency, potentially by signaling uncertainty or encouraging user reflection in cases where model
confidence is low or explanations may be misleading.</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>6.2. Quality and Perception of Domain Knowledge</title>
          <p>Participant feedback reinforces the importance of how domain knowledge is formulated and perceived.
Users consistently described the DK in the diabetes scenario as more “natural,” “human-like,” and
“interesting.” For instance, one participant noted, “The responses to the diabetes scenario were more
accurate and human,” while another remarked, “This type of system would be helpful for preventative
health care [...] the AI assistant changes the tone much more humane.” These responses suggest that
the alignment between explanation content, tone, and user expectations enhances engagement.</p>
          <p>However, this perceived quality is not solely a function of domain familiarity, it also reflects the
quality of the arguments themselves. Coherent, well-structured, and contextually appropriate arguments
contribute to trust and usability. Conversely, vague or irrelevant DK may confuse users or undermine
credibility. Future work should investigate methods for evaluating the clarity, relevance, and
appropriateness of arguments in relation to the user’s goals and the underlying model behavior. Semi-automated
techniques or curated authoring tools may assist in adapting arguments to dialogue-specific formats
while preserving factual integrity.</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>6.3. Contextual Shifts in Explanation Use</title>
          <p>
            Another insight from our study is that DK afects how users interact with other explanation types.
In particular, the presence of DK was associated with reduced use of counterfactual explanations,
suggesting a complementary or even substitutive relationship between explanation forms. This supports
previous work emphasizing that explanation efectiveness depends on context, user intent, and prior
knowledge [
            <xref ref-type="bibr" rid="ref3">3, 33</xref>
            ]. Nevertheless, these observations are shaped by the limits of our current system. Our
explanations are derived from a Random Forest model, chosen for its interpretability and compatibility
with XAI techniques like Shapley values and counterfactual reasoning. While suitable for real-time
dialogue, this model may not reflect the complexity of many real-world tasks. Likewise, our explanation
set, restricted to feature attributions, examples, and counterfactuals, only captures a subset of possible
explanation forms. Incorporating broader techniques such as LIME, saliency maps, or concept-based
reasoning could ofer richer and more diverse interactions, potentially altering how users balance
or select explanations. Future systems should explore adaptive explanation policies that respond to
prediction reliability, user profiles, and interaction history.
          </p>
        </sec>
        <sec id="sec-6-1-4">
          <title>6.4. Design Implications, Scalability, and Future Directions</title>
          <p>This study highlights several design principles for efective conversational XAI: the need for
scenariosensitive explanation strategies, high-quality argument content, and modular system design. Our
pipeline demonstrates the feasibility of automatically generating DK argument structures, but the need
for manual refinement underscores the value of a human-in-the-loop process. Template-based natural
language generation (NLG) ensures factual accuracy, yet limits conversational fluency. As generative
models continue to improve, future systems might leverage hybrid NLG approaches that maintain
grounding while enhancing expressiveness. Scalability also remains a challenge. While our current
setup generalizes across models and explanation types in principle, real-world applications demand
robustness across domains, languages, and user groups. Evaluating DK quality across use cases and
tailoring it to specific interaction goals will be crucial. Additionally, our results suggest that users’
prior knowledge and personal interest, in this case, in health-related content, play a significant role
in shaping perceptions. Personalization based on user profiles, topic familiarity, or trust levels may
further improve explanatory efectiveness and mitigate risks.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>
        In this paper, we presented a modular dialogue-based XAI system that integrates DK via structured
argumentation trees to support more contextualized explanations of AI predictions. Extending prior
work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], our approach introduces utilizing an automated pipeline for acquiring domain knowledge [25],
a proactive dialogue strategy to increase the DK usage, and example-based explanations to enhance
interactivity and user engagement.
      </p>
      <p>
        In a user study ( = 80 ) across two scenarios, credit approval and diabetes risk assessment, we found
that DK shaped how users perceived and interacted with AI explanations. While most efects did
not reach statistical significance after correction, several trends suggest that DK influences perceived
plausibility and engagement, depending on the domain and the correctness of the AI output. In the
diabetes scenario with correct predictions, DK tended to increase user agreement and plausibility ratings.
In the credit scenario, DK appeared to act more persuasively—even when the AI’s predictions were
incorrect. These findings support theoretical claims by Miller [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Wang et al. [33], who emphasize
that the efectiveness of explanations depends on factors such as presentation quality, domain relevance,
and users’ prior knowledge and goals. Moreover, our results reveal a significant shift in explanation use:
when DK was available, users relied less on counterfactual explanations, indicating a complementary
or even substitutive relationship between explanation types. This highlights that DK does not simply
add to the explanation set, it reorients user interaction. While these efects ofer opportunities for
more adaptive and context-sensitive explanation strategies, they also raise ethical concerns: if the
reasoning surfaced by DK is unfaithful to the model’s true decision process, it may mislead users and
foster over-reliance, particularly in high-stakes domains. These insights underscore the importance of
not only tailoring explanations to user context, but also ensuring their faithfulness and transparency
to uphold responsible AI interaction. Our work ofers insights into integrating DK via structured
dialogues and automated pipelines for conversational XAI. Future research should explore scalable
DK evaluation, adaptive explanation policies, and personalization based on user goals and expertise to
support responsible and efective AI communication.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We thank summetix GmbH for supporting our research with access to their Classify API.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4o and DeepL-Write in order to: Grammar
and spelling check, Rephrasing. After using these tools, the authors reviewed and edited the content as
needed and takes full responsibility for the publication’s content.
[13] A. Rago, O. Cocarascu, J. Oksanen, F. Toni, Argumentative review aggregation and dialogical
explanations, Artificial Intelligence (2025).
[14] P. Priya, D. Varshney, M. Firdaus, A. Ekbal, Knowledge-enhanced response generation in dialogue
systems: Current advancements and emerging horizons, in: Proc. of the 2024 Joint Int. Conf. on
Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): Tutorial
Summaries, 2024, pp. 80–87.
[15] D. Emelin, D. Bonadiman, S. Alqahtani, Y. Zhang, S. Mansour, Injecting domain knowledge in
language models for task-oriented dialogue systems, in: Proc. of the 2022 Conf. on Empirical
Methods in Natural Language Processing, 2022, pp. 11962–11974.
[16] M. R. A. H. Rony, R. Usbeck, J. Lehmann, Dialokg: Knowledge-structure aware task-oriented
dialogue generation, in: Findings of the Association for Computational Linguistics: NAACL 2022,
2022, pp. 2557–2571.
[17] G. Wilcock, Anticipating follow-up questions in exploratory information search, in: Proc. of the
25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2024, pp. 103–109.
[18] C. Stab, I. Gurevych, Identifying argumentative discourse structures in persuasive essays, in: Proc.</p>
      <p>of the 2014 conf. on empirical methods in natural language processing (EMNLP), 2014, pp. 46–56.
[19] A. Pazienza, S. Ferilli, F. Esposito, S. Bistarelli, M. Giacomin, Constructing and evaluating bipolar
weighted argumentation frameworks for online debating systems., in: AI3@ AI* IA, 2017, pp.
111–125.
[20] S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, in: Advances in</p>
      <p>Neural Information Processing Systems, 2017, pp. 4768 – 4777.
[21] A. Renkl, Toward an instructionally oriented theory of example-based learning, Cognitive science
(2014).
[22] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the black box:</p>
      <p>Automated decisions and the gdpr, Harv. JL &amp; Tech. (2017).
[23] A. Aicher, N. Rach, W. Minker, S. Ultes, Opinion building based on the argumentative dialogue
system bea, in: Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th Int.</p>
      <p>Workshop on Spoken Dialogue Systems, 2021, pp. 307–318.
[24] N. Rach, C. Schindler, I. Feustel, J. Daxenberger, W. Minker, S. Ultes, From argument search to
argumentative dialogue: A topic-independent approach to argument acquisition for dialogue
systems, in: Proc. of the 22nd Annual Meeting of the Special Interest Group on Discourse and
Dialogue, 2021, pp. 368–379.
[25] C. Schindler, I. Feustel, N. Rach, W. Minker, Automatic generation of structured domain knowledge
for dialogue-based xai systems, The 15th Int. Workshop on Spoken Dialogue Systems Technology
(2025).
[26] C. Stab, J. Daxenberger, C. Stahlhut, T. Miller, B. Schiller, C. Tauchmann, S. Eger, I. Gurevych,
ArgumenText: Searching for arguments in heterogeneous sources, in: Proc. of the 2018 Conf. of
the North American Chapter of the Association for Computational Linguistics: Demonstrations,
2018, pp. 21–25.
[27] K. S. Hone, R. Graham, Towards a tool for the subjective assessment of speech system interfaces
(sassi), Natural Language Engineering (2000).
[28] P. E. McKnight, J. Najab, Mann-whitney u test, The Corsini encyclopedia of psychology (2010).
[29] S. Holm, A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics
(1979).
[30] J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, R. S. Johannes, Using the adap learning
algorithm to forecast the onset of diabetes mellitus, in: Proc. of the annual symposium on computer
application in medical care, 1988, p. 261.
[31] H. Hofmann, Statlog (German Credit Data), UCI Machine Learning Repository, 1994.
[32] G. J. Upton, Fisher’s exact test, Journal of the Royal Statistical Society: Series A (Statistics in</p>
      <p>Society) (1992).
[33] D. Wang, Q. V. Yang, A. Abdul, B. Y. Lim, Designing theory-driven user-centric explainable ai, in:
Proc. of the 2019 CHI Conf. on Human Factors in Computing Systems, 2019, pp. 1–15.
3.57
3.73
3.94
4.00
3.59
3.69</p>
      <sec id="sec-9-1">
        <title>Results of the SASSI Questionnaire for all scenarios. Mean values for each category are presented, based on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). The column ’p’ shows the results of the Mann-Whitney-U test.</title>
        <p>A.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Additional Evaluation Information</title>
      <p>the demographic details of the participants. For this specific analysis, the Bonferroni-Holm correction
was not applied because a preliminary review indicated that the low number of significant results made
the correction unnecessary, allowing for a clearer presentation of the raw data.</p>
      <p>B.</p>
    </sec>
    <sec id="sec-11">
      <title>User study</title>
      <p>Within this section we show the introductory and task texts utilized in the study.
3.67
3.73
3.12
3.47</p>
      <p>Credit - True AI</p>
      <p>Diabetes - False AI</p>
      <sec id="sec-11-1">
        <title>Results of the SASSI Questionnaire for all scenarios. Mean values for each category are presented, based on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). The column ’p’ shows the results of the Mann-Whitney-U test.</title>
      </sec>
      <sec id="sec-11-2">
        <title>Participants</title>
      </sec>
      <sec id="sec-11-3">
        <title>Total</title>
      </sec>
      <sec id="sec-11-4">
        <title>Female</title>
      </sec>
      <sec id="sec-11-5">
        <title>Male 80 28 52</title>
        <p>Age</p>
      </sec>
      <sec id="sec-11-6">
        <title>Average</title>
      </sec>
      <sec id="sec-11-7">
        <title>Youngest</title>
      </sec>
      <sec id="sec-11-8">
        <title>Oldest 40.72 20 60</title>
        <p>was rated from 0 (no experience) to 5 (expert).</p>
        <sec id="sec-11-8-1">
          <title>B.1. General Instruction</title>
          <p>Enabling conversational Explainable AI
Welcome to our online study, where we are evaluating the efectiveness of an explainable dialogue
system. In this study, you will interact with two systems: a &lt;strong&gt;prediction system&lt;/strong&gt;
powered by an AI model, and the dialogue system Athena. For each interaction, you will begin by
iflling out a form to provide the information needed for the AI model (Prediction System) to generate
a prediction based on a specific scenario. Once the prediction is made, a conversation with Athena
will begin. Athena (Dialogue System) will ofer explanations and support your understanding through
interactive dialogue. You will go through this process twice, with diferent scenarios. After each
interaction, you will be asked to complete a questionnaire about your experience. In the end, we kindly
ask you to complete a demographic questionnaire. Your participation, taking approximately 15 minutes,
will provide valuable insights into enhancing the transparency of machine learning models through
the usage of dialogue systems. Thank you for your participation. Note: The AI models (Prediction
Systems) are based on patterns in data that may not always align with reality. Its results should be seen
as estimations or risk assessments—not exact predictions. Please interpret the outcomes with care, and
consider them as part of a broader decision-making process rather than definitive conclusions.</p>
        </sec>
        <sec id="sec-11-8-2">
          <title>B.2. Task Description</title>
          <p>In this study, you have the opportunity to engage with our dialogue system Athena in a conversation
about a predefined scenario. Your first task is to fill out a short form with the necessary information
to receive a prediction from an AI model (Prediction System). Once the prediction is generated, a
dialogue with Athena (Dialogue System) will begin. You are encouraged to interact with Athena freely
— asking, exploring diferent aspects of the prediction, and sharing your thoughts or concerns. At any
point during the study, you may return to the form of the prediction system to modify your inputs and
request a new prediction. This will initiate a new conversation with Athena, allowing you to discuss
and compare diferent outcomes. While the web study will eventually guide you to continue with the
study, feel free to explore and interact as long as you wish to support a more in-depth evaluation of the
experience with Athena (Dialogue System).</p>
          <p>Credit Scenario In this session, we invite you to explore the process of applying for a credit loan
and to consider whether you would be accepted by a bank for such a loan. You’ll begin by providing
details such as income, credit history, and purpose through a short form (Prediction System). Using this
information, the underlying prediction system will estimate whether you would likely be approved or
denied for a credit loan by a bank. After the prediction is made, Athena (Dialogue System) will engage
with you in a dialogue to explain the outcome and help you understand the reasoning behind it. Please
note: This is a fictitious scenario, and the information you provide does not need to reflect your real
ifnancial situation. You are free to use any values you wish.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Parliament</surname>
          </string-name>
          , Council,
          <source>Artificial intelligence act</source>
          , https://eur-lex.europa.eu/legal-content/EN/TXT/ ?uri=CELEX:52021PC0206,
          <year>2021</year>
          . COM(
          <year>2021</year>
          )
          <article-title>206 final</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rad</surname>
          </string-name>
          ,
          <article-title>Opportunities and challenges in explainable artificial intelligence (xai): A survey</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>11371</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Explanation in artificial intelligence: Insights from the social sciences</article-title>
          ,
          <source>Artificial intelligence</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Gilpin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bajwa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Specter</surname>
          </string-name>
          , L. Kagal,
          <article-title>Explaining explanations: An overview of interpretability of machine learning</article-title>
          ,
          <source>in: 2018 IEEE 5th Int. Conf. on data science and advanced analytics (DSAA)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>80</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Feustel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Minker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ultes</surname>
          </string-name>
          ,
          <article-title>Enhancing model transparency: A dialogue system approach to XAI with domain knowledge</article-title>
          ,
          <source>in: Proc. of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>248</fpage>
          -
          <lpage>258</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Schindler</surname>
          </string-name>
          , I. Feustel,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Minker</surname>
          </string-name>
          ,
          <article-title>Automatic generation of structured domain knowledge for dialogue-based XAI systems</article-title>
          ,
          <source>in: Proc. of the 15th Int. Workshop on Spoken Dialogue Systems Technology</source>
          ,
          <year>2025</year>
          , p.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gruen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Questioning the ai: informing design practices for explainable ai user experiences</article-title>
          ,
          <source>in: Proc. of the 2020 CHI conf. on human factors in computing systems</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Y.
          <string-name>
            <surname>Low</surname>
            ,
            <given-names>X. J.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B. A.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Conversational explanations: Discussing explainable ai with non-ai experts</article-title>
          ,
          <source>in: Proc. of the 30th Int. Conf. on Intelligent User Interfaces</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>409</fpage>
          -
          <lpage>424</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mindlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Beer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Sieger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heindorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Esposito</surname>
          </string-name>
          , A.
          <string-name>
            <surname>-C. Ngonga Ngomo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <article-title>Beyond one-shot explanations: a systematic literature review of dialogue-based xai approaches</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vassiliades</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bassiliades</surname>
          </string-name>
          , T. Patkos,
          <article-title>Argumentation and explainable artificial intelligence: a survey, The Knowledge Engineering Review (</article-title>
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Demollin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q.-U.-A. Shaheen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Budzynska</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Sierra</surname>
          </string-name>
          ,
          <article-title>Argumentation theoretical frameworks for explainable artificial intelligence</article-title>
          ,
          <source>in: 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Castagna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>McBurney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Parsons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sassoon</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. I. Sklar</surname>
          </string-name>
          ,
          <article-title>Eqrbot: A chatbot delivering eqr argument-based explanations</article-title>
          ,
          <source>Frontiers in Artificial Intelligence</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>