1. Introduction

Cagliari, Italy * Corresponding author. $ azaninello@fbk.eu (A. Zaninello); pbodlovic@ifzg.hr (P. Bodlović); m.lewinski@fcsh.unl.pt (M. Lewinski); magnini@fbk.eu (B. Magnini)

“I understand, but...”: Towards a Comprehensive Account of the Explainee's Voice in Explanatory Dialogues

Andrea Zaninello

0 1

Petar Bodlović

Marcin Lewinski

Bernardo Magnini

0 0 Fondazione Bruno Kessler , Trento , Italy 1 Free University of Bolzano , Italy 2 Institute of Philosophy , Zagreb , Croatia 3 Universidade Nova de Lisboa , Portugal

2025

000 0 0001

In this paper, we introduce IUBAS, the first annotation scheme that provides an in-depth analysis of the Explainee's reactions in explanatory dialogues. Current schemes, mainly focusing on answers to what, how, and, occasionally, why questions, lack the granularity to capture the full range of the Explainee's contribution. Our richer framework, grounded in argumentation and philosophical theory, distinguishes diferent kinds of explanation requests, feedback types, and critical questions. We provide empirical evidence of the efectiveness of the scheme through a set of experiments with three SOTA LLMs. The IUBAS scheme provides a more detailed understanding of how Explainees interact with Explainers in a dialogical setting, contributing to the development of more sophisticated and human-like conversational agents.

eol>explanatory dialogues annotation scheme explanations

1. Introduction

Consequently, a comprehensive model of explanatory interactions should not only focus on the explanations provided, but also on the request for and reception of such explanations. In addition, the development of annotation schemes for explanatory dialogues is also crucial for training automatic dialogue systems and evaluating their ability to engage in efective knowledge transfer.

For simplicity, throughout the paper, we will assume

the following definitions and notation.

The study of explanations in philosophy and argumentation theory covers a wide range of questions. Researchers have focused on distinguishing explanations from other forms of reasoning, such as clarifications and arguments, highlighting the diference in their core function. Explanations difer from clarifications in that, while the latter simply aim at understanding, explanations aim at increasing knowledge, carrying greater illocutionary force. Moreover, while arguments aim to provide evi- • Phenomenon (p): event, fact, evidence, efect disdence for a doubted claim, explanations seek to account cussed in the dialogue; its existence is a precondifor (e.g., provide a cause for) an already accepted, un- tion for explanatory dialogue (e.g., medical concontroversial statement [ 8, 9, 10, 11 ]. This distinction dition) becomes evident in the medical context, where a doctor • Explanandum (E): event, fact, evidence, efect in might request an explanation for a patient’s dark urine (a that it requires explanation or understanding (e.g., belief in an already accepted symptom in which no jus- medical symptom) tification is required) but may seek an argument for the • Explanans (H ): event, fact, hypothesis, cause of E diagnosis of hemolytic anemia (a hypothesis that requires that provides explanation or understanding (e.g., justification). disease or medical injury)

Further research has investigated the formal and nor- • Explainer (Er): who is clarifying or transferring mative dimensions of explanations, concentrating on their understanding of E through the stating of H developing argument schemes and critical questions as- • Explainee (Ee): who is requesting, giving feedsociated with common explanatory inferences, such as back on or challanging an explanation H for some Inference to the Best Explanation [12, 13, 14, 15]. Prag- given E matic studies, on the other hand, focus on defining the speech act of explaining [16] and its communicative function in various contexts. A key pragmatic function 3. Related work attributed to explanations is the transfer or enhancement of understanding [17, 18, 19], which becomes particu- Models of Explanatory Dialogues larly crucial when communication is triggered by a lack Despite its importance, the field of explanatory dialogues of shared beliefs between the participants. In such in- remains relatively understudied compared to that of arstances, explanations act as a local move within a broader gumentation in general. Nonetheless, some researchers argumentative dialogue, facilitating smoother communi- have studied this phenomenon, contributing to the uncation. For instance, an arguer will more easily develop derstanding and modeling of such interactions. Cawsey an efective argumentative strategy once she understands [22] focuses on human-computer interactions, emphasiz“where the opponent is coming from”, i.e., once the op- ing the need for AI systems to respond to user feedback ponent explains why she doubts or rejects the arguer’s and refine explanations based on their understanding thesis [20]. and background knowledge. Cawsey proposes content

Analyzing explanations as individual moves within related rules for structuring non-interactive explanations broader argumentative contexts, however, difers from and dialogue rules for guiding the interactive process. studying genuinely explanatory dialogues. Explanatory Moore [23] highlights the role of explanations in facilitatdialogues2 are strict dialectical procedures specifically de- ing understanding and learning. She proposes four key signed to promote the transfer or enhancement of under- requirements for interactive explanation systems: natustanding. In explanatory dialogues, the prototypical ralness, responsiveness, flexibility, and sensitivity. These setting is that of an Explainer clarifying or transferring requirements emphasize the need for AI systems to entheir understanding of a phenomenon (represented as gage in natural conversation, adapt to user needs, and be ) in response to an Explainee’s “Why ”, “What is ?”, sensitive to contextual factors. Walton [9, 19, 24] present “How does work?” etc. questions [22, 23, 19]. The a broader model of explanatory dialogue, characterizing inherent dialogical nature of explanations stems from it based on initial situations, collective goals, and rules their communicative goal, which is strictly connected governing diferent dialogue stages. Walton [25] distinwith the Explainee’s level of understanding, (social and guishes between explanatory and clarificatory dialogues, professional) role, curiosity, interests, beliefs, and doubts. noting that clarifications focus on resolving ambiguities

2. Background

2Sometimes also referred to as “explaining dialogues” or “dialogical in expressions or speech acts while explanations target explanations” [ 21, 6 ]. the understanding of events or facts.

In recent years, Arioua and Croitoru [26] formal- in dialogue, including checking understanding or prior ized and extended Walton’s model, proposing a more knowledge, giving or requesting explanations, signaling lfexible protocol that allows for backtracking and di- (non-)understanding, providing feedback, assessments, alectical shifts between explanatory and argumentative or extra information, and a catch-all for any other moves dialogue. Rohlfing et al. [27] advocated for a social (see Table ??). and interactive approach to AI explainability, empha- The 5-levels scheme was used to annotate the Wired sizing the co-construction of understanding through dia- [ 6 ] and the ELI5 [21] datasets (see Section 3). In both logue. Wachsmuth and Alshomary [ 6 ] analyzed human- datasets, annotation is realized at the turn level on the to-human explanatory dialogues, focusing on linguistic three dimensions (T, D, and E), where a turn corresponds patterns and adaptations based on user proficiency levels to either the Explainer or the Explainee taking the floor. and Feldhus et al. [ 7 ] revised Wachsmuth and Alshomary Each turn can be made of one or more utterances. This [ 6 ]’s proposal with an adaptation to a pedagogical setting. scheme provides a high-level categorization of explanaMore recently, Zaninello and Magnini [28] focused on tory dialogue acts but, as mentioned, mainly focuses on the co-construction of knowledge in the medical domain, the Explainer’s contribution, as can be seen from Table showing that LLMs benefit from a dialogical structure 11. of explanations. Similarly, Fichtel et al. [29] presented a study demonstrating that LLMs can partly engage in coconstructive explanation by fostering user engagement but still struggle to adapt explanations based on user understanding. However, while recognizing the central role of the Explainee, they do not provide a comprehensive framework to model the Explanee’s contribution in the co-construction of understanding.

The Rewired scheme [7] is an extension of the 5

levels scheme that proposes to add a new layer of annotation on top of the three proposed by Wachsmuth and Alshomary [ 6 ], drawing from pedagogical studies and teaching practice. The primary diference lies in the introduction of 10 teaching acts (T) in the new scheme.

This new layer, focused on teaching strategies, such as

assessing prior knowledge, proposing lesson steps, engaging in active experience, etc. allows for a more granular analysis of the instructional process, highlighting how teachers manage classroom interactions and instructional delivery.

Annotation Schemes

As mentioned in the previous sections, various models of explanatory dialogues have been proposed, each focusing on diferent aspects of the interaction. However, within the computational linguistic field, few comprehensive annotation schemes can be found. In the following section, we introduce two of the most prominent annotation schemes: the 5-levels scheme, proposed by Wachsmuth and Alshomary [ 6 ], and the Rewired scheme by [ 7 ], an extension of the 5-levels proposal.

Datasets Despite their importance and relevance, explanatory di

alogue data are scarse, as they are dificult to collect and analyze. One of the few available datasets is the

5-levels “Wired” dataset [6], a corpus of 65 English dia

logues from Wired’s 5 Levels video series, where 13 topics The 5-levels scheme [ 6 ] annotates each turn of a dia- are discussed and explained to five explainees of varying logue according to three diferent dimensions, resulting expertise, resulting in 65 dialogues for a total of 1550 in a three-dimensional annotation for each turn where turns. Other available datasets rely on the crawling of only one tag for dimension is allowed. The dimensions discussions online, such as those in blogs and forums. For are: the discussed topic (T), the dialogue act (D), and the instance, the ELI5-dialogues corpus contains 399 daily-life explanation move (E). explanatory dialogues from the Reddit forum “Explain

Dimension (T) recognizes that participant might be dis- Like I am Five” (ELI5). We introduce one example diacussing the main topic (e.g. climate change), a subtopic logue from this dataset in Table 9. (e.g., temperature increase), or some (un)related topic (e.g., greenhouse gas emissions). Dimension (D) is based on speech act theory and is derived from the DIT++ 4. Accounting for the Explainee’s Taxonomy of Dialogue Acts3 [ 30, 5 ], providing a coarse Contribution: The IUBAS account of the type of question asked, whether an an- Annotation Scheme swer confirms or disconfirms whay previously asked, and whether a given statement agrees, disagrees or provides more information on a certain concept. The third dimension (E) provides a taxonomy of explanation moves As highlighted in Section 3, current dialogue annotation schemes recognize basic explanatory requests, modelled as "what-", "how-", and "why-" questions, which they categorize under "information-seeking" dialogical functions [ 5, 19, 6 ]. Such schemes also acknowledge sections. A finer-grained comparison with the 5 levels basic feedback like "signal understanding" or "signal non- scheme can also be found in the Appendix, Table 11. understanding" [ 6 ]. However, they usually do not recognize complex requests that include contrast classes and 4.1. Explanation Requests motivations, and diferent kinds of complex feedback that might include, e.g., qualifications, explanatory remarks, Explanation requests are the dialogical moves that, typor critical questions. Complex requests and feedback ically, initiate the explanatory process. They signal the are typical in real-world explanatory dialogues. While Explainee’s need for understanding and provide a tarcurrent accounts underline the dynamic nature of ex- get for the Explainer’s eforts. We distinguish between planatory dialogues, they underestimate the importance diferent types of requests based on two key criteria: conof directly considering the Explainee’s needs, contextual trastivity and motivation. factors, and the co-construction of understanding, which are, however, vital to fully understand explanatory inter- 4.1.1. Contrastivity: Basic vs. Contrastive Request actions.

This limited approaches neglect the contrastive nature of explanations, where an Explainee might seek to understand why a particular explanandum (E) is the case, instead of alternative possibilities (E*) [31, 18]. Furthermore, the motivations behind the Explainee’s questions are often ignored, neglecting the valuable contextual information that motivates their doubts and inquiries [20], which in turn also has important implications on the Explainer’s reaction itself. For instance, once the Explainer understands what, exactly, puzzles or confuses the Explainee (where does her explanatory request "come from") the Explainer can provide a more efective, tailormade, explainee-centered response. She can focus on the aspects of the problem that the Explainee considers most relevant and choose the efective communicative strategy sensitive to the required level of detail, requested type of information, etc.

To improve the current research, we propose to inte

grate existing accounts with IUBAS, a multi-dimensional annotation scheme that captures the diverse nature of

Explainees’ dialogical contributions and reactions. Our proposed scheme aims to address the limitations of the previous schemes by: Basic explanation requests simply refers to (or targets)

the explanandum, the event or phenomenon requiring explanation (Table 2).

The basic explanatory why-request is recognized in argumentation theory [32, 9, 19, 24, 33, 20], but, for the most part, ignored in contemporary annotation schemes. For instance, although [ 5 ] acknowledge that dialogue acts can be used to provide justifications and explanations, they focus on "check questions," "choice questions" and "set questions." Along similar lines, [ 6 ] emphasize the importance of "check", "how" and "what" questions.

Contrastive explanation requests, on the other hand,

explicitly introduce a contrastive class, highlighting the specific aspects of the explanandum that require clarification (Table 3).

This distinction, while prevalent in philosophical litera

ture on explanation [31, 17, 18, 16] is often overlooked in dialogue annotation schemes. Incorporating contrastive requests allows for a more precise representation of the

Explainee’s information needs, emphasizing the specific

aspects of the explanandum that should be understood. Basic and contrastive requests, as exemplified in Tables 2 and 3, introduce questions that (might) require diferent explanations. So, defining a contrast class sets initial normative boundaries for selecting an adequate explanans. 1. Providing a more fine-grained categorization of explanation moves, capturing specific actions within the explanatory process, by applying the annotation at the utterance level and allowing one utterance to receive zero or more (E) tags4. 2. Explicitly considering the Explainee’s perspective and their active role in seeking and integrating new information.

3. Empirically demonstrate the efectiveness of modelling the Explainee’s role in the dialogue through as set of experiments on dialogue quality prediction.

Pure explanation requests directly inquire about the explanandum (or some aspect of explanandum, if they include contrast class) without further elaboration. In contrast, motivated explanation requests introduce further information about the Explainee’s cognitive and communicative needs (Table 4). By motivating their requests, Explainees explicitly inform the Explainer what confuses them about the explanandum, or, in other words, what exactly stands in the way of transferring under

Table 1 presents a summary of our proposed scheme, standing. Such additional information promotes efective which we explain, motivate and exemplify in the next communication, and might at times even be necessary 4As exemplified in Table 9, we implicitly assume that a tag is ex- for formulating an adequate explanans. pressed at the utterance level and is automatically projected onto Such additional considerations, inspired by the works of the next utterances until a new (E) tag is expressed [20] and [ 34 ] allow us to capture the broader context of 4.1.2. Motivation: Pure vs. Motivated Request Subtype Pure Motivated Pure Motivated Assert understanding Demonstrate understanding Qualified understanding Critical challenge Assert non-understanding Critical challenge Why E? Why E, given that M? Why E, instead of E*? I understand H.

I understand. So...

I understand. But...

Why E, instead of E*, given that M? I understand. However... [critical question] I don’t think H explains E. I rather think H*.

I don’t think H explains E. Can you clarify H?

I don’t think H... In fact [critical question] Types of Critical Challenges

Basic Contrastive Positive Basic

Positive Complex (R) (F) (C)

Negative Basic Negative Complex

Request for clarification Comparative plausibility Epistemic distance Generative completeness Non-comparative plausibility Causal accuracy Causal responsibility Explanandum reliability

Pragmatic considerations

4.2. Explainee’s Feedback Once the Explainer ofers an explanation, the Explainee

typically provides feedback, signaling her understanding or lack thereof. We diferentiate between positive and negative feedback, further distinguishing between basic and complex variants. 4.2.2. Complexity: Basic vs. Complex Feedback

Basic feedback provides a straightforward assessment

of understanding without further elaboration. In contrast, complex feedback incorporates additional remarks, questions, or challenges. 4.2.3. Types of Complex Positive Feedback Complex Positive Feedback can take several forms: I understand that lung cancer explains this kind of cough.

However, is another diagnosis still possible? Can you still run some more tests?"

This type of feedback, both positive and negative (see Section 4.2.4) often introduces critical questions (see Section 4.3 and Table 7).

4.2.4. Types of Complex Negative Feedback Complex negative can also be analyzed into: 1. Request for clarification: The Explainee may point to specific concepts or aspects of the explanation they find unclear.

2. Critical challenge: The Explainee may directly

challenge the plausibility of the explanation, either categorically rejecting it or requesting further justification.

As seen for their positive counterpart, critical challenges

can introduce critical questions (Section 4.3).

4.3. Explainee’s Critical Questions

Critical questions challenge the explanation and its un1. Demonstration of understanding: The Ex- derlying assumptions. They target various aspects of the plainee may provide additional information or explanation, testing its plausibility, completeness, and draw inferences to demonstrate their grasp of the relevance. Inspired by existing literature on Inference explanation (Table 6). to the Best Explanation, argument schemes and critical 2. Qualified understanding: The Explainee may questions [ 35, 36, 37, 12, 15 ], we propose a typology of signal partial understanding, acknowledging the critical questions tailored to why-explanations. We cateneed for further clarification on specific aspects gorize critical questions according to the specific aspect of the explanation (Table 7). of explanation they target, as summarized in Table 1 and 3. Understanding with Critical Challenge: further exemplified in Table 7.

While understanding the nature of explana

tion (or conditionally understanding the phe- 5. Comparative annotation nomenon), the Explainee may challenge its plausibility, demanding further justification (Table 8).

In Table 9, we present a comparative analysis of an ex

ample dialogue from the ELI5 corpus, annotated through the "5-levels" and our "IUBAS" scheme. e.g., basic vs. contrastive), F (feedback type, e.g., positive

IUBAS allows for a finer-grained account of the Ex- vs. negative understanding), and C (presence of any critplainee’s request (e.g. U0, where we can specify that the ical follow-up or clarification request). This automatic explanation request is based on an implicit comparison labeling process produced a set of IUBAS annotations for with a complementary group). Also, we can better ac- all relevant Explainee turns in the ELI5 corpus, increascount for shifts in the explanation move within a turn ing the original labelling by approx. 20%. The resulting (e.g. U4-6), as well as combinations of moves within a enriched dataset contains, for each relevant Explainee single turn (e.g. U7). This provides a more precise ac- utterance, an associated label (R, F, C) indicating the Excount of the conversational flow and, crucially, as this plainee’s needs or feedback in that turn. We manually example suggests, it seems that providing explanations inspected a sample of the GPT-4.1 annotations to ensure is not limited to the Explainer’s role, and neither does they were coherent with the scheme’s guidelines, and feedback only originate from the Explainee. This obser- overall found the labels to be reasonable, providing a vation, once generalised over a broader set of examples, fine-grained view of the Explainee’s role in the dialogue. could challenge the traditional view of the Explainer/Explainees roles, a phenomenon which can be analysed in Quality Prediction Task Setup. Using the automatidetail through our scheme. cally annotated corpus, we replicate the dialogue quality

Also, our account of the diferent types of feedback prediction setup of Alshomary et al. (2024) to evaluate request (e.g. U7, U8-9) highlight that the Explainee’s reac- how the additional IUBAS metadata influences perfortion strongly influences the kind of explanation provided mance. The goal of the task is to predict the humanand participates in the co-construction of the explana- assigned quality score of a dialogue given the dialogue tion process. Finally, IUBAS is organized hierarchically, transcript (with or without annotations). We compare which makes it possible to navigate its tree-like structure four input conditions: and easily reconstruct the analysis of the explanatory move (Figure 1). Moreover, its structure allows for flexibility in terms of the level of granularity needed for a specific analysis. • No Annotation: Each dialogue is given to the model as plain text, with no turn-level labels (baseline condition). • Original ELI5 Labels: Each turn in the dialogue is followed by the original annotation tags for explanation move, dialogue act, and topic. • IUBAS Labels: Each explainee turn is prefixed with its IUBAS labels (R, F, C values) as metadata, while explainer turns remain unlabeled. • Combined (ELI5 + IUBAS): Both the original

ELI5 turn labels and the IUBAS labels for Explainee turns are included. 6. Experiments We conduct our experiments on the ELI5 dialogue quality

assessment task introduced by Alshomary et al. (2024).

This corpus consists of explanatory dialogues (399 in

total) from the Reddit “Explain Like I’m Five” forum, each labeled with a ground-truth explanation quality score on a 1–5 Likert scale. We integrate the proposed

IUBAS scheme into this task by automatically annotating the explainee turns and evaluating its impact on quality prediction. We format the prompt for each dialogue by inserting the turn-level metadata (if any) immediately after each utterance, between square brakets, with a concise description of the tag itself (for example (F01) Positive Basic

IUBAS Annotation with GPT-4.1. To obtain IUBAS Feedback - Assert understanding). After prelabels for the Explainee’s turns, we employed the GPT- senting the entire dialogue, we append a final instruction 4.15 model to perform annotation in a zero-shot manner. asking the model to “Rate the overall explanation quality We targeted only those turns where the Explainee explic- on a 1–5 scale.” The model then outputs a single rating. itly participates in the dialogue, corresponding to the We evaluate three instruction-tuned LLMs: Llamacategories E04 (Request Explanation) and E07 (Request 3.1-8B-Instruct [ 38 ], Gemma-3-4b-it6, and Qwen2.5Feedback) in the original 5-level annotation scheme of 14B-Instruct-1M [ 39 ]. We use HuggingFace’s lm_eval Alshomary et al. These are the turns where the Explainee harness [ 40 ] in the multiple choice mode, asking the asks a question or provides feedback, i.e., the utterances model to choose between a number from 1 to 5, indicating that reflect the Explainee’s reaction and understanding. the dialogue quality. We report RMSE and MAE against For each such turn, GPT-4.1 was prompted with the dia- human ratings of each model’s prediction, and assess logue context and the definition of the IUBAS categories, significance using a paired t-test. and it generated a IUBAS tag capturing the turn’s properties, choosing among: R (type of explanation request,

5https://openai.com/index/gpt-4-1/

LLaMA Gemma Qwen No annotation ELI5-only IUBAS-only IUBAS-only (C) IUBAS-only (F) IUBAS-only (R) ELI5 + IUBAS No annotation ELI5-only IUBAS-only IUBAS-only (C) IUBAS-only (F) IUBAS-only (R) ELI5 + IUBAS No annotation ELI5-only IUBAS-only IUBAS-only (C) IUBAS-only (F) IUBAS-only (R) ELI5 + IUBAS

6.1. Results and Analysis 7. Conclusion

In this paper, we introduced IUBAS, a framework that contributes to a richer understanding of the Explainee’s role within explanatory dialogues. We incorporate contrastivity and motivation alongside a categorization of feedback and critical questions, providing a more comprehensive account for analyzing and modeling such interactions. By adopting this scheme, we can move towards developing more sophisticated conversational AI systems capable of engaging in truly human-like explanatory dialogues, ultimately enhancing communication effectiveness and fostering deeper understanding. dialogue agents (arda): An unexpected journey https://aclanthology.org/2024.lrec-main.1007. from pragmatics to conversational agents, OPEN [22] A. Cawsey, Explanation and Interaction: The ComLINGUISTICS 11 (2025). puter Generation of Explanatory Dialogues, ACL[9] D. Walton, A new dialectical theory of explanation, MIT Press series in natural-language processing, Philosophical Explorations 7 (2004) 71–89. doi:10. Bradford Book, 1992. URL: https://books.google.hr/ 1080/1386979032000186863. books?id=hQt1-7gA334C. [10] G. R. Mayes, Argument explanation complemen- [23] J. Moore, Participating in Explanatory Dialogues: tarity and the structure of informal reasoning, In- Interpreting and Responding to Questions in Conformal Logic 30 (2010) 92–111. doi:10.22329/il. text, A Bradford book, CogNet, 1995. URL: https: v30i1.419. //books.google.hr/books?id=nRx0QgAACAAJ. [11] T. Govier, Problems in Argument Analysis and Eval- [24] D. Walton, Abductive Reasoning, University of Aluation, Windsor Studies in Argumentation, Univer- abama Press, 2014. URL: https://books.google.hr/ sity of Windsor, 2018. URL: https://books.google. books?id=DNqKAwAAQBAJ.

hr/books?id=pulfDwAAQBAJ. [25] D. Walton, The speech act of clarification in a [12] D. Walton, C. Reed, F. Macagno, Argumentation dialogue model, Studies in communication sciSchemes, Cambridge University Press, New York, ences 7 (2007). URL: https://api.semanticscholar. 2008. org/CorpusID:149373911. [13] J. H. M. Wagemans, Argumentative patterns for [26] A. Arioua, M. Croitoru, Formalizing explanajustifying scientific explanations, Argumentation tory dialogues, in: Scalable Uncertainty Manage30 (2015) 97 – 108. URL: https://api.semanticscholar. ment, 2015. URL: https://api.semanticscholar.org/ org/CorpusID:56085286. CorpusID:7365540. [14] S. Yu, F. Zenker, Peirce knew why abduction [27] K. J. Rohlfing, P. Cimiano, I. Scharlau, T. Matzner, isn?t ibe–a scheme and critical questions for abduc- H. M. Buhl, H. Buschmeier, E. Esposito, A. Grimtive argument, Argumentation 32 (2017) 569–587. minger, B. Hammer, R. Häb-Umbach, I. Horwath, doi:10.1007/s10503-017-9443-9. E. Hüllermeier, F. Kern, S. Kopp, K. Thommes, A.-C. [15] P. Olmos, Metaphilosophy and argument: The case Ngonga Ngomo, C. Schulte, H. Wachsmuth, P. Wagof the justification of abduction, Informal Logic 41 ner, B. Wrede, Explanation as a social practice: (2021) 131–164. doi:10.22329/il.v41i2.6249. Toward a conceptual framework for the social de[16] G. Gaszczyk, Helping others to understand: A sign of ai systems, IEEE Transactions on Cogninormative account of the speech act of expla- tive and Developmental Systems 13 (2021) 717–728. nation, Topoi 42 (2023) 385–396. doi:10.1007/ doi:10.1109/TCDS.2020.3044366. s11245-022-09878-y. [28] A. Zaninello, B. Magnini, Medexpdial: Machine-to[17] P. Lipton, Inference to the Best Explanation, machine generation of explanatory dialogues for International library of philosophy and sci- medical qa, in: Proceedings of the 28th Workshop entific method, Routledge/Taylor and Francis on the Semantics and Pragmatics of Dialogue, 2024. Group, 2004. URL: https://books.google.hr/books? [29] L. Fichtel, M. Spliethöver, E. Hüllermeier, P. Jimenez, id=WIf YNExpSC0C. N. Klowait, S. Kopp, A.-C. N. Ngomo, A. Ro[18] S. R. Grimm, The goal of explanation, Studies in brecht, I. Scharlau, L. Terfloth, A.-L. Vollmer, History and Philosophy of Science Part A 41 (2010) H. Wachsmuth, Investigating co-constructive be337–344. doi:10.1016/j.shpsa.2010.10.006. havior of large language models in explanation dia[19] D. Walton, A dialogue system specification for logues, 2025. URL: https://arxiv.org/abs/2504.18483. explanation, Synthese 182 (2011) 349–374. doi:10. arXiv:2504.18483.

1007/s11229-010-9745-z. [30] H. Bunt, D. K. J. Heylen, C. Pelachaud, R. Cati[20] J. A. van Laar, E. C. W. Krabbe, The burden of zone, D. R. Traum, The dit++ taxanomy for funccriticism: Consequences of taking a critical stance, tional dialogue markup, 2009. URL: https://api. Argumentation 27 (2013) 201–224. doi:10.1007/ semanticscholar.org/CorpusID:60074224. s10503-012-9272-9. [31] F. I. Dretske, Contrastive statements, Philosophical [21] M. Alshomary, F. Lange, M. Booshehri, M. Sengupta, Review 81 (1972) 411–437. doi:10.2307/2183886.

P. Cimiano, H. Wachsmuth, Modeling the quality [32] C. Hamblin, Fallacies, University paperbacks, of dialogical explanations, in: N. Calzolari, M.-Y. Methuen, 1970. URL: https://books.google.hr/ Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Pro- books?id=bYYIAQAAIAAJ. ceedings of the 2024 Joint International Conference [33] J. Blair, C. Tindale, Groundwork in the Theory on Computational Linguistics, Language Resources of Argumentation: Selected Papers of J. Anthony and Evaluation (LREC-COLING 2024), ELRA and Blair, Argumentation Library, Springer NetherICCL, Torino, Italy, 2024, pp. 11523–11536. URL: lands, 2011. URL: https://books.google.hr/books?

Appendix Limitations While the manual annotation of a full dataset falls out

side the scope of our current proposal, we believe that future work should involve testing the agreement between the automated annotation and human-annotation.

Additionally, the proposed typology could be expanded to account for the diferent kinds of explanations and reasoning patterns on the Explainer’s side, too. Ethical Considerations

This research focuses on analyzing explanatory dialogue, and it is crucial to acknowledge the potential ethical implications of applying such schemes to real-world situations, especially in sensitive domains like healthcare or by covering topics such as ethnicity, physical ability, gender and sexual orientation (as in the case of the reported example in Table 9). Careful consideration should also be given to data privacy, informed consent, and potential biases in the annotation process.

Checking whether the listener understood the explanation.

Checking the listener’s prior knowledge of the topic.

Explaining a concept or topic to the listener.

Requesting an explanation from the listener.

Is a contrastive class introduced? Directly inquiring about E, the event or phenomenon requiring explanation.

Introducing a contrastive class, high-lighting specific aspects of E.

Is additional information provided? Why E? Why E, given that M? Why E, instead of E*? Why E, instead of E*, given that M? Informing the listener that their last utterance was understood.

Informing the listener that the utterance was not understood.

Responding qualitatively to an utterance by correcting errors or similar.

Does the feedback confirm or disconfirm H? Is the feedback simple or complex? Agreeing with H.

Agreeing with H without further elaboration.

Agreeing with H with further elaboration.

Demonstrative understanding: I understand. So...

Qualified understanding: I understand. But...

Critical challenge: I understand. However... [critical question] (see Table 7) Disagreeing with H.

Disagreeing with H without further elaboration.

Disagreeing with H with further elaboration.

Pure: I don’t think H explains E. I rather think H*.

Clarification request: I don’t think H explains E. Can you clarify h ∈ H? Critical challenge: I don’t think H... In fact [critical question] (see Table 7) Assessing the listener by rephrasing their utterance or giving a hint Giving additional information to foster a complete understanding

Making any other explanation move Does ‘lung cancer’ cause Mark’s condition? Perhaps this diagnosis does not explain all the symptoms, or entails symptoms that were not detected.

Is ‘lung cancer’ the cause we are looking for? Perhaps we are dealing with multiple causes: the patient coughs because of lung cancer, but also because of contracting COVID-19.

Is cough the only symptom that needs to be explained? Is it a real symptom (or is the patient faking it)? What is the cost of being mistaken if one proceeds as if the patient has cancer, or as if she has asthma? Declaration on Generative AI During the preparation of this work, the author(s) used ChatGPT (OpenAI) in order to: Paraphrase and reword, Improve writing style, and Grammar and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s) full responsibility for the publication’s content.

[1]

Lombrozo , Explanation and abductive inference, The Oxford Handbook of Thinking and Reasoning ( 2012 ).

[2]

M. T.

Chi ,

Bassok ,

M. W.

Lewis ,

Reimann ,

Glaser , Self-explanations: How students study and use examples in learning to solve problems , Cognitive science 13 ( 1989 ) 145 - 182 .

[3]

Mittelstadt ,

Russell ,

Wachter , Explaining explanations in ai , in: Proceedings of the Conference on Fairness, Accountability, and Transparency , FAT* '19, ACM , 2019 . URL: http:// dx.doi.org/10.1145/3287560.3287574. doi: 10 .1145/ 3287560.3287574.

[4]

Miller , Explanation in artificial intelligence: Insights from the social sciences , Artificial Intelligence 267 ( 2019 ) 1 - 38 . URL: https://www.sciencedirect.com/science/article/pii/ S0004370218305988. doi:https://doi.org/10. 1016/j.artint. 2018 . 07 .007.

[5]

Bunt ,

Alexandersson ,

Carletta ,

J.-W.

Choe ,

A. C.

Fang ,

Hasida ,

Lee ,

Petukhova ,

Popescu-Belis ,

Romary ,

Soria ,

Traum , Towards an ISO standard for dialogue act annotation , in: N. Calzolari , K.

Choukri , B.

Maegaard , J.

Mariani , J.

Odijk , S.

Piperidis , M.

Rosner , D. Tapias (Eds.), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10) , European Language Resources Association (ELRA) , Valletta, Malta, 2010 . URL: http://www.lrec-conf. org/proceedings/lrec2010/pdf/560_Paper.pdf .

[6]

Wachsmuth ,

Alshomary , "mama always had a way of explaining things so i could understand”: A dialogue corpus for learning to construct explanations , 2022 . arXiv: 2209 . 02508 .

[7]

Feldhus ,

Anagnostopoulou ,

Wang ,

Alshomary ,

Wachsmuth ,

Sonntag ,

Möller , Towards modeling and evaluating instructional explanations in teacher-student dialogues , in: Proceedings of the 2024 International Conference on Information Technology for Social Good , GoodIT '24, Association for Computing Machinery, New York, NY, USA, 2024 , p. 225 - 230 . URL: https://doi.org/ 10.1145/3677525.3678665. doi: 10 .1145/3677525. 3678665.

[8]

Di Maro ,

M. Di

Bratto ,

Mennella ,

Origlia ,

Cutugno , et al., Argumentation in recommender id=IM9p6GgnJAcC.

[34]

Rescorla , Shifting the Burden of Proof?, The Philosophical Quarterly 59 ( 2008 ) 86 - 109 . URL: https://doi.org/10.1111/j.1467- 9213 . 2008 . 555 .x. doi: 10 .1111/j.1467- 9213 . 2008 . 555 .x.

[35]

G. H.

Harman , The inference to the best explanation , Philosophical Review 74 ( 1965 ) 88 - 95 . doi: 10 .2307/2183532.

[36]

J. R.

Josephson , S. G. Josephson (Eds.), Abductive Inference: Computation, Philosophy, Technology, Cambridge University Press, New York, 1994 .

[37]

Walton , Abductive, presumptive and plausible arguments, Informal Logic 21 ( 2001 ). doi: 10 .22329/ il.v21i2. 2241 .

[38]

Grattafiori ,

Dubey ,

Jauhri ,

Pandey ,

Kadian ,

Al-Dahle ,

Letman ,

Mathur ,

Schelten ,

Vaughan , et al., The llama 3 herd of models, arXiv preprint arXiv:2407.21783 ( 2024 ).

[39]

Yang ,

Yu ,

Li ,

Liu ,

Huang ,

Jiang ,

Tu ,

Zhang ,

Zhou , et al., Qwen2 . 5 -1m technical report, arXiv preprint arXiv:2501.15383 ( 2025 ).

[40]

Gao ,

Tow ,

Abbasi ,

Biderman ,

Black , A. DiPofi,

Foster ,

Golding ,

Hsu ,

Le Noac'h ,

Li ,

McDonell ,

Muennighof ,

Ociepa ,

Phang ,

Reynolds ,

Schoelkopf ,

Skowron ,

Sutawika ,

Tang ,

Thite ,

Wang ,

Zou , A framework for few-shot language model evaluation, 2024 . URL: https://zenodo.org/records/ 12608602. doi: 10 .5281/zenodo.12608602.