<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Cagliari, Italy
* Corresponding author.
$ azaninello@fbk.eu (A. Zaninello); pbodlovic@ifzg.hr
(P. Bodlović); m.lewinski@fcsh.unl.pt (M. Lewinski);
magnini@fbk.eu (B. Magnini)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>“I understand, but...”: Towards a Comprehensive Account of the Explainee's Voice in Explanatory Dialogues</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Zaninello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petar Bodlović</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcin Lewinski</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Magnini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Free University of Bolzano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Philosophy</institution>
          ,
          <addr-line>Zagreb</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universidade Nova de Lisboa</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>In this paper, we introduce IUBAS, the first annotation scheme that provides an in-depth analysis of the Explainee's reactions in explanatory dialogues. Current schemes, mainly focusing on answers to what, how, and, occasionally, why questions, lack the granularity to capture the full range of the Explainee's contribution. Our richer framework, grounded in argumentation and philosophical theory, distinguishes diferent kinds of explanation requests, feedback types, and critical questions. We provide empirical evidence of the efectiveness of the scheme through a set of experiments with three SOTA LLMs. The IUBAS scheme provides a more detailed understanding of how Explainees interact with Explainers in a dialogical setting, contributing to the development of more sophisticated and human-like conversational agents.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;explanatory dialogues</kwd>
        <kwd>annotation scheme</kwd>
        <kwd>explanations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Consequently, a comprehensive model of explanatory
interactions should not only focus on the explanations
provided, but also on the request for and reception of
such explanations. In addition, the development of
annotation schemes for explanatory dialogues is also crucial
for training automatic dialogue systems and evaluating
their ability to engage in efective knowledge transfer.</p>
      <sec id="sec-1-1">
        <title>For simplicity, throughout the paper, we will assume</title>
        <p>the following definitions and notation.</p>
        <p>
          The study of explanations in philosophy and
argumentation theory covers a wide range of questions. Researchers
have focused on distinguishing explanations from other
forms of reasoning, such as clarifications and arguments,
highlighting the diference in their core function.
Explanations difer from clarifications in that, while the
latter simply aim at understanding, explanations aim
at increasing knowledge, carrying greater illocutionary
force. Moreover, while arguments aim to provide evi- • Phenomenon (p): event, fact, evidence, efect
disdence for a doubted claim, explanations seek to account cussed in the dialogue; its existence is a
precondifor (e.g., provide a cause for) an already accepted, un- tion for explanatory dialogue (e.g., medical
concontroversial statement [
          <xref ref-type="bibr" rid="ref8">8, 9, 10, 11</xref>
          ]. This distinction dition)
becomes evident in the medical context, where a doctor • Explanandum (E): event, fact, evidence, efect in
might request an explanation for a patient’s dark urine (a that it requires explanation or understanding (e.g.,
belief in an already accepted symptom in which no jus- medical symptom)
tification is required) but may seek an argument for the • Explanans (H ): event, fact, hypothesis, cause of E
diagnosis of hemolytic anemia (a hypothesis that requires that provides explanation or understanding (e.g.,
justification). disease or medical injury)
        </p>
        <p>Further research has investigated the formal and nor- • Explainer (Er): who is clarifying or transferring
mative dimensions of explanations, concentrating on their understanding of E through the stating of H
developing argument schemes and critical questions as- • Explainee (Ee): who is requesting, giving
feedsociated with common explanatory inferences, such as back on or challanging an explanation H for some
Inference to the Best Explanation [12, 13, 14, 15]. Prag- given E
matic studies, on the other hand, focus on defining the
speech act of explaining [16] and its communicative
function in various contexts. A key pragmatic function 3. Related work
attributed to explanations is the transfer or enhancement
of understanding [17, 18, 19], which becomes particu- Models of Explanatory Dialogues
larly crucial when communication is triggered by a lack Despite its importance, the field of explanatory dialogues
of shared beliefs between the participants. In such in- remains relatively understudied compared to that of
arstances, explanations act as a local move within a broader gumentation in general. Nonetheless, some researchers
argumentative dialogue, facilitating smoother communi- have studied this phenomenon, contributing to the
uncation. For instance, an arguer will more easily develop derstanding and modeling of such interactions. Cawsey
an efective argumentative strategy once she understands [22] focuses on human-computer interactions,
emphasiz“where the opponent is coming from”, i.e., once the op- ing the need for AI systems to respond to user feedback
ponent explains why she doubts or rejects the arguer’s and refine explanations based on their understanding
thesis [20]. and background knowledge. Cawsey proposes
content</p>
        <p>Analyzing explanations as individual moves within related rules for structuring non-interactive explanations
broader argumentative contexts, however, difers from and dialogue rules for guiding the interactive process.
studying genuinely explanatory dialogues. Explanatory Moore [23] highlights the role of explanations in
facilitatdialogues2 are strict dialectical procedures specifically de- ing understanding and learning. She proposes four key
signed to promote the transfer or enhancement of under- requirements for interactive explanation systems:
natustanding. In explanatory dialogues, the prototypical ralness, responsiveness, flexibility, and sensitivity. These
setting is that of an Explainer clarifying or transferring requirements emphasize the need for AI systems to
entheir understanding of a phenomenon (represented as gage in natural conversation, adapt to user needs, and be
) in response to an Explainee’s “Why ”, “What is ?”, sensitive to contextual factors. Walton [9, 19, 24] present
“How does  work?” etc. questions [22, 23, 19]. The a broader model of explanatory dialogue, characterizing
inherent dialogical nature of explanations stems from it based on initial situations, collective goals, and rules
their communicative goal, which is strictly connected governing diferent dialogue stages. Walton [25]
distinwith the Explainee’s level of understanding, (social and guishes between explanatory and clarificatory dialogues,
professional) role, curiosity, interests, beliefs, and doubts. noting that clarifications focus on resolving ambiguities</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        2Sometimes also referred to as “explaining dialogues” or “dialogical in expressions or speech acts while explanations target
explanations” [
        <xref ref-type="bibr" rid="ref6">21, 6</xref>
        ]. the understanding of events or facts.
      </p>
      <p>
        In recent years, Arioua and Croitoru [26] formal- in dialogue, including checking understanding or prior
ized and extended Walton’s model, proposing a more knowledge, giving or requesting explanations, signaling
lfexible protocol that allows for backtracking and di- (non-)understanding, providing feedback, assessments,
alectical shifts between explanatory and argumentative or extra information, and a catch-all for any other moves
dialogue. Rohlfing et al. [27] advocated for a social (see Table ??).
and interactive approach to AI explainability, empha- The 5-levels scheme was used to annotate the Wired
sizing the co-construction of understanding through dia- [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and the ELI5 [21] datasets (see Section 3). In both
logue. Wachsmuth and Alshomary [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] analyzed human- datasets, annotation is realized at the turn level on the
to-human explanatory dialogues, focusing on linguistic three dimensions (T, D, and E), where a turn corresponds
patterns and adaptations based on user proficiency levels to either the Explainer or the Explainee taking the floor.
and Feldhus et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] revised Wachsmuth and Alshomary Each turn can be made of one or more utterances. This
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]’s proposal with an adaptation to a pedagogical setting. scheme provides a high-level categorization of
explanaMore recently, Zaninello and Magnini [28] focused on tory dialogue acts but, as mentioned, mainly focuses on
the co-construction of knowledge in the medical domain, the Explainer’s contribution, as can be seen from Table
showing that LLMs benefit from a dialogical structure 11.
of explanations. Similarly, Fichtel et al. [29] presented a
study demonstrating that LLMs can partly engage in
coconstructive explanation by fostering user engagement
but still struggle to adapt explanations based on user
understanding. However, while recognizing the central role
of the Explainee, they do not provide a comprehensive
framework to model the Explanee’s contribution in the
co-construction of understanding.
      </p>
      <sec id="sec-2-1">
        <title>The Rewired scheme [7] is an extension of the 5</title>
        <p>
          levels scheme that proposes to add a new layer of
annotation on top of the three proposed by Wachsmuth
and Alshomary [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], drawing from pedagogical studies
and teaching practice. The primary diference lies in the
introduction of 10 teaching acts (T) in the new scheme.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>This new layer, focused on teaching strategies, such as</title>
        <p>assessing prior knowledge, proposing lesson steps,
engaging in active experience, etc. allows for a more granular
analysis of the instructional process, highlighting how
teachers manage classroom interactions and instructional
delivery.</p>
        <sec id="sec-2-2-1">
          <title>Annotation Schemes</title>
          <p>
            As mentioned in the previous sections, various models of
explanatory dialogues have been proposed, each focusing
on diferent aspects of the interaction. However, within
the computational linguistic field, few comprehensive
annotation schemes can be found. In the following
section, we introduce two of the most prominent annotation
schemes: the 5-levels scheme, proposed by Wachsmuth
and Alshomary [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], and the Rewired scheme by [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], an
extension of the 5-levels proposal.
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Datasets</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Despite their importance and relevance, explanatory di</title>
        <p>alogue data are scarse, as they are dificult to collect
and analyze. One of the few available datasets is the</p>
      </sec>
      <sec id="sec-2-4">
        <title>5-levels “Wired” dataset [6], a corpus of 65 English dia</title>
        <p>
          logues from Wired’s 5 Levels video series, where 13 topics
The 5-levels scheme [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] annotates each turn of a dia- are discussed and explained to five explainees of varying
logue according to three diferent dimensions, resulting expertise, resulting in 65 dialogues for a total of 1550
in a three-dimensional annotation for each turn where turns. Other available datasets rely on the crawling of
only one tag for dimension is allowed. The dimensions discussions online, such as those in blogs and forums. For
are: the discussed topic (T), the dialogue act (D), and the instance, the ELI5-dialogues corpus contains 399 daily-life
explanation move (E). explanatory dialogues from the Reddit forum “Explain
        </p>
        <p>
          Dimension (T) recognizes that participant might be dis- Like I am Five” (ELI5). We introduce one example
diacussing the main topic (e.g. climate change), a subtopic logue from this dataset in Table 9.
(e.g., temperature increase), or some (un)related topic
(e.g., greenhouse gas emissions). Dimension (D) is based
on speech act theory and is derived from the DIT++ 4. Accounting for the Explainee’s
Taxonomy of Dialogue Acts3 [
          <xref ref-type="bibr" rid="ref5">30, 5</xref>
          ], providing a coarse Contribution: The IUBAS
account of the type of question asked, whether an an- Annotation Scheme
swer confirms or disconfirms whay previously asked, and
whether a given statement agrees, disagrees or provides
more information on a certain concept. The third
dimension (E) provides a taxonomy of explanation moves
As highlighted in Section 3, current dialogue
annotation schemes recognize basic explanatory requests,
modelled as "what-", "how-", and "why-" questions, which
they categorize under "information-seeking" dialogical
functions [
          <xref ref-type="bibr" rid="ref5 ref6">5, 19, 6</xref>
          ]. Such schemes also acknowledge sections. A finer-grained comparison with the 5 levels
basic feedback like "signal understanding" or "signal non- scheme can also be found in the Appendix, Table 11.
understanding" [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. However, they usually do not
recognize complex requests that include contrast classes and 4.1. Explanation Requests
motivations, and diferent kinds of complex feedback that
might include, e.g., qualifications, explanatory remarks, Explanation requests are the dialogical moves that,
typor critical questions. Complex requests and feedback ically, initiate the explanatory process. They signal the
are typical in real-world explanatory dialogues. While Explainee’s need for understanding and provide a
tarcurrent accounts underline the dynamic nature of ex- get for the Explainer’s eforts. We distinguish between
planatory dialogues, they underestimate the importance diferent types of requests based on two key criteria:
conof directly considering the Explainee’s needs, contextual trastivity and motivation.
factors, and the co-construction of understanding, which
are, however, vital to fully understand explanatory inter- 4.1.1. Contrastivity: Basic vs. Contrastive Request
actions.
        </p>
        <p>This limited approaches neglect the contrastive
nature of explanations, where an Explainee might seek to
understand why a particular explanandum (E) is the case,
instead of alternative possibilities (E*) [31, 18].
Furthermore, the motivations behind the Explainee’s questions
are often ignored, neglecting the valuable contextual
information that motivates their doubts and inquiries [20],
which in turn also has important implications on the
Explainer’s reaction itself. For instance, once the
Explainer understands what, exactly, puzzles or confuses
the Explainee (where does her explanatory request "come
from") the Explainer can provide a more efective,
tailormade, explainee-centered response. She can focus on the
aspects of the problem that the Explainee considers most
relevant and choose the efective communicative strategy
sensitive to the required level of detail, requested type of
information, etc.</p>
      </sec>
      <sec id="sec-2-5">
        <title>To improve the current research, we propose to inte</title>
        <p>grate existing accounts with IUBAS, a multi-dimensional
annotation scheme that captures the diverse nature of</p>
      </sec>
      <sec id="sec-2-6">
        <title>Explainees’ dialogical contributions and reactions. Our proposed scheme aims to address the limitations of the previous schemes by:</title>
      </sec>
      <sec id="sec-2-7">
        <title>Basic explanation requests simply refers to (or targets)</title>
        <p>the explanandum, the event or phenomenon requiring
explanation (Table 2).</p>
        <p>
          The basic explanatory why-request is recognized in
argumentation theory [32, 9, 19, 24, 33, 20], but, for the most
part, ignored in contemporary annotation schemes. For
instance, although [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] acknowledge that dialogue acts
can be used to provide justifications and explanations,
they focus on "check questions," "choice questions" and
"set questions." Along similar lines, [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] emphasize the
importance of "check", "how" and "what" questions.
        </p>
      </sec>
      <sec id="sec-2-8">
        <title>Contrastive explanation requests, on the other hand,</title>
        <p>explicitly introduce a contrastive class, highlighting the
specific aspects of the explanandum that require
clarification (Table 3).</p>
      </sec>
      <sec id="sec-2-9">
        <title>This distinction, while prevalent in philosophical litera</title>
        <p>ture on explanation [31, 17, 18, 16] is often overlooked in
dialogue annotation schemes. Incorporating contrastive
requests allows for a more precise representation of the</p>
      </sec>
      <sec id="sec-2-10">
        <title>Explainee’s information needs, emphasizing the specific</title>
        <p>aspects of the explanandum that should be understood.
Basic and contrastive requests, as exemplified in Tables 2
and 3, introduce questions that (might) require diferent
explanations. So, defining a contrast class sets initial
normative boundaries for selecting an adequate explanans.
1. Providing a more fine-grained categorization of
explanation moves, capturing specific actions
within the explanatory process, by applying the
annotation at the utterance level and allowing one
utterance to receive zero or more (E) tags4.
2. Explicitly considering the Explainee’s perspective
and their active role in seeking and integrating
new information.</p>
      </sec>
      <sec id="sec-2-11">
        <title>3. Empirically demonstrate the efectiveness of modelling the Explainee’s role in the dialogue through as set of experiments on dialogue quality prediction.</title>
        <p>Pure explanation requests directly inquire about the
explanandum (or some aspect of explanandum, if they
include contrast class) without further elaboration. In
contrast, motivated explanation requests introduce
further information about the Explainee’s cognitive and
communicative needs (Table 4). By motivating their
requests, Explainees explicitly inform the Explainer what
confuses them about the explanandum, or, in other words,
what exactly stands in the way of transferring
under</p>
        <p>
          Table 1 presents a summary of our proposed scheme, standing. Such additional information promotes efective
which we explain, motivate and exemplify in the next communication, and might at times even be necessary
4As exemplified in Table 9, we implicitly assume that a tag is ex- for formulating an adequate explanans.
pressed at the utterance level and is automatically projected onto Such additional considerations, inspired by the works of
the next utterances until a new (E) tag is expressed [20] and [
          <xref ref-type="bibr" rid="ref9">34</xref>
          ] allow us to capture the broader context of
4.1.2. Motivation: Pure vs. Motivated Request
Subtype
Pure
Motivated
Pure
Motivated
Assert understanding
Demonstrate understanding
Qualified understanding
Critical challenge
Assert non-understanding
Critical challenge
Why E?
Why E, given that M?
Why E, instead of E*?
I understand H.
        </p>
        <p>I understand. So...</p>
        <p>I understand. But...</p>
        <p>Why E, instead of E*, given that M?
I understand. However... [critical question]
I don’t think H explains E. I rather think H*.</p>
        <p>I don’t think H explains E. Can you clarify H?</p>
        <p>I don’t think H... In fact [critical question]
Types of Critical Challenges</p>
        <p>Basic
Contrastive
Positive Basic</p>
        <p>Positive Complex
(R)
(F)
(C)</p>
        <p>Negative Basic
Negative Complex</p>
        <p>Request for clarification
Comparative plausibility
Epistemic distance
Generative completeness
Non-comparative plausibility
Causal accuracy
Causal responsibility
Explanandum reliability</p>
        <p>Pragmatic considerations</p>
        <sec id="sec-2-11-1">
          <title>4.2. Explainee’s Feedback</title>
        </sec>
      </sec>
      <sec id="sec-2-12">
        <title>Once the Explainer ofers an explanation, the Explainee</title>
        <p>typically provides feedback, signaling her understanding
or lack thereof. We diferentiate between positive and
negative feedback, further distinguishing between basic
and complex variants.
4.2.2. Complexity: Basic vs. Complex Feedback</p>
      </sec>
      <sec id="sec-2-13">
        <title>Basic feedback provides a straightforward assessment</title>
        <p>of understanding without further elaboration. In
contrast, complex feedback incorporates additional
remarks, questions, or challenges.
4.2.3. Types of Complex Positive Feedback
Complex Positive Feedback can take several forms:
I understand that lung
cancer explains this kind of cough.</p>
        <p>However, is another diagnosis
still possible? Can you still run
some more tests?"</p>
      </sec>
      <sec id="sec-2-14">
        <title>This type of feedback, both positive and negative (see</title>
      </sec>
      <sec id="sec-2-15">
        <title>Section 4.2.4) often introduces critical questions (see</title>
      </sec>
      <sec id="sec-2-16">
        <title>Section 4.3 and Table 7).</title>
        <p>4.2.4. Types of Complex Negative Feedback
Complex negative can also be analyzed into:
1. Request for clarification: The Explainee may
point to specific concepts or aspects of the
explanation they find unclear.</p>
      </sec>
      <sec id="sec-2-17">
        <title>2. Critical challenge: The Explainee may directly</title>
        <p>challenge the plausibility of the explanation,
either categorically rejecting it or requesting
further justification.</p>
      </sec>
      <sec id="sec-2-18">
        <title>As seen for their positive counterpart, critical challenges</title>
        <p>can introduce critical questions (Section 4.3).</p>
        <sec id="sec-2-18-1">
          <title>4.3. Explainee’s Critical Questions</title>
          <p>
            Critical questions challenge the explanation and its
un1. Demonstration of understanding: The Ex- derlying assumptions. They target various aspects of the
plainee may provide additional information or explanation, testing its plausibility, completeness, and
draw inferences to demonstrate their grasp of the relevance. Inspired by existing literature on Inference
explanation (Table 6). to the Best Explanation, argument schemes and critical
2. Qualified understanding: The Explainee may questions [
            <xref ref-type="bibr" rid="ref10 ref11 ref12">35, 36, 37, 12, 15</xref>
            ], we propose a typology of
signal partial understanding, acknowledging the critical questions tailored to why-explanations. We
cateneed for further clarification on specific aspects gorize critical questions according to the specific aspect
of the explanation (Table 7). of explanation they target, as summarized in Table 1 and
3. Understanding with Critical Challenge: further exemplified in Table 7.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-19">
        <title>While understanding the nature of explana</title>
        <p>tion (or conditionally understanding the phe- 5. Comparative annotation
nomenon), the Explainee may challenge its
plausibility, demanding further justification (Table 8).</p>
      </sec>
      <sec id="sec-2-20">
        <title>In Table 9, we present a comparative analysis of an ex</title>
        <p>ample dialogue from the ELI5 corpus, annotated through
the "5-levels" and our "IUBAS" scheme. e.g., basic vs. contrastive), F (feedback type, e.g., positive</p>
        <p>IUBAS allows for a finer-grained account of the Ex- vs. negative understanding), and C (presence of any
critplainee’s request (e.g. U0, where we can specify that the ical follow-up or clarification request). This automatic
explanation request is based on an implicit comparison labeling process produced a set of IUBAS annotations for
with a complementary group). Also, we can better ac- all relevant Explainee turns in the ELI5 corpus,
increascount for shifts in the explanation move within a turn ing the original labelling by approx. 20%. The resulting
(e.g. U4-6), as well as combinations of moves within a enriched dataset contains, for each relevant Explainee
single turn (e.g. U7). This provides a more precise ac- utterance, an associated label (R, F, C) indicating the
Excount of the conversational flow and, crucially, as this plainee’s needs or feedback in that turn. We manually
example suggests, it seems that providing explanations inspected a sample of the GPT-4.1 annotations to ensure
is not limited to the Explainer’s role, and neither does they were coherent with the scheme’s guidelines, and
feedback only originate from the Explainee. This obser- overall found the labels to be reasonable, providing a
vation, once generalised over a broader set of examples, fine-grained view of the Explainee’s role in the dialogue.
could challenge the traditional view of the
Explainer/Explainees roles, a phenomenon which can be analysed in Quality Prediction Task Setup. Using the
automatidetail through our scheme. cally annotated corpus, we replicate the dialogue quality</p>
        <p>Also, our account of the diferent types of feedback prediction setup of Alshomary et al. (2024) to evaluate
request (e.g. U7, U8-9) highlight that the Explainee’s reac- how the additional IUBAS metadata influences
perfortion strongly influences the kind of explanation provided mance. The goal of the task is to predict the
humanand participates in the co-construction of the explana- assigned quality score of a dialogue given the dialogue
tion process. Finally, IUBAS is organized hierarchically, transcript (with or without annotations). We compare
which makes it possible to navigate its tree-like structure four input conditions:
and easily reconstruct the analysis of the explanatory
move (Figure 1). Moreover, its structure allows for
flexibility in terms of the level of granularity needed for a
specific analysis.
• No Annotation: Each dialogue is given to the
model as plain text, with no turn-level labels
(baseline condition).
• Original ELI5 Labels: Each turn in the dialogue
is followed by the original annotation tags for
explanation move, dialogue act, and topic.
• IUBAS Labels: Each explainee turn is prefixed
with its IUBAS labels (R, F, C values) as metadata,
while explainer turns remain unlabeled.
• Combined (ELI5 + IUBAS): Both the original</p>
      </sec>
      <sec id="sec-2-21">
        <title>ELI5 turn labels and the IUBAS labels for Explainee turns are included.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Experiments</title>
      <sec id="sec-3-1">
        <title>We conduct our experiments on the ELI5 dialogue quality</title>
        <p>assessment task introduced by Alshomary et al. (2024).</p>
      </sec>
      <sec id="sec-3-2">
        <title>This corpus consists of explanatory dialogues (399 in</title>
        <p>total) from the Reddit “Explain Like I’m Five” forum,
each labeled with a ground-truth explanation quality
score on a 1–5 Likert scale. We integrate the proposed</p>
      </sec>
      <sec id="sec-3-3">
        <title>IUBAS scheme into this task by automatically annotating the explainee turns and evaluating its impact on quality prediction.</title>
      </sec>
      <sec id="sec-3-4">
        <title>We format the prompt for each dialogue by inserting the turn-level metadata (if any) immediately after each utterance, between square brakets, with a concise description of the tag itself (for example (F01) Positive Basic</title>
        <p>
          IUBAS Annotation with GPT-4.1. To obtain IUBAS Feedback - Assert understanding). After
prelabels for the Explainee’s turns, we employed the GPT- senting the entire dialogue, we append a final instruction
4.15 model to perform annotation in a zero-shot manner. asking the model to “Rate the overall explanation quality
We targeted only those turns where the Explainee explic- on a 1–5 scale.” The model then outputs a single rating.
itly participates in the dialogue, corresponding to the We evaluate three instruction-tuned LLMs:
Llamacategories E04 (Request Explanation) and E07 (Request 3.1-8B-Instruct [
          <xref ref-type="bibr" rid="ref13">38</xref>
          ], Gemma-3-4b-it6, and
Qwen2.5Feedback) in the original 5-level annotation scheme of 14B-Instruct-1M [
          <xref ref-type="bibr" rid="ref14">39</xref>
          ]. We use HuggingFace’s lm_eval
Alshomary et al. These are the turns where the Explainee harness [
          <xref ref-type="bibr" rid="ref15">40</xref>
          ] in the multiple choice mode, asking the
asks a question or provides feedback, i.e., the utterances model to choose between a number from 1 to 5, indicating
that reflect the Explainee’s reaction and understanding. the dialogue quality. We report RMSE and MAE against
For each such turn, GPT-4.1 was prompted with the dia- human ratings of each model’s prediction, and assess
logue context and the definition of the IUBAS categories, significance using a paired t-test.
and it generated a IUBAS tag capturing the turn’s
properties, choosing among: R (type of explanation request,
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>5https://openai.com/index/gpt-4-1/</title>
        <p>LLaMA
Gemma
Qwen
No annotation
ELI5-only
IUBAS-only
IUBAS-only (C)
IUBAS-only (F)
IUBAS-only (R)
ELI5 + IUBAS
No annotation
ELI5-only
IUBAS-only
IUBAS-only (C)
IUBAS-only (F)
IUBAS-only (R)
ELI5 + IUBAS
No annotation
ELI5-only
IUBAS-only
IUBAS-only (C)
IUBAS-only (F)
IUBAS-only (R)
ELI5 + IUBAS</p>
        <sec id="sec-3-5-1">
          <title>6.1. Results and Analysis</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>7. Conclusion</title>
      <p>In this paper, we introduced IUBAS, a framework that
contributes to a richer understanding of the Explainee’s
role within explanatory dialogues. We incorporate
contrastivity and motivation alongside a categorization of
feedback and critical questions, providing a more
comprehensive account for analyzing and modeling such
interactions. By adopting this scheme, we can move
towards developing more sophisticated conversational AI
systems capable of engaging in truly human-like
explanatory dialogues, ultimately enhancing communication
effectiveness and fostering deeper understanding.
dialogue agents (arda): An unexpected journey https://aclanthology.org/2024.lrec-main.1007.
from pragmatics to conversational agents, OPEN [22] A. Cawsey, Explanation and Interaction: The
ComLINGUISTICS 11 (2025). puter Generation of Explanatory Dialogues,
ACL[9] D. Walton, A new dialectical theory of explanation, MIT Press series in natural-language processing,
Philosophical Explorations 7 (2004) 71–89. doi:10. Bradford Book, 1992. URL: https://books.google.hr/
1080/1386979032000186863. books?id=hQt1-7gA334C.
[10] G. R. Mayes, Argument explanation complemen- [23] J. Moore, Participating in Explanatory Dialogues:
tarity and the structure of informal reasoning, In- Interpreting and Responding to Questions in
Conformal Logic 30 (2010) 92–111. doi:10.22329/il. text, A Bradford book, CogNet, 1995. URL: https:
v30i1.419. //books.google.hr/books?id=nRx0QgAACAAJ.
[11] T. Govier, Problems in Argument Analysis and Eval- [24] D. Walton, Abductive Reasoning, University of
Aluation, Windsor Studies in Argumentation, Univer- abama Press, 2014. URL: https://books.google.hr/
sity of Windsor, 2018. URL: https://books.google. books?id=DNqKAwAAQBAJ.</p>
      <p>hr/books?id=pulfDwAAQBAJ. [25] D. Walton, The speech act of clarification in a
[12] D. Walton, C. Reed, F. Macagno, Argumentation dialogue model, Studies in communication
sciSchemes, Cambridge University Press, New York, ences 7 (2007). URL: https://api.semanticscholar.
2008. org/CorpusID:149373911.
[13] J. H. M. Wagemans, Argumentative patterns for [26] A. Arioua, M. Croitoru, Formalizing
explanajustifying scientific explanations, Argumentation tory dialogues, in: Scalable Uncertainty
Manage30 (2015) 97 – 108. URL: https://api.semanticscholar. ment, 2015. URL: https://api.semanticscholar.org/
org/CorpusID:56085286. CorpusID:7365540.
[14] S. Yu, F. Zenker, Peirce knew why abduction [27] K. J. Rohlfing, P. Cimiano, I. Scharlau, T. Matzner,
isn?t ibe–a scheme and critical questions for abduc- H. M. Buhl, H. Buschmeier, E. Esposito, A.
Grimtive argument, Argumentation 32 (2017) 569–587. minger, B. Hammer, R. Häb-Umbach, I. Horwath,
doi:10.1007/s10503-017-9443-9. E. Hüllermeier, F. Kern, S. Kopp, K. Thommes, A.-C.
[15] P. Olmos, Metaphilosophy and argument: The case Ngonga Ngomo, C. Schulte, H. Wachsmuth, P.
Wagof the justification of abduction, Informal Logic 41 ner, B. Wrede, Explanation as a social practice:
(2021) 131–164. doi:10.22329/il.v41i2.6249. Toward a conceptual framework for the social
de[16] G. Gaszczyk, Helping others to understand: A sign of ai systems, IEEE Transactions on
Cogninormative account of the speech act of expla- tive and Developmental Systems 13 (2021) 717–728.
nation, Topoi 42 (2023) 385–396. doi:10.1007/ doi:10.1109/TCDS.2020.3044366.
s11245-022-09878-y. [28] A. Zaninello, B. Magnini, Medexpdial:
Machine-to[17] P. Lipton, Inference to the Best Explanation, machine generation of explanatory dialogues for
International library of philosophy and sci- medical qa, in: Proceedings of the 28th Workshop
entific method, Routledge/Taylor and Francis on the Semantics and Pragmatics of Dialogue, 2024.
Group, 2004. URL: https://books.google.hr/books? [29] L. Fichtel, M. Spliethöver, E. Hüllermeier, P. Jimenez,
id=WIf YNExpSC0C. N. Klowait, S. Kopp, A.-C. N. Ngomo, A.
Ro[18] S. R. Grimm, The goal of explanation, Studies in brecht, I. Scharlau, L. Terfloth, A.-L. Vollmer,
History and Philosophy of Science Part A 41 (2010) H. Wachsmuth, Investigating co-constructive
be337–344. doi:10.1016/j.shpsa.2010.10.006. havior of large language models in explanation
dia[19] D. Walton, A dialogue system specification for logues, 2025. URL: https://arxiv.org/abs/2504.18483.
explanation, Synthese 182 (2011) 349–374. doi:10. arXiv:2504.18483.</p>
      <p>1007/s11229-010-9745-z. [30] H. Bunt, D. K. J. Heylen, C. Pelachaud, R.
Cati[20] J. A. van Laar, E. C. W. Krabbe, The burden of zone, D. R. Traum, The dit++ taxanomy for
funccriticism: Consequences of taking a critical stance, tional dialogue markup, 2009. URL: https://api.
Argumentation 27 (2013) 201–224. doi:10.1007/ semanticscholar.org/CorpusID:60074224.
s10503-012-9272-9. [31] F. I. Dretske, Contrastive statements, Philosophical
[21] M. Alshomary, F. Lange, M. Booshehri, M. Sengupta, Review 81 (1972) 411–437. doi:10.2307/2183886.</p>
      <p>P. Cimiano, H. Wachsmuth, Modeling the quality [32] C. Hamblin, Fallacies, University paperbacks,
of dialogical explanations, in: N. Calzolari, M.-Y. Methuen, 1970. URL: https://books.google.hr/
Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Pro- books?id=bYYIAQAAIAAJ.
ceedings of the 2024 Joint International Conference [33] J. Blair, C. Tindale, Groundwork in the Theory
on Computational Linguistics, Language Resources of Argumentation: Selected Papers of J. Anthony
and Evaluation (LREC-COLING 2024), ELRA and Blair, Argumentation Library, Springer
NetherICCL, Torino, Italy, 2024, pp. 11523–11536. URL: lands, 2011. URL: https://books.google.hr/books?</p>
    </sec>
    <sec id="sec-5">
      <title>Appendix</title>
      <sec id="sec-5-1">
        <title>Limitations</title>
        <sec id="sec-5-1-1">
          <title>While the manual annotation of a full dataset falls out</title>
          <p>side the scope of our current proposal, we believe that
future work should involve testing the agreement
between the automated annotation and human-annotation.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Additionally, the proposed typology could be expanded to account for the diferent kinds of explanations and reasoning patterns on the Explainer’s side, too.</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Ethical Considerations</title>
        <p>This research focuses on analyzing explanatory dialogue,
and it is crucial to acknowledge the potential ethical
implications of applying such schemes to real-world
situations, especially in sensitive domains like healthcare or
by covering topics such as ethnicity, physical ability,
gender and sexual orientation (as in the case of the reported
example in Table 9). Careful consideration should also
be given to data privacy, informed consent, and potential
biases in the annotation process.</p>
        <p>Checking whether the listener understood the explanation.</p>
        <p>Checking the listener’s prior knowledge of the topic.</p>
        <p>Explaining a concept or topic to the listener.</p>
        <p>Requesting an explanation from the listener.</p>
        <p>Is a contrastive class introduced?
Directly inquiring about E, the event or phenomenon requiring explanation.</p>
        <p>Introducing a contrastive class, high-lighting specific aspects of E.</p>
        <p>Is additional information provided?
Why E?
Why E, given that M?
Why E, instead of E*?
Why E, instead of E*, given that M?
Informing the listener that their last utterance was understood.</p>
        <p>Informing the listener that the utterance was not understood.</p>
        <p>Responding qualitatively to an utterance by correcting errors or similar.</p>
        <p>Does the feedback confirm or disconfirm H?
Is the feedback simple or complex?
Agreeing with H.</p>
        <p>Agreeing with H without further elaboration.</p>
        <p>Agreeing with H with further elaboration.</p>
        <p>Demonstrative understanding: I understand. So...</p>
        <p>Qualified understanding: I understand. But...</p>
        <p>Critical challenge: I understand. However... [critical question] (see Table 7)
Disagreeing with H.</p>
        <p>Disagreeing with H without further elaboration.</p>
        <p>Disagreeing with H with further elaboration.</p>
        <p>Pure: I don’t think H explains E. I rather think H*.</p>
        <p>Clarification request: I don’t think H explains E. Can you clarify h ∈ H?
Critical challenge: I don’t think H... In fact [critical question] (see Table 7)
Assessing the listener by rephrasing their utterance or giving a hint
Giving additional information to foster a complete understanding</p>
        <p>Making any other explanation move
Does ‘lung cancer’ cause Mark’s condition? Perhaps
this diagnosis does not explain all the symptoms, or
entails symptoms that were not detected.</p>
        <p>Is ‘lung cancer’ the cause we are looking for?
Perhaps we are dealing with multiple causes: the
patient coughs because of lung cancer, but also because
of contracting COVID-19.</p>
        <p>Is cough the only symptom that needs to be explained?
Is it a real symptom (or is the patient faking it)?
What is the cost of being mistaken if one proceeds
as if the patient has cancer, or as if she has asthma?
Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) in order to: Paraphrase
and reword, Improve writing style, and Grammar and spelling check. After using these
tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lombrozo</surname>
          </string-name>
          , Explanation and abductive inference,
          <source>The Oxford Handbook of Thinking and Reasoning</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bassok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reimann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Glaser</surname>
          </string-name>
          ,
          <article-title>Self-explanations: How students study and use examples in learning to solve problems</article-title>
          ,
          <source>Cognitive science 13</source>
          (
          <year>1989</year>
          )
          <fpage>145</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mittelstadt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wachter</surname>
          </string-name>
          ,
          <article-title>Explaining explanations in ai</article-title>
          ,
          <source>in: Proceedings of the Conference on Fairness, Accountability, and Transparency</source>
          , FAT* '19,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2019</year>
          . URL: http:// dx.doi.org/10.1145/3287560.3287574. doi:
          <volume>10</volume>
          .1145/ 3287560.3287574.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Explanation in artificial intelligence: Insights from the social sciences</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>267</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/ S0004370218305988. doi:https://doi.org/10. 1016/j.artint.
          <year>2018</year>
          .
          <volume>07</volume>
          .007.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bunt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Alexandersson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Choe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hasida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Petukhova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu-Belis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Romary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Soria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Traum</surname>
          </string-name>
          ,
          <article-title>Towards an ISO standard for dialogue act annotation</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Piperidis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rosner</surname>
          </string-name>
          , D. Tapias (Eds.),
          <source>Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)</source>
          ,
          <source>European Language Resources Association (ELRA)</source>
          , Valletta, Malta,
          <year>2010</year>
          . URL: http://www.lrec-conf. org/proceedings/lrec2010/pdf/560_Paper.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alshomary</surname>
          </string-name>
          ,
          <article-title>"mama always had a way of explaining things so i could understand”: A dialogue corpus for learning to construct explanations</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2209</volume>
          .
          <fpage>02508</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Feldhus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anagnostopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alshomary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sonntag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <article-title>Towards modeling and evaluating instructional explanations in teacher-student dialogues</article-title>
          ,
          <source>in: Proceedings of the 2024 International Conference on Information Technology for Social Good</source>
          , GoodIT '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          , p.
          <fpage>225</fpage>
          -
          <lpage>230</lpage>
          . URL: https://doi.org/ 10.1145/3677525.3678665. doi:
          <volume>10</volume>
          .1145/3677525. 3678665.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Di Maro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Di</given-names>
            <surname>Bratto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mennella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Origlia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cutugno</surname>
          </string-name>
          , et al.,
          <source>Argumentation in recommender id=IM9p6GgnJAcC.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rescorla</surname>
          </string-name>
          , Shifting the Burden of Proof?,
          <source>The Philosophical Quarterly</source>
          <volume>59</volume>
          (
          <year>2008</year>
          )
          <fpage>86</fpage>
          -
          <lpage>109</lpage>
          . URL: https://doi.org/10.1111/j.1467-
          <fpage>9213</fpage>
          .
          <year>2008</year>
          .
          <volume>555</volume>
          .x. doi:
          <volume>10</volume>
          .1111/j.1467-
          <fpage>9213</fpage>
          .
          <year>2008</year>
          .
          <volume>555</volume>
          .x.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Harman</surname>
          </string-name>
          ,
          <article-title>The inference to the best explanation</article-title>
          ,
          <source>Philosophical Review</source>
          <volume>74</volume>
          (
          <year>1965</year>
          )
          <fpage>88</fpage>
          -
          <lpage>95</lpage>
          . doi:
          <volume>10</volume>
          .2307/2183532.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Josephson</surname>
          </string-name>
          , S. G. Josephson (Eds.), Abductive Inference: Computation, Philosophy, Technology, Cambridge University Press, New York,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>D.</given-names>
            <surname>Walton</surname>
          </string-name>
          , Abductive, presumptive and plausible arguments,
          <source>Informal Logic</source>
          <volume>21</volume>
          (
          <year>2001</year>
          ). doi:
          <volume>10</volume>
          .22329/ il.v21i2.
          <fpage>2241</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grattafiori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jauhri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Dahle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Letman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mathur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schelten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaughan</surname>
          </string-name>
          , et al.,
          <source>The llama 3 herd of models, arXiv preprint arXiv:2407.21783</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , et al.,
          <source>Qwen2</source>
          .
          <fpage>5</fpage>
          -1m
          <source>technical report, arXiv preprint arXiv:2501.15383</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Abbasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Black</surname>
          </string-name>
          , A. DiPofi,
          <string-name>
            <given-names>C.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Golding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Le Noac'h</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>McDonell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Muennighof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ociepa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Phang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schoelkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Skowron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sutawika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Thite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>A framework for few-shot language model evaluation, 2024</article-title>
          . URL: https://zenodo.org/records/ 12608602. doi:
          <volume>10</volume>
          .5281/zenodo.12608602.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>