<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A. Tocchetti);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Rationale Trees: Towards a Formalization of Human Knowledge for Explainable Natural Language Processing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Tocchetti</string-name>
          <email>andrea.tocchetti@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jie Yang</string-name>
          <email>j.yang-3@tudelft.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Brambilla</string-name>
          <email>marco.brambilla@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Human Knowledge, Knowledge Formalization, Argumentation Mining, Argumentation Theory, Natural</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>20133</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Delft University of Technology</institution>
          ,
          <addr-line>Mekelweg 5, Delft, 2628 CD</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Language</institution>
          ,
          <addr-line>Natural Language Processing, Explainable AI, Explainability</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Politecnico di Milano, Dipartimento di Elettronica, Informazione e Bioingegneria</institution>
          ,
          <addr-line>Via Giuseppe Ponzio 34, Milano</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>As powerful and complex language models are being released to the public, understanding their behaviour is more important than ever. Although Explainable Artificial Intelligence (XAI) approaches have been widely applied to NLP models, the explanations they provide may still be complex to understand for human interpreters as these may not be aligned with the reasoning process they apply in language-based tasks. Furthermore, such a misalignment is also present in most XAI datasets as they are not structured to reflect such a fundamental property. Striving to bridge the gap between model and human reasoning, we propose ad hoc formalizations to structure and detail the thought process applied by human interpreters when performing a set of NLP tasks of interest. Hence, we define rationale mappings, i.e., representations that organize humans' analytical reasoning steps when identifying and associating the essential parts of the texts involved in a language-based task leading to its output. These are organized in tree structures referred to as rationale trees and characterized for each task to enhance their expressiveness. Furthermore, we describe their data collection and storage process. We argue these structures would result in a better alignment between model and human reasoning, hence improving models' explanations, while still being suited for standard explainability processes.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural Language</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The recent spread of Large Language Models (LLM) with the release of ChatGPT raised the
research community’s interest in questioning the capabilities, understandability, and
explainability of AI models like never before. Hence, researchers began exploring the potentiality of
such systems with a particular focus on Natural Language Processing (NLP) tasks [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2, 3, 4, 5</xref>
        ].
Likewise, understanding and explaining models’ behaviour has always been of fundamental
interest to the AI community [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Such a multi-faceted scenario spreads across various
technical and human-centred research fields, like computer science, philosophy, and many more.
(M. Brambilla)
Consequently, two fundamental objectives must be considered when explaining models: the
faithful representation of their behaviour and the design of humanly understandable
explanations. Acknowledged the plethora of explainability methods available in the literature [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ],
recent trends in Explainable Artificial Intelligence (XAI) revealed the increasing importance
and research interest in human-centred AI [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. As human actors are progressively more
involved in diferent aspects of the explainability process, researchers developed various
approaches to collect and employ human knowledge to assess and improve explanations [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Hence, an efort to bridge the gap between humans and models in XAI is of fundamental interest
to the AI community.
      </p>
      <p>
        In the context of NLP, researchers defined explanations by identifying the most important
words in the input(s) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], providing the most influential training examples afecting an outcome
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], or generating textual explanations [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. One might think such explanations are always
interpretable for humans as they rely on their ability to understand and reason on natural
language. However, they may sometimes be too complex [16, 17], or their structure may not be
intuitive enough for a human interpreter to promptly understand the model’s behaviour. For
example, saliency map scores assigned to the words of a sentence in a sentiment analysis task
may not be fully understood, as these scores represent which parts of the input were deemed
important and do not represent the actual model’s reasoning [18]. Similar representations (i.e.,
highlights and textual explanations) are employed when collecting human knowledge to train
or improve models and evaluate their explanations [19] as these are pretty simple for humans to
describe. Despite the practicality of collecting human rationale abiding by these representations,
they may not fully explain the actual human reasoning applied to perform the NLP task at hand
or the intrinsic logic humans use when reasoning on the texts involved.
      </p>
      <p>Striving to provide an even more complete representation of human rationale for a set of NLP
tasks of interest (i.e., Sentiment Analysis, Text Summarization, Natural Language Inference,
Claim Verification, and Question Answering), we propose ad hoc formalizations to structure
human knowledge defined by drawing inspiration from Argumentation Mining [ 20, 21] and the
recent literature in Data Structuring in XAI. These are referred to as rationale mappings. Since
we identified them as fundamental discerning factors, we analyzed and organized the tasks
based on their type (i.e., text classification or generation) and the number of inputs (i.e., single
or multiple inputs). We inspected the processes and the nature of the considered NLP tasks from
both the human and model perspective and designed the formalizations. A standard structure
is described and then characterized for each of the considered tasks to reduce complexity
and enhance expressiveness when possible. These are further hierarchically organized in tree
structures, referred to as rationale trees. Such representations organize the reasoning steps
humans apply when identifying and reasoning on the essential parts of the texts they are
provided with. The process applied to collect such structures and an example are described
for each of the considered tasks. In the end, a characterization of how these structures will be
stored is also provided. To the best of the authors’ knowledge, we are the first to provide ad
hoc human knowledge formalizations applied to various NLP tasks. In summary, we answer
the following research questions
• RQ1: How can we structure human knowledge to represent the analytical reasoning
steps humans apply to NLP tasks?
• RQ2: Can rationale mappings be further organized to provide an even more
comprehensive and detailed representation of the rationale they describe?</p>
      <p>The remainder of the paper is structured as follows. Chapter 2 details the literature on
collecting, structuring, and employing human knowledge in XAI and argumentation mining.
Chapter 3 classifies the NLP tasks of interest based on their features and describe the structure
of rationale mappings, the way they are organized into rationale trees, and their characterization
for each task. Finally, Chapter 4 summarizes the article’s content and provide insights about
future works and developments.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works &amp; Background</title>
      <sec id="sec-2-1">
        <title>2.1. Human Knowledge and Reasoning in NLP and XAI</title>
        <p>
          Natural Language Processing (NLP) is a research field aimed at interpreting, analyzing, and
manipulating natural language data to learn, understand and produce human language content
[22, 23]. NLP include a broad variety of tasks, some aimed at human language understanding
(e.g., Coreference Resolution, Natural Language Parsing, etc.), while more complex tasks focus on
classifying (e.g., Sentiment Analysis, Natural Language Inference, etc.) or generating (e.g., Text
Summarization, Question Answering, etc.) text. In language-based tasks, humans can reason on
and understand the provided text(s) through their inherent linguistic knowledge to generate a
desired output. Such data are usually collected as couples of input-output texts and employed to
train models capable of achieving specific tasks [ 24]. However, such a data collection approach
does not include how humans performed the task and reasoned on the provided text(s). In
particular, whenever model explainability is desired, human actors are also requested to provide
a description or evidence of the thought process they applied. These are usually collected as
free text or highlights of the input’s words and sentences [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In particular, while the first
is more expressive and readable, the latter provides a compact, suficient, and comprehensive
representation [19]. Such information can be used to train so-called self-explainable models [25],
i.e., models capable of providing explanations for their outputs [19], or to assess the explanations
extracted through other XAI techniques [26, 19]. Even though explanations are mainly collected
through crowdsourcing approaches using the aforementioned representations, a wider variety of
formats is available whenever an explanation is provided to a human interpreter. In the context
of Explainable AI, NLP tasks’ explanations are represented as saliency maps [
          <xref ref-type="bibr" rid="ref16">27</xref>
          ], declarative
representations (i.e., trees and rules) [
          <xref ref-type="bibr" rid="ref17">28</xref>
          ], examples [
          <xref ref-type="bibr" rid="ref18">29</xref>
          ], or machine-generated natural language
[
          <xref ref-type="bibr" rid="ref19">30</xref>
          ]. While lay users can easily understand the latter [16], the others may not be directly
understandable to human interpreters since a deep understanding of XAI may be required
[16, 17]. Although the similarities in the shape explanations are provided by and provided
to humans, there’s a significant gap when it comes to their interpretability. Furthermore,
although some explanations may be humanly understandable, they are not structured to match
human reasoning. Hence, a misalignment between how humans think when performing a task
involving natural language and how explanations provide evidence for the model’s reasoning
can be identified.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Structuring in NLP and XAI</title>
        <p>
          Over the last few years, various datasets organized human knowledge applied to the
explainability of NLP tasks in the form of free-text [
          <xref ref-type="bibr" rid="ref20">31</xref>
          ], highlights of the most important words or sentences
[
          <xref ref-type="bibr" rid="ref21 ref22">32, 33</xref>
          ], or a combination of both [
          <xref ref-type="bibr" rid="ref23">34</xref>
          ]. Although such simple structures were proven efective,
researchers demonstrated that an enhanced level of detail also contributes to improving models’
performances [
          <xref ref-type="bibr" rid="ref24 ref25">35, 36</xref>
          ] and understandability [
          <xref ref-type="bibr" rid="ref26">37</xref>
          ]. Most structures have been designed in the
context of Question Answering as it is one of the most complex NLP tasks. Lamm et al. [
          <xref ref-type="bibr" rid="ref26">37</xref>
          ]
defined annotation triples for Question Answering tasks by identifying relationships between
the question and the provided passage. The annotator selects the passage entailing the answer,
then chooses a short text span with the answer within the entailed text and marks the equivalent
noun phrases in the question and the answer. Finally, entailment patterns are extracted. In
the context of machine reading comprehension, Ye et al. [
          <xref ref-type="bibr" rid="ref25">36</xref>
          ] defined quadruples of question,
paragraph, answer, and a textual explanation that motivates the human reasoning applied
to build the annotation. WorldTree [
          <xref ref-type="bibr" rid="ref27">38</xref>
          ] and WorldTree V2 [39] are explanation graphs that
motivate answers to science questions. They are built by defining and labelling relationships
between words in the question, answer, and explanations generated through domain and world
knowledge. Although the described processes and structures significantly advance the
state-ofthe-art in question answering in the corresponding contexts, these are task-specific and their
aligned with human reasoning has yet to be proven.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Argumentation Mining</title>
        <p>Argumentation Mining is the process of detecting arguments in a textual document, their
relationships, and their internal structure [21, 40]. The basic argumentation unit is an argument
whose structure involves implicit or explicit premises and a conclusion or, more generally, a
set of at least two propositions [20]. For each argument, a schema defining relations between
prepositions following human reasoning patterns is defined. In particular, Pragma-Dialects
theory [41] describes argumentation structures that represent the relation between arguments
through coordination, subordination, or forming multiple arguments, as depicted in Figure 1(a).
(a) Pragma-Dialect Theory
(b) Argumentation Tree Example</p>
        <p>While a more complex graph structure is usually employed [40], Mochales and Moens [20]
applied the Pragma-Dialects theory to define a tree-structure representation in which every tree
and sub-tree represents a single argumentation structure. In such a setting, all arguments are
uniquely related to another argument of a tree for which they represent a premise. An example
of such a structure is represented in Figure 1(b).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Formalization</title>
      <sec id="sec-3-1">
        <title>3.1. NLP Task Classification</title>
        <p>This article considers five diferent Natural Language Processing tasks: Sentiment Analysis,
Text Summarization, Natural Language Inference, Claim Verification, and Question Answering.
We identified which features make these tasks substantially diferent (e.g., objective, process,
number of inputs, type of output, etc.). Considering such diferences, our research acknowledged
the similarity in the nature of the inputs (i.e., all these tasks accept free-text inputs) and the
number of outputs (i.e., all these tasks accept a single output) while identifying a significant
diference in the process, the type of task (i.e., whether the task generates or classifies text),
and the number of inputs (i.e., whether the task handles one or multiple inputs). While the
process is unique for each considered NLP task, the type and number of inputs can be used to
categorize them. Table 1 reports the outcome of this classification.</p>
        <p>Task
Sentiment Analysis</p>
        <p>Text Summarization
Natural Language Inference</p>
        <p>Claim Verification
Question Answering</p>
        <p>Task Type
Classification
Generation
Classification
Classification
Generation</p>
        <p>N Inputs Input(s) Type</p>
        <p>Single Free-text
Single Free-text
Multiple Free-text
Multiple Free-text
Multiple Free-text</p>
        <p>Output Type</p>
        <p>Discrete
Free-Text
Discrete
Discrete
Free-text</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. RQ1 - Rationale Mappings</title>
        <p>When reasoning on text, humans are so used to finding logical, syntactical, and semantical
connections between words that they are unaware of such behaviour. A simple example is the
capability of humans to find all the expressions that refer to the same entity in a text (so-called
Coreference Resolution in NLP). Such a task rarely requires complex human reasoning as it
can be promptly achieved thanks to the linguistic flexibility and knowledge we have developed.
On the other hand, extensive human reasoning may be necessary for complex language-based
activities, like question answering, in which a human interpreter must understand the paragraph,
the question, and the relations between their content, to answer it. We consider such reasoning
fundamental building blocks to define and structure human rationale in Natural Language
Processing tasks. We refer to them as rationale mappings, i.e., representations that organize
humans’ analytical reasoning steps when identifying and associating the essential parts of the
texts involved in a language-based task leading to its output. In particular, we characterise three
types of mappings common to the considered NLP tasks:
• External mappings represent the reasoning a human interpreter applies between two
terms and/or parts of text belonging to diferent texts.
• Internal mappings represent the reasoning a human interpreter applies between two
diferent terms and/or parts of text in the same text.
• Resolution mappings are internal mappings representing anaphora or coreference
resolution reasoning between two terms and/or parts of text in the same text.</p>
        <p>We define the structure of rationale mappings by combining the literature about data structuring
in XAI and the argument structure described by Mochales and Moens [20]. In our definition, we
constrained the number of propositions and extended it with their relationship, finally merging
them into a single representation. Hence, we define rationale mappings as triples
〈 text, text, label 〉
where text is a word or a set of consecutive words from any text involved in the human reasoning
applied to the language-based task and label is a term that defines the relationship between
the texts. The latter is defined based on the type of mapping. In external mappings, they are
specific to the NLP task to which the mappings are applied, i.e., when the task involves a
discrete output (i.e., a finite and well-defined set of outputs is possible) or specific terms that
describe the applied approach, these are employed as labels as they represent both human- and
model-understandable concepts. Otherwise, more generic linguistic labels are considered, i.e.,
semantic or syntactic, respectfully representing the semantic or syntactic similarity between
texts. Whenever a semantic label is applied, mappings can be extended to include a textual
description of the rationale a human interpreter applies, enhancing the level of detail. Such
generic labels are also applied to internal mappings as they define a syntactic or semantic
relationship between the texts.</p>
        <p>External mappings may be simplified in specific cases, hence defining these mappings as
couples</p>
        <p>〈 text, label 〉
where text is a word or a set of consecutive words from any text involved in the human reasoning
applied to the language-based task and label is a term that specifies the text. In particular, we
consider simplifications only when the nature of the task and the labels allow for them. The
fact that the two texts in a mapping coincide is not considered a simplification, even though
it might be helpful for what concerns data storage. Internal mappings are not subject to any
simplifications as there is no meaningful overlap between texts and label. However, they may
be subject to slight changes to improve their expressiveness when applied to specific tasks.
Resolution mappings can’t be simplified as it is necessary to specify the type of resolution used
and the parts of text involved. Further clarifications will be made for each considered NLP task
in their dedicated sections.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. RQ2 - Rationale Trees</title>
        <p>While defining such mappings is useful to understand the human reasoning applied to a task,
these can be further hierarchically structured to describe better the rationale involved in a
specific instance of a task. Hence, mappings are organized in a tree structure in which each
rationale mapping is a tree node with diferent meanings and constraints based on its type. We
refer to these structures as rationale trees. The root node represents the (generic) relationship
between the input(s) and the output, i.e., a standard input-output representation of the task.
Each other node (i.e., internal nodes and leaves) details the mapping between the texts in its
parent node. In particular, considering a parent node p and its child node c defined as
p 〈 p_text_I, p_text_II, p_label 〉
c 〈 c_text_I, c_text_II, c_label 〉
and assuming that their corresponding texts (i.e., p_text_I and c_text_I, and p_text_II and
c_text_II ) are extracted from the same text, either one of the following constraints is enforced.
• c_text_I ⊂ p_text_I , i.e., c_text_I is a word or a set of consecutive words that are a subset
of p_text_I, and c_text_II ⊂ p_text_II , i.e., c_text_II is a word or a set of consecutive
words that are a subset of p_text_II.
• c_text_I ⊂ p_text_I , i.e., c_text_I is a word or a set of consecutive words that are a subset
of p_text_I, and c_text_II ⊂ p_text_I , i.e., c_text_II is a word or a set of consecutive words
that are a subset of p_text_I. The same can be applied considering p_text_II.
Such conditions define a structure in which the deeper the node, the more specific the rationale
it describes. Furthermore, while external and internal mappings can either be internal nodes
or leaves that detail the parent node’s rationale, resolution mappings can only be leaf nodes
and define rationale to be applied to their parent and sibling nodes whenever meaningful.
Child nodes are considered to be in a coordinative relationship towards their parent node,
simultaneously contributing to specifying the parent’s node mapping. Moreover, while external
mappings can have both internal and external mappings as child nodes, internal mappings can
only have other internal mappings as child nodes since they are mainly employed to detail the
rationale applied in external mappings and not vice-versa. Additionally, resolution mappings
can be child nodes for both internal and external mappings. A generic example of a rationale
tree is depicted in Figure 2.</p>
        <p>The only condition enforced between sibling nodes is that their text_I and text_II should not
completely overlap simultaneously, i.e., considering any pair of sibling nodes s1 and s2 defined
as
〈 s1_text_I, s1_text_II, s1_label 〉
〈 s2_text_I, s2_text_II, s2_label 〉</p>
        <p>Labels</p>
        <p>Positive, Negative</p>
        <p>Extractive, Abstractive
Neutral, Contradiction, Entailment</p>
        <p>Support, Refute
Syntactic, Semantic</p>
        <p>Simplification</p>
        <p>Yes
Yes
No
No
No
and assuming that their corresponding texts (e.g., s1_text_I and s2_text_I, and s1_text_II and
s2_text_II ) are extracted from the same text, the following constraints are enforced.
• if s1_text_I ≡ s2_text_I ⇒ s1_text_II ≠ s2_text_II.</p>
        <p>• if s1_text_II ≡ s2_text_II ⇒ s1_text_I ≠ s2_text_I.</p>
        <p>Such conditions define a structure where the same mapping can’t be duplicated, although they
still allow a fine granularity in the diferences between the mappings associated with a parent
node.</p>
        <p>The following sections describe the formalizations, detailing a set of features of interest. In
particular, the simplest ones are summarized in Table 1, while the most complex ones are
detailed in the corresponding sections, summarized in Table 2, and explained below.
• Labels, i.e., the concepts applied as labels when defining the mappings. These are mainly
employed in external mappings, although internal mappings may sometimes benefit from
such labels.
• Mappings Interpretation, i.e., a task-specific description for internal and external
mappings, if needed. Whenever no specific interpretation is provided, we consider them
aligned with their general description.
• Simplifications , i.e., whether any simplification can be applied to the mappings, their
description and structure.
• Mapping Guidelines, i.e., the process a human interpreter applies to define mappings
and a rationale tree for a task of interest. We consider human interpreters to be performing
the task themselves, although we do not include details of such a process in the guidelines.
The same approach can be applied even when the interpreter is provided with all the
texts involved in the task.
• Example, i.e., an example of a rationale tree collected by applying the process to a task
entry from a specified NLP dataset. Each mapping type is identified by its starting letters
(e.g., EM stands for external mapping). An example is provided for Sentiment Analysis,
while all the others can be found in Appendix A as they follow the same principles.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Sentiment Analysis</title>
        <p>Sentiment Analysis is an NLP task in which a human interpreter defines the output by assigning
a sentiment (either positive, negative, or sometimes neutral) to an input sentence or text. For
such a task, internal mappings are defined as</p>
        <p>〈 input_text, input_text, label 〉
where input_text is a word or a set of consecutive words from the input text and label is the
sentiment between the texts (i.e., positive or negative, in our case). On the other hand, external
mappings are defined as</p>
        <p>〈 input_text, output_text, label 〉
where input_text is a word or a set of consecutive words from the input text, and output_text and
label represent the sentiment associated with the input_text. Acknowledged the overlapping
between output_text and label, external mappings are applied a simplification. Hence, they are
defined as</p>
        <p>〈 input_text, label 〉
where input_text is a word or a set of consecutive words from the input text and label is the
sentiment associated with input_text.</p>
        <p>A human interpreter providing rationale mappings for a Sentiment Analysis task performs the
following assignments.</p>
        <p>• They define external mappings between the input and the output texts.
• For each of the previously defined external mapping, they recursively define internal and
external mappings detailing the texts involved until a desired level of detail is achieved.</p>
        <p>The same process is applied to the newly found mappings.
• For each of the previously defined internal and external mappings, they define any
resolution mapping that was applied, as child nodes or sibling nodes based on where they are
applied.</p>
        <p>We picked a data point from the Large Movie Review Dataset [42] and built its rationale tree as
an example, represented in Figure 3.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Text Summarization</title>
        <p>Text Summarization is an NLP task in which a human interpreter is provided with an input
text and they provide a summarized output text. Two diferent approaches can be applied and
combined together. An extractive approach reports parts of the input text into the output text,
maintaining the same syntax. An abstractive approach formulates the output text to have the
same semantics as parts of the input text while using a diferent syntax. For such a task, external
mappings are detailed as</p>
        <p>〈 input_text, output_text, label 〉
where input_text is a word or a set of consecutive words from the input text, output_text is
a word or a set of consecutive words from the output text, and label is the summarization
approach (i.e., abstractive or extractive) applied to input_text to generate output_text.
Whenever an extractive approach is applied, external mappings can be simplified as such an
approach involves reporting the same text from the input in the output text. Hence, they are
defined as couples</p>
        <p>〈 input_text, “extractive” 〉
where input_text is a word or a set of consecutive words from the input text.
A human interpreter providing rationale mappings for a Text Summarization task performs the
following assignments.</p>
        <p>• They define external mappings between the input and the output texts.
• For each of the previously defined external mapping that is assigned the “extractive” label,
they recursively define internal mappings detailing the texts involved until a desired level
of detail is achieved. Instead, for each of the previously described external mapping that
is assigned the “abstractive” label, they recursively define internal and external mappings
detailing the texts involved until a desired level of detail is achieved. The same approach
is applied to the newly found mappings.
• For each of the previously defined internal and external mappings, they define any
resolution mapping that was applied, as child nodes or sibling nodes based on where they are
applied.</p>
        <p>We picked a data point from the CNN/Daily Mail Dataset [43] and built its rationale tree as an
example, represented in Figure 6 in Appendix A.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Natural Language Inference</title>
        <p>Natural Language Inference is an NLP task in which a human interpreter is provided with
two texts, an hypothesis and a premise, and they define whether they are in an entailment,
contradiction, or neutral relationship. For such a task, external mappings are defined as
〈 premise_text, hypothesis_text, label 〉
where premise_text is a word or a set of consecutive words from the premise, hypothesis_text
is a word or a set of consecutive words from the hypothesis, and label is the relationship (i.e.,
entailment, contradiction, or neutral) between premise_text and hypothesis_text.
A human interpreter providing rationale mappings for a Natural Language Inference task
performs the following assignments.</p>
        <p>• They identify external mappings between the premise and the hypothesis texts.
• For each of the previously defined external mappings, they recursively define internal and
external mappings detailing the texts involved until a desired level of detail is achieved.</p>
        <p>The same approach is applied to the newly found mappings.
• For each of the previously defined internal and external mappings, they define whether
any resolution mapping that was applied, as child nodes or sibling nodes based on where
they are applied.</p>
        <p>
          We picked a data point from the e-SNLI Dataset [
          <xref ref-type="bibr" rid="ref23">34</xref>
          ] and built its rationale tree as an example,
represented in Figure 4 in Appendix A.
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>3.7. Claim Verification</title>
        <p>Claim Verification is an NLP task in which a human interpreter is provided with two texts, i.e.,
a claim and an evidence, and they define whether the evidence supports or refutes the claim. For
such a task, external mappings are defined as</p>
        <p>〈 claim_text, evidence_text, label 〉
where claim_text is a word or a set of consecutive words from the claim, evidence_text is a word
or a set of consecutive words from the evidence, and label is the relationship (i.e., support or
refute) between claim_text and evidence_text.</p>
        <p>A human interpreter providing rationale mappings for a Claim Verification task performs the
following assignments.</p>
        <p>• They identify external mappings between the claim and the evidence.
• For each of the previously defined external mappings, they recursively define internal and
external mappings detailing the texts involved until a desired level of detail is achieved.</p>
        <p>The same approach is applied to the newly found mappings.
• For each of the previously defined internal and external mappings, they define any
resolution mapping that was applied, as child nodes or sibling nodes based on where they are
applied.</p>
        <p>We picked a data point from the FEVER Dataset [44] and built its rationale tree as an example,
represented in Figure 5 in Appendix A.</p>
      </sec>
      <sec id="sec-3-8">
        <title>3.8. Question Answering</title>
        <p>Question Answering is an NLP task in which a human interpreter is provided with a question
and a paragraph, and they provide an answer to the question through the paragraph. For such a
task, mappings are defined as</p>
        <p>〈 text, text, label 〉
where text is a word or a set of consecutive words from the same (in internal mappings) or
diferent (in external mappings) texts, i.e., the question, the paragraph, or the answer, and label
describes whether there’s a semantic or syntactic relationship between the texts. Similarly to
internal mappings, whenever a semantic label is applied, the mapping is further detailed by
collecting comments detailing the relationship between the texts.</p>
        <p>Rationale trees increase in complexity in Question Answering tasks as the process is more
convoluted than the other considered NLP tasks. First of all, a new type of mapping has to be
defined. Abstractive mappings define which word or set of consecutive words of the question
contributed to defining its class among the following question types.</p>
        <p>• Yes/No Question, i.e., questions looking for confirmation in the paragraph.
• Wh-Question, i.e., questions looking for the answer based on the type of wh-question
(e.g., Who, What, etc.).
• Choice Question, i.e., questions picking the answer among the ones proposed in the
question based on the paragraph.</p>
        <p>• Disjunctive Questions, i.e., questions looking for confirmation in the paragraph.
Such mappings are introduced to be aligned with the question-answering process in which a
human interpreter identifies which information they should look for to answer the question
before reading the paragraph [45, 46, 47]. Abstractive mappings are defined as couples
〈 question_text, question_class 〉
where question_text is a word or a set of consecutive words from the question and the
question_class describes the question class chosen from a list of values defined from the question
types described, i.e., yes/no question, disjunctive question, choice question, and wh-question. The
latter is further detailed based on the type of wh-question, defining a specialization (described
in Table 3 in Appendix A). Moreover, each rationale tree can only have one abstractive mapping
and must be a child node of the root node.</p>
        <p>A human interpreter providing rationale mappings for a Question Answering task performs
the following assignments.</p>
        <p>• They define an abstractive mapping associated with the question.
• They define external mappings between the question and the paragraph. The same is
done for internal and resolution mappings in these texts. These are recursively refined
until the desired level of detail is achieved.
• They define external mappings between the paragraph and the answer. The same is done
for internal and resolution mappings in these texts. These are recursively refined until the
desired level of detail is achieved.
• They detail the previously defined abstractive mapping by defining external mappings
between the question and the answer. These are recursively refined until the desired level
of detail is achieved.</p>
        <p>We picked a data point from the SQuAD 2.0 Dataset [48] and built its rationale tree as an example,
represented in Figure 7 in Appendix A.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>This article described a novel approach to structuring human knowledge for a set of Natural
Language Processing tasks of interest. We reported on the literature about human knowledge
and data structuring in NLP and XAI and Argumentation Mining that extensively inspired our
work. We explained the concept of rationale mapping, its specializations, and how these can be
structured into rationale trees to describe the reasoning process a human interpreter applies
in language-based tasks. Task-specific mappings, potential simplifications, and extensions
were detailed for each one. We argue these representations contribute towards representing
human knowledge to be applied to XAI tasks while also being a suitable way of shaping
explanations provided by XAI methods or self-explaining models (e.g., LLMs). Future work will
involve collecting and assessing the understandability of rationale trees for datasets of interest,
improving and detailing the labels applied to some of the proposed mappings, and exploring
the applicability of rationale trees to other NLP tasks.
for machine learning models performance: A table-to-text task, in: Proceedings of the
Thirteenth Language Resources and Evaluation Conference, European Language Resources
Association, Marseille, France, 2022, pp. 3542–3551. URL: https://aclanthology.org/2022.
lrec-1.379.
[16] M. Danilevsky, K. Qian, R. Aharonov, Y. Katsis, B. Kawas, P. Sen, A survey of the state
of explainable AI for natural language processing, in: Proceedings of the 1st Conference
of the Asia-Pacific Chapter of the Association for Computational Linguistics and the
10th International Joint Conference on Natural Language Processing, Association for
Computational Linguistics, Suzhou, China, 2020, pp. 447–459. URL: https://aclanthology.
org/2020.aacl-main.46.
[17] N. Feldhus, L. Hennig, M. D. Nasert, C. Ebert, R. Schwarzenberg, S. Moller, Constructing
natural language explanations via saliency map verbalization, ArXiv abs/2210.07222 (2022).
[18] H. Schuf, A. Jacovi, H. Adel, Y. Goldberg, N. T. Vu, Human interpretation of
saliencybased explanation over text, in: 2022 ACM Conference on Fairness, Accountability, and
Transparency, FAccT ’22, Association for Computing Machinery, New York, NY, USA,
2022, p. 611–636. URL: https://doi.org/10.1145/3531146.3533127. doi:10.1145/3531146.
3533127.
[19] S. Wiegrefe, A. Marasović, Teach me to explain: A review of datasets for explainable
natural language processing, 2021. URL: https://arxiv.org/abs/2102.12060. doi:10.48550/
ARXIV.2102.12060.
[20] R. Mochales, M.-F. Moens, Argumentation mining, Artificial Intelligence and Law 19
(2011) 1–22. doi:10.1007/s10506-010-9104-x.
[21] R. M. Palau, M.-F. Moens, Argumentation mining: The detection, classification and
structure of arguments in text, in: Proceedings of the 12th International Conference on
Artificial Intelligence and Law, ICAIL ’09, Association for Computing Machinery, New
York, NY, USA, 2009, p. 98–107. URL: https://doi.org/10.1145/1568234.1568246. doi:10.
1145/1568234.1568246.
[22] D. Khurana, A. Koli, K. Khatter, S. Singh, Natural language processing: state of the art,
current trends and challenges, Multimedia Tools and Applications 82 (2023) 3713–3744.</p>
      <p>URL: https://doi.org/10.1007/s11042-022-13428-4. doi:10.1007/s11042-022-13428-4.
[23] J. Hirschberg, C. D. Manning, Advances in natural language
processing, Science 349 (2015) 261–266. URL: https://www.science.
org/doi/abs/10.1126/science.aaa8685. doi:10.1126/science.aaa8685.
arXiv:https://www.science.org/doi/pdf/10.1126/science.aaa8685.
[24] M. Sabou, K. Bontcheva, A. Scharl, Crowdsourcing research opportunities: Lessons from
natural language processing, in: Proceedings of the 12th International Conference on
Knowledge Management and Knowledge Technologies, i-KNOW ’12, Association for
Computing Machinery, New York, NY, USA, 2012. URL: https://doi.org/10.1145/2362456.
2362479. doi:10.1145/2362456.2362479.
[25] D. Alvarez-Melis, T. S. Jaakkola, Towards robust interpretability with self-explaining neural
networks, 2018. arXiv:1806.07538.
[26] P. Atanasova, J. G. Simonsen, C. Lioma, I. Augenstein, A diagnostic study of explainability
techniques for text classification, in: Proceedings of the 2020 Conference on Empirical
Methods in Natural Language Processing (EMNLP), Association for Computational
LinarXiv:1802.03052.
[39] Z. Xie, S. Thiem, J. Martin, E. Wainwright, S. Marmorstein, P. Jansen, WorldTree v2:
A corpus of science-domain structured explanations and inference patterns supporting
multi-hop inference, in: Proceedings of the Twelfth Language Resources and
Evaluation Conference, European Language Resources Association, Marseille, France, 2020, pp.
5456–5473. URL: https://aclanthology.org/2020.lrec-1.671.
[40] L. Carstens, F. Toni, Towards relation based argumentation mining, in: Proceedings of
the 2nd Workshop on Argumentation Mining, Association for Computational Linguistics,
Denver, CO, 2015, pp. 29–34. URL: https://aclanthology.org/W15-0504. doi:10.3115/v1/
W15-0504.
[41] F. van Eemeren, R. Grootendorst, A systematic theory of argumentation: The
pragmadialectical approach (2003). doi:10.1017/CBO9780511616389.
[42] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts, Learning word
vectors for sentiment analysis, in: Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies,
Association for Computational Linguistics, Portland, Oregon, USA, 2011, pp. 142–150. URL:
http://www.aclweb.org/anthology/P11-1015.
[43] R. Nallapati, B. Zhou, C. dos Santos, Ç. Gulçehre, B. Xiang, Abstractive text summarization
using sequence-to-sequence RNNs and beyond, in: Proceedings of the 20th SIGNLL
Conference on Computational Natural Language Learning, Association for Computational
Linguistics, Berlin, Germany, 2016, pp. 280–290. URL: https://aclanthology.org/K16-1028.
doi:10.18653/v1/K16-1028.
[44] J. Thorne, A. Vlachos, C. Christodoulopoulos, A. Mittal, FEVER: a large-scale dataset for
fact extraction and VERification, in: Proceedings of the 2018 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long Papers), Association for Computational Linguistics, New
Orleans, Louisiana, 2018, pp. 809–819. URL: https://aclanthology.org/N18-1074. doi:10.
18653/v1/N18-1074.
[45] E. Rilof, M. Thelen, A rule-based question answering system for reading comprehension
tests, in: Proceedings of the 2000 ANLP/NAACL Workshop on Reading Comprehension
Tests as Evaluation for Computer-Based Language Understanding Sytems - Volume 6,
ANLP/NAACL-ReadingComp ’00, Association for Computational Linguistics, USA, 2000,
p. 13–19. URL: https://doi.org/10.3115/1117595.1117598. doi:10.3115/1117595.1117598.
[46] M. A. Calijorne Soares, F. S. Parreiras, A literature review on question answering
techniques, paradigms and systems, Journal of King Saud University - Computer and
Information Sciences 32 (2020) 635–646. URL: https://www.sciencedirect.com/science/article/pii/
S131915781830082X. doi:https://doi.org/10.1016/j.jksuci.2018.08.005.
[47] P. Katyayan, N. Joshi, Design and development of rule-based open-domain
questionanswering system on squad v2.0 dataset, 2022. arXiv:2204.09659.
[48] P. Rajpurkar, R. Jia, P. Liang, Know what you don’t know: Unanswerable questions for
squad, 2018. arXiv:1806.03822.</p>
    </sec>
    <sec id="sec-5">
      <title>A. Appendix</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Chen,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Is chatgpt a general-purpose natural language processing task solver</article-title>
          ?,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2302.06476. doi:
          <volume>10</volume>
          .48550/ARXIV.2302.06476.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          , J.-t. Huang,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <article-title>Is chatgpt a good translator? a preliminary study</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2301.08745. doi:
          <volume>10</volume>
          .48550/ARXIV.2301.08745.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cahyawijaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lovenia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Do</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination</article-title>
          , and interactivity,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2302.04023. doi:
          <volume>10</volume>
          .48550/ARXIV.2302.04023.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W. Cheng,
          <article-title>Exploring the limits of chatgpt for query or aspect-based text summarization</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2302.08081. doi:
          <volume>10</volume>
          . 48550/ARXIV.2302.08081.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ding</surname>
          </string-name>
          , L. Jing,
          <article-title>How would stance detection techniques evolve after the launch of chatgpt</article-title>
          ?,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2212.14548. doi:
          <volume>10</volume>
          .48550/ARXIV.2212.14548.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Samek</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-R. Müller</surname>
          </string-name>
          ,
          <source>Towards Explainable Artificial Intelligence</source>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>22</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -28954-
          <issue>6</issue>
          _1. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -28954-
          <issue>6</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Goebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lecue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Akata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stumpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kieseberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          , Explainable ai: The new 42?, in: A.
          <string-name>
            <surname>Holzinger</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kieseberg</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          <string-name>
            <surname>Tjoa</surname>
          </string-name>
          , E. Weippl (Eds.),
          <source>Machine Learning and Knowledge Extraction</source>
          , Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>295</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Vilone</surname>
          </string-name>
          , L. Longo,
          <source>Explainable artificial intelligence: a systematic review</source>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2006</year>
          .00093. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>2006</year>
          .
          <volume>00093</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <article-title>A survey of methods for explaining black box models</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1802</year>
          .
          <year>01933</year>
          . doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>1802</year>
          .
          <year>01933</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <article-title>Human-centered explainable ai (xai): From algorithms to user experiences</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2110.10790. doi:
          <volume>10</volume>
          .48550/ARXIV.2110. 10790.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>U.</given-names>
            <surname>Ehsan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wintersberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Streit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wachter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Riener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Operationalizing human-centered perspectives in explainable ai</article-title>
          ,
          <source>in: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA '21</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          . URL: https: //doi.org/10.1145/3411763.3441342. doi:
          <volume>10</volume>
          .1145/3411763.3441342.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tocchetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brambilla</surname>
          </string-name>
          ,
          <article-title>The role of human knowledge in explainable ai</article-title>
          ,
          <source>Data</source>
          <volume>7</volume>
          (
          <year>2022</year>
          ). URL: https://www.mdpi.com/2306-5729/7/7/93. doi:
          <volume>10</volume>
          .3390/data7070093.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Barzilay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jaakkola</surname>
          </string-name>
          ,
          <source>Rationalizing neural predictions</source>
          ,
          <year>2016</year>
          . URL: https://arxiv. org/abs/1606.04155. doi:
          <volume>10</volume>
          .48550/ARXIV.1606.04155.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tsvetkov</surname>
          </string-name>
          ,
          <article-title>Explaining black box predictions and unveiling data artifacts through influence functions</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2005</year>
          .06676. doi:
          <volume>10</volume>
          . 48550/ARXIV.
          <year>2005</year>
          .
          <volume>06676</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ampomah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Burton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Enshaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. Al</given-names>
            <surname>Moubayed</surname>
          </string-name>
          ,
          <article-title>Generating textual explanations guistics</article-title>
          ,
          <source>Online</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>3256</fpage>
          -
          <lpage>3274</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>263</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp- main.263.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Deep inside convolutional networks: Visualising image classification models and saliency maps</article-title>
          ,
          <year>2014</year>
          . arXiv:
          <volume>1312</volume>
          .
          <fpage>6034</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pröllochs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feuerriegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <article-title>Learning interpretable negation rules via weak supervision at document level: A reinforcement learning approach, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics</article-title>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>407</fpage>
          -
          <lpage>413</lpage>
          . URL: https://aclanthology.org/N19-1038. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          - 1038.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rajani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          , C. Xiong,
          <article-title>FastIF: Scalable influence functions for eficient model interpretation and debugging</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Online and
          <string-name>
            <given-names>Punta</given-names>
            <surname>Cana</surname>
          </string-name>
          , Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>10333</fpage>
          -
          <lpage>10350</lpage>
          . URL: https: //aclanthology.org/
          <year>2021</year>
          .emnlp-main.
          <volume>808</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .emnlp- main.808.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yogatama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Blunsom</surname>
          </string-name>
          ,
          <article-title>Program induction by rationale generation: Learning to solve and explain algebraic word problems, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Vancouver, Canada,
          <year>2017</year>
          , pp.
          <fpage>158</fpage>
          -
          <lpage>167</lpage>
          . URL: https://aclanthology.org/P17-1015. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P17</fpage>
          - 1015.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kotonya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <article-title>Explainable automated fact-checking for public health claims</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>7740</fpage>
          -
          <lpage>7754</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>623</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . emnlp- main.623.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Perelygin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <article-title>Recursive deep models for semantic compositionality over a sentiment treebank</article-title>
          ,
          <source>in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Seattle, Washington, USA,
          <year>2013</year>
          , pp.
          <fpage>1631</fpage>
          -
          <lpage>1642</lpage>
          . URL: https: //aclanthology.org/D13-1170.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [33]
          <string-name>
            <surname>J. DeYoung</surname>
            , S. Jain,
            <given-names>N. F.</given-names>
          </string-name>
          <string-name>
            <surname>Rajani</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Lehman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          <string-name>
            <surname>Wallace</surname>
          </string-name>
          ,
          <article-title>Eraser: A benchmark to evaluate rationalized nlp models</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1911</year>
          .03429. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>1911</year>
          .
          <volume>03429</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [34]
          <string-name>
            <surname>O.-M. Camburu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Rocktäschel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lukasiewicz</surname>
          </string-name>
          , P. Blunsom, e
          <article-title>-snli: Natural language inference with natural language explanations</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1812</year>
          .01193. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>1812</year>
          .
          <volume>01193</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>T.</given-names>
            <surname>Khot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guerquin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sabharwal</surname>
          </string-name>
          ,
          <article-title>Qasc: A dataset for question answering via sentence composition</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1910</year>
          .11473.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Boschee</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Ren,</surname>
          </string-name>
          <article-title>Teaching machine comprehension with compositional explanations</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2005</year>
          .00806.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lamm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palomaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Alberti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Andor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Soares</surname>
          </string-name>
          , M. Collins,
          <article-title>Qed: A framework and dataset for explanations in question answering</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2009</year>
          .06354.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wainwright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marmorstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Morrison</surname>
          </string-name>
          ,
          <article-title>Worldtree: A corpus of explanation graphs for elementary science questions supporting multi-hop inference</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>