<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Applying a Model of Text Comprehension to Automated Verbalizations of E L Derivations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tanja Perleth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marvin Schiller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Birte Glimm</string-name>
          <email>birte.glimmg@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Ulm</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology verbalization techniques have been introduced to generate natural-language texts from ontology axioms and deduction steps. This allows users without knowledge of formal languages (e.g. OWL) to follow deductive inferences derived in ontologies. Since these explanations can be generated from di erent ontologies (not necessarily developed with verbalization in mind) and from potentially long, complex derivations, the question is to what extent these explanations are readable, understandable and useful for human readers. We apply the cognitive-psychology-based model of Kintsch and van Dijk to explanations generated automatically using a verbalization system for derivations in the EL fragment of description logic to model the reading process of a human reader. This allows for checking whether the generated explanations have a coherent text base (according to Kintsch and van Dijk's model) and for re-ordering the presented steps accordingly.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        To facilitate the work with ontologies, verbalization techniques and tools have
been developed (e.g. [
        <xref ref-type="bibr" rid="ref1 ref11 ref15 ref9">15, 9, 11, 1</xref>
        ] to mention just a few). They serve to generate
natural-language texts for ontology axioms and inferences, which is helpful both
for ontology debugging (to explain why an inference holds) and for users not
familiar with formal languages such as OWL. Systems for verbalizing inferences
(e.g. [
        <xref ref-type="bibr" rid="ref11 ref14">11, 14</xref>
        ]) typically use consequence-based reasoning (i.e. inference rules) and
text patterns to generate natural-language explanations, which, depending on
the complexity of the required inference steps, can become long and hard to
read. Therefore, it needs to be established in how far these explanations are
understandable and usable for human readers and { if necessary { how they can
be improved. We explore the application of a cognitive text processing model
to automatically generated explanations to assess the complexity involved in
reading and understanding them. This complements previous work aiming at
characterizing the cognitive complexity of individual inference rules and the
cognitive complexity of so-called justi cations (cf. Sect. 2).
      </p>
      <p>
        The contribution of this work consists in an implementation of the cognitive
text processing model proposed by Kintsch and van Dijk [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and its application
to automatically generated explanations for subsumptions in the E L fragment of
description logic (DL) by a verbalization tool, as detailed in Sect. 3. This includes
the transformation of generated texts to an abstract semantic representation of
its surface structure, a so-called text base that is formed out of propositions
(as de ned by Kintsch and van Dijk). We demonstrate how the model helps
to identify explanations that are deemed to be di cult to understand, namely
those that lack a coherent text base, and those that require long-term memory
search. The main idea is that the complexity of understanding an explanation
depends not only on the employed inference steps, but is also a ected by text
construction. In Sect. 4 we report on a rst study that compares generated
explanations with a coherent text base with a corresponding version without
coherent text base.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Related work includes verbalization techniques for ontologies and measures for
the cognitive complexity of inferences (and explanations generated for them).
2.1</p>
      <sec id="sec-2-1">
        <title>Verbalization</title>
        <p>
          Ontology verbalization techniques have been proposed to present ontological
axioms and derivations in the form of natural-language texts. Approaches that
focus on the verbalization of axioms (and descriptions of classes based on sets
of axioms) include a tool developed by the SWAT project [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], the OntoVerbal
verbalizer [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and NaturalOWL [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Approaches that address the verbalization
of derivations include the Classic system [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and the work of Borgida et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
and Nguyen et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In this work, we use our own verbalization tool
(henceforth referred to as \verbalizer") [
          <xref ref-type="bibr" rid="ref13 ref14">14, 13</xref>
          ]. Similarly to the above-mentioned
approaches (and similarly to the \tracing" facility of the ELK reasoner [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]), it uses
consequence-based reasoning to construct a proof tree for a derivation. These
proofs are then translated to natural-language texts using patterns (a
comprehensive list is found in [14, Fig. 1]). During the translation, the proof tree is
traversed using post-order traversal; i.e. for an inference step to be explained,
the derivations of the premises are explained before the conclusion is stated.
In this paper, we consider an entailment from (a smaller version of) the Galen
ontology1 as a running example: The entailment Bursa v HollowBodyStructure
(\a bursa is a hollow body structure") can be derived from the following axioms.
        </p>
        <sec id="sec-2-1-1">
          <title>Bursa v GenericInternalStructure</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>GenericInternalStructure v GenericBodyStructure</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>GenericBodyStructure v BodyStructure</title>
        </sec>
        <sec id="sec-2-1-4">
          <title>Bursa v 9hasTopology:(Topology u 9hasState:hollow)</title>
        </sec>
        <sec id="sec-2-1-5">
          <title>HollowBodyStructure (BodyStructure u 9hasTopology:(Topology u 9hasState:hollow)</title>
          <p>To establish how the entailment follows from the axioms, a proof tree is constructed
(concept names are abbreviated, e.g. \GIS" for \GenericInternalStructure"):
1 http://www.cs.man.ac.uk/~horrocks/OWL/Ontologies/galen.owl</p>
          <p>GIS v GBS GBS v BS</p>
        </sec>
        <sec id="sec-2-1-6">
          <title>B v BS B v 9hT:(Tu9hS:hollow)</title>
        </sec>
        <sec id="sec-2-1-7">
          <title>B v (BSu9hT:(Tu9hS:hollow))</title>
        </sec>
        <sec id="sec-2-1-8">
          <title>HBS (BSu9hT:(Tu9hS:hollow)) B v HBS</title>
          <p>This derivation is then translated to text (numbers are inserted for reference):
\(1) Since a bursa is a generic internal structure, which is a generic body structure,
which is a body structure, a bursa is a body structure.
(2) Furthermore, since a bursa is something that has a topology that has a hollow state,
a bursa is a body structure that has a topology that has a hollow state.
(3) A hollow body structure is a body structure that has a topology that has a hollow state.
(4) Thus, a bursa is a hollow body structure."</p>
          <p>
            In our previous work [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], we experimentally obtain rst indications of the
understandability of explanations generated from derivations up to a length of
seven inference steps. Furthermore, techniques to shorten the generated
explanations are discussed and evaluated (e.g. omitting inference steps considered
\trivial" from the explanations and introducing \shortcut" inference rules). In
this work, we consider a further extension to the inference rule set by an
additional \shortcut" rule. The new rule R v (which is used in step (1) of the above
example) represents an n-fold application of R v :
R v
(P1) C1 v C2 ...
          </p>
          <p>(Pn+1) Cn+1 v Cn+2
(C) C1 v Cn+2</p>
          <p>
            R v
(P1) C1 v C2
(C) C1 v C3
(P2) C2 v C3
The verbalization pattern for R v is: \Since verb(P1), which is verb(C3), ... ,
which is verb(Cn+2), verb(C).", where verb() represents the application of
verbalization patterns to basic OWL formulae (cf. [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]).
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Cognitive Complexity</title>
        <p>
          To measure the cognitive complexity of OWL inferences, several models have
been proposed. A rst step in understanding why an entailment holds consists
of nding minimal sets of axioms from which a consequence can be derived,
socalled justi cations [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Horridge et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] describe how the cognitive di culty of
a justi cation can be determined. Their measure is based on twelve dimensions,
which were established through an exploratory study and the authors' intuitions.
The resulting complexity score takes into account the structure and semantics
of a justi cation and its entailed conclusion. Justi cations are classed as hard if
they exceed a threshold value, otherwise they are considered as easy.
        </p>
        <p>
          Nguyen et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] tested single description logic inference rules and
established a so-called facility index for each rule. The facility indices were obtained
in a study where the subjects had to judge whether a given inference is correct
or not. The facility index for a certain rule represents the ratio of correct answers
to the total number of answers. The model of Nguyen et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] assumes that
the di culties of rule applications are multiplicative. This hypothesis was tested
(and to some degree con rmed) using derivations made up of two inference steps.
However, longer derivations were not tested.
        </p>
        <p>Whereas the above mentioned approaches deal with the cognitive complexity
of inference rules and justi cations, the readability of verbalizations as such,
and the modeling of the reading process by a human reader, are not taken into
consideration. Our work thus provides an additional perspective by addressing
the text comprehension aspect.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Modeling</title>
      <p>3.1</p>
      <sec id="sec-3-1">
        <title>Theory</title>
        <p>We rst present a brief summary of the text comprehension model by Kintsch
and van Dijk, focusing only on those parts of the theory relevant for our work.
Then we apply this approach to generated verbalizations.</p>
        <p>
          The text comprehension model of Kintsch and van Dijk speci es the construction
of a semantic representation of a text, called the text base. This representation is
based on propositions and relations between them. The elements of a proposition
are de ned by Kintsch [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] as word concepts, each being a lexical unit in its base
form. Propositions are represented as follows.
        </p>
        <p>(PREDICATE; ARGUMENT1; :::; ARGUMENTn)
Predicates are often realized on the surface structure as verbs, adjectives,
adverbs or conjunctions. Arguments are mostly nouns, prepositions and embedded
propositions which ful ll di erent semantic functions such as subject, object
or goal. The order in which the predicates appear in the text determines the
sequence of the propositions in the text base. So propositions are numbered
accordingly.</p>
        <p>We introduce a further kind of proposition to combine several propositions
into one to constitute more complex expressions. The original idea to expand
the model by introducing the notion of facts came from the authors themselves
[6, p. 390]. In this work, facts are de ned as an n-tuple of propositions.</p>
        <p>(PROPOSITION1; :::; PROPOSITIONn)
The text processing model assumes a text base to be coherent. That is, each
proposition must have at least one referentially cohesive relationship to another
preceding proposition. If the text base is not coherent, then the model cannot be
used. Referential coherence is established by the overlap of arguments between
two statements. For instance, a proposition (P1, A1, A2) is de ned as
referentially coherent with (P2, A2, A3) due to sharing the argument A2. If a
proposition is embedded in another, e.g. (P3, A4, (P1, A1, A2)), these two are also
considered referentially coherent. This establishes the coherent tree structure of
all propositions of a cohesive text base connected in a so-called coherence graph.</p>
        <p>P1
P1</p>
        <p>P4
P5
P9
P3
P7
P8</p>
        <p>P6
P4
P5
P9
P10</p>
        <p>P14
(b) Coherence graph for cycle 1</p>
        <p>P15
P11</p>
        <p>P12</p>
        <p>P13
No. Proposition/Fact
P1 (IS, BURSA, P4)
P2 (GENERIC, STRUCTURE)
P3 (INTERNAL, STRUCTURE)
P4 (P2, P3)
P5 (IS, P4, P7)
P6 (BODY, STRUCTURE)
P7 (P2, P6)
P8 (IS, P7, P6)
P9 (IS, BURSA, P6)
P10 (IS, BURSA, P11)
P11 (THAT HAS, SOMETHING, P12)
P12 (THAT HAS, TOPOLOGY, P13)
P13 (HOLLOW, STATE)
P14 (IS, BURSA, P15)
P15 (THAT HAS, P6, P12)
(a) Propositions and facts
(c) Coherence graph for cycle 2</p>
        <p>
          When processing a text, Kintsch and van Dijk assume that the propositions
are processed sentence by sentence, in so-called cycles. This stepwise processing
serves to model the capacity limitation of human short-term memory which is a
part of working memory. While reading a sentence, n propositions are processed
in working memory of which only s propositions can be stored in the short-term
memory bu er to be carried over to the next processing cycle. The capacity of
the short-term memory depends on the individual characteristics of the reader.
In the following, we restrict the capacity to four propositions (cf. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]). In each
cycle, all n propositions of the current sentence are processed by connecting
them to referentially coherent propositions that were stored in the bu er during
the prior cycle. If no referential coherence is found, the search for a connection
includes all previously processed propositions. This procedure is called long-term
memory search. Each cycle creates a sub-graph, all of which are nally combined
into a cohesion graph of the complete text base.
        </p>
        <p>
          We illustrate this process in Fig. 1 using the rst two sentences of the
running example (from Sect. 2.1). Initially, the rst proposition for initiating the
construction of the coherence graph must be selected. The rst proposition of
the list is selected if it is not embedded in any directly following fact. Otherwise,
the latter fact is selected. In our example, P1 is taken as the root and the
propositions P2 to P9 are incorporated into the coherence graph. This process begins
at the rst level of the graph by connecting all propositions having a
referential coherence to P1. This applies to P9, P4 and P5 since P1 and P9 share the
common argument BURSA, P4 is embedded in P1 and P1 shares with P5 the
proposition P4 as an argument. These propositions form the second level. Now
each proposition in this level is checked (in ascending order of their number) for
a connection to the remaining propositions. Once all current propositions have
been connected into the coherence graph by repeating this procedure, four of
them are stored in the short-term memory bu er using the so-called
leadingedge strategy [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. This strategy models the aspects of frequency and recency in
human short term memory. Starting from the subgraph of the rst cycle, all
propositions from Fig. 1 (b) along the path with the highest number including
the top proposition are selected, as long as each number is higher than the
previous one. Next, the propositions are selected level by level in ascending order
of their number starting with the highest level possible. If the storage capacity
is reached meanwhile or all available propositions are stored, the process
terminates. The selected propositions in the rst cycle are P1, P9, P5 and P4. These
connected propositions form the initial coherence graph for the next cycle where
the propositions P10 to P15 are included in the graph.
3.2
        </p>
        <p>Application to EL Explanations
For the applicability of Kintsch and van Dijk's model, explanations with a
coherent text base are assumed. The model does not specify whether texts without
a coherent text base are understandable, (much) more di cult, or not at all
understandable. In this work, we hypothesize that explanations without a coherent
text base are more di cult to understand than explanations with a coherent text
base. The reason is that humans have to resort to their long-term memory to
connect the current sentence to the previous text to form a coherence graph. The
process of long-term memory search is further described as resource-consuming.
As a possible measure of complexity (in the case of texts with a coherent text
base), it is therefore appropriate to determine whether, and how often, long-term
memory search takes place to obtain an estimate of the cognitive di culty of
explanations. Based on these assumptions, the cognitive complexity of
explanations can be divided into three levels.</p>
        <p>Complexity level 1: explanations without a coherent text base
(the most di cult to understand)
Complexity level 2: explanations with a coherent text base but also with
long-term memory search (di culty depends on the number of instances of
longterm memory search)
Complexity level 3: explanations with a coherent text base and without
longterm memory search (easiest to understand)
As a result, explanations without a coherent text base should be restructured
in such a way that they have a coherent text base and require no long-term
memory search.</p>
        <p>
          The input for the model of Kintsch and van Dijk is a list of propositions
that are generated from the explanations produced by the verbalizer. These
explanations are not unconstrained texts, for which building an all-encompassing
translation to propositions would be laborious. Rather, the explanations
follow a xed set verbalization patterns based on the available inference rules and
the structure of OWL formulae (cf. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]), so only these text patterns need to be
taken into account (instantiated with concept and role names). Fillwords in these
patterns such as \since" or \thus" as well as the phrase \according to its de
nition" which improve reading uency, are ignored. During verbalization, labels
for concept and role names are used where available in an ontology, otherwise
camel-cased names are simply split into words.
        </p>
        <p>Predicates. Predicates are mostly determined by the respective logical
constructor. This results in the following representation.</p>
        <p>Pv = (IS, ARGUMENT1; ARGUMENT2)</p>
        <p>Pu = (AND, ARGUMENT1; :::; ARGUMENTn)
P9 is formed di erently since the predicate is determined from the role name of
the existential restriction (usually a verb). For better readability of the
propositions \THAT" is pre xed in the predicate. Depending on which word types are
included in the role name, the predicate may have two or three arguments.</p>
        <p>P9 = (THAT + verb, ARGUMENT1; :::; ARGUMENTn) for n
3
Arguments. For Pv and Pu, the children of the corresponding constructor
determine the arguments of the respective proposition. If the children of a
unode are a concept name and an 9-node, proposition generation is delegated to
the 9-node, and the concept name is passed on to become ARGUMENT1 in
the proposition generated for the 9-node (as in P12 in Fig. 1, generated from
Topology u 9hasState.hollow. If an 9-node has no parent or a v-node as a parent,
the word concept SOMETHING is used for ARGUMENT1 (cf. P11 in Fig. 1).
Facts. The semantic representation of some concept and role names requires
the construction of several propositions. This is the case when class and role
names consist of several words (e.g. GenericBodyStructure). To construct
(possibly nested) arguments from these, the nouns, verbs, adjectives, etc. contained
in such composite names need to be distinguished. For this purpose, WordNet2
is employed together with an additional list of prepositions and conjunctions.
The more complex expression is obtained by combining the references to the
corresponding propositions.</p>
        <p>PFACT = (PROPOSITION1; :::; PROPOSITIONn)
For better illustration consider the logical expression LateralFemoralCondyle u
9isDivisionOf.Femur. A tree structure (Fig. 2 (a)) is created from the initial
logical structure, with nodes representing constructors and leaves representing
concept or role names. The propositions (Fig. 2 (b)) are formed by going through
this structure recursively. As described above, proposition generation at the rst
node (u-node) is skipped and begins at the 9-node where the left-hand leaf of
2 https://wordnet.princeton.edu/
(a) Tree structure</p>
        <p>P 1 = (LATERAL, CONDYLE)
P 2 = (FEMORAL, CONDYLE)
P 3 = (P1, P2)
P 4 = (THAT IS, P3, P5)
P 5 = (OF, DIVISION, FEMUR)</p>
        <p>
          (b) Proposition list
the 9-node's parent node is processed rst. Since the array consists of two
adjectives and one noun at the end (as determined by WordNet), P1 and P2 are
constructed. In line with [
          <xref ref-type="bibr" rid="ref16 ref7">7, 16</xref>
          ], attributes belonging to the same concept are
represented as separate individual propositions (e.g. P1 and P2) of the same rank.
To join these two propositions, the PFACT proposition P3 is generated, which
becomes ARGUMENT1 of the 9-node. The propositions for ARGUMENT2 are
determined by part-of-speech analysis of the 9-node's children (role name and
ller), with IS (the verb) becoming part of P4's predicate, and OF (a preposition)
relating the noun DIVISION with FEMUR (P5).
        </p>
        <p>Implementing the described cyclic text processing model helped us to check
if the explanations generated by the verbalizer have a coherent text base. We
discovered situations (labeled 1 and 3) where this was not the case:
.
.</p>
        <p>.</p>
        <p>A v B B v C
2:</p>
        <p>A v C</p>
        <p>A v D
.
.</p>
        <p>.</p>
        <p>C v D
3:</p>
        <p>A v C
.
.</p>
        <p>.</p>
        <p>C v D
A v D
When verbalizing an inference step that includes a premise that is provided as
an axiom (e.g. B v C in situation 1 and A v C in situation 3), and a later premise
that itself requires some derivation (C v D), our procedure explains how C v D
is derived before stating that the premises together yield the conclusion.
However, when explaining the derivation of C v D, C and D appear in the
explanation without necessarily being mentioned in the previous text, in which case
they cannot serve to establish referential coherence. In case of situation 1, a
restructuring (situation 2) of the derivation (by introducing an auxiliary inference
step) solves this problem (since the statement that A v C is derived provides
referential coherence for explaining the derivation of C v D). In case of situation 3,
this re-structuring is not possible. Here, the solution is to mention A v C before
explaining C v D (with a shared C for referential coherence), before stating that
together they yield the conclusion. While the above illustrations show the
inference rules for transitivity of v, the above observations also apply to some other
inference rules used by the verbalizer with at least two premises, e.g. situation 3
also applies to: ...</p>
        <p>C v D
A v 9r:C</p>
        <p>A v 9r:D
An online study was carried out to assess how a coherent text base a ects the
understandability of generated explanations. We compared generated explanations
without coherent text base with a corresponding explanation for which a
coherent text base was established. To nd candidate explanations among those that
can be generated using the \verbalizer" tool, we employed our implementation
of the text understanding model described above. Since we are not interested
in \toy example" ontologies, we selected four explanations from the
(aforementioned version of the) Galen ontology that were not too long (5{7 inference steps)
and with an incoherent text base (cf. e.g. Fig. 3).</p>
        <p>As an objective measure for participants' understanding, the explanations
were manipulated to contain errors. Participants were asked to indicate if \the
presented reasoning is logically correct (i.e. each step is a consequence of the
available knowledge)". Thus, for each explanation, four di erent versions were
created (with/without error and with/without coherent text base).
Unfortunately, a mistake was made during experiment preparation that a ected one of
the presented explanations. The results therefore only refer to three explanations
(named E1, E2 and E3 in the following).</p>
        <p>
          Participants. English-speaking participants were recruited through
advertisements at Ulm University, social media and personal contact, with a ra e for 25e
gift vouchers as an incentive. Eighteen participants (15 males, three females)
completed the study (among 42 who started the survey). One participant was
excluded due to suspected \straight-lining" (cf. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]), thus 17 remained.
Procedure. Participants were split into experimental groups according to
Table 1. After a short introduction to the study's requirements, participants were
introduced to the task using an example explanation. They were informed that
the explanations to be judged are generated from a knowledge database, and
that they may contain errors. After completing a pre-test questionnaire, each
participant was shown four explanations (e.g. Fig. 3). They were asked to read
the explanations step by step. For each of them, the participants should indicate
whether the explanation is correct, and they had to provide subjective ratings
concerning understandability and the adequacy of the order in which the
reasoning steps are presented. Further input elds were provided for the participants
to indicate which steps they considered to be erroneous or hard to understand.
Results. The classi cation accuracy is shown in Table 2 for the three di erent
explanations E1{3 in their original version (incoherent text base) and their
improved version with coherent text base. As mentioned, respondents were asked
in which step they suspected the error. If this indication did not match the
\actual" error, the response was excluded (cf. Table 2).
        </p>
        <p>Regarding the question whether the coherence of the text base may have
affected the classi cation accuracy, only the data for the explanation E1 hint at a
potential e ect (slightly better classi cation performance when the text base is
coherent). A binomial test yields p = 0:074, and a Chi-Square goodness-of- t test
yields Chi2(1)=4.5125, p &lt; 0:05, though both these indications should be taken
with great caution due to small sample size. When considering all responses
for explanations E1-3, the experiment did not yield signi cant improvements in
classi cation accuracy for a coherent vs. an incoherent text base.
Figure 4 shows the mean scores of participants' answers (for correctly classi ed
explanations only) to our questions regarding readability. Participants had a
neutral to slightly positive tendency with regards to question Q1 \[The] explanation
is easy to understand", both with and without coherent text base. Responses to
Q2 \The order in which the reasoning steps are presented is appropriate" and
Q3 \The order in which the reasoning steps are presented should be changed"
indicated that participants mostly agreed with the presented order. They had a
tendency to agree to Q4 \Each step by itself is understandable", though answers
were mixed. Similarly, answers to Q5 \Some sentences are di cult to read" were
mixed. Overall, the answers suggest that participants in most cases did not
consider the explanations hard to understand. However, coherence of the text base
did not lead to a more positive assessment by the participants.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions &amp; Discussion</title>
      <p>The application of the text understanding model by Kintsch and van Dijk
revealed that some explanations generated by our verbalization tool lacked a
coherent text base (already in the inexpressive E L fragment of DL). This allowed
us to take the issue into account and to explore in a rst study whether this
helped to improve understandability. Our rst results are inconclusive in this
respect. Improving the coherence of the text base did not yield a clear e ect:
Classi cation accuracy had a tendency to improve for one of the explanations
employed as material, but remained the same for two others. Understandability
as reported by the participants was also not found to improve.</p>
      <p>Several factors may have
played a role: the overall
dif</p>
      <p>culty of the task (as
evidenced by the number of
participants who quit) and
the resulting small sample
size, the set of \naturalistic"
explanations taken from the
Galen ontology, and other
inadequacies of the generated
explanations (length,
repetitiveness, ambiguities) that
might in addition to in- Fig. 4: Mean responses to questions Q1-Q5 for
coherence a ect readability. explanations E1-E3 on a 5-point Likert scale
Furthermore, most partici- (1: agree { 5: disagree), together with standard
pants were not English na- errors of the mean.
tive speakers. Nevertheless,
the reported observations are useful for setting up further, more targeted studies
with a larger number of participants. The examination of other predictions made
by Kintsch and van Dijk's model, e.g. whether long-term memory search a ects
the readability of the explanations, remains for future work, as well as
applying our methodology to more expressive fragments of DL (since the employed
verbalization tool is only gradually extended to more expressive logics than E L).
Acknowledgments. We acknowledge the support of the German Research
Foundation (DFG) within the project \Live Ontologies" (KA 3470/2-2) and the
technology transfer project \Do it yourself, but not alone: Companion Technology
for Home Improvement" of the Transregional Collaborative Research Center
SFB/TRR 62 with the industrial project partner Robert Bosch GmbH. We thank
four reviewers and F. S. for helpful comments and all experiment participants.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lampouras</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galanis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Generating natural language descriptions from OWL ontologies: The NaturalOWL system</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          <volume>48</volume>
          ,
          <volume>671</volume>
          {
          <fpage>715</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Borgida</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franconi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Explaining ALC subsumption</article-title>
          . In: Horn,
          <string-name>
            <surname>W</surname>
          </string-name>
          . (ed.)
          <source>Proceedings of the 14th European Conference on Arti cial Intelligence</source>
          , pp.
          <volume>209</volume>
          {
          <fpage>213</fpage>
          . IOS Press (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Horridge</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bail</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>The cognitive complexity of OWL justi cations</article-title>
          . In: Aroyo,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Welty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Alani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Taylor</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kagal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomqvist</surname>
          </string-name>
          , E. (eds.)
          <source>The Semantic Web { ISWC 2011. LNCS</source>
          , vol.
          <volume>7031</volume>
          , pp.
          <volume>241</volume>
          {
          <fpage>256</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>House</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Respondent screening and revealed preference axioms: Testing quarantining methods for enhanced data quality in web panel surveys</article-title>
          .
          <source>Public Opinion Quarterly</source>
          <volume>79</volume>
          (
          <issue>3</issue>
          ),
          <volume>687</volume>
          {
          <fpage>709</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klinov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Goal-directed tracing of inferences in EL ontologies</article-title>
          . In: Mika,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tudorache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Welty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Vrandecic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Janowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Goble</surname>
          </string-name>
          , C. (eds.)
          <source>The Semantic Web { ISWC</source>
          <year>2014</year>
          ,
          <article-title>LNCS</article-title>
          , vol.
          <volume>8797</volume>
          , pp.
          <volume>196</volume>
          {
          <fpage>211</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kintsch</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>The representation of meaning in memory</article-title>
          . Oxford, England: Lawrence Erlbaum (
          <year>1974</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kintsch</surname>
            , W., van Dijk,
            <given-names>T.A.</given-names>
          </string-name>
          :
          <article-title>Toward a model of text comprehension and production</article-title>
          .
          <source>Psychological review 85(5)</source>
          ,
          <volume>363</volume>
          {
          <fpage>394</fpage>
          (
          <year>1978</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kintsch</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vipond</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Reading comprehension and readability in educational practice and psychological theory</article-title>
          . In: Nilsson,
          <string-name>
            <surname>L.G</surname>
          </string-name>
          . (ed.)
          <article-title>Memory: Processes and problems</article-title>
          . Hillsdale, N.J.:
          <string-name>
            <surname>Erlbaum</surname>
          </string-name>
          (
          <year>1978</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>S.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scott</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rector</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Ontoverbal: A generic tool and practical application to SNOMED CT</article-title>
          .
          <source>International Journal of Advanced Computer Science and Applications (IJACSA) 4</source>
          (
          <issue>6</issue>
          ),
          <volume>227</volume>
          {
          <fpage>239</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Explaining Reasoning in Description Logics</article-title>
          .
          <source>Ph.D. thesis</source>
          , Rutgers University (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Power</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piwek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Measuring the understandability of deduction rules for OWL</article-title>
          . In: Lambrix,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Horridge</surname>
          </string-name>
          , M. (eds.)
          <source>First International Workshop on Debugging Ontologies and Ontology Mappings (WoDOOM12)</source>
          , pp.
          <volume>1</volume>
          {
          <fpage>12</fpage>
          . Linkoping University Electronic Press (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Power</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piwek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Predicting the understandability of OWL inferences</article-title>
          . In: Cimiano,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Corcho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Rudolph</surname>
          </string-name>
          , S. (eds.)
          <source>The Semantic Web: Semantics and Big Data</source>
          , pp.
          <volume>109</volume>
          {
          <fpage>123</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Schiller</surname>
            ,
            <given-names>M.R.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Towards explicative inference for OWL</article-title>
          . In: Eiter,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Glimm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kazakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          , Krotzsch, M. (eds.)
          <source>Proceedings of the 28th International Workshop on Description Logics, CEUR Workshop Proceedings</source>
          , vol.
          <volume>1014</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Schiller</surname>
            ,
            <given-names>M.R.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiller</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Testing the adequacy of automated explanations of EL subsumptions</article-title>
          . In: Artale,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Glimm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 30th International Workshop on Description Logics (DL)</source>
          ,
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <source>1879</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malone</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Power</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Third</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automating generation of textual class de nitions from OWL to English</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <volume>2</volume>
          (
          <issue>Suppl</issue>
          . 2),
          <source>S5</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. van Dijk,
          <string-name>
            <given-names>T.A.</given-names>
            ,
            <surname>Kintsch</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          :
          <article-title>Strategies of discourse comprehension</article-title>
          . New York: Academic Press (
          <year>1983</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>