<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Completion Graph Caching for Expressive Description Logics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreas Steigmiller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Birte Glimm</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thorsten Liebig</string-name>
          <email>liebig@derivo.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Ulm</institution>
          ,
          <addr-line>Ulm, Germany, &lt;first name&gt;.&lt;last</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>derivo GmbH</institution>
          ,
          <addr-line>Ulm</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Reasoning in very expressive Description Logics (DLs) such as SROIQ is often hard
since non-determinism, e.g., from disjunctions or cardinality restrictions, requires a
case-by-case analysis and since a strict separation between intensional (TBox) and
extensional (ABox) knowledge is not possible due to nominals. Current state-of-the-art
reasoners for SROIQ are typically based on (hyper-)tableau algorithms, where higher
level reasoning tasks such as classification are reduced to possibly many consistency
tests. For large ABoxes and more expressive DLs, reasoning then easily becomes
infeasible since the ABox has to be considered in each test.</p>
      <p>For less expressive DLs, optimisations have been proposed that allow for
considering only parts of the ABox for checking whether an individual is an instance of a
specific concept [22,23]. It is, however, not clear how these optimisations can be
extended to SROIQ. Furthermore, approaches based on partitioning and modularisation
require a syntactic pre-analysis of the concepts, roles, and individuals in a KB, which
can be quite costly for more expressive DLs and, due to the static analysis, only queries
with specific concepts are supported. Rewriting axioms such that ABox reasoning is
improved also carries the risk that this negatively influences other reasoning tasks for
which ABox reasoning is potentially not or less relevant.</p>
      <p>
        Another possibility is the caching of the completion graph that is built by the tableau
algorithm during the initial consistency check. The re-use of (parts of) the cached
completion graph in subsequent tests reduces the number of consequences that have to be
re-derived for the individuals of the ABox. Existing approaches based on this idea (e.g.,
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]) are, however, not very suitable for handling non-determinism. Since caching and
re-using non-deterministically derived facts can easily cause unsound consequences,
the non-deterministically derived facts are usually simply discarded and re-derived if
necessary. We address this and present a technique that allows for identifying
nondeterministic facts that can safely be re-used. The approach is based on a set of
conditions that can be checked locally and, thus, allows for e ciently identifying individuals
step-by-step for which non-deterministic consequences have to be re-considered in
subsequent tests. The presented technique can directly be integrated into existing
tableaubased reasoning systems without significant adaptations and reduces reasoning e ort
for all tasks for which consequences from the ABox are potentially relevant. Moreover,
The author acknowledges the support of the doctoral scholarship under the Postgraduate
Scholarships Act of the Land of Baden-Wuerttemberg (LGFG).
it can directly be used for the DL SROIQ, does not produce a significant overhead, and
can easily be extended, e.g., to improve incremental ABox reasoning. Note that
completion graph caching is complementary to many other optimisations that try to reduce
the number of subsequent tests for higher level reasoning tasks, e.g., summarisation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
abstraction and refinement [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], bulk processing and binary retrieval [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], (pseudo) model
merging [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], extraction of known/possible instances from model abstractions [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        The paper is organised as follows: We next introduce some preliminaries and then
present the basic completion graph caching technique in Section 3. In Section 4, we
present several extensions and applications to support nominals in ordinary satisfiability
caching, to perform incremental reasoning for changing ABoxes, and to handle very
large ABoxes by storing data in a representative way. Finally, we present an evaluation
of the presented techniques in Section 5 and conclude in Section 6. Further details and
an extended evaluation are available in a technical report [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        For brevity, we do not introduce DLs (see, e.g., [
        <xref ref-type="bibr" rid="ref1 ref10">1,10</xref>
        ]). We use (possibly with
subscripts) C; D for (possibly complex) concepts, A; B for atomic concepts, and r; s for
roles. We assume that a SROIQ knowledge base K is expressed as the union of a
TBox T consisting of GCIs of the form C v D, a role hierarchy R, and an ABox A,
such that the e ects of complex roles are implicitly encoded (e.g., based on automata
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or regular expressions [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). We write nnf(C) to denote the negation normal form
of C. For r a role, we set inv(r) := r and inv(r ) := r.
      </p>
      <p>
        Model construction calculi such as tableau [
        <xref ref-type="bibr" rid="ref10 ref13">10,13</xref>
        ] decide the consistency of a KB
K by trying to construct an abstraction of a model for K , a so-called completion graph.
For the purpose of this paper, we use a tuple of the form (V; E; L; ,˙ ; M) for a completion
graph G, where each node x 2 V (edge hx; yi 2 E) represents one or more (pairs of)
individuals. Each node x (edge hx; yi) is labelled with a set of concepts (roles), L(x)
(L(hx; yi)), which the (pairs of) individuals represented by x (hx; yi) are instances of.
The relation ,˙ records inequalities, which must hold between nodes, e.g., due to
atleast cardinality restrictions, and the mapping M tracks merging activities, e.g., due
to at-most cardinality restrictions or nominals. The algorithm works by decomposing
concepts in the completion graph with a set of expansion rules. For example, if C1 t
C2 2 L(v) for some node v, but neither C1 2 L(v) nor C2 2 L(v), then the rule for
handling disjunctions non-deterministically adds one of the disjuncts to L(v). Similarly,
if 9r:C 2 L(v), but v does not have an r-successor with C in its label, then the algorithm
expands the completion graph by adding the required successor node. If a node v is
merged into a node v0, M is extended by v 7! v0. We use mergedToM(v) to return the
node into which a node v has been merged, i.e., mergedToM(v) = mergedToM(w) for
v 7! w 2 M and mergedToM(v) = v otherwise.
      </p>
      <p>
        Unrestricted application of rules for existential or at-least cardinality restrictions
can lead to the introduction of infinitely many new nodes. To guarantee termination,
a cycle detection technique called (pairwise) blocking [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] restricts the application of
these rules. The rules are repeatedly applied until either the completion graph is fully
expanded (no more rules are applicable), in which case the graph can be used to
construct a model that witnesses the consistency of K , or an obvious contradiction (called
a clash) is discovered (e.g., both C and :C in a node label), proving that the completion
graph does not correspond to a model. A knowledge base K is consistent if the rules
can be applied such that they build a fully expanded and clash-free completion graph.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Completion Graph Caching and Reusing</title>
      <p>In the following, we use superscripts to distinguish di erent versions of completion
graphs. We denote with Gd = (Vd; Ed; Ld; ,˙d; Md) the last completion graph of the
initial consistency test that is obtained with only deterministic rule applications and with
Gn = (Vn; En; Ln; ,˙n; Mn) the fully expanded (and clash-free) completion graph that
possibly also contains non-deterministic choices. Obviously, instead of starting with
the initial completion graph G0 to which no rule has (yet) been applied, we can use
Gd to initialise a completion graph G for subsequent consistency tests which are, for
example, required to prove or refute assumptions of higher level reasoning tasks. To be
more precise, we extend Gd to G by the new individuals or by additional assertions to
original individuals as required for a subsequent test and then we can apply the tableau
expansion rules to G. Note, in order to be able to distinguish the nodes/nominals in the
di erent completion graphs, we assume that all nodes/nominals that are newly created
for G do not occur in existing completion graphs, such as Gd or Gn.</p>
      <p>
        This re-use of Gd is an obvious and straightforward optimisation and it is already
used by many state-of-the-art reasoning systems to successfully reduce the work that
has to be repeated in subsequent consistency tests [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Especially if the knowledge
base is deterministic, the tableau expansion rules only have to be applied for the newly
added assertions in G. In principle, also Gn can be re-used instead of Gd [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], but
this causes problems if non-deterministically derived facts of Gn are involved in new
clashes. In particular, it is required to do backtracking in such cases, i.e., we have to
jump back to the last version of the initial completion graph that does not contain the
consequences of the last non-deterministic decision that is involved in the clash. Then,
we have to continue the processing by choosing another alternative. Obviously, if we
have to jump back to a very early version of the completion graph, then potentially
many non-deterministic decisions must be re-processed. Moreover, after jumping back,
we also have to add and re-process the newly added individuals and/or assertions.
      </p>
      <p>To improve the handling of non-deterministic knowledge bases, our approach uses
criteria to check whether nodes in G (or the nodes in a completion graph G0 obtained
from G by further rule applications) are “cached”, i.e., there exist corresponding nodes
in the cached completion graph of the initial consistency test and, therefore, it is not
required to process them again. These caching criteria check whether the expansion of
nodes is possible as in the cached completion graph Gn without influencing modified
nodes in G0, thus, only the processing of new and modified nodes is required.
Definition 1 (Caching Criteria). Let Gd = (Vd; Ed; Ld; ,˙d; Md) be a completion graph
with only deterministically derived consequences and Gn = (Vn; En; Ln; ,˙ n; Mn) a fully
expanded and clash-free expansion of Gd. Moreover, let G be an extension of Gd and
G0 = (V0; E0; L0; ,˙0; M0) a completion graph that is obtained from G by applying
tableau expansion rules.</p>
      <p>A node v0 2 V0 is cached in G0 if caching of the node is not invalid, where the
caching is invalid (we then also refer to the node as non-cached) if
C1 v0 &lt; Vd or mergedToMn (v0) &lt; Vn;</p>
      <p>n
C2 L0(v0) * Ln(mergedToM (v0));</p>
      <p>n
C3 8r:C 2 Ln(mergedToM (v0)) and there is an r-neighbour node w0 of v0 such that w0
is not cached and C &lt; L(w0);
C4 6 m r:C 2 Ln(mergedToMn (v0)) and the number of the non-cached r-neighbour
nodes of v0 without nnf(:C) in their labels together with the r-neighbours in Gn of
n
mergedToM (v0) with C in their labels is greater than m;</p>
      <p>n
C5 9r:C 2 Ln(mergedToM (v0)) and every r-neighbour node w0 of v0 with C &lt; L(w0)
n
and C 2 Ln(mergedToM (w0)) is not cached;</p>
      <p>n
C6 &gt; m r:C 2 Ln(mergedToM (v0)) and the number of r-neighbour nodes wn1; : : : ; wkn
of mergedToMn (v0) with C 2 Ln(wn); : : : ; C 2 Ln(wkn), for which there is either no
n 1 n
node wi0 2 V0 with mergedToM (wi0) = win or wi0 with mergedToM (wi0) = win and
C &lt; L(wi0) is not cached for 1 i k, is less than m;
C7 mergedToMn (v0) is a nominal node with 6 m r:C in its label and there exists a
blockable and non-cached inv(r)-predecessor node w0 of v0 with nnf(:C) &lt; L(w0);
n n
C8 mergedToM (w0) is an r-neighbour node of mergedToM (v0) such that w0 is not
cached and w0 is not an r-neighbour node of v0;</p>
      <p>n
C9 mergedToM (v0) has a successor node un such that un or a descendant of un has
n
a successor node mergedToM (w0) for which w0 2 V0, w0 is not cached, and there
exists no node u0 2 V0 with mergedToMn (u0) = un;</p>
      <p>n n
C10 mergedToM (v0) is blocked by mergedToM (w0) and w0 2 V0 is non-cached;
n n
C11 there is a non-cached node w0 2 V0 and mergedToM (v0) = mergedToM (w0); or
n n
C12 there is a node w0 2 V0 such that v0 ,˙0w0 and mergedToM (v0) = mergedToM (w0).</p>
      <p>Conditions C1 and C2 ensure that a node also exists in the cached completion graph
Gn and that its label is a subset of the corresponding label in Gn such that the same
expansion is possible. C3 checks whether the expansion of a node would add a concept
of the form 8r:C such that it could propagate C to a non-cached neighbour node.
Analogously, C4 checks for potentially violated at-most cardinality restrictions by counting
the new or modified neighbours in G0 and the neighbours in Gn. C5 and C6 verify that
existential and at-least cardinality restrictions are still satisfied if the cached nodes are
expanded identically. C7 checks whether the NN-rule of the tableau algorithm would
be applicable after the expansion, i.e., we check whether all potentially relevant
neighbours in G0 are nominal nodes. C8 checks whether the expansion would add additional
roles to edge labels between cached and non-cached nodes, which could be
problematic for disjoint roles. For C9: If a node w0, for which caching is invalid, is connected to
nodes in Gn that are only available in Gn (e.g., non-deterministically created ones), then
we have to identify caching as invalid for those ancestors of these nodes that are also in
G0 such that these connections to w0 can be re-built. Otherwise, we would potentially
miss new consequences that could be propagated from w0. C10 is used to reactivate the
processing of nodes for which the caching of the blocker node is invalid. C11 and C12
ensure that merging is possible as in Gn: C11 checks whether the node into which the
node is merged is also cached and C12 ensures that there is no additional entry for ,˙0
that would cause a clash if the nodes were merged as in Gn.</p>
      <p>L(va) = n &gt;; fag; 9r:fbg; B t A1; A1 o
va
r</p>
      <p>vb
L(v1) = n &gt;; 9r:(fcg u A2) o</p>
      <p>s
r
v1
r</p>
      <p>vc
L(vc) =
( &gt;; fcg; 9s:fbg; )</p>
      <p>fcg u A2;A2;A3</p>
      <p>L(vb) = ( &gt;; fbg; B9tr:99rr::9(frc:g(fucgAu2)A;82)s; B:At3 8s :A3; )</p>
      <p>
        Since only those nodes can be cached, which are available in the “deterministic”
completion graph Gd, it is important to maximise the deterministic processing of the
completion graph. This can, for example, be achieved by processing the completion
graph with deterministic rules only until the generation of new nodes is subset blocked.
Subset blocking is not su cient for more expressive DLs, but it prevents the expansion
of too many successor nodes in case non-deterministic rule applications merge and
prune some parts of the completion graph. Absorption techniques (e.g., [
        <xref ref-type="bibr" rid="ref14 ref18">14,18,21</xref>
        ]) are
also essential since they reduce the overall non-determinism in an ontology.
Example 1. Assume that the tableau algorithm builds a completion graph as depicted
in Figure 1 for testing the consistency of a knowledge base K containing the axioms
fag v B t A1
fbg v B t 9r:9r:(fcg u A2)
fcg v 9s:fbg
fag v 9r:fbg fbg v B t 8s :A3:
The deterministic version of the completion graph, Gd, contains the elements coloured
in black in Figure 1 and the non-deterministic version, Gn, additionally contains the
elements coloured in grey. If we want to determine which individuals are instances of
the concept 9r:&gt;, then we have to check, for each individual i in K , the consistency
of K extended by nnf(:9r:&gt;)(i), which is equivalent to adding the axiom fig v 8r:?.
The individual a is obviously an instance of this concept, which can also be observed
by the fact that expanding the extended completion graph adds 8r:? to Ld(va), which
immediately results in a clash. In contrast, if we extend Ld(vb) by 8r:?, we have to
process the disjunctions Bt9r:9r:(fcgu A2) and Bt8s :A3. The disjunct 9r:9r:(fcgu A2)
would result in a clash and, therefore, we choose B, which also satisfies the second
disjunction. Note that va and vc do not have to be processed since va is cached, i.e.,
its expansion is possible in the same way as in Gn, and vc does not have any concepts
for which processing is required. Last but not least, we have to test whether Ld(vc)
extended by 8r:? is satisfiable. Now, the caching of vc is obviously invalid (due to C2)
and, therefore, also the caching of vb is invalid: C3 can be applied for 8s :A3 and C9
for the successor node v1 of vb in Gn which also has the non-cached successor node
vc. Since va is still cached, we only have to process vb again, which does, however, not
result in a clash. Hence, only a is an instance of 9r:&gt;.
      </p>
      <p>Most conditions can be checked locally: For a non-cached node, we simply follow
its edges w.r.t. G0 and Gn to (potentially indirect) neighbour nodes that are also available
in Gd. Exceptions are Conditions C10, C11, and C12, for which we can, however,
simply trace back established blocking relations or Mn to find nodes for which the caching
criteria have to be checked. The implementation can also be simplified by keeping the
caching of a node invalid once it has been identified as invalid (even if the caching
becomes possible again after the node has been expanded identically). Instead of directly
checking the caching criteria, the relevant nodes can also be stored in a set/queue to
check the conditions when it is (more) convenient. Clearly, it depends on the ontology,
whether it is more beneficial to test the caching criteria earlier or later. If the criteria
are tested early, then we could unnecessarily re-process some parts of the cached
completion graph since the application of (non-deterministic) expansion rules can satisfy
some conditions. A later testing could cause a repeated re-processing of the same parts
if a lot of backtracking is required and consequences of re-activated nodes are involved
in clashes. Alternatively, one could learn for the entire ontology or for single nodes,
whether the criteria should be checked earlier or later. Unfortunately, this cannot easily
be realised for all reasoning systems since it requires that dependencies between
derived facts are precisely tracked in order to identify nodes that are often involved in the
creation of clashes and for which the criteria should be checked early.</p>
      <p>It is also possible to non-deterministically re-use derived consequences from Gn,
n
i.e., if the caching is invalid for a node v and mergedToM (v) is in Gn, then we can
nonn
deterministically add the missing concepts from Ln(mergedToM (v)) to L(v). Since the
resulting completion graph is very similar to the cached one, caching can often be
established quickly for many nodes. Of course, if some of the non-deterministically added
concepts are involved in clashes, then we potentially have to backtrack and process the
alternative where this node is ordinarily processed. Another nice side e ect of storing
Gn is that we can use the non-deterministic decisions from Gn as an orientation in
subsequent consistency tests. By prioritising the processing of the same non-deterministic
alternatives as for Gn, we can potentially find a solution that is very similar to Gn
without exploring much of the search space.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Caching Extensions and Applications</title>
      <p>In this section, we sketch additional applications of the caching technique, which allow
for supporting nominals for the satisfiability caching of node labels and for reducing
the incremental reasoning e ort for changing ABoxes. Furthermore, we describe an
extension that allows for caching (parts of) the completion graph in a representative
way, whereby an explicit representation of many nodes in the completion graph can be
avoided and, therefore, the memory consumption can be reduced.</p>
      <p>
        Satisfiability Caching with Nominals: Caching the satisfiability status of labels of
(blockable) nodes in completion graphs is an important optimisation technique for
tableau-based reasoning systems [
        <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
        ]. If one obtains a fully expanded and clash-free
completion graph, then the contained node labels can be cached and, if identical labels
occur in other completion graphs, then their expansion (i.e., the creation of required
successor nodes) is not necessary since their satisfiability has already been proven. Of
course, for more expressive DLs that include, for example, inverse roles and cardinality
restrictions, we have to consider pairs of node labels as well as the edge labels between
these nodes. Unfortunately, this kind of satisfiability caching does not work for DLs
with nominals. In particular, connections to nominal nodes can be used to propagate
new concepts from one blockable node in the completion graph to any other blockable
node, whereas for DLs without nominals, the consequences can only be propagated
from or to successor nodes. Hence, the caching of node labels is not easily possible and
many reasoners deactivate this kind of caching for knowledge bases with nominals.
      </p>
      <p>However, in combination with completion graph caching, we re-gain the possibility
to cache some labels of (blockable) nodes for knowledge bases with nominals. Roughly
speaking, we first identify which nodes “depend on” which nominals, i.e., which
nominal nodes are existentially quantified as a successor/descendant for a node. The labels
of such nodes can then be cached together with the dependent nominals, i.e., with those
nominals on which the nodes depend, if all nodes for the dependent nominals are still
cached w.r.t. the initial completion graph. These cache entries can be re-used for nodes
with identical labels in subsequent completion graphs as long as the completion graph
caching of the nodes for the dependent nominals is not invalid. Of course, the blocked
processing of nodes due to matching cache entries has to be reactivated if the
completion graph caching of a node for a dependent nominal becomes invalid. Moreover, we
cannot cache the labels of blockable nodes that depend on newly generated nominals
since possible interactions over these nodes are not detected.</p>
      <p>
        Incremental Reasoning for Changing ABoxes: Many reasoning systems re-start
reasoning from scratch if a few axioms in the knowledge base have been changed. However,
it is not very likely that a few changes in the ABox of a knowledge base have a huge
impact on reasoning. In particular, many ABox assertions only propagate consequences
to the local neighbourhood of the modified individuals and, therefore, the results of
reasoning tasks such as classification are often not a ected, even if nominals are used in
the knowledge base. With the presented completion graph caching, we can easily track
which nodes from the cached completion graph are modified in subsequent tests and
for which caching is invalidated to perform higher level reasoning tasks. Hence, if the
changed ABox assertions have only a known, locally limited influence, then the
reasoning e ort for many (higher level) reasoning tasks can be reduced by checking whether
a tracked node is a ected by the changes. To detect the influences of the changes, an
incremental consistency check can be performed where all modified individuals and
their neighbours are deterministically re-constructed step-by-step until “compatibility”
with the previous deterministic completion graph is achieved, i.e., the same
deterministic consequences are derived for the nodes as in the previous completion graph. The
non-deterministic version of the new completion graph can then be obtained by
continuing the (non-deterministic) processing of the re-constructed nodes and by using
the presented completion graph caching for all remaining nodes. Hence, we can
identify the influenced nodes by comparing the newly obtained completion graph with the
previous one. Compared to other incremental consistency checking approaches (e.g.,
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]), the re-construction of changed parts supported by the completion graph caching
enables a better handling of non-determinism and does not require a memory
intensive tracking of which consequences are caused by which ABox assertions. The idea
of tracking those parts of a completion graph that are potentially relevant/used for the
calculation of higher level reasoning tasks and comparing them with the changed parts
in new completion graphs has already been proposed for answering conjunctive queries
under incremental ABox updates [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], but the completion graph caching simplifies the
realisation of this technique and significantly reduces the overhead for identifying those
parts of higher level reasoning tasks that have to be re-computed. Moreover, with the
completion graph caching, also very expressive DLs such as SROIQ can be supported.
Representative Caching: In order to reduce the memory consumption for caching the
completion graph, the technique can be adapted such that all relevant data is stored in a
representative way, which allows for building “local” completion graphs for small
subsets of the entire ABox until the existence of a complete completion graph considering
all individuals can be guaranteed. To be more precise, if a fully expanded and clash-free
completion graph is constructed for a subset of the ABox (e.g., a subset of all
individuals and their assertions), then we extract and generalise information from the processed
individuals and store them in a representative cache. If we then try to build a completion
graph for another subset of the ABox that has some overlapping with a previously
handled subset (e.g., role assertions for which edges to previous handled individuals have
to be created), then we load the available data from the cache and continue the
processing of the overlapping part until it is “compatible”, i.e., the expansion of the remaining
individuals in the cache can be guaranteed as in the previously constructed completion
graphs. Of course, this only works well for knowledge bases for which there is not too
much non-deterministic interaction between the separately handled ABox parts.
Moreover, compared to the ordinary completion graph caching, we are clearly trading a lower
memory consumption against an increased runtime since more work potentially has to
be repeated to establish compatibility.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5 Implementation and Evaluation</title>
      <p>The completion graph caching introduced in Section 3 is integrated in our reasoning
system Konclude [20] and we selected several well-known benchmark ontologies (cf.
Table 1) for the evaluation of the presented techniques. The evaluation was carried out
on a Dell PowerEdge R420 server running with two Intel Xeon E5-2440 hexa core
processors at 2.4 GHz with Hyper-Threading and 144 GB RAM under a 64bit Ubuntu
12.04.2 LTS. In order to make the evaluation independent of the number of CPU cores,
we used only one worker thread for Konclude. We ignored the time spent for parsing
ontologies and writing results and we used a time limit of 5 minutes, i.e., 300 seconds.</p>
      <p>Table 2 shows the reasoning times for consistency checking (including
preprocessing), classification, and realisation (in seconds) with di erent completion graph caching
techniques integrated in Konclude. Please note that the class hierarchy is required to
realise an ontology, i.e., classification is a prerequisite of realisation, and, analogously,
consistency checking as well as preprocessing are prerequisites of classification. Thus,
realisation cannot be performed if the time limit is already reached for classification.</p>
      <p>If no completion graph caching is activated (No-C), then the realisation and
classification can often require a large amount of time since Konclude has to re-process the
entire ABox for all instance and subsumption tests (if the ontology uses nominals). For
several ontologies, such as Wine and DOLCE, the caching and re-use of the
deterministic completion graph from the consistency check (Det-C) already leads to significant
improvements. Nevertheless, with the two variants ET-C and LT-C of the presented
completion graph caching technique, where ET-C uses an “early testing” and LT-C a
“late testing” of the defined caching criteria, Konclude can further reduce the
reasoning times. In particular, with both completion graph caching techniques, all evaluated
ontologies can easily be classified and also the realisation can be realised e ciently
for all but UOBM-1. Table 2 also reveals that only for COMSO there is a remarkable
di erence between ET-C and LT-C, where this di erence can be explained by the fact
that there is often more interaction with the individuals from the ABox for instance
tests than for satisfiability and subsumption tests and, therefore, ET-C can be better for
realisation due to the potentially lower e ort in combination with backtracking.</p>
      <p>The e ects of the di erent completion graph caching techniques can also be
observed for the OWL DL Realisation dataset of the ORE 2014 competition,3 which
contains several ontologies with non-deterministic language features and non-trivial
ABoxes. By excluding 1; 331 s spent for preprocessing and consistency checking, the
accumulated classification times over the contained 200 ontologies are 3; 996 s for the
version No-C, 2; 503 s for Det-C, 1; 801 s for ET-C, and 1; 606 s for LT-C. Clearly,
the dataset also contains many ontologies for which completion graph caching does not
seem to have a significant impact, for example, if the ontologies can be processed
deterministically or the ontologies are too di cult to even perform consistency checking
(which is the case for 3 ontologies). Nevertheless, our completion graph caching
improves the classification time with similar performances for LT-C and ET-C. By using
3 http://www.easychair.org/smart-program/VSL2014/ORE-index.html
Ontology j j 100</p>
      <p>jKj
USDA-5
USDA-5
USDA-5
COSMO
COSMO
COSMO
DPC1
DPC1
DPC1
UOBM-1
UOBM-1
UOBM-1
the satisfiability caching extension for nominals, as presented in Section 4, the
accumulated classification time can be further improved to 725 s. Similar results are also
achieved for the realisation of these ontologies. By excluding the times for all
prerequisites, the accumulated realisation times over all 200 ontologies are 1; 740 s for No-C,
1; 498 s for Det-C, 1; 061 s for ET-C, 1; 256 s for LT-C, and 923 s for the version where
the satisfiability caching extension for nominals is additionally activated.
Incremental Reasoning Experiments: To test the incremental reasoning based on the
presented completion graph caching, we used those ontologies of Table 1 that have a
large amount of ABox assertions and for which Konclude still has a clearly measurable
reasoning time, i.e., USDA-5, COSMO, DPC1, and UOBM-1. We simulated a changed
ABox for these ontologies by randomly removing a certain amount of assertions from
the ontology (denoted by K ) and by re-adding the removed assertions and removing
new assertions (denoted by K ). For each ontology, we evaluated 10 random
modifications that have 1, 2, and 4 % of the size of the ontology’s ABox. For simplicity, all
modifications have the same amount of removed and added assertions. The obtained
results for the presented incremental reasoning approach are shown in Table 3.</p>
      <p>For consistency, the first two columns show the (incremental) consistency
checking time (in seconds) for K and K , respectively, and the third column shows the
percentage of the nodes in the completion graph for K that has been changed for the
application of the modification. It can be observed that, especially for smaller
modifications, the incremental consistency check often requires much less time than the initial
consistency check. In particular, the individuals of USDA-5 are sparsely connected via
object properties and, therefore, often only the modified individuals have to be re-built.
For larger modifications, the incremental consistency checking time increases
significantly for some ontologies, e.g., UOBM-1, and does almost catch up to the consistency
checking time for the initial ontology. On the one hand, our incremental consistency
checking approach clearly has some additional overhead due to the fact that nodes are
re-built step by step until an expansion as for the initial completion graph can be
guaranteed, but, on the other hand, our prototypical implementation has still a lot of room for
improvements. For example, we currently also re-build nodes for individuals for which
only new assertions have been added although it would be su cient to simply extend
the nodes of the previous deterministic completion graph by the new consequences.</p>
      <p>For classification, the first two columns show analogously the (incremental)
classification time for K and K , respectively, and the third column represents the percentage
of the classes for which satisfiability and subsumption tests were re-calculated. At the
moment, the used/modified nodes from the cached completion graph are tracked
together for all satisfiability and subsumption tests and we mark those classes for which
nodes have been tracked. It can be observed that for the ontologies with nominals
(e.g., UOBM-1), only a few classes have to be re-classified and, in several cases,
reclassification is not required at all.</p>
      <p>Also for realisation, the first two columns show the (incremental) realisation time
for K and K , respectively, and the last column shows the percentage of the number
of individuals that are potentially a ected by the changes and for which the (possible)
types have to be re-computed. For this, we separately track, for each individual, the
nodes of the cached completion graph that are used/modified by the instance tests.
Representative Caching Experiments: We integrated a first prototypical version of the
presented representative caching in our reasoning system Konclude, which is, however,
not yet compatible with all other integrated features and optimisations. As of now, the
integrated representative caching is primarily used for “simple individuals” that do not
have too much interaction with other individuals in the ABox. In cases where
representative caching could easily cause performance deficits (e.g., through the intensive use of
nominals), Konclude caches the relevant parts of such completion graphs by using the
ordinary technique. Moreover, data property assertions are, at the moment, internally
transformed into class assertions and, as a consequence, nodes for individuals with
data property assertions can currently not be representatively cached. However, first
experiments are very encouraging. For example, Homo_sapiens is a very large SROIQ
ontology from the Oxford ontology library with 244; 232 classes, 255 object
properties, and 289; 236 individuals for which Konclude requires 10; 211 MB in total to check
the consistency by using representative caching, whereas 19; 721 MB are required by
Konclude for consistency checking with a fully expanded completion graph. Note that a
large amount (9; 875 MB) of the required memory is used for the (unoptimised) internal
representation, the data from the preprocessing, and the parsed OWL objects which are
kept in memory to facilitate the handling of added/removed axioms.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We have presented a refinement of the completion graph caching technique that
improves ABox reasoning also for very expressive Description Logics through a more
sophisticated handling of non-deterministic consequences. In addition, we sketched
extensions and applications of the caching technique, which allow for supporting
nominals for the satisfiability caching of node labels, for reducing the incremental reasoning
e ort for changing ABoxes, and for handling very large ABoxes by storing partially
processed parts of the completion graph in a representative way.
20. Steigmiller, A., Liebig, T., Glimm, B.: Konclude: system description. J. of Web Semantics
27(1) (2014)
21. Tsarkov, D., Horrocks, I.: E cient reasoning with range and domain constraints. In: Proc.</p>
      <p>17th Int. Workshop on Description Logics (DL’04). vol. 104. CEUR (2004)
22. Wandelt, S., Möller, R.: Towards ABox modularization of semi-expressive description
logics. Applied Ontology 7(2), 133–167 (2012)
23. Wu, J., Hudek, A.K., Toman, D., Weddell, G.E.: Absorption for ABoxes. J. of Automated
Reasoning 53(3), 215–243 (2014)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P</given-names>
          </string-name>
          . (eds.):
          <article-title>The Description Logic Handbook: Theory, Implementation, and Applications</article-title>
          . Cambridge University Press, second edn. (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dolby</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schonberg</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Scalable highly expressive reasoner (SHER)</article-title>
          .
          <source>J. of of Web Semantics</source>
          <volume>7</volume>
          (
          <issue>4</issue>
          ),
          <fpage>357</fpage>
          -
          <lpage>361</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Donini</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massacci</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>EXPTIME tableaux for ALC</article-title>
          .
          <source>J. of Artificial Intelligence</source>
          <volume>124</volume>
          (
          <issue>1</issue>
          ),
          <fpage>87</fpage>
          -
          <lpage>138</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Hermit: An OWL 2 reasoner</article-title>
          .
          <source>J. of Automated Reasoning</source>
          <volume>53</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liebig</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vialard</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Abstraction refinement for ontology materialization</article-title>
          .
          <source>In: Proc. 13th Int. Semantic Web Conf. (ISWC'14)</source>
          . LNCS, vol.
          <volume>8797</volume>
          , pp.
          <fpage>180</fpage>
          -
          <lpage>195</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Haarslev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Möller</surname>
          </string-name>
          , R.:
          <article-title>On the scalability of description logic instance retrieval</article-title>
          .
          <source>J. of Automated Reasoning</source>
          <volume>41</volume>
          (
          <issue>2</issue>
          ),
          <fpage>99</fpage>
          -
          <lpage>142</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Haarslev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Möller</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turhan</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Exploiting pseudo models for TBox and ABox reasoning in expressive description logics</article-title>
          .
          <source>In: Proc. 1st Int. Joint Conf. on Automated Reasoning (IJCAR'01)</source>
          . LNCS, vol.
          <year>2083</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>75</lpage>
          . Springer (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Halaschek-Wiener</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
          </string-name>
          , J.:
          <article-title>Toward expressive syndication on the web</article-title>
          .
          <source>In: Proc. 16th Int. Conf. on World Wide Web (WWW'07)</source>
          . ACM (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Halaschek-Wiener</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
          </string-name>
          , E.:
          <article-title>Description logic reasoning with syntactic updates</article-title>
          .
          <source>In: Proc. 4th Confederated Int. Conf. On the Move to Meaningful Internet Systems</source>
          <year>2006</year>
          :
          <article-title>CoopIS, DOA, GADA, and ODBASE(OTM'06), LNCS</article-title>
          , vol.
          <volume>4275</volume>
          , pp.
          <fpage>722</fpage>
          -
          <lpage>737</lpage>
          . Springer (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kutz</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>The even more irresistible SROIQ</article-title>
          .
          <source>In: Proc. 10th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR'06)</source>
          . pp.
          <fpage>57</fpage>
          -
          <lpage>67</lpage>
          . AAAI Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>A description logic with transitive and inverse roles and role hierarchies</article-title>
          .
          <source>J. of Logic and Computation</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          ),
          <fpage>385</fpage>
          -
          <lpage>410</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Decidability of SH IQ with complex role inclusion axioms</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>160</volume>
          (
          <issue>1</issue>
          ),
          <fpage>79</fpage>
          -
          <lpage>104</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>A tableau decision procedure for SH OIQ</article-title>
          .
          <source>J. of Automated Resoning</source>
          <volume>39</volume>
          (
          <issue>3</issue>
          ),
          <fpage>249</fpage>
          -
          <lpage>276</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Hudek</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weddell</surname>
            ,
            <given-names>G.E.</given-names>
          </string-name>
          :
          <article-title>Binary absorption in tableaux-based reasoning for description logics</article-title>
          .
          <source>In: Proc. 19th Int. Workshop on Description Logics (DL'06)</source>
          . vol.
          <volume>189</volume>
          .
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kollia</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Optimizing SPARQL query answering over OWL ontologies</article-title>
          .
          <source>J. of Artificial Intelligence Research</source>
          <volume>48</volume>
          ,
          <fpage>253</fpage>
          -
          <lpage>303</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Simancˇík</surname>
          </string-name>
          , F.:
          <article-title>Elimination of complex RIAs without automata</article-title>
          .
          <source>In: Proc. 25th Int. Workshop on Description Logics (DL'12)</source>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>846</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>From wine to water: Optimizing description logic reasoning for nominals</article-title>
          .
          <source>In: Proc. 10th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR'06)</source>
          . pp.
          <fpage>90</fpage>
          -
          <lpage>99</lpage>
          . AAAI Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Steigmiller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liebig</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Optimised absorption for expressive description logics</article-title>
          .
          <source>In: Proc. 27th Int. Workshop on Description Logics (DL'14)</source>
          . vol.
          <volume>1193</volume>
          .
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Steigmiller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liebig</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Completion graph caching extensions and applications for expressive description logics</article-title>
          .
          <source>Tech. rep.</source>
          , Ulm University, Ulm, Germany (
          <year>2015</year>
          ), available online at https://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst. 090/Publikationen/2015/DL2015CGCTR.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>