=Paper= {{Paper |id=Vol-1350/paper-35 |storemode=property |title=Completion Graph Caching for Expressive Description Logics |pdfUrl=https://ceur-ws.org/Vol-1350/paper-35.pdf |volume=Vol-1350 |dblpUrl=https://dblp.org/rec/conf/dlog/SteigmillerGL15 }} ==Completion Graph Caching for Expressive Description Logics== https://ceur-ws.org/Vol-1350/paper-35.pdf
                       Completion Graph Caching for
                       Expressive Description Logics

                  Andreas Steigmiller∗1 , Birte Glimm1 , and Thorsten Liebig2
           1
               University of Ulm, Ulm, Germany, .@uni-ulm.de
                          2
                            derivo GmbH, Ulm, Germany, liebig@derivo.de



1      Introduction
Reasoning in very expressive Description Logics (DLs) such as SROIQ is often hard
since non-determinism, e.g., from disjunctions or cardinality restrictions, requires a
case-by-case analysis and since a strict separation between intensional (TBox) and ex-
tensional (ABox) knowledge is not possible due to nominals. Current state-of-the-art
reasoners for SROIQ are typically based on (hyper-)tableau algorithms, where higher
level reasoning tasks such as classification are reduced to possibly many consistency
tests. For large ABoxes and more expressive DLs, reasoning then easily becomes infea-
sible since the ABox has to be considered in each test.
    For less expressive DLs, optimisations have been proposed that allow for consid-
ering only parts of the ABox for checking whether an individual is an instance of a
specific concept [22,23]. It is, however, not clear how these optimisations can be ex-
tended to SROIQ. Furthermore, approaches based on partitioning and modularisation
require a syntactic pre-analysis of the concepts, roles, and individuals in a KB, which
can be quite costly for more expressive DLs and, due to the static analysis, only queries
with specific concepts are supported. Rewriting axioms such that ABox reasoning is
improved also carries the risk that this negatively influences other reasoning tasks for
which ABox reasoning is potentially not or less relevant.
    Another possibility is the caching of the completion graph that is built by the tableau
algorithm during the initial consistency check. The re-use of (parts of) the cached com-
pletion graph in subsequent tests reduces the number of consequences that have to be
re-derived for the individuals of the ABox. Existing approaches based on this idea (e.g.,
[17]) are, however, not very suitable for handling non-determinism. Since caching and
re-using non-deterministically derived facts can easily cause unsound consequences,
the non-deterministically derived facts are usually simply discarded and re-derived if
necessary. We address this and present a technique that allows for identifying non-
deterministic facts that can safely be re-used. The approach is based on a set of condi-
tions that can be checked locally and, thus, allows for efficiently identifying individuals
step-by-step for which non-deterministic consequences have to be re-considered in sub-
sequent tests. The presented technique can directly be integrated into existing tableau-
based reasoning systems without significant adaptations and reduces reasoning effort
for all tasks for which consequences from the ABox are potentially relevant. Moreover,
 ∗
     The author acknowledges the support of the doctoral scholarship under the Postgraduate Schol-
     arships Act of the Land of Baden-Wuerttemberg (LGFG).
it can directly be used for the DL SROIQ, does not produce a significant overhead, and
can easily be extended, e.g., to improve incremental ABox reasoning. Note that com-
pletion graph caching is complementary to many other optimisations that try to reduce
the number of subsequent tests for higher level reasoning tasks, e.g., summarisation [2],
abstraction and refinement [5], bulk processing and binary retrieval [6], (pseudo) model
merging [7], extraction of known/possible instances from model abstractions [15].
     The paper is organised as follows: We next introduce some preliminaries and then
present the basic completion graph caching technique in Section 3. In Section 4, we
present several extensions and applications to support nominals in ordinary satisfiability
caching, to perform incremental reasoning for changing ABoxes, and to handle very
large ABoxes by storing data in a representative way. Finally, we present an evaluation
of the presented techniques in Section 5 and conclude in Section 6. Further details and
an extended evaluation are available in a technical report [19].


2   Preliminaries

For brevity, we do not introduce DLs (see, e.g., [1,10]). We use (possibly with sub-
scripts) C, D for (possibly complex) concepts, A, B for atomic concepts, and r, s for
roles. We assume that a SROIQ knowledge base K is expressed as the union of a
TBox T consisting of GCIs of the form C v D, a role hierarchy R, and an ABox A,
such that the effects of complex roles are implicitly encoded (e.g., based on automata
[12] or regular expressions [16]). We write nnf(C) to denote the negation normal form
of C. For r a role, we set inv(r) := r− and inv(r− ) := r.
    Model construction calculi such as tableau [10,13] decide the consistency of a KB
K by trying to construct an abstraction of a model for K, a so-called completion graph.
For the purpose of this paper, we use a tuple of the form (V, E, L, ,̇, M) for a completion
graph G, where each node x ∈ V (edge hx, yi ∈ E) represents one or more (pairs of)
individuals. Each node x (edge hx, yi) is labelled with a set of concepts (roles), L(x)
(L(hx, yi)), which the (pairs of) individuals represented by x (hx, yi) are instances of.
The relation ,̇ records inequalities, which must hold between nodes, e.g., due to at-
least cardinality restrictions, and the mapping M tracks merging activities, e.g., due
to at-most cardinality restrictions or nominals. The algorithm works by decomposing
concepts in the completion graph with a set of expansion rules. For example, if C1 t
C2 ∈ L(v) for some node v, but neither C1 ∈ L(v) nor C2 ∈ L(v), then the rule for
handling disjunctions non-deterministically adds one of the disjuncts to L(v). Similarly,
if ∃r.C ∈ L(v), but v does not have an r-successor with C in its label, then the algorithm
expands the completion graph by adding the required successor node. If a node v is
merged into a node v0 , M is extended by v 7→ v0 . We use mergedToM (v) to return the
node into which a node v has been merged, i.e., mergedToM (v) = mergedToM (w) for
v 7→ w ∈ M and mergedToM (v) = v otherwise.
    Unrestricted application of rules for existential or at-least cardinality restrictions
can lead to the introduction of infinitely many new nodes. To guarantee termination,
a cycle detection technique called (pairwise) blocking [11] restricts the application of
these rules. The rules are repeatedly applied until either the completion graph is fully
expanded (no more rules are applicable), in which case the graph can be used to con-
struct a model that witnesses the consistency of K, or an obvious contradiction (called
a clash) is discovered (e.g., both C and ¬C in a node label), proving that the completion
graph does not correspond to a model. A knowledge base K is consistent if the rules
can be applied such that they build a fully expanded and clash-free completion graph.


3   Completion Graph Caching and Reusing
In the following, we use superscripts to distinguish different versions of completion
graphs. We denote with Gd = (V d , E d , Ld , ,̇d , Md ) the last completion graph of the
initial consistency test that is obtained with only deterministic rule applications and with
Gn = (V n , E n , Ln , ,̇n , Mn ) the fully expanded (and clash-free) completion graph that
possibly also contains non-deterministic choices. Obviously, instead of starting with
the initial completion graph G0 to which no rule has (yet) been applied, we can use
Gd to initialise a completion graph G for subsequent consistency tests which are, for
example, required to prove or refute assumptions of higher level reasoning tasks. To be
more precise, we extend Gd to G by the new individuals or by additional assertions to
original individuals as required for a subsequent test and then we can apply the tableau
expansion rules to G. Note, in order to be able to distinguish the nodes/nominals in the
different completion graphs, we assume that all nodes/nominals that are newly created
for G do not occur in existing completion graphs, such as Gd or Gn .
     This re-use of Gd is an obvious and straightforward optimisation and it is already
used by many state-of-the-art reasoning systems to successfully reduce the work that
has to be repeated in subsequent consistency tests [17]. Especially if the knowledge
base is deterministic, the tableau expansion rules only have to be applied for the newly
added assertions in G. In principle, also Gn can be re-used instead of Gd [17], but
this causes problems if non-deterministically derived facts of Gn are involved in new
clashes. In particular, it is required to do backtracking in such cases, i.e., we have to
jump back to the last version of the initial completion graph that does not contain the
consequences of the last non-deterministic decision that is involved in the clash. Then,
we have to continue the processing by choosing another alternative. Obviously, if we
have to jump back to a very early version of the completion graph, then potentially
many non-deterministic decisions must be re-processed. Moreover, after jumping back,
we also have to add and re-process the newly added individuals and/or assertions.
     To improve the handling of non-deterministic knowledge bases, our approach uses
criteria to check whether nodes in G (or the nodes in a completion graph G0 obtained
from G by further rule applications) are “cached”, i.e., there exist corresponding nodes
in the cached completion graph of the initial consistency test and, therefore, it is not
required to process them again. These caching criteria check whether the expansion of
nodes is possible as in the cached completion graph Gn without influencing modified
nodes in G0 , thus, only the processing of new and modified nodes is required.
Definition 1 (Caching Criteria). Let Gd = (V d , E d , Ld , ,̇d , Md ) be a completion graph
with only deterministically derived consequences and Gn = (V n , E n , Ln , ,̇n , Mn ) a fully
expanded and clash-free expansion of Gd . Moreover, let G be an extension of Gd and
G0 = (V 0 , E 0 , L0 , ,̇0 , M0 ) a completion graph that is obtained from G by applying tab-
leau expansion rules.
    A node v0 ∈ V 0 is cached in G0 if caching of the node is not invalid, where the
 caching is invalid (we then also refer to the node as non-cached) if
                              n
 C1 v0 < V d or mergedToM (v0 ) < V n ;
                            n
 C2 L0 (v0 ) * Ln (mergedToM (v0 ));
                          Mn 0
 C3 ∀r.C ∈ L (mergedTo (v )) and there is an r-neighbour node w0 of v0 such that w0
               n

    is not cached and C < L(w0 );  n
 C4 6 m r.C ∈ Ln (mergedToM (v0 )) and the number of the non-cached r-neighbour
    nodes of v0 without nnf(¬C) in their labels together with the r-neighbours in Gn of
                 n
    mergedToM (v0 ) with C in their labels is greater than m;
                               n
 C5 ∃r.C ∈ Ln (mergedToM (v0 )) and every r-neighbour node w0 of v0 with C < L(w0 )
                                 n
    and C ∈ Ln (mergedToM (w        0
                                      )) is not cached;
                                 Mn 0
 C6 > m r.C ∈ L (mergedTo (v )) and the number of r-neighbour nodes wn1 , . . . , wnk
                  n
                      n
    of mergedToM (v0 ) with C ∈ Ln (wn1 ), . . . , C ∈ Ln (wnk ), for which there is either no
                                          n                                   n
    node w0i ∈ V 0 with mergedToM (w0i ) = wni or w0i with mergedToM (w0i ) = wni and
              0
    C < L(wi ) is not cached for 1 ≤ i ≤ k, is less than m;
                 n
 C7 mergedToM (v0 ) is a nominal node with 6 m r.C in its label and there exists a
    blockable andn
                      non-cached inv(r)-predecessor node w0 of vn0 with nnf(¬C) < L(w0 );
 C8 mergedToM (w0 ) is an r-neighbour node of mergedToM (v0 ) such that w0 is not
                    0
    cached and w      is not an r-neighbour node of v0 ;
                Mn 0
 C9 mergedTo (v ) has a successor node un such that un or a descendant of un has
                                        n
    a successor node mergedToM (w0 ) for which w0 ∈ V 0 , w0 is not cached, and there
                                                  n
    exists no node    u ∈ V with mergedToM (u0n) = un ;
                        0    0
                Mn 0
C10 mergedTo (v ) is blocked by mergedToM (w0 ) and w0 ∈ V 0 is non-cached;
                                                              n                     n
C11 there is a non-cached node w0 ∈ V 0 and mergedToM (v0 ) = mergedToM (w0 ); or
                                               0                    n                    n
C12 there is a node w0 ∈ V 0 such that v0 ,̇ w0 and mergedToM (v0 ) = mergedToM (w0 ).
     Conditions C1 and C2 ensure that a node also exists in the cached completion graph
 Gn and that its label is a subset of the corresponding label in Gn such that the same ex-
 pansion is possible. C3 checks whether the expansion of a node would add a concept
 of the form ∀r.C such that it could propagate C to a non-cached neighbour node. Anal-
 ogously, C4 checks for potentially violated at-most cardinality restrictions by counting
 the new or modified neighbours in G0 and the neighbours in Gn . C5 and C6 verify that
 existential and at-least cardinality restrictions are still satisfied if the cached nodes are
 expanded identically. C7 checks whether the NN-rule of the tableau algorithm would
 be applicable after the expansion, i.e., we check whether all potentially relevant neigh-
 bours in G0 are nominal nodes. C8 checks whether the expansion would add additional
 roles to edge labels between cached and non-cached nodes, which could be problem-
 atic for disjoint roles. For C9: If a node w0 , for which caching is invalid, is connected to
 nodes in Gn that are only available in Gn (e.g., non-deterministically created ones), then
 we have to identify caching as invalid for those ancestors of these nodes that are also in
 G0 such that these connections to w0 can be re-built. Otherwise, we would potentially
 miss new consequences that could be propagated from w0 . C10 is used to reactivate the
 processing of nodes for which the caching of the blocker node is invalid. C11 and C12
 ensure that merging is possible as in Gn : C11 checks whether the node into which the
 node is merged is also cached and C12 ensures that there is no additional entry for ,̇0
 that would cause a clash if the nodes were merged as in Gn .
                                                  >, {b}, B t ∃r.∃r.({c} u A2 ), B t ∀s− .A3 ,
          n                             o         (                                              )
 L(va ) = >, {a}, ∃r.{b}, B t A1 , A1   L(vb ) =
                                                           ∃r.∃r.({c} u A2 ),∀s− .A3
                         r                           s
              va                  vb                                    vc
                                                r            r           (                   )
                           n                  o
                                                   v1                        >, {c}, ∃s.{b},
                   L(v1 ) = >, ∃r.({c} u A2 )                   L(vc ) =
                                                                            {c} u A2 ,A2 ,A3


Fig. 1. Constructed and cached completion graph for Example 1 with deterministically (coloured
black) and non-deterministically (coloured grey) derived facts

    Since only those nodes can be cached, which are available in the “deterministic”
completion graph Gd , it is important to maximise the deterministic processing of the
completion graph. This can, for example, be achieved by processing the completion
graph with deterministic rules only until the generation of new nodes is subset blocked.
Subset blocking is not sufficient for more expressive DLs, but it prevents the expansion
of too many successor nodes in case non-deterministic rule applications merge and
prune some parts of the completion graph. Absorption techniques (e.g., [14,18,21]) are
also essential since they reduce the overall non-determinism in an ontology.

Example 1. Assume that the tableau algorithm builds a completion graph as depicted
in Figure 1 for testing the consistency of a knowledge base K containing the axioms
          {a} v B t A1             {b} v B t ∃r.∃r.({c} u A2 )             {c} v ∃s.{b}
          {a} v ∃r.{b}             {b} v B t ∀s− .A3 .
The deterministic version of the completion graph, Gd , contains the elements coloured
in black in Figure 1 and the non-deterministic version, Gn , additionally contains the
elements coloured in grey. If we want to determine which individuals are instances of
the concept ∃r.>, then we have to check, for each individual i in K, the consistency
of K extended by nnf(¬∃r.>)(i), which is equivalent to adding the axiom {i} v ∀r.⊥.
The individual a is obviously an instance of this concept, which can also be observed
by the fact that expanding the extended completion graph adds ∀r.⊥ to Ld (va ), which
immediately results in a clash. In contrast, if we extend Ld (vb ) by ∀r.⊥, we have to
process the disjunctions Bt∃r.∃r.({c}uA2 ) and Bt∀s− .A3 . The disjunct ∃r.∃r.({c}uA2 )
would result in a clash and, therefore, we choose B, which also satisfies the second
disjunction. Note that va and vc do not have to be processed since va is cached, i.e.,
its expansion is possible in the same way as in Gn , and vc does not have any concepts
for which processing is required. Last but not least, we have to test whether Ld (vc )
extended by ∀r.⊥ is satisfiable. Now, the caching of vc is obviously invalid (due to C2)
and, therefore, also the caching of vb is invalid: C3 can be applied for ∀s− .A3 and C9
for the successor node v1 of vb in Gn which also has the non-cached successor node
vc . Since va is still cached, we only have to process vb again, which does, however, not
result in a clash. Hence, only a is an instance of ∃r.>.

     Most conditions can be checked locally: For a non-cached node, we simply follow
its edges w.r.t. G0 and Gn to (potentially indirect) neighbour nodes that are also available
in Gd . Exceptions are Conditions C10, C11, and C12, for which we can, however, sim-
ply trace back established blocking relations or Mn to find nodes for which the caching
criteria have to be checked. The implementation can also be simplified by keeping the
caching of a node invalid once it has been identified as invalid (even if the caching be-
comes possible again after the node has been expanded identically). Instead of directly
checking the caching criteria, the relevant nodes can also be stored in a set/queue to
check the conditions when it is (more) convenient. Clearly, it depends on the ontology,
whether it is more beneficial to test the caching criteria earlier or later. If the criteria
are tested early, then we could unnecessarily re-process some parts of the cached com-
pletion graph since the application of (non-deterministic) expansion rules can satisfy
some conditions. A later testing could cause a repeated re-processing of the same parts
if a lot of backtracking is required and consequences of re-activated nodes are involved
in clashes. Alternatively, one could learn for the entire ontology or for single nodes,
whether the criteria should be checked earlier or later. Unfortunately, this cannot easily
be realised for all reasoning systems since it requires that dependencies between de-
rived facts are precisely tracked in order to identify nodes that are often involved in the
creation of clashes and for which the criteria should be checked early.
     It is also possible to non-deterministically re-use derived consequences from Gn ,
                                                             n
i.e., if the caching is invalid for a node v and mergedToM (v) is in Gn , then we can non-
                                                                     n
deterministically add the missing concepts from Ln (mergedToM (v)) to L(v). Since the
resulting completion graph is very similar to the cached one, caching can often be es-
tablished quickly for many nodes. Of course, if some of the non-deterministically added
concepts are involved in clashes, then we potentially have to backtrack and process the
alternative where this node is ordinarily processed. Another nice side effect of storing
Gn is that we can use the non-deterministic decisions from Gn as an orientation in sub-
sequent consistency tests. By prioritising the processing of the same non-deterministic
alternatives as for Gn , we can potentially find a solution that is very similar to Gn with-
out exploring much of the search space.


4   Caching Extensions and Applications
In this section, we sketch additional applications of the caching technique, which allow
for supporting nominals for the satisfiability caching of node labels and for reducing
the incremental reasoning effort for changing ABoxes. Furthermore, we describe an
extension that allows for caching (parts of) the completion graph in a representative
way, whereby an explicit representation of many nodes in the completion graph can be
avoided and, therefore, the memory consumption can be reduced.
Satisfiability Caching with Nominals: Caching the satisfiability status of labels of
(blockable) nodes in completion graphs is an important optimisation technique for
tableau-based reasoning systems [3,4]. If one obtains a fully expanded and clash-free
completion graph, then the contained node labels can be cached and, if identical labels
occur in other completion graphs, then their expansion (i.e., the creation of required
successor nodes) is not necessary since their satisfiability has already been proven. Of
course, for more expressive DLs that include, for example, inverse roles and cardinality
restrictions, we have to consider pairs of node labels as well as the edge labels between
these nodes. Unfortunately, this kind of satisfiability caching does not work for DLs
with nominals. In particular, connections to nominal nodes can be used to propagate
new concepts from one blockable node in the completion graph to any other blockable
node, whereas for DLs without nominals, the consequences can only be propagated
from or to successor nodes. Hence, the caching of node labels is not easily possible and
many reasoners deactivate this kind of caching for knowledge bases with nominals.
    However, in combination with completion graph caching, we re-gain the possibility
to cache some labels of (blockable) nodes for knowledge bases with nominals. Roughly
speaking, we first identify which nodes “depend on” which nominals, i.e., which nom-
inal nodes are existentially quantified as a successor/descendant for a node. The labels
of such nodes can then be cached together with the dependent nominals, i.e., with those
nominals on which the nodes depend, if all nodes for the dependent nominals are still
cached w.r.t. the initial completion graph. These cache entries can be re-used for nodes
with identical labels in subsequent completion graphs as long as the completion graph
caching of the nodes for the dependent nominals is not invalid. Of course, the blocked
processing of nodes due to matching cache entries has to be reactivated if the comple-
tion graph caching of a node for a dependent nominal becomes invalid. Moreover, we
cannot cache the labels of blockable nodes that depend on newly generated nominals
since possible interactions over these nodes are not detected.

Incremental Reasoning for Changing ABoxes: Many reasoning systems re-start reason-
ing from scratch if a few axioms in the knowledge base have been changed. However,
it is not very likely that a few changes in the ABox of a knowledge base have a huge
impact on reasoning. In particular, many ABox assertions only propagate consequences
to the local neighbourhood of the modified individuals and, therefore, the results of rea-
soning tasks such as classification are often not affected, even if nominals are used in
the knowledge base. With the presented completion graph caching, we can easily track
which nodes from the cached completion graph are modified in subsequent tests and
for which caching is invalidated to perform higher level reasoning tasks. Hence, if the
changed ABox assertions have only a known, locally limited influence, then the reason-
ing effort for many (higher level) reasoning tasks can be reduced by checking whether
a tracked node is affected by the changes. To detect the influences of the changes, an
incremental consistency check can be performed where all modified individuals and
their neighbours are deterministically re-constructed step-by-step until “compatibility”
with the previous deterministic completion graph is achieved, i.e., the same determi-
nistic consequences are derived for the nodes as in the previous completion graph. The
non-deterministic version of the new completion graph can then be obtained by con-
tinuing the (non-deterministic) processing of the re-constructed nodes and by using
the presented completion graph caching for all remaining nodes. Hence, we can iden-
tify the influenced nodes by comparing the newly obtained completion graph with the
previous one. Compared to other incremental consistency checking approaches (e.g.,
[9]), the re-construction of changed parts supported by the completion graph caching
enables a better handling of non-determinism and does not require a memory inten-
sive tracking of which consequences are caused by which ABox assertions. The idea
of tracking those parts of a completion graph that are potentially relevant/used for the
calculation of higher level reasoning tasks and comparing them with the changed parts
in new completion graphs has already been proposed for answering conjunctive queries
under incremental ABox updates [8], but the completion graph caching simplifies the
Table 1. Ontology metrics for selected benchmark ontologies (A stands for Axioms, C for
Classes, P for Properties, I for Individuals, CA for Class Assertions, OPA for Object Property
Assertions, and DPA for Data Property Assertions)

Ontology     Expressivity            #A       #C     #P        #I     #CA     #OPA     #DPA
OGSF        SROIQ(D)              1, 235      386   179        57      45       58      20
Wine        SHOIN(D)              1, 546      214    31       367     409      492       2
DOLCE       SHOIN                 1, 667      209   317        42     101       36       0
OBI         SROIQ(D)             28, 770   3, 549   152       161     273       19       1
USDA-5      ALCIF (D)             1, 401       30   147    1, 214 1, 214        12       0
COSMO       SHOIN(D)             29, 655   7, 790   941    7, 817 8, 675 3, 240        665
DPC1        ALCIF (D)            55, 020   1, 920    94   28, 023 15, 445 39, 453        0
UOBM-1      SHOIN(D)            260, 728       69    44   25, 453 46, 403 143, 549 70, 628


realisation of this technique and significantly reduces the overhead for identifying those
parts of higher level reasoning tasks that have to be re-computed. Moreover, with the
completion graph caching, also very expressive DLs such as SROIQ can be supported.
Representative Caching: In order to reduce the memory consumption for caching the
completion graph, the technique can be adapted such that all relevant data is stored in a
representative way, which allows for building “local” completion graphs for small sub-
sets of the entire ABox until the existence of a complete completion graph considering
all individuals can be guaranteed. To be more precise, if a fully expanded and clash-free
completion graph is constructed for a subset of the ABox (e.g., a subset of all individu-
als and their assertions), then we extract and generalise information from the processed
individuals and store them in a representative cache. If we then try to build a completion
graph for another subset of the ABox that has some overlapping with a previously han-
dled subset (e.g., role assertions for which edges to previous handled individuals have
to be created), then we load the available data from the cache and continue the process-
ing of the overlapping part until it is “compatible”, i.e., the expansion of the remaining
individuals in the cache can be guaranteed as in the previously constructed completion
graphs. Of course, this only works well for knowledge bases for which there is not too
much non-deterministic interaction between the separately handled ABox parts. More-
over, compared to the ordinary completion graph caching, we are clearly trading a lower
memory consumption against an increased runtime since more work potentially has to
be repeated to establish compatibility.


5   Implementation and Evaluation
The completion graph caching introduced in Section 3 is integrated in our reasoning
system Konclude [20] and we selected several well-known benchmark ontologies (cf.
Table 1) for the evaluation of the presented techniques. The evaluation was carried out
on a Dell PowerEdge R420 server running with two Intel Xeon E5-2440 hexa core
processors at 2.4 GHz with Hyper-Threading and 144 GB RAM under a 64bit Ubuntu
12.04.2 LTS. In order to make the evaluation independent of the number of CPU cores,
we used only one worker thread for Konclude. We ignored the time spent for parsing
ontologies and writing results and we used a time limit of 5 minutes, i.e., 300 seconds.
     Table 2. Reasoning times for different completion graph caching techniques (in seconds)

              Prep.+             Classification                        Realisation
Ontology
              Cons.      No-C     Det-C ET-C        LT-C      No-C    Det-C     ET-C      LT-C
OGSF            0.0        3.8     1.0       0.2      0.2       0.1     0.0     0.0     0.0
Wine            0.0       49.5    29.6       0.8      0.8      49.1    25.8     0.2     0.1
OBI             0.2       65.9    19.2       1.5      1.5       2.2     2.0     0.0     0.1
DOLCE           0.0        6.7     1.1       0.2      0.2   ≥ 300.0     5.2     0.1     0.1
USDA-5          4.1        0.8     1.0       1.0      0.8   ≥ 300.0    38.7    20.5    20.2
COSMO           0.6    ≥ 300.0 ≥ 300.0      42.2     11.2       n/a     n/a    11.2    19.9
DPC1            6.5        0.1     0.2       0.1      0.1   ≥ 300.0    53.3    19.4    20.5
UOBM-1          6.7      240.6     4.8       1.3      1.1   ≥ 300.0 ≥ 300.0 ≥ 300.0 ≥ 300.0


    Table 2 shows the reasoning times for consistency checking (including preprocess-
ing), classification, and realisation (in seconds) with different completion graph caching
techniques integrated in Konclude. Please note that the class hierarchy is required to re-
alise an ontology, i.e., classification is a prerequisite of realisation, and, analogously,
consistency checking as well as preprocessing are prerequisites of classification. Thus,
realisation cannot be performed if the time limit is already reached for classification.
    If no completion graph caching is activated (No-C), then the realisation and classi-
fication can often require a large amount of time since Konclude has to re-process the
entire ABox for all instance and subsumption tests (if the ontology uses nominals). For
several ontologies, such as Wine and DOLCE, the caching and re-use of the determi-
nistic completion graph from the consistency check (Det-C) already leads to significant
improvements. Nevertheless, with the two variants ET-C and LT-C of the presented
completion graph caching technique, where ET-C uses an “early testing” and LT-C a
“late testing” of the defined caching criteria, Konclude can further reduce the reason-
ing times. In particular, with both completion graph caching techniques, all evaluated
ontologies can easily be classified and also the realisation can be realised efficiently
for all but UOBM-1. Table 2 also reveals that only for COMSO there is a remarkable
difference between ET-C and LT-C, where this difference can be explained by the fact
that there is often more interaction with the individuals from the ABox for instance
tests than for satisfiability and subsumption tests and, therefore, ET-C can be better for
realisation due to the potentially lower effort in combination with backtracking.
    The effects of the different completion graph caching techniques can also be ob-
served for the OWL DL Realisation dataset of the ORE 2014 competition,3 which
contains several ontologies with non-deterministic language features and non-trivial
ABoxes. By excluding 1, 331 s spent for preprocessing and consistency checking, the
accumulated classification times over the contained 200 ontologies are 3, 996 s for the
version No-C, 2, 503 s for Det-C, 1, 801 s for ET-C, and 1, 606 s for LT-C. Clearly,
the dataset also contains many ontologies for which completion graph caching does not
seem to have a significant impact, for example, if the ontologies can be processed de-
terministically or the ontologies are too difficult to even perform consistency checking
(which is the case for 3 ontologies). Nevertheless, our completion graph caching im-
proves the classification time with similar performances for LT-C and ET-C. By using
 3
     http://www.easychair.org/smart-program/VSL2014/ORE-index.html
    Table 3. Incremental reasoning effort for different reasoning tasks on changed ABoxes

                        Consistency           Classification               Realisation
           |∆|·100
Ontology     |K|     Time [s]   changed    Time [s] reclassified     Time [s] recomp. indi-
                     K K ∓∆ nodes [%]      K K ∓∆ classes [%]        K K ∓∆ viduals [%]
USDA-5       1       3.1   0.0    0.0      0.9 −           −        18.2 0.0         1.0
USDA-5       2       3.2   0.0    0.0      0.8 −           −        14.9 0.0         2.0
USDA-5       4       3.2   0.1    0.0      0.8 −           −        17.0 0.0         3.9
COSMO        1       0.4   0.0    3.1     24.4 17.4      73.0       0.4 0.2          3.1
COSMO        2       0.4   0.1    5.5     25.9 18.0      73.0        0.5 0.3         5.6
COSMO        4       0.4   0.1    9.9     25.2 18.6      72.8       0.6 0.4         10.0
DPC1         1       5.0   0.6    1.5      0.1 −           −        29.1 14.0       10.0
DPC1         2       4.9   1.1    2.6      0.1 −           −        27.7 19.9       16.9
DPC1         4       5.0   2.0    4.4      0.1 −           −        29.3 26.7       26.5
UOBM-1       1       3.6   2.3   10.0      1.3 0.8        5.9        ≥ 300.0        n/a
UOBM-1       2       3.7   2.9   14.0      1.4 1.1        7.9        ≥ 300.0        n/a
UOBM-1       4       3.7   3.5   18.2      1.4 1.7       11.3        ≥ 300.0        n/a


the satisfiability caching extension for nominals, as presented in Section 4, the accu-
mulated classification time can be further improved to 725 s. Similar results are also
achieved for the realisation of these ontologies. By excluding the times for all prereq-
uisites, the accumulated realisation times over all 200 ontologies are 1, 740 s for No-C,
1, 498 s for Det-C, 1, 061 s for ET-C, 1, 256 s for LT-C, and 923 s for the version where
the satisfiability caching extension for nominals is additionally activated.

Incremental Reasoning Experiments: To test the incremental reasoning based on the
presented completion graph caching, we used those ontologies of Table 1 that have a
large amount of ABox assertions and for which Konclude still has a clearly measurable
reasoning time, i.e., USDA-5, COSMO, DPC1, and UOBM-1. We simulated a changed
ABox for these ontologies by randomly removing a certain amount of assertions from
the ontology (denoted by K) and by re-adding the removed assertions and removing
new assertions (denoted by K ∓∆ ). For each ontology, we evaluated 10 random modifi-
cations that have 1, 2, and 4 % of the size of the ontology’s ABox. For simplicity, all
modifications have the same amount of removed and added assertions. The obtained
results for the presented incremental reasoning approach are shown in Table 3.
    For consistency, the first two columns show the (incremental) consistency check-
ing time (in seconds) for K and K ∓∆ , respectively, and the third column shows the
percentage of the nodes in the completion graph for K that has been changed for the
application of the modification. It can be observed that, especially for smaller modifica-
tions, the incremental consistency check often requires much less time than the initial
consistency check. In particular, the individuals of USDA-5 are sparsely connected via
object properties and, therefore, often only the modified individuals have to be re-built.
For larger modifications, the incremental consistency checking time increases signifi-
cantly for some ontologies, e.g., UOBM-1, and does almost catch up to the consistency
checking time for the initial ontology. On the one hand, our incremental consistency
checking approach clearly has some additional overhead due to the fact that nodes are
re-built step by step until an expansion as for the initial completion graph can be guaran-
teed, but, on the other hand, our prototypical implementation has still a lot of room for
improvements. For example, we currently also re-build nodes for individuals for which
only new assertions have been added although it would be sufficient to simply extend
the nodes of the previous deterministic completion graph by the new consequences.
    For classification, the first two columns show analogously the (incremental) classifi-
cation time for K and K ∓∆ , respectively, and the third column represents the percentage
of the classes for which satisfiability and subsumption tests were re-calculated. At the
moment, the used/modified nodes from the cached completion graph are tracked to-
gether for all satisfiability and subsumption tests and we mark those classes for which
nodes have been tracked. It can be observed that for the ontologies with nominals
(e.g., UOBM-1), only a few classes have to be re-classified and, in several cases, re-
classification is not required at all.
    Also for realisation, the first two columns show the (incremental) realisation time
for K and K ∓∆ , respectively, and the last column shows the percentage of the number
of individuals that are potentially affected by the changes and for which the (possible)
types have to be re-computed. For this, we separately track, for each individual, the
nodes of the cached completion graph that are used/modified by the instance tests.
Representative Caching Experiments: We integrated a first prototypical version of the
presented representative caching in our reasoning system Konclude, which is, however,
not yet compatible with all other integrated features and optimisations. As of now, the
integrated representative caching is primarily used for “simple individuals” that do not
have too much interaction with other individuals in the ABox. In cases where represen-
tative caching could easily cause performance deficits (e.g., through the intensive use of
nominals), Konclude caches the relevant parts of such completion graphs by using the
ordinary technique. Moreover, data property assertions are, at the moment, internally
transformed into class assertions and, as a consequence, nodes for individuals with
data property assertions can currently not be representatively cached. However, first
experiments are very encouraging. For example, Homo_sapiens is a very large SROIQ
ontology from the Oxford ontology library with 244, 232 classes, 255 object proper-
ties, and 289, 236 individuals for which Konclude requires 10, 211 MB in total to check
the consistency by using representative caching, whereas 19, 721 MB are required by
Konclude for consistency checking with a fully expanded completion graph. Note that a
large amount (9, 875 MB) of the required memory is used for the (unoptimised) internal
representation, the data from the preprocessing, and the parsed OWL objects which are
kept in memory to facilitate the handling of added/removed axioms.


6   Conclusions
We have presented a refinement of the completion graph caching technique that im-
proves ABox reasoning also for very expressive Description Logics through a more
sophisticated handling of non-deterministic consequences. In addition, we sketched ex-
tensions and applications of the caching technique, which allow for supporting nomi-
nals for the satisfiability caching of node labels, for reducing the incremental reasoning
effort for changing ABoxes, and for handling very large ABoxes by storing partially
processed parts of the completion graph in a representative way.
References
 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The De-
    scription Logic Handbook: Theory, Implementation, and Applications. Cambridge Univer-
    sity Press, second edn. (2007)
 2. Dolby, J., Fokoue, A., Kalyanpur, A., Schonberg, E., Srinivas, K.: Scalable highly expressive
    reasoner (SHER). J. of of Web Semantics 7(4), 357–361 (2009)
 3. Donini, F.M., Massacci, F.: EXPTIME tableaux for ALC. J. of Artificial Intelligence 124(1),
    87–138 (2000)
 4. Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: Hermit: An OWL 2 reasoner. J. of
    Automated Reasoning 53(3), 1–25 (2014)
 5. Glimm, B., Kazakov, Y., Liebig, T., Tran, T.K., Vialard, V.: Abstraction refinement for ontol-
    ogy materialization. In: Proc. 13th Int. Semantic Web Conf. (ISWC’14). LNCS, vol. 8797,
    pp. 180–195. Springer (2014)
 6. Haarslev, V., Möller, R.: On the scalability of description logic instance retrieval. J. of Auto-
    mated Reasoning 41(2), 99–142 (2008)
 7. Haarslev, V., Möller, R., Turhan, A.Y.: Exploiting pseudo models for TBox and ABox reason-
    ing in expressive description logics. In: Proc. 1st Int. Joint Conf. on Automated Reasoning
    (IJCAR’01). LNCS, vol. 2083, pp. 61–75. Springer (2001)
 8. Halaschek-Wiener, C., Hendler, J.: Toward expressive syndication on the web. In: Proc. 16th
    Int. Conf. on World Wide Web (WWW’07). ACM (2007)
 9. Halaschek-Wiener, C., Parsia, B., Sirin, E.: Description logic reasoning with syntactic up-
    dates. In: Proc. 4th Confederated Int. Conf. On the Move to Meaningful Internet Systems
    2006: CoopIS, DOA, GADA, and ODBASE(OTM’06), LNCS, vol. 4275, pp. 722–737.
    Springer (2006)
10. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc. 10th Int.
    Conf. on Principles of Knowledge Representation and Reasoning (KR’06). pp. 57–67. AAAI
    Press (2006)
11. Horrocks, I., Sattler, U.: A description logic with transitive and inverse roles and role hierar-
    chies. J. of Logic and Computation 9(3), 385–410 (1999)
12. Horrocks, I., Sattler, U.: Decidability of SHIQ with complex role inclusion axioms. Artifi-
    cial Intelligence 160(1), 79–104 (2004)
13. Horrocks, I., Sattler, U.: A tableau decision procedure for SHOIQ. J. of Automated Reson-
    ing 39(3), 249–276 (2007)
14. Hudek, A.K., Weddell, G.E.: Binary absorption in tableaux-based reasoning for description
    logics. In: Proc. 19th Int. Workshop on Description Logics (DL’06). vol. 189. CEUR (2006)
15. Kollia, I., Glimm, B.: Optimizing SPARQL query answering over OWL ontologies. J. of
    Artificial Intelligence Research 48, 253–303 (2013)
16. Simančík, F.: Elimination of complex RIAs without automata. In: Proc. 25th Int. Workshop
    on Description Logics (DL’12). CEUR Workshop Proceedings, vol. 846. CEUR-WS.org
    (2012)
17. Sirin, E., Cuenca Grau, B., Parsia, B.: From wine to water: Optimizing description logic
    reasoning for nominals. In: Proc. 10th Int. Conf. on Principles of Knowledge Representation
    and Reasoning (KR’06). pp. 90–99. AAAI Press (2006)
18. Steigmiller, A., Glimm, B., Liebig, T.: Optimised absorption for expressive description log-
    ics. In: Proc. 27th Int. Workshop on Description Logics (DL’14). vol. 1193. CEUR (2014)
19. Steigmiller, A., Glimm, B., Liebig, T.: Completion graph caching extensions and applications
    for expressive description logics. Tech. rep., Ulm University, Ulm, Germany (2015), avail-
    able online at https://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst.
    090/Publikationen/2015/DL2015CGCTR.pdf
20. Steigmiller, A., Liebig, T., Glimm, B.: Konclude: system description. J. of Web Semantics
    27(1) (2014)
21. Tsarkov, D., Horrocks, I.: Efficient reasoning with range and domain constraints. In: Proc.
    17th Int. Workshop on Description Logics (DL’04). vol. 104. CEUR (2004)
22. Wandelt, S., Möller, R.: Towards ABox modularization of semi-expressive description log-
    ics. Applied Ontology 7(2), 133–167 (2012)
23. Wu, J., Hudek, A.K., Toman, D., Weddell, G.E.: Absorption for ABoxes. J. of Automated
    Reasoning 53(3), 215–243 (2014)