=Paper=
{{Paper
|id=Vol-1350/paper-35
|storemode=property
|title=Completion
Graph Caching for Expressive Description Logics
|pdfUrl=https://ceur-ws.org/Vol-1350/paper-35.pdf
|volume=Vol-1350
|dblpUrl=https://dblp.org/rec/conf/dlog/SteigmillerGL15
}}
==Completion
Graph Caching for Expressive Description Logics==
Completion Graph Caching for Expressive Description Logics Andreas Steigmiller∗1 , Birte Glimm1 , and Thorsten Liebig2 1 University of Ulm, Ulm, Germany,. @uni-ulm.de 2 derivo GmbH, Ulm, Germany, liebig@derivo.de 1 Introduction Reasoning in very expressive Description Logics (DLs) such as SROIQ is often hard since non-determinism, e.g., from disjunctions or cardinality restrictions, requires a case-by-case analysis and since a strict separation between intensional (TBox) and ex- tensional (ABox) knowledge is not possible due to nominals. Current state-of-the-art reasoners for SROIQ are typically based on (hyper-)tableau algorithms, where higher level reasoning tasks such as classification are reduced to possibly many consistency tests. For large ABoxes and more expressive DLs, reasoning then easily becomes infea- sible since the ABox has to be considered in each test. For less expressive DLs, optimisations have been proposed that allow for consid- ering only parts of the ABox for checking whether an individual is an instance of a specific concept [22,23]. It is, however, not clear how these optimisations can be ex- tended to SROIQ. Furthermore, approaches based on partitioning and modularisation require a syntactic pre-analysis of the concepts, roles, and individuals in a KB, which can be quite costly for more expressive DLs and, due to the static analysis, only queries with specific concepts are supported. Rewriting axioms such that ABox reasoning is improved also carries the risk that this negatively influences other reasoning tasks for which ABox reasoning is potentially not or less relevant. Another possibility is the caching of the completion graph that is built by the tableau algorithm during the initial consistency check. The re-use of (parts of) the cached com- pletion graph in subsequent tests reduces the number of consequences that have to be re-derived for the individuals of the ABox. Existing approaches based on this idea (e.g., [17]) are, however, not very suitable for handling non-determinism. Since caching and re-using non-deterministically derived facts can easily cause unsound consequences, the non-deterministically derived facts are usually simply discarded and re-derived if necessary. We address this and present a technique that allows for identifying non- deterministic facts that can safely be re-used. The approach is based on a set of condi- tions that can be checked locally and, thus, allows for efficiently identifying individuals step-by-step for which non-deterministic consequences have to be re-considered in sub- sequent tests. The presented technique can directly be integrated into existing tableau- based reasoning systems without significant adaptations and reduces reasoning effort for all tasks for which consequences from the ABox are potentially relevant. Moreover, ∗ The author acknowledges the support of the doctoral scholarship under the Postgraduate Schol- arships Act of the Land of Baden-Wuerttemberg (LGFG). it can directly be used for the DL SROIQ, does not produce a significant overhead, and can easily be extended, e.g., to improve incremental ABox reasoning. Note that com- pletion graph caching is complementary to many other optimisations that try to reduce the number of subsequent tests for higher level reasoning tasks, e.g., summarisation [2], abstraction and refinement [5], bulk processing and binary retrieval [6], (pseudo) model merging [7], extraction of known/possible instances from model abstractions [15]. The paper is organised as follows: We next introduce some preliminaries and then present the basic completion graph caching technique in Section 3. In Section 4, we present several extensions and applications to support nominals in ordinary satisfiability caching, to perform incremental reasoning for changing ABoxes, and to handle very large ABoxes by storing data in a representative way. Finally, we present an evaluation of the presented techniques in Section 5 and conclude in Section 6. Further details and an extended evaluation are available in a technical report [19]. 2 Preliminaries For brevity, we do not introduce DLs (see, e.g., [1,10]). We use (possibly with sub- scripts) C, D for (possibly complex) concepts, A, B for atomic concepts, and r, s for roles. We assume that a SROIQ knowledge base K is expressed as the union of a TBox T consisting of GCIs of the form C v D, a role hierarchy R, and an ABox A, such that the effects of complex roles are implicitly encoded (e.g., based on automata [12] or regular expressions [16]). We write nnf(C) to denote the negation normal form of C. For r a role, we set inv(r) := r− and inv(r− ) := r. Model construction calculi such as tableau [10,13] decide the consistency of a KB K by trying to construct an abstraction of a model for K, a so-called completion graph. For the purpose of this paper, we use a tuple of the form (V, E, L, ,̇, M) for a completion graph G, where each node x ∈ V (edge hx, yi ∈ E) represents one or more (pairs of) individuals. Each node x (edge hx, yi) is labelled with a set of concepts (roles), L(x) (L(hx, yi)), which the (pairs of) individuals represented by x (hx, yi) are instances of. The relation ,̇ records inequalities, which must hold between nodes, e.g., due to at- least cardinality restrictions, and the mapping M tracks merging activities, e.g., due to at-most cardinality restrictions or nominals. The algorithm works by decomposing concepts in the completion graph with a set of expansion rules. For example, if C1 t C2 ∈ L(v) for some node v, but neither C1 ∈ L(v) nor C2 ∈ L(v), then the rule for handling disjunctions non-deterministically adds one of the disjuncts to L(v). Similarly, if ∃r.C ∈ L(v), but v does not have an r-successor with C in its label, then the algorithm expands the completion graph by adding the required successor node. If a node v is merged into a node v0 , M is extended by v 7→ v0 . We use mergedToM (v) to return the node into which a node v has been merged, i.e., mergedToM (v) = mergedToM (w) for v 7→ w ∈ M and mergedToM (v) = v otherwise. Unrestricted application of rules for existential or at-least cardinality restrictions can lead to the introduction of infinitely many new nodes. To guarantee termination, a cycle detection technique called (pairwise) blocking [11] restricts the application of these rules. The rules are repeatedly applied until either the completion graph is fully expanded (no more rules are applicable), in which case the graph can be used to con- struct a model that witnesses the consistency of K, or an obvious contradiction (called a clash) is discovered (e.g., both C and ¬C in a node label), proving that the completion graph does not correspond to a model. A knowledge base K is consistent if the rules can be applied such that they build a fully expanded and clash-free completion graph. 3 Completion Graph Caching and Reusing In the following, we use superscripts to distinguish different versions of completion graphs. We denote with Gd = (V d , E d , Ld , ,̇d , Md ) the last completion graph of the initial consistency test that is obtained with only deterministic rule applications and with Gn = (V n , E n , Ln , ,̇n , Mn ) the fully expanded (and clash-free) completion graph that possibly also contains non-deterministic choices. Obviously, instead of starting with the initial completion graph G0 to which no rule has (yet) been applied, we can use Gd to initialise a completion graph G for subsequent consistency tests which are, for example, required to prove or refute assumptions of higher level reasoning tasks. To be more precise, we extend Gd to G by the new individuals or by additional assertions to original individuals as required for a subsequent test and then we can apply the tableau expansion rules to G. Note, in order to be able to distinguish the nodes/nominals in the different completion graphs, we assume that all nodes/nominals that are newly created for G do not occur in existing completion graphs, such as Gd or Gn . This re-use of Gd is an obvious and straightforward optimisation and it is already used by many state-of-the-art reasoning systems to successfully reduce the work that has to be repeated in subsequent consistency tests [17]. Especially if the knowledge base is deterministic, the tableau expansion rules only have to be applied for the newly added assertions in G. In principle, also Gn can be re-used instead of Gd [17], but this causes problems if non-deterministically derived facts of Gn are involved in new clashes. In particular, it is required to do backtracking in such cases, i.e., we have to jump back to the last version of the initial completion graph that does not contain the consequences of the last non-deterministic decision that is involved in the clash. Then, we have to continue the processing by choosing another alternative. Obviously, if we have to jump back to a very early version of the completion graph, then potentially many non-deterministic decisions must be re-processed. Moreover, after jumping back, we also have to add and re-process the newly added individuals and/or assertions. To improve the handling of non-deterministic knowledge bases, our approach uses criteria to check whether nodes in G (or the nodes in a completion graph G0 obtained from G by further rule applications) are “cached”, i.e., there exist corresponding nodes in the cached completion graph of the initial consistency test and, therefore, it is not required to process them again. These caching criteria check whether the expansion of nodes is possible as in the cached completion graph Gn without influencing modified nodes in G0 , thus, only the processing of new and modified nodes is required. Definition 1 (Caching Criteria). Let Gd = (V d , E d , Ld , ,̇d , Md ) be a completion graph with only deterministically derived consequences and Gn = (V n , E n , Ln , ,̇n , Mn ) a fully expanded and clash-free expansion of Gd . Moreover, let G be an extension of Gd and G0 = (V 0 , E 0 , L0 , ,̇0 , M0 ) a completion graph that is obtained from G by applying tab- leau expansion rules. A node v0 ∈ V 0 is cached in G0 if caching of the node is not invalid, where the caching is invalid (we then also refer to the node as non-cached) if n C1 v0 < V d or mergedToM (v0 ) < V n ; n C2 L0 (v0 ) * Ln (mergedToM (v0 )); Mn 0 C3 ∀r.C ∈ L (mergedTo (v )) and there is an r-neighbour node w0 of v0 such that w0 n is not cached and C < L(w0 ); n C4 6 m r.C ∈ Ln (mergedToM (v0 )) and the number of the non-cached r-neighbour nodes of v0 without nnf(¬C) in their labels together with the r-neighbours in Gn of n mergedToM (v0 ) with C in their labels is greater than m; n C5 ∃r.C ∈ Ln (mergedToM (v0 )) and every r-neighbour node w0 of v0 with C < L(w0 ) n and C ∈ Ln (mergedToM (w 0 )) is not cached; Mn 0 C6 > m r.C ∈ L (mergedTo (v )) and the number of r-neighbour nodes wn1 , . . . , wnk n n of mergedToM (v0 ) with C ∈ Ln (wn1 ), . . . , C ∈ Ln (wnk ), for which there is either no n n node w0i ∈ V 0 with mergedToM (w0i ) = wni or w0i with mergedToM (w0i ) = wni and 0 C < L(wi ) is not cached for 1 ≤ i ≤ k, is less than m; n C7 mergedToM (v0 ) is a nominal node with 6 m r.C in its label and there exists a blockable andn non-cached inv(r)-predecessor node w0 of vn0 with nnf(¬C) < L(w0 ); C8 mergedToM (w0 ) is an r-neighbour node of mergedToM (v0 ) such that w0 is not 0 cached and w is not an r-neighbour node of v0 ; Mn 0 C9 mergedTo (v ) has a successor node un such that un or a descendant of un has n a successor node mergedToM (w0 ) for which w0 ∈ V 0 , w0 is not cached, and there n exists no node u ∈ V with mergedToM (u0n) = un ; 0 0 Mn 0 C10 mergedTo (v ) is blocked by mergedToM (w0 ) and w0 ∈ V 0 is non-cached; n n C11 there is a non-cached node w0 ∈ V 0 and mergedToM (v0 ) = mergedToM (w0 ); or 0 n n C12 there is a node w0 ∈ V 0 such that v0 ,̇ w0 and mergedToM (v0 ) = mergedToM (w0 ). Conditions C1 and C2 ensure that a node also exists in the cached completion graph Gn and that its label is a subset of the corresponding label in Gn such that the same ex- pansion is possible. C3 checks whether the expansion of a node would add a concept of the form ∀r.C such that it could propagate C to a non-cached neighbour node. Anal- ogously, C4 checks for potentially violated at-most cardinality restrictions by counting the new or modified neighbours in G0 and the neighbours in Gn . C5 and C6 verify that existential and at-least cardinality restrictions are still satisfied if the cached nodes are expanded identically. C7 checks whether the NN-rule of the tableau algorithm would be applicable after the expansion, i.e., we check whether all potentially relevant neigh- bours in G0 are nominal nodes. C8 checks whether the expansion would add additional roles to edge labels between cached and non-cached nodes, which could be problem- atic for disjoint roles. For C9: If a node w0 , for which caching is invalid, is connected to nodes in Gn that are only available in Gn (e.g., non-deterministically created ones), then we have to identify caching as invalid for those ancestors of these nodes that are also in G0 such that these connections to w0 can be re-built. Otherwise, we would potentially miss new consequences that could be propagated from w0 . C10 is used to reactivate the processing of nodes for which the caching of the blocker node is invalid. C11 and C12 ensure that merging is possible as in Gn : C11 checks whether the node into which the node is merged is also cached and C12 ensures that there is no additional entry for ,̇0 that would cause a clash if the nodes were merged as in Gn . >, {b}, B t ∃r.∃r.({c} u A2 ), B t ∀s− .A3 , n o ( ) L(va ) = >, {a}, ∃r.{b}, B t A1 , A1 L(vb ) = ∃r.∃r.({c} u A2 ),∀s− .A3 r s va vb vc r r ( ) n o v1 >, {c}, ∃s.{b}, L(v1 ) = >, ∃r.({c} u A2 ) L(vc ) = {c} u A2 ,A2 ,A3 Fig. 1. Constructed and cached completion graph for Example 1 with deterministically (coloured black) and non-deterministically (coloured grey) derived facts Since only those nodes can be cached, which are available in the “deterministic” completion graph Gd , it is important to maximise the deterministic processing of the completion graph. This can, for example, be achieved by processing the completion graph with deterministic rules only until the generation of new nodes is subset blocked. Subset blocking is not sufficient for more expressive DLs, but it prevents the expansion of too many successor nodes in case non-deterministic rule applications merge and prune some parts of the completion graph. Absorption techniques (e.g., [14,18,21]) are also essential since they reduce the overall non-determinism in an ontology. Example 1. Assume that the tableau algorithm builds a completion graph as depicted in Figure 1 for testing the consistency of a knowledge base K containing the axioms {a} v B t A1 {b} v B t ∃r.∃r.({c} u A2 ) {c} v ∃s.{b} {a} v ∃r.{b} {b} v B t ∀s− .A3 . The deterministic version of the completion graph, Gd , contains the elements coloured in black in Figure 1 and the non-deterministic version, Gn , additionally contains the elements coloured in grey. If we want to determine which individuals are instances of the concept ∃r.>, then we have to check, for each individual i in K, the consistency of K extended by nnf(¬∃r.>)(i), which is equivalent to adding the axiom {i} v ∀r.⊥. The individual a is obviously an instance of this concept, which can also be observed by the fact that expanding the extended completion graph adds ∀r.⊥ to Ld (va ), which immediately results in a clash. In contrast, if we extend Ld (vb ) by ∀r.⊥, we have to process the disjunctions Bt∃r.∃r.({c}uA2 ) and Bt∀s− .A3 . The disjunct ∃r.∃r.({c}uA2 ) would result in a clash and, therefore, we choose B, which also satisfies the second disjunction. Note that va and vc do not have to be processed since va is cached, i.e., its expansion is possible in the same way as in Gn , and vc does not have any concepts for which processing is required. Last but not least, we have to test whether Ld (vc ) extended by ∀r.⊥ is satisfiable. Now, the caching of vc is obviously invalid (due to C2) and, therefore, also the caching of vb is invalid: C3 can be applied for ∀s− .A3 and C9 for the successor node v1 of vb in Gn which also has the non-cached successor node vc . Since va is still cached, we only have to process vb again, which does, however, not result in a clash. Hence, only a is an instance of ∃r.>. Most conditions can be checked locally: For a non-cached node, we simply follow its edges w.r.t. G0 and Gn to (potentially indirect) neighbour nodes that are also available in Gd . Exceptions are Conditions C10, C11, and C12, for which we can, however, sim- ply trace back established blocking relations or Mn to find nodes for which the caching criteria have to be checked. The implementation can also be simplified by keeping the caching of a node invalid once it has been identified as invalid (even if the caching be- comes possible again after the node has been expanded identically). Instead of directly checking the caching criteria, the relevant nodes can also be stored in a set/queue to check the conditions when it is (more) convenient. Clearly, it depends on the ontology, whether it is more beneficial to test the caching criteria earlier or later. If the criteria are tested early, then we could unnecessarily re-process some parts of the cached com- pletion graph since the application of (non-deterministic) expansion rules can satisfy some conditions. A later testing could cause a repeated re-processing of the same parts if a lot of backtracking is required and consequences of re-activated nodes are involved in clashes. Alternatively, one could learn for the entire ontology or for single nodes, whether the criteria should be checked earlier or later. Unfortunately, this cannot easily be realised for all reasoning systems since it requires that dependencies between de- rived facts are precisely tracked in order to identify nodes that are often involved in the creation of clashes and for which the criteria should be checked early. It is also possible to non-deterministically re-use derived consequences from Gn , n i.e., if the caching is invalid for a node v and mergedToM (v) is in Gn , then we can non- n deterministically add the missing concepts from Ln (mergedToM (v)) to L(v). Since the resulting completion graph is very similar to the cached one, caching can often be es- tablished quickly for many nodes. Of course, if some of the non-deterministically added concepts are involved in clashes, then we potentially have to backtrack and process the alternative where this node is ordinarily processed. Another nice side effect of storing Gn is that we can use the non-deterministic decisions from Gn as an orientation in sub- sequent consistency tests. By prioritising the processing of the same non-deterministic alternatives as for Gn , we can potentially find a solution that is very similar to Gn with- out exploring much of the search space. 4 Caching Extensions and Applications In this section, we sketch additional applications of the caching technique, which allow for supporting nominals for the satisfiability caching of node labels and for reducing the incremental reasoning effort for changing ABoxes. Furthermore, we describe an extension that allows for caching (parts of) the completion graph in a representative way, whereby an explicit representation of many nodes in the completion graph can be avoided and, therefore, the memory consumption can be reduced. Satisfiability Caching with Nominals: Caching the satisfiability status of labels of (blockable) nodes in completion graphs is an important optimisation technique for tableau-based reasoning systems [3,4]. If one obtains a fully expanded and clash-free completion graph, then the contained node labels can be cached and, if identical labels occur in other completion graphs, then their expansion (i.e., the creation of required successor nodes) is not necessary since their satisfiability has already been proven. Of course, for more expressive DLs that include, for example, inverse roles and cardinality restrictions, we have to consider pairs of node labels as well as the edge labels between these nodes. Unfortunately, this kind of satisfiability caching does not work for DLs with nominals. In particular, connections to nominal nodes can be used to propagate new concepts from one blockable node in the completion graph to any other blockable node, whereas for DLs without nominals, the consequences can only be propagated from or to successor nodes. Hence, the caching of node labels is not easily possible and many reasoners deactivate this kind of caching for knowledge bases with nominals. However, in combination with completion graph caching, we re-gain the possibility to cache some labels of (blockable) nodes for knowledge bases with nominals. Roughly speaking, we first identify which nodes “depend on” which nominals, i.e., which nom- inal nodes are existentially quantified as a successor/descendant for a node. The labels of such nodes can then be cached together with the dependent nominals, i.e., with those nominals on which the nodes depend, if all nodes for the dependent nominals are still cached w.r.t. the initial completion graph. These cache entries can be re-used for nodes with identical labels in subsequent completion graphs as long as the completion graph caching of the nodes for the dependent nominals is not invalid. Of course, the blocked processing of nodes due to matching cache entries has to be reactivated if the comple- tion graph caching of a node for a dependent nominal becomes invalid. Moreover, we cannot cache the labels of blockable nodes that depend on newly generated nominals since possible interactions over these nodes are not detected. Incremental Reasoning for Changing ABoxes: Many reasoning systems re-start reason- ing from scratch if a few axioms in the knowledge base have been changed. However, it is not very likely that a few changes in the ABox of a knowledge base have a huge impact on reasoning. In particular, many ABox assertions only propagate consequences to the local neighbourhood of the modified individuals and, therefore, the results of rea- soning tasks such as classification are often not affected, even if nominals are used in the knowledge base. With the presented completion graph caching, we can easily track which nodes from the cached completion graph are modified in subsequent tests and for which caching is invalidated to perform higher level reasoning tasks. Hence, if the changed ABox assertions have only a known, locally limited influence, then the reason- ing effort for many (higher level) reasoning tasks can be reduced by checking whether a tracked node is affected by the changes. To detect the influences of the changes, an incremental consistency check can be performed where all modified individuals and their neighbours are deterministically re-constructed step-by-step until “compatibility” with the previous deterministic completion graph is achieved, i.e., the same determi- nistic consequences are derived for the nodes as in the previous completion graph. The non-deterministic version of the new completion graph can then be obtained by con- tinuing the (non-deterministic) processing of the re-constructed nodes and by using the presented completion graph caching for all remaining nodes. Hence, we can iden- tify the influenced nodes by comparing the newly obtained completion graph with the previous one. Compared to other incremental consistency checking approaches (e.g., [9]), the re-construction of changed parts supported by the completion graph caching enables a better handling of non-determinism and does not require a memory inten- sive tracking of which consequences are caused by which ABox assertions. The idea of tracking those parts of a completion graph that are potentially relevant/used for the calculation of higher level reasoning tasks and comparing them with the changed parts in new completion graphs has already been proposed for answering conjunctive queries under incremental ABox updates [8], but the completion graph caching simplifies the Table 1. Ontology metrics for selected benchmark ontologies (A stands for Axioms, C for Classes, P for Properties, I for Individuals, CA for Class Assertions, OPA for Object Property Assertions, and DPA for Data Property Assertions) Ontology Expressivity #A #C #P #I #CA #OPA #DPA OGSF SROIQ(D) 1, 235 386 179 57 45 58 20 Wine SHOIN(D) 1, 546 214 31 367 409 492 2 DOLCE SHOIN 1, 667 209 317 42 101 36 0 OBI SROIQ(D) 28, 770 3, 549 152 161 273 19 1 USDA-5 ALCIF (D) 1, 401 30 147 1, 214 1, 214 12 0 COSMO SHOIN(D) 29, 655 7, 790 941 7, 817 8, 675 3, 240 665 DPC1 ALCIF (D) 55, 020 1, 920 94 28, 023 15, 445 39, 453 0 UOBM-1 SHOIN(D) 260, 728 69 44 25, 453 46, 403 143, 549 70, 628 realisation of this technique and significantly reduces the overhead for identifying those parts of higher level reasoning tasks that have to be re-computed. Moreover, with the completion graph caching, also very expressive DLs such as SROIQ can be supported. Representative Caching: In order to reduce the memory consumption for caching the completion graph, the technique can be adapted such that all relevant data is stored in a representative way, which allows for building “local” completion graphs for small sub- sets of the entire ABox until the existence of a complete completion graph considering all individuals can be guaranteed. To be more precise, if a fully expanded and clash-free completion graph is constructed for a subset of the ABox (e.g., a subset of all individu- als and their assertions), then we extract and generalise information from the processed individuals and store them in a representative cache. If we then try to build a completion graph for another subset of the ABox that has some overlapping with a previously han- dled subset (e.g., role assertions for which edges to previous handled individuals have to be created), then we load the available data from the cache and continue the process- ing of the overlapping part until it is “compatible”, i.e., the expansion of the remaining individuals in the cache can be guaranteed as in the previously constructed completion graphs. Of course, this only works well for knowledge bases for which there is not too much non-deterministic interaction between the separately handled ABox parts. More- over, compared to the ordinary completion graph caching, we are clearly trading a lower memory consumption against an increased runtime since more work potentially has to be repeated to establish compatibility. 5 Implementation and Evaluation The completion graph caching introduced in Section 3 is integrated in our reasoning system Konclude [20] and we selected several well-known benchmark ontologies (cf. Table 1) for the evaluation of the presented techniques. The evaluation was carried out on a Dell PowerEdge R420 server running with two Intel Xeon E5-2440 hexa core processors at 2.4 GHz with Hyper-Threading and 144 GB RAM under a 64bit Ubuntu 12.04.2 LTS. In order to make the evaluation independent of the number of CPU cores, we used only one worker thread for Konclude. We ignored the time spent for parsing ontologies and writing results and we used a time limit of 5 minutes, i.e., 300 seconds. Table 2. Reasoning times for different completion graph caching techniques (in seconds) Prep.+ Classification Realisation Ontology Cons. No-C Det-C ET-C LT-C No-C Det-C ET-C LT-C OGSF 0.0 3.8 1.0 0.2 0.2 0.1 0.0 0.0 0.0 Wine 0.0 49.5 29.6 0.8 0.8 49.1 25.8 0.2 0.1 OBI 0.2 65.9 19.2 1.5 1.5 2.2 2.0 0.0 0.1 DOLCE 0.0 6.7 1.1 0.2 0.2 ≥ 300.0 5.2 0.1 0.1 USDA-5 4.1 0.8 1.0 1.0 0.8 ≥ 300.0 38.7 20.5 20.2 COSMO 0.6 ≥ 300.0 ≥ 300.0 42.2 11.2 n/a n/a 11.2 19.9 DPC1 6.5 0.1 0.2 0.1 0.1 ≥ 300.0 53.3 19.4 20.5 UOBM-1 6.7 240.6 4.8 1.3 1.1 ≥ 300.0 ≥ 300.0 ≥ 300.0 ≥ 300.0 Table 2 shows the reasoning times for consistency checking (including preprocess- ing), classification, and realisation (in seconds) with different completion graph caching techniques integrated in Konclude. Please note that the class hierarchy is required to re- alise an ontology, i.e., classification is a prerequisite of realisation, and, analogously, consistency checking as well as preprocessing are prerequisites of classification. Thus, realisation cannot be performed if the time limit is already reached for classification. If no completion graph caching is activated (No-C), then the realisation and classi- fication can often require a large amount of time since Konclude has to re-process the entire ABox for all instance and subsumption tests (if the ontology uses nominals). For several ontologies, such as Wine and DOLCE, the caching and re-use of the determi- nistic completion graph from the consistency check (Det-C) already leads to significant improvements. Nevertheless, with the two variants ET-C and LT-C of the presented completion graph caching technique, where ET-C uses an “early testing” and LT-C a “late testing” of the defined caching criteria, Konclude can further reduce the reason- ing times. In particular, with both completion graph caching techniques, all evaluated ontologies can easily be classified and also the realisation can be realised efficiently for all but UOBM-1. Table 2 also reveals that only for COMSO there is a remarkable difference between ET-C and LT-C, where this difference can be explained by the fact that there is often more interaction with the individuals from the ABox for instance tests than for satisfiability and subsumption tests and, therefore, ET-C can be better for realisation due to the potentially lower effort in combination with backtracking. The effects of the different completion graph caching techniques can also be ob- served for the OWL DL Realisation dataset of the ORE 2014 competition,3 which contains several ontologies with non-deterministic language features and non-trivial ABoxes. By excluding 1, 331 s spent for preprocessing and consistency checking, the accumulated classification times over the contained 200 ontologies are 3, 996 s for the version No-C, 2, 503 s for Det-C, 1, 801 s for ET-C, and 1, 606 s for LT-C. Clearly, the dataset also contains many ontologies for which completion graph caching does not seem to have a significant impact, for example, if the ontologies can be processed de- terministically or the ontologies are too difficult to even perform consistency checking (which is the case for 3 ontologies). Nevertheless, our completion graph caching im- proves the classification time with similar performances for LT-C and ET-C. By using 3 http://www.easychair.org/smart-program/VSL2014/ORE-index.html Table 3. Incremental reasoning effort for different reasoning tasks on changed ABoxes Consistency Classification Realisation |∆|·100 Ontology |K| Time [s] changed Time [s] reclassified Time [s] recomp. indi- K K ∓∆ nodes [%] K K ∓∆ classes [%] K K ∓∆ viduals [%] USDA-5 1 3.1 0.0 0.0 0.9 − − 18.2 0.0 1.0 USDA-5 2 3.2 0.0 0.0 0.8 − − 14.9 0.0 2.0 USDA-5 4 3.2 0.1 0.0 0.8 − − 17.0 0.0 3.9 COSMO 1 0.4 0.0 3.1 24.4 17.4 73.0 0.4 0.2 3.1 COSMO 2 0.4 0.1 5.5 25.9 18.0 73.0 0.5 0.3 5.6 COSMO 4 0.4 0.1 9.9 25.2 18.6 72.8 0.6 0.4 10.0 DPC1 1 5.0 0.6 1.5 0.1 − − 29.1 14.0 10.0 DPC1 2 4.9 1.1 2.6 0.1 − − 27.7 19.9 16.9 DPC1 4 5.0 2.0 4.4 0.1 − − 29.3 26.7 26.5 UOBM-1 1 3.6 2.3 10.0 1.3 0.8 5.9 ≥ 300.0 n/a UOBM-1 2 3.7 2.9 14.0 1.4 1.1 7.9 ≥ 300.0 n/a UOBM-1 4 3.7 3.5 18.2 1.4 1.7 11.3 ≥ 300.0 n/a the satisfiability caching extension for nominals, as presented in Section 4, the accu- mulated classification time can be further improved to 725 s. Similar results are also achieved for the realisation of these ontologies. By excluding the times for all prereq- uisites, the accumulated realisation times over all 200 ontologies are 1, 740 s for No-C, 1, 498 s for Det-C, 1, 061 s for ET-C, 1, 256 s for LT-C, and 923 s for the version where the satisfiability caching extension for nominals is additionally activated. Incremental Reasoning Experiments: To test the incremental reasoning based on the presented completion graph caching, we used those ontologies of Table 1 that have a large amount of ABox assertions and for which Konclude still has a clearly measurable reasoning time, i.e., USDA-5, COSMO, DPC1, and UOBM-1. We simulated a changed ABox for these ontologies by randomly removing a certain amount of assertions from the ontology (denoted by K) and by re-adding the removed assertions and removing new assertions (denoted by K ∓∆ ). For each ontology, we evaluated 10 random modifi- cations that have 1, 2, and 4 % of the size of the ontology’s ABox. For simplicity, all modifications have the same amount of removed and added assertions. The obtained results for the presented incremental reasoning approach are shown in Table 3. For consistency, the first two columns show the (incremental) consistency check- ing time (in seconds) for K and K ∓∆ , respectively, and the third column shows the percentage of the nodes in the completion graph for K that has been changed for the application of the modification. It can be observed that, especially for smaller modifica- tions, the incremental consistency check often requires much less time than the initial consistency check. In particular, the individuals of USDA-5 are sparsely connected via object properties and, therefore, often only the modified individuals have to be re-built. For larger modifications, the incremental consistency checking time increases signifi- cantly for some ontologies, e.g., UOBM-1, and does almost catch up to the consistency checking time for the initial ontology. On the one hand, our incremental consistency checking approach clearly has some additional overhead due to the fact that nodes are re-built step by step until an expansion as for the initial completion graph can be guaran- teed, but, on the other hand, our prototypical implementation has still a lot of room for improvements. For example, we currently also re-build nodes for individuals for which only new assertions have been added although it would be sufficient to simply extend the nodes of the previous deterministic completion graph by the new consequences. For classification, the first two columns show analogously the (incremental) classifi- cation time for K and K ∓∆ , respectively, and the third column represents the percentage of the classes for which satisfiability and subsumption tests were re-calculated. At the moment, the used/modified nodes from the cached completion graph are tracked to- gether for all satisfiability and subsumption tests and we mark those classes for which nodes have been tracked. It can be observed that for the ontologies with nominals (e.g., UOBM-1), only a few classes have to be re-classified and, in several cases, re- classification is not required at all. Also for realisation, the first two columns show the (incremental) realisation time for K and K ∓∆ , respectively, and the last column shows the percentage of the number of individuals that are potentially affected by the changes and for which the (possible) types have to be re-computed. For this, we separately track, for each individual, the nodes of the cached completion graph that are used/modified by the instance tests. Representative Caching Experiments: We integrated a first prototypical version of the presented representative caching in our reasoning system Konclude, which is, however, not yet compatible with all other integrated features and optimisations. As of now, the integrated representative caching is primarily used for “simple individuals” that do not have too much interaction with other individuals in the ABox. In cases where represen- tative caching could easily cause performance deficits (e.g., through the intensive use of nominals), Konclude caches the relevant parts of such completion graphs by using the ordinary technique. Moreover, data property assertions are, at the moment, internally transformed into class assertions and, as a consequence, nodes for individuals with data property assertions can currently not be representatively cached. However, first experiments are very encouraging. For example, Homo_sapiens is a very large SROIQ ontology from the Oxford ontology library with 244, 232 classes, 255 object proper- ties, and 289, 236 individuals for which Konclude requires 10, 211 MB in total to check the consistency by using representative caching, whereas 19, 721 MB are required by Konclude for consistency checking with a fully expanded completion graph. Note that a large amount (9, 875 MB) of the required memory is used for the (unoptimised) internal representation, the data from the preprocessing, and the parsed OWL objects which are kept in memory to facilitate the handling of added/removed axioms. 6 Conclusions We have presented a refinement of the completion graph caching technique that im- proves ABox reasoning also for very expressive Description Logics through a more sophisticated handling of non-deterministic consequences. In addition, we sketched ex- tensions and applications of the caching technique, which allow for supporting nomi- nals for the satisfiability caching of node labels, for reducing the incremental reasoning effort for changing ABoxes, and for handling very large ABoxes by storing partially processed parts of the completion graph in a representative way. References 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The De- scription Logic Handbook: Theory, Implementation, and Applications. Cambridge Univer- sity Press, second edn. (2007) 2. Dolby, J., Fokoue, A., Kalyanpur, A., Schonberg, E., Srinivas, K.: Scalable highly expressive reasoner (SHER). J. of of Web Semantics 7(4), 357–361 (2009) 3. Donini, F.M., Massacci, F.: EXPTIME tableaux for ALC. J. of Artificial Intelligence 124(1), 87–138 (2000) 4. Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: Hermit: An OWL 2 reasoner. J. of Automated Reasoning 53(3), 1–25 (2014) 5. Glimm, B., Kazakov, Y., Liebig, T., Tran, T.K., Vialard, V.: Abstraction refinement for ontol- ogy materialization. In: Proc. 13th Int. Semantic Web Conf. (ISWC’14). LNCS, vol. 8797, pp. 180–195. Springer (2014) 6. Haarslev, V., Möller, R.: On the scalability of description logic instance retrieval. J. of Auto- mated Reasoning 41(2), 99–142 (2008) 7. Haarslev, V., Möller, R., Turhan, A.Y.: Exploiting pseudo models for TBox and ABox reason- ing in expressive description logics. In: Proc. 1st Int. Joint Conf. on Automated Reasoning (IJCAR’01). LNCS, vol. 2083, pp. 61–75. Springer (2001) 8. Halaschek-Wiener, C., Hendler, J.: Toward expressive syndication on the web. In: Proc. 16th Int. Conf. on World Wide Web (WWW’07). ACM (2007) 9. Halaschek-Wiener, C., Parsia, B., Sirin, E.: Description logic reasoning with syntactic up- dates. In: Proc. 4th Confederated Int. Conf. On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE(OTM’06), LNCS, vol. 4275, pp. 722–737. Springer (2006) 10. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc. 10th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR’06). pp. 57–67. AAAI Press (2006) 11. Horrocks, I., Sattler, U.: A description logic with transitive and inverse roles and role hierar- chies. J. of Logic and Computation 9(3), 385–410 (1999) 12. Horrocks, I., Sattler, U.: Decidability of SHIQ with complex role inclusion axioms. Artifi- cial Intelligence 160(1), 79–104 (2004) 13. Horrocks, I., Sattler, U.: A tableau decision procedure for SHOIQ. J. of Automated Reson- ing 39(3), 249–276 (2007) 14. Hudek, A.K., Weddell, G.E.: Binary absorption in tableaux-based reasoning for description logics. In: Proc. 19th Int. Workshop on Description Logics (DL’06). vol. 189. CEUR (2006) 15. Kollia, I., Glimm, B.: Optimizing SPARQL query answering over OWL ontologies. J. of Artificial Intelligence Research 48, 253–303 (2013) 16. Simančík, F.: Elimination of complex RIAs without automata. In: Proc. 25th Int. Workshop on Description Logics (DL’12). CEUR Workshop Proceedings, vol. 846. CEUR-WS.org (2012) 17. Sirin, E., Cuenca Grau, B., Parsia, B.: From wine to water: Optimizing description logic reasoning for nominals. In: Proc. 10th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR’06). pp. 90–99. AAAI Press (2006) 18. Steigmiller, A., Glimm, B., Liebig, T.: Optimised absorption for expressive description log- ics. In: Proc. 27th Int. Workshop on Description Logics (DL’14). vol. 1193. CEUR (2014) 19. Steigmiller, A., Glimm, B., Liebig, T.: Completion graph caching extensions and applications for expressive description logics. Tech. rep., Ulm University, Ulm, Germany (2015), avail- able online at https://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst. 090/Publikationen/2015/DL2015CGCTR.pdf 20. Steigmiller, A., Liebig, T., Glimm, B.: Konclude: system description. J. of Web Semantics 27(1) (2014) 21. Tsarkov, D., Horrocks, I.: Efficient reasoning with range and domain constraints. In: Proc. 17th Int. Workshop on Description Logics (DL’04). vol. 104. CEUR (2004) 22. Wandelt, S., Möller, R.: Towards ABox modularization of semi-expressive description log- ics. Applied Ontology 7(2), 133–167 (2012) 23. Wu, J., Hudek, A.K., Toman, D., Weddell, G.E.: Absorption for ABoxes. J. of Automated Reasoning 53(3), 215–243 (2014)