Introduction

Optimised Absorption for Expressive Description Logics

Andreas Steigmiller

Birte Glimm

Thorsten Liebig

liebig@derivo.de 1 0 University of Ulm , Ulm, Germany, <first name>.<last 1 derivo GmbH , Ulm , Germany

The Web Ontology Language (OWL 2) [19] is based on the very expressive Description Logic (DL) SROIQ [3] for which sophisticated algorithms are required to handle (standard) reasoning tasks. In practice, variants of tableau algorithms are often used since they are easily extensible and adaptable. Since such algorithms have a very high worst-case complexity, developing optimisations to nevertheless allow for highly e cient implementations is a long-standing research area in DLs (see, e.g., [4,18]). A very e ective and widely implemented optimisation is absorption, which is a preprocessing step that aims at rewriting general concept inclusion (GCI) axioms such that non-determinism in the tableau algorithm is avoided as much as possible. In this paper, we present an improved variant of a recursive binary absorption algorithm that generalises the well-known techniques of binary absorption [7] and role absorption [17]. The algorithm also allows for absorbing parts of concepts and, as a result, more expressive concept constructors can be handled and non-determinism can often be delayed further. The algorithm has already been introduced in our previous work [12] as an essential pre-requisite for handling nominal schemas [8]. Here we simplify its presentation, prove the correctness, present several extensions for the algorithm, and provide a comparison with other absorption techniques. The algorithm is implemented in the tableau-based reasoner Konclude [15] and is essential for Konclude's e ciency. In fact, many other optimisation techniques in Konclude are based on the presented absorption and they significantly benefit from the reduced or eliminated non-determinism [13,14]. We next introduce the basics of tableau algorithms and absorption. In Section 3, we introduce our recursive absorption algorithm for which we present several extension in Section 4. We discuss related work in Section 5 before we conclude in Section 6.

Introduction

Model construction calculi, such as tableau [ 3,6 ], decide the consistency of a knowledge base K by trying to construct an abstraction of a model for K , a so-called completion graph. A completion graph G is a tuple (V; E; L; ,˙ ), where each node x 2 V (edge hx; yi 2 E) represents one or more (pairs of) individuals. Each node x (edge hx; yi) is labelled with a set of concepts (roles), L(x) (L(hx; yi)), which the (pairs of) individuals represented by x (hx; yi) are instances of. The relation ,˙ records inequalities, which must hold between nodes, e.g., due to at-least cardinality restrictions. The algorithm works by decomposing concepts in the completion graph with a set of expansion rules. For example, if C1 t C2 2 L(v) for some node v, but neither C1 2 L(v) nor C2 2 L(v), then the rule for handling disjunctions non-deterministically adds one of the disjuncts to L(v). Similarly, if 9r:C 2 L(v), but v does not have an r-successor with C in its label, then the algorithm expands the completion graph by adding the required successor node. Unrestricted application of this rule or the rule that handles at-least cardinality restrictions can lead to the introduction of infinitely many new tableau nodes. To nevertheless guarantee termination, a cycle detection technique called (pairwise) blocking [ 5 ] restricts the application of these rules. The rules are repeatedly applied until either the completion graph is fully expanded (no more rules are applicable), in which case the graph can be used to construct a model that witnesses the consistency of K , or an obvious contradiction (called a clash) is discovered (e.g., both C and :C in a node label), proving that the completion graph does not correspond to a model. The input knowledge base K is consistent if the rules can be applied such that they build a fully expanded and clash-free completion graph.

In order to guarantee that each node of the completion graph indeed satisfies all axioms of the TBox, one can “internalise” the TBox into a concept that is added to each node label. For example, if the TBox contains the axioms

A v 9r:(B1 u B2) (1)

A u B1 u 9r:(B1 t B2) v 9s:A (2) the internalisation contains one conjunct for each axiom that is a disjunction with the negated left-hand side of the axiom and the right-hand side: nnf( (:A t 9r:(B1 u B2)) u (:(A u B1 u 9r:(B1 t B2)) t 9s:A) ). Given that > is satisfied at each node, a tableau rule can be specified that adds such an internalised concept, say CI , with an auxiliary axiom of the form > v CI to each node label. Clearly, internalisation introduces a large number of disjunctions in each node label, which possibly require several nondeterministic choices and backtracking if the choice resulted in a clash. 2.1

Absorption

Instead of internalising the TBox axioms, it is more e cient to integrate a rule into the tableau calculus that checks, for each GCI C v D, whether C is satisfied for a node and if this is the case, then D is also added to the node label. For Axiom (1), for example, it is easy to check whether the atomic concept A is satisfied at a node. This lazy unfolding for atomic concepts can be realised by additionally using Rv1 of Table 1 in the tableau algorithm. With this rule, we do not have to internalise axioms of the form A v C.

Checking whether a complex left-hand side of an axiom is satisfied can, however, be non-trivial. For example, it can often not be verified syntactically, whether a node satisfies 9r:(B1 t B2), which would be required for Axiom (2). For example, with Axioms (1) and (2), any instance of A has an r-successor that satisfies B1 u B2 and, therefore, this A instance (semantically) also satisfies 9r:(B1 t B2) (but possibly not B1). In order to nevertheless avoid the internalisation of such GCIs into axioms of the form > v nnf(:C t D) (and the resulting processing of disjunctions), practical reasoning systems use elaborate transformations in a preprocessing step called absorption.

An absorption algorithm rewrites axioms in such a way that internalisation and the processing of disjunctions in the tableau algorithm are avoided as much as possible. To achieve this, the absorption algorithm extracts conditions for an axiom that can easily be checked, and only if the conditions hold for a node in a completion graph, then the remaining (complex) part of the axiom has to be added to the node’s label, while otherwise the axiom is trivially satisfied. For example, consider the internalisation > v :A t :B1 t 8r:(:B1 u :B2) t 9s:A of Axiom (2). We can observe that any node in a tableau that does not have an r-neighbour trivially satisfies 8r:(:B1u:B2) and, hence, the overall disjunction. Furthermore, even if there is an r-successor, this is only problematic, if the successor satisfies B1 or B2. To check this, we can introduce axioms that make sure that a fresh marker concept T1 is added to each node label that contains B1 or B2, which then triggers the propagation of another marker T2 to the r-predecessor to indicate that one of the other disjuncts :A, :B1, or 9s:A has to be satisfied. Since it can also easily be checked whether A or B1 is satisfied, we can use the axioms:

B1 v T1 B2 v T1 T1 v 8r :T2 A u B1 u T2 v 9s:A to express Axiom (2) without any disjunction. We say that :A, :B1, and 8r:(:B1 u:B2) are completely absorbable since they can be moved out of the disjunction, whereas 9s:A is non-absorbable, i.e., it cannot easily be checked when the concept is satisfied and, hence, the concept has to remain on the right-hand side of the axiom.

Note that the last axiom cannot be handled by the lazy unfolding rule Rv1 since its left-hand side consists of a conjunction of atomic concepts. Instead of adding a rule that processes axioms of the form A1 u: : :uAn v C with Ai atomic concepts, an e cient way of handling such axioms is binary absorption [ 7 ]. Binary absorption splits such axioms such that A1 and A2 imply a fresh concept T1 (A1 u A2 v T1), then T1 is combined with the next condition A3 and so on, until Tn 2 u An v Tn 1. The concept Tn 1 then implies the concept C. Hence, rule Rv2 in Table 1 is su cient for handling such binary axioms. 3

A Recursive Algorithm for Partial Absorption

We next present an improved variant of a recursive binary absorption algorithm. In comparison to other absorption techniques, the presented approach is often able to further delay non-determinism. The recursion also facilitates further optimisations such as backward chaining for the handling of nominal schemas [ 12 ].

Algorithm 1 absorbTBox

Input: T : a TBox, i.e., a set of GCIs C v D Output: T 0: a new TBox with absorbed axioms of T 1: T 0 ; 2: for all C v D 2 T do 3: S fA j A = absorb(C0; T 0); C0 2 PA(nnf(:C t D))g 4: A join(S ; T 0) 5: fD1; : : : ; Dmg notCA(nnf(:C t D)) 6: T 0 T 0 [ fA v D1 t : : : t Dmg 7: end for

Due to the recursion, the presented algorithm works quite di erently compared to the original binary absorption [ 7 ]. We exploit this for the integration of further improvements. The presented algorithm completely handles an axiom C v D in one step, i.e., the axiom is first internalised into > v :C tD and then the absorbable (sub-)concepts of :C t D are recursively handled. Each recursion step creates (simple) inclusion axioms for the handled (sub-)concepts such that corresponding marker concepts are implied, and the final marker concept (Tn 1) is used to imply the non-absorbable part of the axiom. In contrast, the original binary absorption algorithm introduces new GCIs during the absorption of an axiom, which are then (possibly) further absorbed in separate steps.

This complete handling of axioms allows for partially absorbing parts of axioms without creating additional disjunctions. Roughly speaking, this partial absorption also uses parts of non-absorbable concepts as conditions to delay the processing of the nonabsorbable part of the axiom. For example, the axiom > 5 r:A v D is, without absorption, handled as > v 6 4 r:A t D. None of the disjuncts can be absorbed completely, but it is nevertheless possible to delay the processing of the disjunction until there is an r-neighbour with the concept A in its label. Partial absorption is able to extract these conditions, and rewrites the axiom such that the disjunction is propagated from a node with A in its label to all r -neighbours (if there are any), which results in A v 8r :T1 and T1 v 6 4 r:A t D. Hence, partial absorption can significantly improve the original binary absorption for more expressive Description Logics.

We now make important terms for our partial absorption algorithm clearer. Definition 1. A concept C is completely absorbable if (i) C = :fag for a a nominal, (ii) C = :A for A an atomic concept, (iii) C = C1 t : : : tCn or C = C1 u : : : uCn and, for 1 i n, Ci is completely absorbable, or (iv) C = 8r:D and D is completely absorbable. Otherwise C is not completely absorbable. A concept C is partially absorbable if C is completely absorbable or (i) C = 8r:D, (ii) C = 6 n r:D, (iii) C = C1 t : : : t Cn and, for some i, 1 i n, Ci is partially absorbable, or (iv) C = C1 u : : : u Cn and, for each i, 1 i n, Ci is partially absorbable. For a disjunction C = C1 t : : : t Cn, we set notCA(C) :=fCi j 1 i n; Ci is not completely absorbableg

PA(C) :=fCi j 1 i

n; Ci is partially absorbableg: Note that if a concept C is not partially absorbable, then it is also not completely absorbable. In the following, we describe the absorption algorithm.

Algorithm 1 (absorbTBox) takes as input a TBox T and produces a TBox T 0 such that each axiom in T 0 has the form A v C, A1 u A2 v C, > v C, or fag v C, i.e., T 0 can be e ciently handled by the rules of Table 1 or as concept assertion. The method absorbTBox considers each axiom C v D 2 T as disjunction in negation normal form. Each partially absorbable disjunct from PA(nnf(:C t D)) is passed to the function absorb (Algorithm 2), which (recursively) performs the (binary) absorption and returns a single atomic concept that represents the disjunct. The concepts are then joined and used to imply the non-absorbable part of the axiom in Line 6. Please note that if all disjuncts of the axiom can be completely absorbed, then an empty disjunction is created in Line 6, which we consider as ?.

The method absorb is recursive with the base case of negated concept names (C = :A) or nominals (C = :fag). For the former, the algorithm directly returns the atomic concept A. For the latter, the algorithm returns the fresh concept T , which is axiomatised by fag v T in the TBox constructed during the absorption. For the recursion, partially absorbable subconcepts of C are first absorbed, which results in an atomic concept for each such subconcept. The resulting atomic concepts are then joined by Algorithm 3 (join), which implements the binary absorption introduced in Section 3 and which returns just the final concept (Tn 1).

Partial absorption also significantly delays non-determinism for axioms that use more expressive concept constructors. Consider, for example, a TBox T with 9r:(A u 8r:B1) v B2 (3) as the one and only axiom of the form C v D. Without absorption, it would be necessary to internalise this axiom into the form > v nnf(:C t D), which would result in > v 8r:(:A t 9r::B1) t B2. Obviously, this would produce a significant amount of non-determinism. Clearly, none of the parts of (3) can be absorbed completely, i.e., notCA(nnf(:C t D)) = f8r:(:At9r::B1); B2g. The concept 8r:(:At9r::B1), however, is partially absorbable, i.e., it is in PA(nnf(:C t D)). This allows for delaying the processing of the disjunctions until there is an r-neighbour with the concept A in its label. In order to capture this, the presented absorption algorithm rewrites the axiom such that the disjunction is propagated from a node with A in its label to all r -neighbours as follows: In Line 3 of absorbTBox, we call absorb(8r:(:At9r::B1); T 0) with T 0 = ;. The algorithm proceeds with Lines 7–10 and recursively calls itself for :A and T 0, which returns A, while 9r::B1 is not absorbable. The following call of join(fAg; T 0) directly returns A (no binarisation is needed). Finally, in Line 10, T 0 is extended with A v 8r :T for the fresh concept T and T is returned by the algorithm. Back in Algorithm 1, the call of join(fT g; T 0) again directly returns T . After extending T 0 in Line 6 of Algorithm 1, we have T 0 = fA v 8r :T; T v 8r:(:A t 9r::B1) t B2g.

It is clear that an existential restriction of the form 9r:C cannot be absorbed directly since it does not provide conditions that could be used to trigger other concepts. In contrast, the absorption can be extended to several concept constructors of more expressive DLs (see also Section 4). For example, the :9r:Self concept of SROIQ can be partially absorbed with > v 8r :A and A v :9r:Self.

Note that, in principle, it is not necessary to always create new axioms with fresh atomic concepts for the absorption of identical concepts. In practice, the binary absorption axioms as well as the axioms for absorbing specific concepts can be reused. Lemma 1. Let T be a TBox and T 0 the TBox produced by Algorithm 1 when given T as input, then T 0 is a model-conservative extension of T , i.e., (a) every model of T can be extended to a model of T 0 by adding suitable interpretations for fresh symbols in T 0 and (b) every model of T 0 is a model of T (T 0 j= T ).

A complete proof is available in an online technical report [ 16 ].

Proof (sketch). Note that by showing T 0 j= T , we do not need to require T T 0 as usual for (model) conservative extensions.

Let I be some model of T , C v D 2 T , E a completely (partially) absorbable disjunction that is an absorbed subconcept of nnf(:C t D), and TE the concept returned by absorb(E; T 0). For (a), we construct an interpretation J for T 0 that coincides with I apart from the interpretations of concepts that are fresh in T 0. To show J j= T 0, we establish (1) (:TE)J = EJ ((:TE)J EJ ). For showing (b), we first show (2) T 0 j= :TE v E.

The base case, where each disjunct of a subconcept E = E1 t: : :t En of nnf(:C t D) is either non-absorbable or of the form :A or :fag, is straightforward. For the induction, E might also contain other absorbable disjuncts. We just sketch the case of Ei = 6 n r:E0, 1 i n, where absorb(Ei; T 0) returns TEi and T:E0 v 8inv(r):TEi 2 T 0, which is equivalent to 9r:T:E0 v TEi . We set TEJi = (9r:T:E0 )J , so J j= T:E0 v 8inv(r):TEi . To show (1): we have (:TEi )J = (8r:(:T:E0 ))J , and, by induction, (8r:(:T:E0 ))J (6 n r:E0)J = EiJ . For (2): since T 0 j= 9r:T:E0 v TEi , T 0 j= :TEi v 8r:(:T:E0 ). By induction, T j 0 = :TEi v 6 n r:E0 = Ei as required.

0 = :T:E0 v E0, hence, T j

In absorbTBox, we eventually have a concept TE for the disjuncts E1; : : : ; En 2 PA(nnf(:C t D)) and TE v D1 t : : : t Dm 2 T 0. By induction, T 0 j= TE1 u : : : u TEn v TE and T 0 j= :TEi v Ei for 1 i n. Hence, T 0 j= > v E1 t : : : t En t D1 t : : : t Dm and, hence, T 0 j= > v nnf(:C t D). To show that J j= TE v D1 t : : : t Dm we show J j= > v :TE1 t : : : t :TEn t D1 t : : : t Dm since J j= TE1 u : : : u TEn v TE. W.l.o.g., let E1; : : : ; E` be the completely and E`+1; : : : ; En the partially, but not completely absorbable disjuncts of E. By induction, (:TEi )J = EiJ ((:TEi )J EiJ ), for each 1 i ` (` < i n). Since I j= T , I j= > v E1 t : : : t E` t D1 t : : : t Dm and, hence, J j= > v :TE1 t : : : t :TE` t D1 t : : : t Dm. Since, for ` < i n, Ei = D j for some 1 j m, J j= > v :TE1 t : : : t :TEn t D1 t : : : t Dm as required. 4

Extensions to the Absorption Algorithm

In this section, we sketch several extensions of the above presented absorption algorithm, which improve the handling of completely defined concepts, optimise the processing of disjunctions, and allow for absorbing axioms that contain data ranges. 4.1

Handling of Completely Defined Concepts

In the previous sections, we considered a TBox to be a set of GCIs of the form C v D. This is w.l.o.g. since C D can be rewritten to C v D and D v C. For definitions of the form A C with A an atomic concept (also called a completely defined concept), it is often ine cient to decompose the axiom A C into A v C and C v A, because nnf(:C) might not be completely absorbable and then the disjunction :C t A has to be processed for all nodes in the completion graph. In order to determine the satisfiability of a concept, it is for many nodes not relevant whether A or :A is in their label as long as it can be ensured that one of both alternatives is not causing a clash. Obviously, if the atomic concept A is only defined once in the knowledge base and A is acyclic,3 then C or :C and, therefore, also A or :A must be satisfiable and, only in this case, 3 Acyclicity is defined as follows: A1 directly uses A2 w.r.t. a TBox T if A1 C 2 T or A1 v C 2 T and A2 occurs in C; uses is the transitive closure of “directly uses”. Then, a concept D is acyclic w.r.t. a TBox T if it contains no concept A that uses itself. 1: for all A C 2 T do 2: TA absorb(PA(nnf(:C)); T 0) 3: if nnf(:C) is completely absorbable then 4: T 0 T 0 [ fA v C; TA v A; TA v A+g 5: else if jfC0 j A v C0 2 T or A C0 2 T gj > 1 then 6: T 0 T 0 [ fA v C; TA v nnf(:C t A); TA v A+g 7: else 8: T 0 T 0 [ fA C; TA v A+g 9: end if 10: end for the axiom A C can directly be handled by an additional rule, which unfolds A to C and :A to :C. To capture this, we extend the algorithm absorbTBox (cf. Figure 1) such that axioms of the from A C are only eliminated if there are several definitions of A. In such cases, A v C and C v A must be explicitly represented in T 0 such that possible interactions between these several definitions are handled. In addition, a concept of the form :A is now only completely (partially) absorbable if A is acyclic and, for each A C 2 T , nnf(:C) is completely (partially) absorbable. Note that the definition of completely and partially absorbable concepts now refers to the TBox. To correctly handle the completely defined concepts in the recursive absorption algorithm, absorb also has to be adapted for concepts of the form :A. The algorithm still returns A if there is no axiom of the form A C 2 T . Otherwise, for each A C, all absorbable disjuncts of nnf(:C) are recursively absorbed, the resulting atomic concepts are joined into a fresh concept TC, and TC v A+ is added to T 0 for a so-called candidate concept A+ for A. After processing all definitions for A, the candidate concept A+ is returned. Note that if nnf(:C) is not absorbable, then the absorption returns > and > v A+ is added to T 0. An occurrence of A+ in the label of a node signalises that the node might be an instance of the concept A, i.e., if A+ is not in the node label of a clash-free and fully expanded completion graph, then this node is also not an instance of A. In particular, if A+ is not in the label of a node, then we know that :A can be safely added since one disjunct of nnf(:C) is trivially satisfied. Hence, the generated candidate concept A+ can be used as condition for further absorption if :A occurs and, therefore, it can be used to further delay branching.

Such candidate concepts are also very useful for identifying completely defined concepts as possible subsumers for classification (cf. [ 2 ]) without forcing the decision between A and :A for the nodes in the completion graph. As a result, these candidate concepts can significantly improve the classification performance, especially in combination with model merging techniques [ 13,14 ], and, therefore, it is almost always useful to generate these candidate concepts even if they are not required for further absorption. The presented absorption algorithm can also be used to optimise the handling of ordinary disjunctions. For example, the axiom A v 9r:(:B t A) cannot be absorbed and contains the disjunction :B t A. Obviously, we can replace the disjunction with a fresh marker concept T by adding the additional axiom T v :B t A to the TBox, which can be absorbed to T u B v A. Consequently, we have the axioms A v 9r:T and T u B v A, which can be processed by the tableau algorithm without causing any non-determinism. Another advantage of the presented absorption algorithm is the straightforward extensibility to handle datatypes such as strings, integers, or decimals, which represent sets of concrete data values. In OWL, one can also use datatype restrictions to build custom datatypes, e.g., the datatype restriction real[int ^ >5] restricts OWL’s datatype owl:real, which we abbreviate to real, with the facets int and >5 to real numbers that are integers and greater than 5. We call int ^ >5 a facet expression. One can further use Boolean constructors to build data ranges, e.g., string _ real[int ^ >5] is a data range that contains all data values that are strings or reals as described above. Concepts refer to data ranges via concrete roles, e.g., the formula 9q:real[int ^ >5], for q a concrete role, describes individuals that have some value of the data range real[int ^ >5] as q-successor.

Elements that represent data values are represented in the tableau algorithm by concrete nodes. For example, for a node v with 9q:real[int ^ >5] 2 L(v), the tableau algorithm adds a new concrete node z and an edge hv; zi with q 2 L(hv; zi) and real[int ^ >5] 2 L(z). Typically, the tableau algorithm is then combined with a datatype checker that checks the consistency of datatype constraints [ 10 ]. Such constraints are represented as a set of assertions of the form d(z), :d(z), and z1 0 z2, where z, z1, and z2 are concrete nodes and d is an explicit enumeration of data values fc1; : : : ; cng or an expression of the form dt['] for dt a datatype and ' a facet expression. Given a set of such assertions, the datatype checker determines whether each concrete node can be assigned a data value in a way that satisfies all of the given assertions.

It is well-known that facets can be used to encode disjunctions [ 9 ]. For example, A v 8r:B1 t B2 can be expressed by the axioms

A v 9q:real[int ^ >5] (4) 9q:real[ 10 ] v 8r:B1 (5) 9q:real[>10] v B2; (6) where q is a concrete role. For the data range real[int ^ >5], it cannot directly be determined whether real[ 10 ] or its complement is satisfied. This has to be clarified in a case-by-case analysis, where once the integer interval int(5; 10] and once the interval int(10; 1] is considered. Hence, the datatype checker is forced to make nondeterministic decisions. This also means that it is usually not possible to (completely) absorb data ranges since it is not easily possible to determine if a certain data range is satisfied or not. Of course, it is possible to absorb other parts of the axioms, e.g., Axiom (5) can be absorbed to

> v 8r :T T v 8q:real[ 10 ] t 8r:B1; where real[ 10 ] expresses the negation of real[ 10 ]. However, without any further optimisations, Axiom (6) introduces non-determinism for every node as it is handled as > v 8q:real[>10] t B2:

For partial absorption it is only required that the tableau algorithm or the datatype checker can identify those concepts and data ranges that might be satisfied. This is obviously possible for many data ranges and, therefore, they can be absorbed partially. For this, the datatype checker has to be extended such that it identifies data range candidates, i.e., data ranges that are possibly satisfied. Let us assume that real[ 10 ]+ is the data range candidate for real[ 10 ], i.e., if a concrete node in the completion graph possibly represents numerical data values that are less or equal to 10, then real[ 10 ]+ is satisfied and this can be identified e ciently by the datatype checker. Consequently, we can use such data range candidates in lazy unfolding rules, e.g., real[ 10 ]+ v T is used to trigger the marker concept T if real[ 10 ] is possibly satisfied. Note that this adds ordinary concepts to the labels of concrete nodes, which is, however, not problematic since these concepts are only used internally as auxiliary constructs. With the data range candidates, we can improve the absorption of Axiom (5) to

real[ 10 ]+ v T1 T1 v 8q :T2 > v 8r :T3 T2 u T3 v 8q:real[ 10 ] t 8r:B1 and we can absorb Axiom (6) to

real[>10]+ v T4 T4 v 8q :T5 T5 v 8q:real[>10] t B2: Now, the processing of the disjunctions is delayed until there are concrete nodes that possibly represent the absorbed data ranges.

The required extensions for the absorption algorithm and the datatype checker are straightforward. For the absorption, the definition of partial absorbability has to be extended to the supported data ranges and the concepts related to datatypes. Moreover, the absorb function has to be adjusted such that the corresponding axioms are created. The datatype checker has to be extended to identify those data range candidates that might be satisfied. Usually, the datatype checker manages the data intervals that are still possible for a concrete node in a sorted array [ 10 ]. By counting and iterating through the possible values, it is easily possible to identify those data range candidates that are satisfied. For example, the possible values of a concrete node z are restricted to the integer interval int(5; 1] if the data range real[int ^ >5] is asserted to z. If z is not distinct to other concrete nodes (i.e., there is no assertion z 0 z0), then the datatype checker can pick the first possible data value of 6 to satisfy the constraints for z. Hence, the datatype checker has to trigger only those data range candidates that are satisfied for this first data value (i.e., real[ 10 ]+). If z is distinct to 5 other concrete nodes, then real[>10]+ is also triggered (although it is not necessarily the case that indeed 6 data values are required to assign all (distinct) concrete nodes a data value). 5

Comparison with Related Absorption Techniques

In comparison to other absorption techniques, partial absorption often delays nondeterminism further. Consider again Axiom (3) from Section 3, which is rewritten by the original binary absorption algorithm into the following axioms:4

A v 9r::B1 t 8r :T

T v B2: In contrast, partial absorption creates the following axioms (as shown in Section 3): A v 8r :T

T v 8r:(:A t 9r::B1) t B2: 4 The binary absorption algorithm presented in [ 7 ] has a looping bug for concepts of the form 8r:C, which can, however, easily be repaired. We simply ignored the repeated absorption of concepts that are introduced by the algorithm.

Hence, with the original binary absorption technique, the disjunction has already to be processed when A is added to the label of a node, whereas partial absorption delays the processing until there is also an r-neighbour. It is also well-known that clausification for the hypertableau algorithm reduces many cases of non-determinism [ 11 ]. For Axiom (3), however, clausification creates the following DL-clauses:

r(x; y) ! B2(x) _ T (y) T (x) ^ A(x) ! 9r::B1(x) Hence, the hypertableau algorithm has to choose between B2 and T for every node with an r-neighbour, whereas the partial absorption technique allows for further delaying the processing of disjunctions until the node also has the concept A in its label. Besides that, the presented partial absorption technique shares other interesting features with clausification of the hypertableau algorithm. In particular, both use conditions of more expressive concept constructors to trigger the processing of disjunctions. For instance, the axiom > 2 r:A v B is automatically satisfied if there is no r-successor with the concept A in its label. This is utilised by partial absorption, which generates

A v 8r :T T v 6 1 r:A t B and also by clausification of the hypertableau calculus, which generates: r(x; y1) ^ A(y1) ^ r(x; y2) ^ A(y2) ! B(x) _ y1 y2 Note that clausification eliminates at-most cardinality restrictions, whereas partial absorption has to use the concept 6 1 r:A since > 2 r:A is not completely absorbable. Consequently, a choose-rule is not required in the hypertableau algorithm, which possibly further reduces non-determinism in comparison to a standard tableau algorithm that has to choose between :A and A for all r-neighbours of nodes with T in their label.

By using a recursive algorithm that absorbs GCIs completely in one step, we gain a further advantage in comparison to traditional absorption algorithms. Traditional algorithms absorb parts of axioms in separate steps, e.g., by introducing new GCIs that are handled separately, which often creates additional disjunctions, especially if the axioms have more complex (sub-)concepts. For instance, by simplifying the axiom > 5 r:(> 3 s:A) v D into > 5 r:T v D and T > 3 s:A, for T a fresh atomic concept that is used to split the original axiom, the GCI T > 3 s:A cannot further be absorbed without creating additional disjunctions. In contrast, our recursive absorption algorithm creates the axioms A v 8s :T1, T1 v 8r :T2, and T2 v 6 4 r:(> 3 s:A) t D, which delays the processing of the disjunction significantly (or avoids it completely if there is no node in the completion graph for which T2 is added to its label). 6

Conclusions

In this paper, we have presented the partial absorption algorithm used in the reasoning system Konclude together with several extensions. The candidate concepts can, for example, be combined with model merging techniques to reduce Konclude’s classification time for several thousand ontologies by 60 % on average [ 13,14 ], which allows Konclude to often outperform other state-of-the-art reasoners. The presented absorption techniques are also essential for further optimisations such as an e cient handling of nominal schemas [ 12 ] or the combination of tableau with saturation-based procedures [ 13,14 ], where the reduced non-determinism is crucial.

1. Baader , F. , Calvanese , D. , McGuinness , D. , Nardi , D. , Patel-Schneider , P . (eds.): The Description Logic Handbook: Theory, Implementation, and Applications . Cambridge University Press, second edn. ( 2007 )

2. Glimm , B. , Horrocks , I. , Motik , B. , Shearer , R. , Stoilos , G.: A novel approach to ontology classification . J. of Web Semantics: Science, Services and Agents on the World Wide Web 14 , 84 - 101 ( July 2012 )

3. Horrocks , I. , Kutz , O. , Sattler , U. : The even more irresistible SROIQ . In: Proc. 10th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR'06) . pp. 57 - 67 . AAAI Press ( 2006 )

4. Horrocks , I. , Patel-Schneider , P.F. : Optimizing description logic subsumption . J. of Logic and Computation 9 ( 3 ), 267 - 293 ( 1999 )

5. Horrocks , I. , Sattler , U. : A description logic with transitive and inverse roles and role hierarchies . J. of of Logic and Computation 9 ( 3 ), 385 - 410 ( 1999 )

6. Horrocks , I. , Sattler , U. : A tableau decision procedure for SH OIQ . J. of Automated Resoning 39 ( 3 ), 249 - 276 ( 2007 )

7. Hudek , A.K. , Weddell , G.E. : Binary absorption in tableaux-based reasoning for description logics . In: Proc. 19th Int. Workshop on Description Logics (DL'06) . vol. 189 . CEUR ( 2006 )

8. Krötzsch , M. , Maier , F. , Krisnadhi , A. , Hitzler , P.: A better uncle for OWL: nominal schemas for integrating rules and ontologies . In: Proc. 20th Int. Conf. on World Wide Web (WWW'11) . pp. 645 - 654 . ACM ( 2011 )

9. Magka , D. , Kazakov , Y. , Horrocks , I. : Tractable extensions of the description logic EL with numerical datatypes . J. of Automated Reasoning 47 ( 4 ), 427 - 450 ( 2011 )

10. Motik , B. , Horrocks , I. : OWL datatypes: Design and implementation . In: Proc. 7th Int. Semantic Web Conf. (ISWC'08) . LNCS, vol. 5318 , pp. 307 - 322 . Springer ( 2008 )

11. Motik , B. , Shearer , R. , Horrocks , I. : Hypertableau reasoning for description logics . J. of Artificial Intelligence Research 36 , 165 - 228 ( 2009 )

12. Steigmiller , A. , Glimm , B. , Liebig , T. : Nominal schema absorption . In: Proc. 23rd Int. Joint Conf. on Artificial Intelligence (IJCAI'13) . pp. 1104 - 1110 . AAAI Press/The MIT Press ( 2013 )

13. Steigmiller , A. , Glimm , B. , Liebig , T. : Coupling tableau algorithms for expressive description logics with completion-based saturation procedures . In: Proc. 7th Int. Joint Conf. on Automated Reasoning (IJCAR'14) . LNCS, Springer ( 2014 ), accepted

14. Steigmiller , A. , Glimm , B. , Liebig , T. : Coupling tableau algorithms for the DL SROIQ with completion-based saturation procedures . Tech. Rep. UIB-2014-02 , University of Ulm, Ulm, Germany ( 2014 ), available online at http://www.uni-ulm.de/fileadmin/website_ uni_ulm/iui/Ulmer_Informatik_Berichte/2014/UIB-2014-02.pdf

15. Steigmiller , A. , Liebig , T. , Glimm , B. : Konclude: system description . J. of Web Semantics ( 2014 ), accepted

16. Steigmiller , A. , Liebig , T. , Glimm , B. : Optimised absorption for expressive description logics . Tech. rep. , Ulm University, Ulm, Germany ( 2014 ), available online at https://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst.090/ Publikationen/2014/StGL14c.pdf

17. Tsarkov , D. , Horrocks , I.: E cient reasoning with range and domain constraints . In: Proc. 17th Int. Workshop on Description Logics (DL'04) . vol. 104 . CEUR ( 2004 )

18. Tsarkov , D. , Horrocks , I. , Patel-Schneider , P.F. : Optimizing terminological reasoning for expressive description logics . J. of Automated Reasoning 39 , 277 - 316 ( 2007 )

19. W3C OWL Working Group: OWL 2 Web Ontology Language: Document Overview . W3C Recommendation (27 October 2009 ), available at http://www.w3.org/TR/ owl2-overview/