=Paper=
{{Paper
|id=Vol-1193/paper_49
|storemode=property
|title=Optimised Absorption for Expressive Description Logics
|pdfUrl=https://ceur-ws.org/Vol-1193/paper_49.pdf
|volume=Vol-1193
|dblpUrl=https://dblp.org/rec/conf/dlog/SteigmillerGL14
}}
==Optimised Absorption for Expressive Description Logics==
<pdf width="1500px">https://ceur-ws.org/Vol-1193/paper_49.pdf</pdf>
<pre>
Optimised Absorption for Expressive Description Logics

               Andreas Steigmiller1 , Birte Glimm1 , and Thorsten Liebig2
        1
            University of Ulm, Ulm, Germany, <first name>.<last name>@uni-ulm.de
                       2
                         derivo GmbH, Ulm, Germany, liebig@derivo.de


1   Introduction

The Web Ontology Language (OWL 2) [19] is based on the very expressive Descrip-
tion Logic (DL) SROIQ [3] for which sophisticated algorithms are required to handle
(standard) reasoning tasks. In practice, variants of tableau algorithms are often used
since they are easily extensible and adaptable. Since such algorithms have a very high
worst-case complexity, developing optimisations to nevertheless allow for highly effi-
cient implementations is a long-standing research area in DLs (see, e.g., [4,18]).
    A very effective and widely implemented optimisation is absorption, which is a pre-
processing step that aims at rewriting general concept inclusion (GCI) axioms such that
non-determinism in the tableau algorithm is avoided as much as possible. In this paper,
we present an improved variant of a recursive binary absorption algorithm that gen-
eralises the well-known techniques of binary absorption [7] and role absorption [17].
The algorithm also allows for absorbing parts of concepts and, as a result, more expres-
sive concept constructors can be handled and non-determinism can often be delayed
further. The algorithm has already been introduced in our previous work [12] as an
essential pre-requisite for handling nominal schemas [8]. Here we simplify its presen-
tation, prove the correctness, present several extensions for the algorithm, and provide
a comparison with other absorption techniques. The algorithm is implemented in the
tableau-based reasoner Konclude [15] and is essential for Konclude’s efficiency. In fact,
many other optimisation techniques in Konclude are based on the presented absorption
and they significantly benefit from the reduced or eliminated non-determinism [13,14].
    We next introduce the basics of tableau algorithms and absorption. In Section 3, we
introduce our recursive absorption algorithm for which we present several extension in
Section 4. We discuss related work in Section 5 before we conclude in Section 6.


2   Preliminaries

For brevity, we do not introduce DLs (see, e.g., [1,3]) and we only give a short intro-
duction to tableau algorithms (for details see, e.g., [3]). Throughout the paper, we use
(possibly with subscripts) C, D for (possibly complex) concepts, A, B for atomic con-
cepts, T for a fresh atomic concept, r for a role, and S for a set of atomic concepts. For
simplicity, we also consider > as an atomic concept. A SROIQ knowledge base K is
the union of a TBox T consisting of GCIs of the form C v D, a role hierarchy R, and
an ABox A. We write nnf(C) to denote the negation normal form of C. For r a role, we
set inv(r) := r− and inv(r− ) := r.
                     Table 1. Lazy unfolding rules for tableau algorithms

Rv1 if A ∈ L(v), A v C ∈ K, C < L(v), and v is not indirectly blocked
    then L(v) = L(v) ∪ {C}
Rv2 if {A1 , A2 } ⊆ L(v), (A1 u A2 ) v C ∈ K, C < L(v), and v is not indirectly blocked
    then L(v) = L(v) ∪ {C}


     Model construction calculi, such as tableau [3,6], decide the consistency of a knowl-
edge base K by trying to construct an abstraction of a model for K, a so-called comple-
tion graph. A completion graph G is a tuple (V, E, L, ,̇), where each node x ∈ V (edge
hx, yi ∈ E) represents one or more (pairs of) individuals. Each node x (edge hx, yi) is
labelled with a set of concepts (roles), L(x) (L(hx, yi)), which the (pairs of) individu-
als represented by x (hx, yi) are instances of. The relation ,̇ records inequalities, which
must hold between nodes, e.g., due to at-least cardinality restrictions. The algorithm
works by decomposing concepts in the completion graph with a set of expansion rules.
For example, if C1 t C2 ∈ L(v) for some node v, but neither C1 ∈ L(v) nor C2 ∈ L(v),
then the rule for handling disjunctions non-deterministically adds one of the disjuncts
to L(v). Similarly, if ∃r.C ∈ L(v), but v does not have an r-successor with C in its la-
bel, then the algorithm expands the completion graph by adding the required successor
node. Unrestricted application of this rule or the rule that handles at-least cardinality
restrictions can lead to the introduction of infinitely many new tableau nodes. To nev-
ertheless guarantee termination, a cycle detection technique called (pairwise) blocking
[5] restricts the application of these rules. The rules are repeatedly applied until either
the completion graph is fully expanded (no more rules are applicable), in which case
the graph can be used to construct a model that witnesses the consistency of K, or an
obvious contradiction (called a clash) is discovered (e.g., both C and ¬C in a node
label), proving that the completion graph does not correspond to a model. The input
knowledge base K is consistent if the rules can be applied such that they build a fully
expanded and clash-free completion graph.
     In order to guarantee that each node of the completion graph indeed satisfies all
axioms of the TBox, one can “internalise” the TBox into a concept that is added to each
node label. For example, if the TBox contains the axioms
              A v ∃r.(B1 u B2 )             (1)        A u B1 u ∃r.(B1 t B2 ) v ∃s.A      (2)
the internalisation contains one conjunct for each axiom that is a disjunction with the
negated left-hand side of the axiom and the right-hand side: nnf( (¬A t ∃r.(B1 u B2 )) u
(¬(A u B1 u ∃r.(B1 t B2 )) t ∃s.A) ). Given that > is satisfied at each node, a tableau
rule can be specified that adds such an internalised concept, say C I , with an auxiliary
axiom of the form > v C I to each node label. Clearly, internalisation introduces a
large number of disjunctions in each node label, which possibly require several non-
deterministic choices and backtracking if the choice resulted in a clash.

2.1   Absorption
Instead of internalising the TBox axioms, it is more efficient to integrate a rule into the
tableau calculus that checks, for each GCI C v D, whether C is satisfied for a node and
if this is the case, then D is also added to the node label. For Axiom (1), for example, it
is easy to check whether the atomic concept A is satisfied at a node. This lazy unfolding
for atomic concepts can be realised by additionally using Rv1 of Table 1 in the tableau
algorithm. With this rule, we do not have to internalise axioms of the form A v C.
     Checking whether a complex left-hand side of an axiom is satisfied can, however,
be non-trivial. For example, it can often not be verified syntactically, whether a node
satisfies ∃r.(B1 t B2 ), which would be required for Axiom (2). For example, with Ax-
ioms (1) and (2), any instance of A has an r-successor that satisfies B1 u B2 and, there-
fore, this A instance (semantically) also satisfies ∃r.(B1 t B2 ) (but possibly not B1 ). In
order to nevertheless avoid the internalisation of such GCIs into axioms of the form
> v nnf(¬C t D) (and the resulting processing of disjunctions), practical reasoning
systems use elaborate transformations in a preprocessing step called absorption.
     An absorption algorithm rewrites axioms in such a way that internalisation and the
processing of disjunctions in the tableau algorithm are avoided as much as possible.
To achieve this, the absorption algorithm extracts conditions for an axiom that can
easily be checked, and only if the conditions hold for a node in a completion graph,
then the remaining (complex) part of the axiom has to be added to the node’s label,
while otherwise the axiom is trivially satisfied. For example, consider the internalisa-
tion > v ¬A t ¬B1 t ∀r.(¬B1 u ¬B2 ) t ∃s.A of Axiom (2). We can observe that any
node in a tableau that does not have an r-neighbour trivially satisfies ∀r.(¬B1 u¬B2 ) and,
hence, the overall disjunction. Furthermore, even if there is an r-successor, this is only
problematic, if the successor satisfies B1 or B2 . To check this, we can introduce axioms
that make sure that a fresh marker concept T 1 is added to each node label that contains
B1 or B2 , which then triggers the propagation of another marker T 2 to the r-predecessor
to indicate that one of the other disjuncts ¬A, ¬B1 , or ∃s.A has to be satisfied. Since it
can also easily be checked whether A or B1 is satisfied, we can use the axioms:
          B1 v T 1        B2 v T 1         T 1 v ∀r− .T 2      A u B1 u T 2 v ∃s.A
to express Axiom (2) without any disjunction. We say that ¬A, ¬B1 , and ∀r.(¬B1 u¬B2 )
are completely absorbable since they can be moved out of the disjunction, whereas ∃s.A
is non-absorbable, i.e., it cannot easily be checked when the concept is satisfied and,
hence, the concept has to remain on the right-hand side of the axiom.
     Note that the last axiom cannot be handled by the lazy unfolding rule Rv1 since its
left-hand side consists of a conjunction of atomic concepts. Instead of adding a rule that
processes axioms of the form A1 u. . .uAn v C with Ai atomic concepts, an efficient way
of handling such axioms is binary absorption [7]. Binary absorption splits such axioms
such that A1 and A2 imply a fresh concept T 1 (A1 u A2 v T 1 ), then T 1 is combined with
the next condition A3 and so on, until T n−2 u An v T n−1 . The concept T n−1 then implies
the concept C. Hence, rule Rv2 in Table 1 is sufficient for handling such binary axioms.


3   A Recursive Algorithm for Partial Absorption
We next present an improved variant of a recursive binary absorption algorithm. In
comparison to other absorption techniques, the presented approach is often able to fur-
ther delay non-determinism. The recursion also facilitates further optimisations such as
backward chaining for the handling of nominal schemas [12].
Algorithm 1 absorbTBox
Input: T : a TBox, i.e., a set of GCIs C v D
Output: T 0 : a new TBox with absorbed axioms of T
 1: T 0 ← ∅
 2: for all C v D ∈ T do
 3:     S ← {A | A = absorb(C 0 , T 0 ), C 0 ∈ PA(nnf(¬C t D))}
 4:     A ← join(S , T 0 )
 5:     {D1 , . . . , Dm } ← notCA(nnf(¬C t D))
 6:     T 0 ← T 0 ∪ {A v D1 t . . . t Dm }
 7: end for


    Due to the recursion, the presented algorithm works quite differently compared to
the original binary absorption [7]. We exploit this for the integration of further improve-
ments. The presented algorithm completely handles an axiom C v D in one step, i.e.,
the axiom is first internalised into > v ¬CtD and then the absorbable (sub-)concepts of
¬C t D are recursively handled. Each recursion step creates (simple) inclusion axioms
for the handled (sub-)concepts such that corresponding marker concepts are implied,
and the final marker concept (T n−1 ) is used to imply the non-absorbable part of the ax-
iom. In contrast, the original binary absorption algorithm introduces new GCIs during
the absorption of an axiom, which are then (possibly) further absorbed in separate steps.
    This complete handling of axioms allows for partially absorbing parts of axioms
without creating additional disjunctions. Roughly speaking, this partial absorption also
uses parts of non-absorbable concepts as conditions to delay the processing of the non-
absorbable part of the axiom. For example, the axiom > 5 r.A v D is, without absorp-
tion, handled as > v 6 4 r.A t D. None of the disjuncts can be absorbed completely,
but it is nevertheless possible to delay the processing of the disjunction until there is an
r-neighbour with the concept A in its label. Partial absorption is able to extract these
conditions, and rewrites the axiom such that the disjunction is propagated from a node
with A in its label to all r− -neighbours (if there are any), which results in A v ∀r− .T 1
and T 1 v 6 4 r.A t D. Hence, partial absorption can significantly improve the original
binary absorption for more expressive Description Logics.
    We now make important terms for our partial absorption algorithm clearer.

Definition 1. A concept C is completely absorbable if (i) C = ¬{a} for a a nominal, (ii)
C = ¬A for A an atomic concept, (iii) C = C1 t. . .tCn or C = C1 u. . .uCn and, for 1 ≤
i ≤ n, Ci is completely absorbable, or (iv) C = ∀r.D and D is completely absorbable.
Otherwise C is not completely absorbable. A concept C is partially absorbable if C is
completely absorbable or (i) C = ∀r.D, (ii) C = 6 n r.D, (iii) C = C1 t . . . t Cn and,
for some i, 1 ≤ i ≤ n, Ci is partially absorbable, or (iv) C = C1 u . . . u Cn and, for each
i, 1 ≤ i ≤ n, Ci is partially absorbable. For a disjunction C = C1 t . . . t Cn , we set
              notCA(C) :={Ci | 1 ≤ i ≤ n, Ci is not completely absorbable}
                  PA(C) :={Ci | 1 ≤ i ≤ n, Ci is partially absorbable}.

Note that if a concept C is not partially absorbable, then it is also not completely ab-
sorbable. In the following, we describe the absorption algorithm.
Algorithm 2 absorb(C, T 0 )
Input: A partially absorbable concept C and a TBox T 0 , which is modified via side-effects
Output: Returns the atomic concept for the absorption of C
 1: if C = ¬A then
 2:     return A
 3: end if
 4: T ← fresh atomic concept
 5: if C = ¬{a} then
 6:     T 0 ← T 0 ∪ {{a} v T }
 7: else if C = ∀r.D then
 8:     S ← {A | A = absorb(D0 , T 0 ), D0 ∈ PA(D)}
 9:     A ← join(S , T 0 )
10:     T 0 ← T 0 ∪ {A v ∀ inv(r).T }
11: else if C = 6 n r.D then
12:     S ← {A | A = absorb(D0 , T 0 ), D0 ∈ PA(nnf(¬D))}
13:     A ← join(S , T 0 )
14:     T 0 ← T 0 ∪ {A v ∀ inv(r).T }
15: else if C = C1 u . . . u Cn then
16:     S C ← { {A | A = absorb(D, T 0 ), D ∈ PA(Ci )} | 1 ≤ i ≤ n}
17:     S ← {A | A = join(S i , T 0 ), S i ∈ S C }
18:     T 0 ← T 0 ∪ {A v T | A ∈ S }
19: end if
20: return T


Algorithm 3 join(S , T 0 )
Input: A set S of atomic concepts and a TBox T 0 , which is modified via side-effects
Output: Returns an atomic concept that is implied by the join of the concepts in S
 1: while A1 , A2 ∈ S and A1 , A2 do
 2:     T ← fresh atomic concept
 3:     T 0 ← T 0 ∪ {A1 u A2 v T }
 4:     S ← (S ∪ {T }) \ {A1 , A2 }
 5: end while
 6: if S = ∅ then return >
 7: else return the element A ∈ S                                             . S is a singleton
 8: end if


    Algorithm 1 (absorbTBox) takes as input a TBox T and produces a TBox T 0 such
that each axiom in T 0 has the form A v C, A1 u A2 v C, > v C, or {a} v C, i.e., T 0
can be efficiently handled by the rules of Table 1 or as concept assertion. The method
absorbTBox considers each axiom C v D ∈ T as disjunction in negation normal
form. Each partially absorbable disjunct from PA(nnf(¬C t D)) is passed to the function
absorb (Algorithm 2), which (recursively) performs the (binary) absorption and returns
a single atomic concept that represents the disjunct. The concepts are then joined and
used to imply the non-absorbable part of the axiom in Line 6. Please note that if all
disjuncts of the axiom can be completely absorbed, then an empty disjunction is created
in Line 6, which we consider as ⊥.
    The method absorb is recursive with the base case of negated concept names (C =
¬A) or nominals (C = ¬{a}). For the former, the algorithm directly returns the atomic
concept A. For the latter, the algorithm returns the fresh concept T , which is axiomatised
by {a} v T in the TBox constructed during the absorption. For the recursion, partially
absorbable subconcepts of C are first absorbed, which results in an atomic concept for
each such subconcept. The resulting atomic concepts are then joined by Algorithm 3
(join), which implements the binary absorption introduced in Section 3 and which re-
turns just the final concept (T n−1 ).
    Partial absorption also significantly delays non-determinism for axioms that use
more expressive concept constructors. Consider, for example, a TBox T with
                                      ∃r.(A u ∀r.B1 ) v B2                               (3)
as the one and only axiom of the form C v D. Without absorption, it would be nec-
essary to internalise this axiom into the form > v nnf(¬C t D), which would result
in > v ∀r.(¬A t ∃r.¬B1 ) t B2 . Obviously, this would produce a significant amount
of non-determinism. Clearly, none of the parts of (3) can be absorbed completely, i.e.,
notCA(nnf(¬C t D)) = {∀r.(¬At∃r.¬B1 ), B2 }. The concept ∀r.(¬At∃r.¬B1 ), however,
is partially absorbable, i.e., it is in PA(nnf(¬C t D)). This allows for delaying the pro-
cessing of the disjunctions until there is an r-neighbour with the concept A in its label.
In order to capture this, the presented absorption algorithm rewrites the axiom such that
the disjunction is propagated from a node with A in its label to all r− -neighbours as fol-
lows: In Line 3 of absorbTBox, we call absorb(∀r.(¬At∃r.¬B1 ), T 0 ) with T 0 = ∅. The
algorithm proceeds with Lines 7–10 and recursively calls itself for ¬A and T 0 , which
returns A, while ∃r.¬B1 is not absorbable. The following call of join({A}, T 0 ) directly
returns A (no binarisation is needed). Finally, in Line 10, T 0 is extended with A v ∀r− .T
for the fresh concept T and T is returned by the algorithm. Back in Algorithm 1, the call
of join({T }, T 0 ) again directly returns T . After extending T 0 in Line 6 of Algorithm 1,
we have T 0 = {A v ∀r− .T, T v ∀r.(¬A t ∃r.¬B1 ) t B2 }.
    It is clear that an existential restriction of the form ∃r.C cannot be absorbed directly
since it does not provide conditions that could be used to trigger other concepts. In con-
trast, the absorption can be extended to several concept constructors of more expressive
DLs (see also Section 4). For example, the ¬∃r.Self concept of SROIQ can be partially
absorbed with > v ∀r− .A and A v ¬∃r.Self.
    Note that, in principle, it is not necessary to always create new axioms with fresh
atomic concepts for the absorption of identical concepts. In practice, the binary absorp-
tion axioms as well as the axioms for absorbing specific concepts can be reused.
Lemma 1. Let T be a TBox and T 0 the TBox produced by Algorithm 1 when given T
as input, then T 0 is a model-conservative extension of T , i.e., (a) every model of T
can be extended to a model of T 0 by adding suitable interpretations for fresh symbols
in T 0 and (b) every model of T 0 is a model of T (T 0 |= T ).
   A complete proof is available in an online technical report [16].
Proof (sketch). Note that by showing T 0 |= T , we do not need to require T ⊆ T 0 as
usual for (model) conservative extensions.
    Let I be some model of T , C v D ∈ T , E a completely (partially) absorbable
disjunction that is an absorbed subconcept of nnf(¬C t D), and T E the concept returned
by absorb(E, T 0 ). For (a), we construct an interpretation J for T 0 that coincides with
I apart from the interpretations of concepts that are fresh in T 0 . To show J |= T 0 ,
we establish (1) (¬T E )J = E J ((¬T E )J ⊆ E J ). For showing (b), we first show (2)
T 0 |= ¬T E v E.
     The base case, where each disjunct of a subconcept E = E1 t. . .tEn of nnf(¬C tD)
is either non-absorbable or of the form ¬A or ¬{a}, is straightforward. For the induction,
E might also contain other absorbable disjuncts. We just sketch the case of Ei = 6
n r.E 0 , 1 ≤ i ≤ n, where absorb(Ei , T 0 ) returns T Ei and T ¬E 0 v ∀inv(r).T Ei ∈ T 0 , which
is equivalent to ∃r.T ¬E 0 v T Ei . We set T EJi = (∃r.T ¬E 0 )J , so J |= T ¬E 0 v ∀inv(r).T Ei .
To show (1): we have (¬T Ei )J = (∀r.(¬T ¬E 0 ))J , and, by induction, (∀r.(¬T ¬E 0 ))J ⊆
(6 n r.E 0 )J = EiJ . For (2): since T 0 |= ∃r.T ¬E 0 v T Ei , T 0 |= ¬T Ei v ∀r.(¬T ¬E 0 ). By
induction, T 0 |= ¬T ¬E 0 v E 0 , hence, T 0 |= ¬T Ei v 6 n r.E 0 = Ei as required.
     In absorbTBox, we eventually have a concept T E for the disjuncts E1 , . . . , En ∈
PA(nnf(¬C t D)) and T E v D1 t . . . t Dm ∈ T 0. By induction, T 0 |= T E1 u . . . u T En v T E
and T 0 |= ¬T Ei v Ei for 1 ≤ i ≤ n. Hence, T 0 |= > v E1 t . . . t En t D1 t . . . t Dm
and, hence, T 0 |= > v nnf(¬C t D). To show that J |= T E v D1 t . . . t Dm we
show J |= > v ¬T E1 t . . . t ¬T En t D1 t . . . t Dm since J |= T E1 u . . . u T En v
T E . W.l.o.g., let E1 , . . . , E` be the completely and E`+1 , . . . , En the partially, but not
completely absorbable disjuncts of E. By induction, (¬T Ei )J = EiJ ((¬T Ei )J ⊆ EiJ ),
for each 1 ≤ i ≤ ` (` < i ≤ n). Since I |= T , I |= > v E1 t . . . t E` t D1 t . . . t Dm
and, hence, J |= > v ¬T E1 t . . . t ¬T E` t D1 t . . . t Dm . Since, for ` < i ≤ n, Ei = D j
for some 1 ≤ j ≤ m, J |= > v ¬T E1 t . . . t ¬T En t D1 t . . . t Dm as required.


4      Extensions to the Absorption Algorithm

In this section, we sketch several extensions of the above presented absorption algo-
rithm, which improve the handling of completely defined concepts, optimise the pro-
cessing of disjunctions, and allow for absorbing axioms that contain data ranges.


4.1     Handling of Completely Defined Concepts

In the previous sections, we considered a TBox to be a set of GCIs of the form C v D.
This is w.l.o.g. since C ≡ D can be rewritten to C v D and D v C. For definitions of
the form A ≡ C with A an atomic concept (also called a completely defined concept),
it is often inefficient to decompose the axiom A ≡ C into A v C and C v A, because
nnf(¬C) might not be completely absorbable and then the disjunction ¬C t A has to be
processed for all nodes in the completion graph. In order to determine the satisfiability
of a concept, it is for many nodes not relevant whether A or ¬A is in their label as long
as it can be ensured that one of both alternatives is not causing a clash. Obviously, if
the atomic concept A is only defined once in the knowledge base and A is acyclic,3
then C or ¬C and, therefore, also A or ¬A must be satisfiable and, only in this case,
 3
     Acyclicity is defined as follows: A1 directly uses A2 w.r.t. a TBox T if A1 ≡ C ∈ T or
     A1 v C ∈ T and A2 occurs in C; uses is the transitive closure of “directly uses”. Then, a
     concept D is acyclic w.r.t. a TBox T if it contains no concept A that uses itself.
 1: for all A ≡ C ∈ T do
 2:     T A ← absorb(PA(nnf(¬C)), T 0 )
 3:     if nnf(¬C) is completely absorbable then
 4:         T 0 ← T 0 ∪ {A v C, T A v A, T A v A+ }
 5:     else if |{C 0 | A v C 0 ∈ T or A ≡ C 0 ∈ T }| > 1 then
 6:         T 0 ← T 0 ∪ {A v C, T A v nnf(¬C t A), T A v A+ }
 7:     else
 8:         T 0 ← T 0 ∪ {A ≡ C, T A v A+ }
 9:     end if
10: end for

 Fig. 1. Extension for Algorithm 1 for completely defined concepts (to be inserted after Line 1)


the axiom A ≡ C can directly be handled by an additional rule, which unfolds A to
C and ¬A to ¬C. To capture this, we extend the algorithm absorbTBox (cf. Figure 1)
such that axioms of the from A ≡ C are only eliminated if there are several definitions
of A. In such cases, A v C and C v A must be explicitly represented in T 0 such
that possible interactions between these several definitions are handled. In addition, a
concept of the form ¬A is now only completely (partially) absorbable if A is acyclic
and, for each A ≡ C ∈ T , nnf(¬C) is completely (partially) absorbable. Note that the
definition of completely and partially absorbable concepts now refers to the TBox. To
correctly handle the completely defined concepts in the recursive absorption algorithm,
absorb also has to be adapted for concepts of the form ¬A. The algorithm still returns A
if there is no axiom of the form A ≡ C ∈ T . Otherwise, for each A ≡ C, all absorbable
disjuncts of nnf(¬C) are recursively absorbed, the resulting atomic concepts are joined
into a fresh concept TC , and TC v A+ is added to T 0 for a so-called candidate concept
A+ for A. After processing all definitions for A, the candidate concept A+ is returned.
Note that if nnf(¬C) is not absorbable, then the absorption returns > and > v A+ is
added to T 0 . An occurrence of A+ in the label of a node signalises that the node might
be an instance of the concept A, i.e., if A+ is not in the node label of a clash-free and fully
expanded completion graph, then this node is also not an instance of A. In particular, if
A+ is not in the label of a node, then we know that ¬A can be safely added since one
disjunct of nnf(¬C) is trivially satisfied. Hence, the generated candidate concept A+ can
be used as condition for further absorption if ¬A occurs and, therefore, it can be used
to further delay branching.
     Such candidate concepts are also very useful for identifying completely defined
concepts as possible subsumers for classification (cf. [2]) without forcing the decision
between A and ¬A for the nodes in the completion graph. As a result, these candidate
concepts can significantly improve the classification performance, especially in combi-
nation with model merging techniques [13,14], and, therefore, it is almost always useful
to generate these candidate concepts even if they are not required for further absorption.


4.2   Handling of Disjunctions

The presented absorption algorithm can also be used to optimise the handling of ordi-
nary disjunctions. For example, the axiom A v ∃r.(¬B t A) cannot be absorbed and
contains the disjunction ¬B t A. Obviously, we can replace the disjunction with a fresh
marker concept T by adding the additional axiom T v ¬B t A to the TBox, which can
be absorbed to T u B v A. Consequently, we have the axioms A v ∃r.T and T u B v A,
which can be processed by the tableau algorithm without causing any non-determinism.

4.3   Handling of Datatypes
Another advantage of the presented absorption algorithm is the straightforward exten-
sibility to handle datatypes such as strings, integers, or decimals, which represent sets
of concrete data values. In OWL, one can also use datatype restrictions to build custom
datatypes, e.g., the datatype restriction real[int ∧ >5 ] restricts OWL’s datatype owl:real,
which we abbreviate to real, with the facets int and >5 to real numbers that are integers
and greater than 5. We call int ∧ >5 a facet expression. One can further use Boolean con-
structors to build data ranges, e.g., string ∨ real[int ∧ >5 ] is a data range that contains
all data values that are strings or reals as described above. Concepts refer to data ranges
via concrete roles, e.g., the formula ∃q.real[int ∧ >5 ], for q a concrete role, describes
individuals that have some value of the data range real[int ∧ >5 ] as q-successor.
     Elements that represent data values are represented in the tableau algorithm by
concrete nodes. For example, for a node v with ∃q.real[int ∧ >5 ] ∈ L(v), the tab-
leau algorithm adds a new concrete node z and an edge hv, zi with q ∈ L(hv, zi) and
real[int ∧ >5 ] ∈ L(z). Typically, the tableau algorithm is then combined with a datatype
checker that checks the consistency of datatype constraints [10]. Such constraints are
represented as a set of assertions of the form d(z), ¬d(z), and z1 0 z2 , where z, z1 , and
z2 are concrete nodes and d is an explicit enumeration of data values {c1 , . . . , cn } or an
expression of the form dt[ϕ] for dt a datatype and ϕ a facet expression. Given a set of
such assertions, the datatype checker determines whether each concrete node can be
assigned a data value in a way that satisfies all of the given assertions.
     It is well-known that facets can be used to encode disjunctions [9]. For example,
A v ∀r.B1 t B2 can be expressed by the axioms
   A v ∃q.real[int ∧ >5 ] (4) ∃q.real[≤10 ] v ∀r.B1 (5)         ∃q.real[>10 ] v B2 , (6)
where q is a concrete role. For the data range real[int ∧ >5 ], it cannot directly be de-
termined whether real[≤10 ] or its complement is satisfied. This has to be clarified in
a case-by-case analysis, where once the integer interval int(5, 10] and once the in-
terval int(10, ∞] is considered. Hence, the datatype checker is forced to make non-
deterministic decisions. This also means that it is usually not possible to (completely)
absorb data ranges since it is not easily possible to determine if a certain data range
is satisfied or not. Of course, it is possible to absorb other parts of the axioms, e.g.,
Axiom (5) can be absorbed to
                  > v ∀r− .T                  T v ∀q.real[≤10 ] t ∀r.B1 ,
where real[≤10 ] expresses the negation of real[≤10 ]. However, without any further op-
timisations, Axiom (6) introduces non-determinism for every node as it is handled as
                                  > v ∀q.real[>10 ] t B2 .
    For partial absorption it is only required that the tableau algorithm or the datatype
checker can identify those concepts and data ranges that might be satisfied. This is ob-
viously possible for many data ranges and, therefore, they can be absorbed partially.
For this, the datatype checker has to be extended such that it identifies data range can-
didates, i.e., data ranges that are possibly satisfied. Let us assume that real[≤10 ]+ is the
data range candidate for real[≤10 ], i.e., if a concrete node in the completion graph pos-
sibly represents numerical data values that are less or equal to 10, then real[≤10 ]+ is
satisfied and this can be identified efficiently by the datatype checker. Consequently, we
can use such data range candidates in lazy unfolding rules, e.g., real[≤10 ]+ v T is used
to trigger the marker concept T if real[≤10 ] is possibly satisfied. Note that this adds
ordinary concepts to the labels of concrete nodes, which is, however, not problematic
since these concepts are only used internally as auxiliary constructs. With the data range
candidates, we can improve the absorption of Axiom (5) to
    real[≤10 ]+ v T 1 T 1 v ∀q− .T 2 > v ∀r− .T 3 T 2 u T 3 v ∀q.real[≤10 ] t ∀r.B1
and we can absorb Axiom (6) to
          real[>10 ]+ v T 4           T 4 v ∀q− .T 5        T 5 v ∀q.real[>10 ] t B2 .
Now, the processing of the disjunctions is delayed until there are concrete nodes that
possibly represent the absorbed data ranges.
     The required extensions for the absorption algorithm and the datatype checker are
straightforward. For the absorption, the definition of partial absorbability has to be ex-
tended to the supported data ranges and the concepts related to datatypes. Moreover,
the absorb function has to be adjusted such that the corresponding axioms are created.
The datatype checker has to be extended to identify those data range candidates that
might be satisfied. Usually, the datatype checker manages the data intervals that are still
possible for a concrete node in a sorted array [10]. By counting and iterating through
the possible values, it is easily possible to identify those data range candidates that are
satisfied. For example, the possible values of a concrete node z are restricted to the in-
teger interval int(5, ∞] if the data range real[int ∧ >5 ] is asserted to z. If z is not distinct
to other concrete nodes (i.e., there is no assertion z 0 z0 ), then the datatype checker can
pick the first possible data value of 6 to satisfy the constraints for z. Hence, the datatype
checker has to trigger only those data range candidates that are satisfied for this first
data value (i.e., real[≤10 ]+ ). If z is distinct to 5 other concrete nodes, then real[>10 ]+
is also triggered (although it is not necessarily the case that indeed 6 data values are
required to assign all (distinct) concrete nodes a data value).


5      Comparison with Related Absorption Techniques

In comparison to other absorption techniques, partial absorption often delays non-
determinism further. Consider again Axiom (3) from Section 3, which is rewritten by
the original binary absorption algorithm into the following axioms:4
             A v ∃r.¬B1 t ∀r− .T               T v B2 .
In contrast, partial absorption creates the following axioms (as shown in Section 3):
               A v ∀r− .T                          T v ∀r.(¬A t ∃r.¬B1 ) t B2 .
 4
     The binary absorption algorithm presented in [7] has a looping bug for concepts of the form
     ∀r.C, which can, however, easily be repaired. We simply ignored the repeated absorption of
     concepts that are introduced by the algorithm.
Hence, with the original binary absorption technique, the disjunction has already to be
processed when A is added to the label of a node, whereas partial absorption delays
the processing until there is also an r-neighbour. It is also well-known that clausifica-
tion for the hypertableau algorithm reduces many cases of non-determinism [11]. For
Axiom (3), however, clausification creates the following DL-clauses:
              r(x, y) → B2 (x) ∨ T (y)                 T (x) ∧ A(x) → ∃r.¬B1 (x)
Hence, the hypertableau algorithm has to choose between B2 and T for every node with
an r-neighbour, whereas the partial absorption technique allows for further delaying the
processing of disjunctions until the node also has the concept A in its label. Besides
that, the presented partial absorption technique shares other interesting features with
clausification of the hypertableau algorithm. In particular, both use conditions of more
expressive concept constructors to trigger the processing of disjunctions. For instance,
the axiom > 2 r.A v B is automatically satisfied if there is no r-successor with the
concept A in its label. This is utilised by partial absorption, which generates
                       A v ∀r− .T                          T v 6 1 r.A t B
and also by clausification of the hypertableau calculus, which generates:
                    r(x, y1 ) ∧ A(y1 ) ∧ r(x, y2 ) ∧ A(y2 ) → B(x) ∨ y1 ≈ y2
Note that clausification eliminates at-most cardinality restrictions, whereas partial ab-
sorption has to use the concept 6 1 r.A since > 2 r.A is not completely absorbable. Con-
sequently, a choose-rule is not required in the hypertableau algorithm, which possibly
further reduces non-determinism in comparison to a standard tableau algorithm that has
to choose between ¬A and A for all r-neighbours of nodes with T in their label.
    By using a recursive algorithm that absorbs GCIs completely in one step, we gain
a further advantage in comparison to traditional absorption algorithms. Traditional al-
gorithms absorb parts of axioms in separate steps, e.g., by introducing new GCIs that
are handled separately, which often creates additional disjunctions, especially if the
axioms have more complex (sub-)concepts. For instance, by simplifying the axiom
> 5 r.(> 3 s.A) v D into > 5 r.T v D and T ≡ > 3 s.A, for T a fresh atomic concept
that is used to split the original axiom, the GCI T ≡ > 3 s.A cannot further be absorbed
without creating additional disjunctions. In contrast, our recursive absorption algorithm
creates the axioms A v ∀s− .T 1 , T 1 v ∀r− .T 2 , and T 2 v 6 4 r.(> 3 s.A) t D, which de-
lays the processing of the disjunction significantly (or avoids it completely if there is no
node in the completion graph for which T 2 is added to its label).


6   Conclusions

In this paper, we have presented the partial absorption algorithm used in the reason-
ing system Konclude together with several extensions. The candidate concepts can, for
example, be combined with model merging techniques to reduce Konclude’s classifi-
cation time for several thousand ontologies by 60 % on average [13,14], which allows
Konclude to often outperform other state-of-the-art reasoners. The presented absorption
techniques are also essential for further optimisations such as an efficient handling of
nominal schemas [12] or the combination of tableau with saturation-based procedures
[13,14], where the reduced non-determinism is crucial.
References
 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The De-
    scription Logic Handbook: Theory, Implementation, and Applications. Cambridge Univer-
    sity Press, second edn. (2007)
 2. Glimm, B., Horrocks, I., Motik, B., Shearer, R., Stoilos, G.: A novel approach to ontology
    classification. J. of Web Semantics: Science, Services and Agents on the World Wide Web
    14, 84–101 (July 2012)
 3. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc. 10th Int.
    Conf. on Principles of Knowledge Representation and Reasoning (KR’06). pp. 57–67. AAAI
    Press (2006)
 4. Horrocks, I., Patel-Schneider, P.F.: Optimizing description logic subsumption. J. of Logic
    and Computation 9(3), 267–293 (1999)
 5. Horrocks, I., Sattler, U.: A description logic with transitive and inverse roles and role hierar-
    chies. J. of of Logic and Computation 9(3), 385–410 (1999)
 6. Horrocks, I., Sattler, U.: A tableau decision procedure for SHOIQ. J. of Automated Reson-
    ing 39(3), 249–276 (2007)
 7. Hudek, A.K., Weddell, G.E.: Binary absorption in tableaux-based reasoning for description
    logics. In: Proc. 19th Int. Workshop on Description Logics (DL’06). vol. 189. CEUR (2006)
 8. Krötzsch, M., Maier, F., Krisnadhi, A., Hitzler, P.: A better uncle for OWL: nominal
    schemas for integrating rules and ontologies. In: Proc. 20th Int. Conf. on World Wide Web
    (WWW’11). pp. 645–654. ACM (2011)
 9. Magka, D., Kazakov, Y., Horrocks, I.: Tractable extensions of the description logic EL with
    numerical datatypes. J. of Automated Reasoning 47(4), 427–450 (2011)
10. Motik, B., Horrocks, I.: OWL datatypes: Design and implementation. In: Proc. 7th Int. Se-
    mantic Web Conf. (ISWC’08). LNCS, vol. 5318, pp. 307–322. Springer (2008)
11. Motik, B., Shearer, R., Horrocks, I.: Hypertableau reasoning for description logics. J. of
    Artificial Intelligence Research 36, 165–228 (2009)
12. Steigmiller, A., Glimm, B., Liebig, T.: Nominal schema absorption. In: Proc. 23rd Int. Joint
    Conf. on Artificial Intelligence (IJCAI’13). pp. 1104–1110. AAAI Press/The MIT Press
    (2013)
13. Steigmiller, A., Glimm, B., Liebig, T.: Coupling tableau algorithms for expressive descrip-
    tion logics with completion-based saturation procedures. In: Proc. 7th Int. Joint Conf. on
    Automated Reasoning (IJCAR’14). LNCS, Springer (2014), accepted
14. Steigmiller, A., Glimm, B., Liebig, T.: Coupling tableau algorithms for the DL SROIQ with
    completion-based saturation procedures. Tech. Rep. UIB-2014-02, University of Ulm, Ulm,
    Germany (2014), available online at http://www.uni-ulm.de/fileadmin/website_
    uni_ulm/iui/Ulmer_Informatik_Berichte/2014/UIB-2014-02.pdf
15. Steigmiller, A., Liebig, T., Glimm, B.: Konclude: system description. J. of Web Semantics
    (2014), accepted
16. Steigmiller, A., Liebig, T., Glimm, B.: Optimised absorption for expressive descrip-
    tion logics. Tech. rep., Ulm University, Ulm, Germany (2014), available online
    at          https://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst.090/
    Publikationen/2014/StGL14c.pdf
17. Tsarkov, D., Horrocks, I.: Efficient reasoning with range and domain constraints. In: Proc.
    17th Int. Workshop on Description Logics (DL’04). vol. 104. CEUR (2004)
18. Tsarkov, D., Horrocks, I., Patel-Schneider, P.F.: Optimizing terminological reasoning for ex-
    pressive description logics. J. of Automated Reasoning 39, 277–316 (2007)
19. W3C OWL Working Group: OWL 2 Web Ontology Language: Document Overview.
    W3C Recommendation (27 October 2009), available at http://www.w3.org/TR/
    owl2-overview/

</pre>