Optimizing Reasoning with Qualified Number Restrictions in SHQ N. Farsiniamarj and V. Haarslev Concordia University, Montreal, Canada 1 Introduction Using SHQ one can express number restrictions on role fillers of individuals. A typi- cal question for a DL reasoner would be whether the concept ∀hasCredit.(Science t Engineering t Business) u (≥ 120 hasCredit.(Science t Engineering)) u (≥ 140 hasCredit)u(≤ 32 hasCredit.(Science tBusiness))u(≤ 91 hasCredit.Engineering) is satisfiable. Most DL tableau algorithms [9, 1, 12] test the satisfiability of such a con- cept by first satisfying all at-least restrictions, e.g., by creating 260 hasCredit-fillers, of which 120 are instances of (Science t Engineering). Eventually, a nondetermin- istic choose-rule assigns to each of these 260 individuals (Science t Business) or ¬(Science t Business), and Engineering or ¬Engineering. In case an at-most re- striction is violated, e.g., a student has more than 91 hasCredit-fillers of Engineering, a nondeterministic merge-rule tries to reduce the number of these individuals by merg- ing a pair of individuals until the upper bound specified in this at-most restriction is satisfied. Searching for a model in such an arithmetically uninformed or blind way is usually very inefficient. Our hybrid calculus is based on a standard tableau for SH [1] modified and ex- tended to deal with qualified number restrictions and works with an inequation solver based on integer linear programming. The tableau rules encode number restrictions into a set of inequations using the so-called atomic decomposition technique [16]. The set of inequations is processed by the inequation solver which finds, if possible, a minimal non-negative integer solution (distribution of role fillers constrained by number restric- tions) satisfying the inequations. The tableau rules ensure that such a distribution of role fillers also satisfies the logical restrictions. Since this hybrid algorithm collects all the information about arithmetic expressions before creating any role filler, it will not satisfy any at-least restriction by violating an at-most restriction and there is no need for a mechanism of merging role fillers and its performance is not affected by the values of numbers occurring in number restric- tions. Since the solution from the inequation solver satisfies all numerical restrictions imposed by at-least and at-most restrictions, our calculus needs to create only one so- called proxy individual (inspired by [6]) representing a set of role fillers. Considering all these features the proposed hybrid algorithm is well suited to improve average case performance. Furthermore, in [2, Chapter 6] it has been shown that a tableau procedure extended by global caching and an algebraic method similar to the one presented in this paper and in [16, 8] is worst-case optimal for SHIQ. Although the calculus presented in our paper is different from the one in [2] we conjecture that the result presented in [2] can be transferred to our work because the use of integer linear programming was essential to proving worst-case optimality. This calculus extends our work presented in [3] by (i) covering Abox consistency for SHQ with general Tboxes, and (ii) introducing the use of proxy individuals rep- resenting sets of role fillers. It complements the work in [4] (which extends the work Table 1. Syntax and semantics of SHQ Syntax Semantics Syntax Semantics A A I ⊆ ∆ I , A ∈ NC R ∈ NR R I ⊆ ∆ I × ∆ I ¬C ∆I \C I R ∈ NRT RI = (RI )+ C u D C I ∩ DI R v S RI ⊆ S I I I C tD C ˘ ∪D C v D C I ⊆ DI ∀R.C ˘x | ∀y : (x, y) ∈ RI¯⇒ y ∈ C I ¯ a : C aI ∈ C I ≥ n R.C ˘x | #RI (x, C) ≥ n ¯ (a, b) : R (aI , bI ) ∈ RI ≤ m R.C x | #RI (x, C) ≤ m . a=6 b aI 6= bI in [3] to ALCOQ) by additionally dealing with role hierarchies and transitive roles. This paper also discusses practical aspects of using the hybrid method and presents evaluation results based on an implemented prototype. It also significantly differs from the work presented in [15, 16] where no formal calculus was proposed and the covered logic did not support Abox consistency for SHQ with general Tboxes. Our early work presented in [8] did not deal with Aboxes and was based on a recursive calculus without using proxy individuals and did not provide any proof for soundness, completeness, or termination. 2 Preliminaries 2.1 Description Logic SHQ In the following, three disjoint sets are defined; NC is the set of concepts names; NR = NRT ∪ NRS is the set of all role names which consists of transitive (NRT ) and non-transitive (NRS ) roles; I is the set of all individuals, while IA ⊆ I is the set of individuals occurring in an Abox A. A role is called simple if it is neither transitive nor has any transitive sub-role. In order to remain decidable qualified number restric- tions are only allowed for simple roles (except for concepts of the form ≥ 1 R.C). In Table 1, we use a standard Tarski-style semantics I and assume that C, D are arbitrary concept descriptions, R, S ∈ NR , a, b ∈ IA , and #RI (x, C) denotes the cardinality of {y | (x, y) ∈ RI , y ∈ C I }. Let v∗ be the transitive, reflexive closure of v over NR . A concept description C is said to be satisfiable by an interpretation I iff C I 6= ∅. A role hierarchy R is a set of assertions of the form R v S where R, S ∈ NR . A SHQ Tbox T is a finite set of axioms of the form C v D or C ≡ D where C, D are concept expressions and C ≡ D is the placeholder for {C v D, D v C}. A Tbox T is satisfiable by an interpretation I iff I satisfies all axioms in T and R. A SHQ Abox A w.r.t. a Tbox T is a finite set of assertions of the form a : C, (a, b) : R, . and a = 6 b. An Abox A is satisfiable by an interpretation I iff I satisfies T and R and all assertions in A. Let a, b ∈ I be (possibly unnamed) individuals, given an assertion (a, b) : R, a is called a predecessor of b and b a successor or role filler of a. The R-fillers of a are defined as Fil (a, R) = {b | (aI , bI ) ∈ RI }. 2.2 Reducing SHQ to SHN \ Inspired by [16] we transform given qualified number restrictions to unqualified number restrictions and add a new role-set difference operator ∀(R\R0 ).C with (∀(R\R0 ).C)I = {x | ∀y : (x, y) ∈ (RI \ R0I ) ⇒ y ∈ C I }. The resulting language is called SHN \ . ˙ denote the standard negation normal form (NNF) of ¬C. We define a recursive Let ¬C function unQ which rewrites a SHQ concept description into SHN \ . It is important to note that this rewriting process always introduces for each transformed qualified num- ber restriction a unique new role. The function unQ transforms the input description into its NNF and replaces qual- ified number restrictions. In the following each R0 is a new role in NR with R := R ∪ {R0 v R}: unQ(C) := C if C ∈ NC ; unQ(¬C) := ¬C if C ∈ NC , unQ(¬C) ˙ otherwise; unQ(∀R.C) := ∀R.unQ(C); unQ(C u D) := unQ(C) u unQ(D); unQ(C t D) := unQ(C) t unQ(D); unQ(≥ nR.C) := (≥ nR0 ) u ∀R0 .unQ(C); unQ(≤ nR.C) := (≤ nR0 ) u ∀(R\R0 ).unQ(¬C); ˙ unQ(a : C) := a : unQ(C). It is easy to show that the concept (≤ n R.C) is equisatisfiable with the expression (∀(R\R0 ).¬C u ≤ n R0 ) with R0 fresh in R and R0 v R (see [5] for the proof). Since unQ introduces a negation itself, the negated description needs to be converted to NNF before further applying unQ. Our language is not closed under negation w.r.t. the con- cept descriptions created by unQ. However, our calculus ensures that these concept descriptions will never be negated. 3 A Hybrid Abox Tableau Calculus for SHQ 3.1 Preprocessing As a preprocessing step, the input Abox and Tbox are transformed into SHN \ as defined in Section 2.2. Similar to [13], indorder to propagate Tbox axioms to all in- dividuals we define the concept CT := CvD∈T unQ(¬C ˙ t D) and U as a new transitive role in the role hierarchy R extended by {R v U | R ∈ NR }. In order to ex- amine the consistency of A, the algorithm first extends A by {x0 : (CT u ∀U.CT )} ∪ {(x0 , x) : U | x occurs in A}, where x0 is new in A. By this means, we impose the ax- ioms in the Tbox on all individuals in A. Furthermore, we rewrite the role assertions in A as follows. If x, y ∈ IA then there exists a unique role name Rxy ∈ NR only used to represent that y is an R-filler of x with Rxy v R ∈ R. In other words whenever (x, z) : Rxy we have y I = z I . We replace role assertions of the form (b, c) : R by b : (≥ 1 Rbc u ≤ 1 Rbc ) because they impose a numerical restriction on individuals. For example, an assertion (b, c) : R implies there exists exactly one Rbc -filler for b which is c. 3.2 Completion Forest The closure clos(E) of a concept expression E is the smallest set of concepts such that: E ∈ clos(E), (¬D) ∈ clos(E) ⇒ D ∈ clos(E), (C t D) ∈ clos(E) or (C u D) ∈ clos(E) ⇒ C ∈ clos(E) and D ∈ clos(E), (∀R.C) ∈ clos(E) ⇒ C ∈ clos(E), (≥ n R.C) ∈ clos(E) or (≤ m R.C) ∈ clos(E) ⇒ C ∈ clos(E). For a Tbox T we define clos(T ) such that if (C v D) ∈ T or (C ≡ D) ∈ T then clos(C) ⊆ clos(T ) and clos(D) ⊆ T . Similarly for an Abox A we define clos(A) such that if (a : C) ∈ A then clos(C) ⊆ clos(A). A completion forest F = (V, E, L, LE ) for a SHQ Abox A is composed of a set of arbitrarily connected nodes as the roots of completion trees (if one ignores the connections between root nodes). Every node Px ∈ V is labeled by L(x) ⊆ clos(A) and LE (x) as a set of inequations of the form ( i∈1..k vi ) ./ n with ./ ∈ {≤, ≥}, n ∈ N, and vi ∈ V where V is a set of variables (see also below); each edge hx, yi ∈ E is labeled by the set L(hx, yi) ⊆ NR , and x (y) is called a predecessor (successor or role filler) of y (x), and the transitively closed set of successors (predecessors) is called p1 = {R}, p2 = {T }, p4 = {S}, p1 p5 p4 p3 = {R, T }, p5 = {R, S}, p6 = {S, T }, RS p7 = {R, S, T } R S p7 α(v001 ) = p1 , α(v010 ) = p2 , α(v100 ) = p4 , p3 p6 α(v011 ) = p3 , α(v101 ) = p5 , α(v110 ) = p6 , RST α(v111 ) = p7 RT ST 8 9 > > v001 + v011 + v101 + v111 ≥ 3, > > v010 + v011 + v110 + v111 ≤ 2, < = p2 LE (x) = T > > v100 + v101 + v110 + v111 ≤ 1, > > v100 + v101 + v110 + v111 ≥ 1 : ; Fig. 1. Atomic Decomposition Example descendants (ancestors) respectively. We maintain the distinction between nodes of a . forest by the relation =. 6 The set Rx of related roles for a node x is defined as Rx = {S | {≤ n S, ≥ m S} ∩ L(x) 6= ∅}. We define a partitioning RS x = ( RS ⊆Rx {RS }) \ ∅ and for a partition S RS x ∈ RS x we define RS Ix = ( S∈RS x Fil I (x, S)) \ ( S∈Rx \RS x Fil I (x, S)) such T S that Fil I (x, S) = {y I | y ∈ Fil (x, S)}. RS Ix represents the interpretation of fillers of x that are fillers for the roles in RS but not fillers for the roles in Rx \ RS . Therefore, by definition the fillers of x associated with the partitions in RS x are mutually disjoint w.r.t. the interpretation I. We assume a set V of variables and by using a mapping α : V ↔ RS x we associate a unique variable v ∈ V with each partition RS x in RS x such that α(v) = RS x . Let V R be defined as the set of all variables related to a role R such that V R = {v ∈ V | R ∈ α(v)}. A function ξ is used to map number restrictions to inequations of the form ξ(R, ./, n) := ( vi ∈V R vi ) ./ n where ./ ∈ {≤, ≥} and n ∈ N. P Since all concept restrictions for node successors that are not root nodes are prop- agated through universal restrictions for roles and a new role is created for each role filler occurring in a role assertion, we can conclude that all nodes in a certain node par- tition share the same restrictions and can be dealt with as a unit. We call this unit proxy node which is a representative of possibly more than one node. We define the mapping card : V → N to indicate the cardinality associated with proxy nodes. A node x in a forest F is blocked by a node y iff x, y are not root nodes and y is an ancestor of x such that L(x) ⊆ L(y). A node x contains a clash iff there exists a concept name A ∈ NC such that {A, ¬A} ⊆ L(x) (logical clash) or LE (x) does not have a non-negative integer solution (arithmetic clash). An arithmetic solution is represented using the function σ : V → N which assigns a non-negative integer value to each variable in V. Let Vx be the set of variables assigned to an individual x, Vx = {v ∈ V | v occurs in LE (x)}. We define a set of solutions Ω for x as Ω(x) := {σ(v) = n | n ∈ N, v ∈ Vx }. Notice that the goal function of the inequation-solver is to minimize the sum of the variables occurring in the input inequations. The algorithm starts with the forest FA with the individuals mentioned in A as root nodes. For each a ∈ IA a root node xa will be created with L(xa ) = {C | (a : C) ∈ A}. Additionally, for every root node x we set card(x) = 1. We illustrate these definitions with a simple example shown in Figure 1. Assume for a node x that we have L(x) = {≥ 3 R, ≤ 2 T , ≥ 1 S, ≤ 1 S}. Therefore, we have Rx = {R, T, S} and RS x consists of 7 different partitions named p1 , . . . , p7 (see Figure 1). Assuming a binary coding of the indices of variables, where the first digit from the right represents R, the second T , and the last S, we define the variables such that each vi is associated with its corresponding partition (see α(v) in Figure 1). Hence, the number restrictions in L(x) can be translated to the set of inequations in LE (x) as shown in the bottom-right part of Figure 1. 3.3 Completion Rules The completion rules are shown in Figure 2. The completion rules are always ap- plied with the following priorities (given in decreasing order). A rule of lower priority can never be applied if another one with a higher priority is applicable. 1. reset-Rule and merge-Rule. 2. u-Rule, t-Rule, ∀-Rule, ∀\ -Rule, ∀+ -Rule, ch-Rule, disjoint-Rule, equal-Rule, and hierarchy-Rule 3. ≤-Rule and ≥-Rule. 4. fil -Rule. There are two limi- tations on the application of the rules: (i) priority of the rules, (ii) rules are only appli- cable to nodes that are not blocked. A completion forest F is called complete if no completion rule is applicable to any node of F. A completion forest F is called clash-free if none of the clash triggers is applicable to any node of F. The u-Rule, t-Rule, ∀-Rule, and the ∀+ -Rule are similar to the ones in standard tableau algorithms. The ∀\ -Rule implements the universal restriction based on role set difference. ≤-Rule, ≥-Rule: The ≤-Rule and the ≥-Rule translate the number restric- tions, based on the atomic decomposition technique, into inequations and add them to LE (x). ch-Rule: The intuition behind the ch-Rule is due to the partitioning consequences. When we partition all successors for a node, we actually consider all possible cases for the role successors. If a partition px for a node x is logically unsatisfiable, the corre- sponding variable v with α(v) = px should be zero. But if it is logically satisfiable, nothing but the current set of inequations can restrict the number of nodes being fillers for the roles in the partition px . . The disjoint-Rule preserves the semantics of Abox assertions of the form a = 6 b. The equal-Rule ensures the equivalence of the Rxy - and Sxy -fillers. The hierarchy- Rule enforces the semantics of role hierarchies on partition variables. The merge-Rule merges two successors (and root nodes) za , zb of a root node zx into one node by re- placing every occurrence of za in the completion graph by zb . The labels L and LE of zb are unified with the corresponding labels of za . All incoming (outgoing) edges of za become now incoming (outgoing) edges of zb . fil-Rule: The fil-Rule, which has the lowest priority, is the only generating rule. It creates one successor (proxy node) for a node zx that is not blocked and sets the cardinality of the proxy node to σ(v). If zx is a root node and has an already existing successor xb , which is a also a root node, and the role set for hzx , zb i is empty, i.e., it has not yet been initialized or has been reset by the reset-rule, the role set is set to α(v). Due to possible cycles between root nodes, one has to consider incremental parti- tioning for root nodes because a previously unknown number restriction could be prop- agated to a root node. Whenever (≤ n R) or (≥ m R) is added to L(x), the following reset-Rule if x is a root node, {(≤ nR), (≥ nR)} ∩ L(x) 6= ∅, and ∀v ∈ Vx : R ∈ / α(v) then set LE (x) := ∅ and for every successor y of x set L(hx, yi) := ∅ merge-Rule if there exists root nodes zx , za , zb for x, a, b ∈ IA such that Rxa ∈ L(hzx , zb i) then merge the nodes za , zb and their labels and replace every occurrence of za in the completion graph by zb u-Rule if (C1 u C2 ) ∈ L(x) and {C1 , C2 } 6⊆ L(x) then set L(x) = L(x) ∪ {C1 , C2 } t-Rule if (C1 t C2 ) ∈ L(x) and {C1 , C2 } ∩ L(x) = ∅ then set L(x) = L(x) ∪ {X} with X ∈ {C1 , C2 } ∀-Rule if ∀R.C ∈ L(x) and there exists a y and R0 with R0 ∈ L(hx, yi), C ∈ / L(y), R 0 v∗ R then set L(y) = L(y) ∪ {C} ∀\ -Rule if ∀R\S.C ∈ L(x) and there exists a y and R0 with R0 ∈ L(hx, yi), R0 v∗ R, C∈ / L(y), and there exists no S 0 such that S 0 v∗ S and S 0 ∈ L(hx, yi) then set L(y) = L(y) ∪ {C} ∀+ -Rule if ∀R.C ∈ L(x) and there exists a y and R0 , S with R0 ∈ L(hx, yi), R0 v∗ S, S v∗ R, S ∈ NRT , and ∀S.C ∈ / L(y) then set L(y) = L(y) ∪ {∀S.C} ch-Rule If there occurs v in LE (x) with {v ≥ 1, v ≤ 0} ∩ LE (x) = ∅ then set LE (x) = LE (x) ∪ {X} with X ∈ {v ≥ 1, v ≤ 0} disjoint- if there occurs v in LE (zx ) with {R0 , S 0 } ⊆ α(v), R0 v∗ Rxa , S 0 v∗ Rxb , Rule x, a, b ∈ IA , v ≤ 0 ∈ / LE (zx ) then set LE (zx ) := LE (zx ) ∪ {v ≤ 0} equal-Rule if there occurs v in LE (zx ) with R0 ∈ α(v), S 0 ∈ / α(v), R0 v∗ Rxa , S 0 v∗ Sxa , x, a, b ∈ IA , v ≤ 0 ∈ / LE (zx ) then set LE (zx ) := LE (zx ) ∪ {v ≤ 0} hierarchy- if there occurs v in LE (x) with R ∈ α(v), S ∈ / α(v), R v∗ S, v ≤ 0 ∈/ LE (x) Rule then set LE (x) := LE (x) ∪ {v ≤ 0} ≥-Rule If (≥ nR) ∈ L(x) and ξ(R, ≥, n) ∈ / LE (x) then set LE (x) = LE (x) ∪ {ξ(R, ≥, n)} ≤-Rule If (≤ nR) ∈ L(x) and ξ(R, ≤, n) ∈ / LE (x) then set LE (x) = LE (x) ∪ {ξ(R, ≤, n)} fil-Rule If there exists v occurring in LE (zx ) such that σ(v) = n with n > 0: if n = 1, zx , zb root nodes, and Rxb ∈ α(v) with x, b ∈ IA then if L(hzx , zb i) = ∅ then set L(hzx , zb i) := α(v) end if else if zx is not blocked and ¬∃y : L(hzx , yi) = α(v) then create a new node y and set L(hzx , yi) := α(v) and card(y) = n Fig. 2. Completion rules for SHQ Abox consistency (listed in decreasing priority) takes place: (i) The reset-Rule becomes applicable for x, which sets LE (x) := ∅ and removes all outgoing edges of x. (ii) Now that LE (x) is empty, the ≤-Rule and ≥- Rule will be invoked again to recompute the partitions and variables. Afterwards, the set of inequations based on the new partitioning is added. (iii) If (σ(vi ) = n) ∈ Ω(x), P vi ∈ Vx corresponds where to the previous partitioning, then set LE (x) := LE (x) ∪ { v0 ∈Vx,i vj0 ≥ n, v0 ∈Vx,i vj0 ≤ n} where Vx,i := {vj0 ∈ Vx0 | α(vi ) ⊆ α(vj0 )} and P j j vj0 ∈ Vx0 are based on the new partitioning. The third task in fact maintains the previous solution in x and records it by means of inequalities in LE (x). Therefore, the solution based on the new partitioning will be recreated by the arithmetic reasoner. An interpretation I satisfies a SHQ Abox A iff the hybrid tableau calculus can derive a complete and clash-free completion forest for A. Due to lack of space we refer the reader to [5] for the proofs on termination, soundness and completeness of the presented calculus. 4 Practical reasoning 4.1 Complexity Analysis To analyze the complexity of the concept satisfiability test with respect to qualified number restrictions, we count the number of branches that the algorithm creates in the search space. In the following we assume a node x ∈ V in the completion forest that contains p at-least restrictions and q at-most restrictions in its label (Ri , Rj0 ∈ NR and Ci , Cj0 ∈ clos(T )): {≥ n1 R1 .C1 , . . . , ≥ np Rp .Cp } ⊆ L(x), {≤ m1 R10 .C10 , . . . , ≤ mq Rq0 .Cq0 } ⊆ L(x) Standard tableau: The complexity of a standard tableau (e.g., see [9]) can be char- acterized by (2q )N × f (N, m), where N depends on the biggest number occurring in at-least restrictions and f (N, m) describes the number of possible ways to merge N in- dividuals into m individuals with m the smallest non-zero number occurring in at-most restrictions (see [5] for more details). Hybrid tableau: During the preprocessing step, the hybrid algorithm converts all qualified number restrictions into unqualified ones which introduces p + q new role names. According to the atomic decomposition presented in Section 3.2, the hybrid algorithm defines 2p+q − 1 partitions and consequently variables for x; i.e. |Vx | = 2p+q −1. The ch-Rule opens two branches for each variable in Vx . Therefore, there will be totally 2|Vx | cases to be examined by the arithmetic reasoner. The ch-Rule will be p+q invoked |Vx | = 2p+q − 1 times and creates 22 branches in the search space. Hence, the complexity of the algorithm is characterized by a double-exponential function of p+q. In [14] a polynomial-time algorithm for integer programming with a fixed number of variables is given. However, our prototype implementation employed the Simplex method which is known to be NP in the worst case but usually behaves very well on average. The complexity of the standard algorithm is a function of N and therefore the num- bers occurring in the at-most restrictions can affect the standard algorithm exponen- tially. Whereas in the hybrid algorithm, the complexity is independent from N due to its arithmetic approach to the problem. 4.2 Partition Optimization Techniques Although it seems that the hybrid algorithm is double-exponential in the worst case and the large number of variables seems to be hopelessly inefficient, there are some effective heuristics and optimization techniques which make it feasible to use. Default Value for Variables: In the semantic branching based on the concept choose- Rule (see [9]), in one branch we have C and in the other branch we have ¬C in the label of the nodes. However, due to the ch-Rule (for variables) in one branch we have v ≥ 1 whereas in the other branch v ≤ 0. In contrast to concept branching based on the choose-Rule, in variable branching we can ignore the existence of variables that are less or equal to zero. In other words, the arithmetic reasoner only considers variables that are greater or equal to one. Therefore, by setting the default value to v ≤ 0 for every variable, the algorithm does not need to invoke the ch-Rule |Vx | times before starting to find a solution for the inequations. More precisely, the algorithm starts with the default value of v ≤ 0 for all variables in |Vx |. Obviously, the solution for this set of inequations, which is ∀ vi ∈ Vx : σ(vi ) = 0, cannot satisfy any at-least restriction. Therefore, the algorithm must choose some variables in Vx to make them greater or equal to one. Although in the worst case the algorithm still needs to try 2|Vx | cases, by setting this default value it does not need to invoke the ch-Rule when it is not necessary. We define don’t care variables as the set of variables that appear in an at-least re- striction but in no at-most restriction. These variables have only logical restrictions which later on will be processed by the algorithm. Therefore, any non-negative integer value satisfying all known arithmetic restrictions can be assigned to these variables and we can leave them unchanged in all inequations unless a logical clash is triggered. We define satisfying variables as the set of variables which occur in an at-least re- striction and are not don’t care variables. Since these are the variables that occur in an at-least restriction, by assigning them to be greater or equal to one, the algorithm can lead the arithmetic reasoner to a solution. Whenever a node that is created based on v causes a clash, by means of dependency-directed backtracking we will set v ≤ 0 and therefore remove v from the satisfying variables set. When the satisfying variables set becomes empty the algorithm can conclude that the set of qualified number restric- tions in L(x) is unsatisfiable. Notice that the number of variables that can be decided to be greater than zero in an inequation is bounded by the number occurring in their corresponding number restriction. Variable Encoding: One can encode indices of variables in binary format to easily retrieve the role names related to them. We do not need to assign any memory space for them unless they have a value greater than zero based on an arithmetic solution. 4.3 Dependency-Directed Backtracking Dependency-directed backtracking or backjumping is a backtracking method which de- tects the sources of a clash and tries to bypass branching points that are not related to the sources of the clash [12, 11]. In the hybrid algorithm, whenever we encounter a log- ical clash for a successor y of x, we can conclude that the corresponding variable vy for the partition in which y resides must be zero. Therefore, we can prune all branches for which vy ≥ 1 ∈ LE (x). This method which is called simple backtracking can decrease the size of the search space by pruning half of the branches each time the algorithm detects a clash. For example, for an arbitrary L(x), by pruning all the branches where p+q vy ≥ 1, we will in fact prune 2|Vx |−1 = 22 −1 branches w.r.t. the ch-Rule, which amounts to half of the branches. We can improve this by a better version in which we prune all branches that have the same reason for the clash caused by vy . This methods is called complex backtracking. For instance, assume the node y that is created based on σ(vy ) = k where k ≥ 1 ends up with a clash. Since we have only one type of clash other than the arithmetic clash, assume the clash is because of {A, ¬A} ⊆ L(y) for some A ∈ NC . Moreover, assume we know that A is caused by a ∀Ri .A restriction in its predecessor x and ¬A by ∀S\Tj .(¬A) ∈ L(x). It is possible to conclude that all the variables v for which Ri ∈ α(v) ∧ Tj ∈ / α(v) will end up with the same clash. Considering the binary coding for the indices of the variables in which the ith digit represents Ri and the jth digit represents Tj , all the variables, where the ith digit is equal to 1 and the jth digit equal to zero, must be zero. Since the binary coding of the variable indices requires a total of p + q digits, the number of variables that must be zero will be 2p+q−2 . The 2p+q − 2p+q−2 other variables are free to be constrained and will open two branches in the search space. Therefore, the number of branches will be reduced from 2|Vx | to 23/4|Vx | which is a significant improvement. 5 Evaluation Our prototype reasoner can process ALCHQ concept expressions (see also [5]). The system is implemented in Java using the OWL-API 2.1.1 [10]. Qualified number restric- tions are expressive constructs added to the forthcoming OWL 2 [17]. Therefore, current OWL benchmarks do not contain qualified number restrictions. In fact, to the best of our knowledge there is no real-world SHQ benchmark available which contains non-trivial qualified number restrictions. Furthermore, our prototype reasoner does not implement any optimization techniques except the ones introduced above, which target only rea- soning with the Q-component of SHQ. Accordingly, we built a set of synthetic test cases to empirically evaluate the hybrid reasoner. We focus our evaluation on concept expressions only containing qualified number restrictions. We identified the following parameters that may affect the complexity of reasoning with number restrictions: (i) The size of numbers occurring in number restrictions. (ii) The number of qualified number restrictions. (iii) The ratio of the number of at-least to the number of at-most restric- tions. (iv) Satisfiability versus unsatisfiability of the input concept expression. Due to lack of space we focus in this paper only on the cases (i) and (iv). Well-known reasoners that support qualified number restrictions such as Racer [7], FaCT++ [19] or Pellet [18] implement numerous optimization techniques. However, to the best of our knowledge, Pellet as well as FaCT++ have no specific optimization techniques for dealing with qualified number restrictions. We decided to select Pellet (version 2.0 RC7 from June 11, 2009) as the representative for a state-of-the-art DL rea- soner. We based our evaluations only partially on a comparison with one of the existing reasoners and focused on the study of the behavior of the hybrid reasoner. The experi- ments were performed under Windows 32 on a standard PC with a dual-core (2.1 GHz) processor and 3 GB of physical memory. Every test was executed in five independent runs (with a timeout of 1000 seconds) and the average of these runs is presented. In order to assess the scalability of the hybrid algorithm with respect to the size of the numbers, we compared its performance with Pellet. We used the following concept expressions (assuming a role hierarchy {R v T, S v T, RS v R, RS v S}; i is a number incremented for each benchmark): (≥ 2i RS.(A t B)) u (≤ i S.A) u (≤ i R.B) u (≤ (i−1) T.(¬A)) t (≤ i T.(¬B)) (CSAT ) (≥ 2i RS.(AtB))u(≤ i S.A)u(≤ i R.B) u(≤ (i−1) T.(¬A))t(≤ (i−1) T.(¬B))(CUNSAT ) CSAT is a satisfiable concept while CUNSAT is unsatisfiable. In fact, CSAT is not trivially satisfiable. In the hybrid algorithm, the set of inequations is only satisfied in the case where all variables are zero except v, v 0 ≥ 1 and α(v) = {RS 0 , S 0 , T 0 } and α(v 0 ) = {RS 0 , R0 }.1 The standard algorithm in order to examine the satisfiability of 1 Assume R0 , S 0 , RS 0 , and T 0 are new sub-roles of R, S, RS, and T respectively. 10 6 .... Effect of the Size of Numbers: Hybrid vs. Pellet 100 Effect of Exponential Growth of i . . . 5 90 Hybrid-Sat 2 Hybrid-Unsat Hybrid-Sat 80 . 5 10 5 Hybrid-Unsat 70 Pellet-Sat Runtime (ms) 2 Pellet-Unsat Runtime (ms) 60 4 . . . . . . 10 5 50 ... 2 40 3 10 5 30 2 20 2 10 5 10 0 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 i in concept expression C log10 i in concept expression C Fig. 3. Increasing the value of numbers Fig. 4. Exponential growth of i . Effect of Backtracking 10 5 6 16000 . . . . ..... Satisfiable vs. Unsatisfiable: Standard and Hybrid Algorithms . Hybrid . 2 14000 Pellet . 5 10 5 12000 . 2 Runtime (ms) 10000 Runtime (ms) 4 . 10 5 8000 . 2 . 3 6000 10 . 5 . 4000 .. 2 ... ............. 2 10 No-Backtracking 2000 5 Simple Complex 0 2 0 20 40 60 80 100 120 140 160 180 200 220 240 2 3 4 5 6 7 8 9 10 11 i in the concept expression Ei Number of Restrictions Fig. 5. Different backtracking strategies Fig. 6. (Un)Satisfiability of Ei CSAT creates (2 × i) RS-successors for x in (A t B) and according to the three at- most restrictions it opens 8 new branches for each successor. However, since (2 × i) is much larger than i or i − 1, one must start merging the extra successors when an at-most restriction is violated which happens in all branches in this test case. As shown in Figure 3 (logarithmic scale), the growth of i from 2 to 10 has almost no effect on the hybrid reasoner while it dramatically increases the runtime of the standard algorithm for numbers as small as 6 and 7. In contrast to the standard reasoner, the performance of the hybrid reasoner is unaffected by the value of the numbers. Furthermore, to assure that this independence will be preserved also with respect to an exponential growth of i, in Figure 4 we present the performance of the hybrid reasoner for i = 10n , n ∈ 1..6. In the next experiment we observed the effect of backtracking on the performance of the hybrid reasoner. In three settings, we first turn off backtracking, secondly enable simple backtracking, and finally utilize complex backtracking (see Section 4.3). We tested an unsatisfiable concept DUNSAT which follows the pattern where Dj u Dk = ⊥ for 1 ≤ j, k ≤ i, j 6= k with respect to a role hierarchy R v T : (≥ 3 R.D1 ) u . . . u (≥ 3 R.Di ) u (≤ (3i−1) T ). As shown in Figure 5 (logarithmic scale), the high non- determinism of the ch-Rule makes it inevitable to utilize backtracking. Moreover, we can conclude that by using a more comprehensive and informed method of backtracking we can improve the performance significantly. In the last experiment the test cases are concepts containing four qualified at-least restrictions and one unqualified at-most restriction according to the following pattern. We abbreviate these concepts with Ei where R v T for i = 1, 20, 40, . . . , 220, 240: ≥ 30 R.(B u C)u≥ 30 R.(B u ¬C) u≥ 30 R.(¬B u C)u≥ 30 R.(¬B u ¬C)u≤ i T . The concept Ei is satisfiable for i ≥ 120 and unsatisfiable for i < 120. As illustrated in Figure 6, the standard algorithm can easily infer for i ∈ 1..29 that Ei is unsatisfiable since x has at least 30 distinguished successors. However, from E30 to E120 it becomes very time-consuming for the standard algorithm to merge all 120 successors into i individuals. Moreover, Figure 6 shows that no matter which value i takes between 30 and 119, the standard algorithm performs similarly. In other words, we can conclude that it tries the same number of possible ways of merging which is all the possibilities to merge 4 sets of mutually distinguished individuals. As soon as i becomes greater or equal 120, since the at-most restriction is not violated, the standard algorithm simply ignores it and reasoning becomes trivial for the standard algorithm. Furthermore, we can conclude from Figure 6 that for the hybrid algorithm i = 1 is a trivial case since not more than one variable can have the type of v ≥ 1 which is the case that easily leads to unsatisfiability for E1 . However, it becomes more difficult as i grows and reaches its maximum for i = 30 and starts to decrease gradually until i = 70 and remains unchanged until i = 120. In fact, this unexpected behavior did not correspond to the formal analysis of the hybrid algorithm. Therefore, we measured the time spent on arithmetic reasoning and logical reasoning as well as the number of different clashes. The reason that i = 30 is a breaking point is due to the fact that for i < 30 no arithmetic solution exists for the set of inequations. It seems that the increase in runtime is due to this fact and a straight-forward and unoptimized implementation of the arithmetic reasoner. When i grows from 30 to 120, the arithmetic reasoner finds more solutions for the set of inequations which will fail due to logical clashes. 6 Discussion and Future Work Based on the previous sections we identify the following advantages of the hybrid al- gorithm: (i) Insensitivity to the value of numbers in number restrictions. (ii) Since all number restrictions are collected before expanding the completion graph, its solution is more probable to survive. (iii) Due to the atomic decomposition the search for a model is semantically structured. When encountering a clash, one can efficiently backtrack to the source of the clash and optimally prune the branches which would lead to the same clash (see Section 4.3). The following disadvantages of the hybrid algorithm can be observed: (i) According to the nature of the atomic decomposition, the hybrid algorithm introduces an exponen- tial number of variables. (ii) The implemented prototype still requires a preprocessing phase affected by the number of partitions. This initialization time could be saved for trivially satisfiable or unsatisfiable concepts. We plan to address the following topics in our future work. (i) Since the minimal number of successor property is unnecessary in many cases, one could reduce the arith- metic reasoning time by modifying the goal function and allowing a solution with more successors. (ii) Optimizing the arithmetic reasoner by implementing techniques such as incremental arithmetic reasoning, caching, and heuristics for controlling the order of applying the ch-rule. Acknowledgements We express our gratitude to our colleagues Jocelyne Faddoul and Ralf Möller for their valuable advice during this investigation. References 1. F. Baader and U. Sattler. An overview of tableau algorithms for description logics. Studia Logica, 69:5–40, 2001. 2. Y. Ding. Tableau-based Reasoning for Description Logics with Inverse Roles and Number Restrictions. PhD thesis, Department of Computer Science and Software Engineering, Con- cordia University, April 2008. Available at http://users.encs.concordia.ca/ ~haarslev/students/Yu_Ding.pdf. 3. J. Faddoul, N. Farsinia, V. Haarslev, and R. Möller. A hybrid tableau algorithm for ALCQ. In Proceedings of the 18th European Conference on Artificial Intelligence (ECAI 2008), Patras, Greece, July 21-25, pages 725–726, 2008. An extended version appeared in Proceedings of the 2008 International Workshop on Description Logics (DL-2008), Dresden, Germany, May 13-16, 2008. 4. J. Faddoul, V. Haarslev, and R. Möller. Hybrid reasoning for description logics with nom- inals and qualified number restrictions. Technical report, Institute for Software Systems (STS), Hamburg University of Technology, 29 pages, 2008. See also http://www.sts.tu- harburg.de/tech-reports/papers.html. 5. N. Farsiniamarj. Combining integer programming and tableau-based reasoning: A hybrid calculus for the description logic SHQ. Master’s thesis, Concordia University, Depart- ment of Computer Science and Software Engineering, 2008. Available at http://users. encs.concordia.ca/~haarslev/students/Nasim_Farsinia.pdf. 6. V. Haarslev and R. Möller. Optimizing reasoning in description logics with qualified number restriction. In Proceedings of the International Workshop on Description Logics (DL’2001), Aug. 1-3, 2001, Stanford, CA, USA, pages 142–151, Aug. 2001. 7. V. Haarslev and R. Möller. RACER system description. In R. Goré, A. Leitsch, and T. Nip- kow, editors, Proceedings of the International Joint Conference on Automated Reasoning, IJCAR’2001, June 18-23, 2001, Siena, Italy, Lecture Notes in Computer Science, pages 701– 705. Springer-Verlag, June 2001. 8. V. Haarslev, M. Timmann, and R. Möller. Combining tableaux and algebraic methods for reasoning with qualified number restrictions. In Proceedings of the International Workshop on Description Logics (DL’2001), Aug. 1-3, Stanford, USA, pages 152–161, 2001. 9. B. Hollunder and F. Baader. Qualifying number restrictions in concept languages. In J. Allen, R. Fikes, and E. Sandewall, editors, Second International Conference on Princi- ples of Knowledge Representation, Cambridge, Mass., April 22-25, 1991, pages 335–346, Apr. 1991. 10. M. Horridge, S. Bechhofer, and O. Noppens. Igniting the OWL 1.1 touch paper: The OWL API. In Proceedings of the OWLED 2007 Workshop on OWL: Experiences and Directions, volume 258, Innsbruck, Austria, June 2007. 11. I. Horrocks. Backtracking and qualified number restrictions: Some preliminary results. In Proc. of the 2002 Description Logic Workshop (DL 2002), pages 99–106, 2002. 12. I. Horrocks, U. Sattler, and S. Tobies. Practical reasoning for very expressive description logics. Logic Journal of the IGPL, 8(3):239–264, 2000. 13. I. Horrocks, U. Sattler, and S. Tobies. Reasoning with individuals for the description logic SHIQ. In D. MacAllester, editor, Proceedings of the 17th International Conference on Au- tomated Deduction (CADE-17), Lecture Notes in Computer Science, pages 482–496, Ger- many, 2000. Springer Verlag. 14. H. Lenstra. Integer programming with a fixed number of variables. Mathematics of Opera- tions Research, 8(4):538–548, November 1983. 15. H. Ohlbach and J. Koehler. Role hierarchies and number restrictions. In Proceedings of the International Workshop on Description Logics (DL-97), Sept. 27 - 29, Gif sur Yvette (Paris), France, 1997. 16. H. Ohlbach and J. Köhler. Modal logics, description logics and arithmetic reasoning. Artifi- cial Intelligence, 109(1-2):1–31, 1999. 17. OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax, W3C Working Draft, 02 December 2008, October 2008. http://www.w3.org/TR/ owl2-syntax/ (last visited June 2009). 18. E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: a practical OWL-DL reasoner. Journal of Web Semantics, 5(2):51–53, 2005. 19. D. Tsarkov and I. Horrocks. FaCT++ description logic reasoner: System description. In Proc. of the Int. Joint Conf. on Automated Reasoning (IJCAR 2006), volume 4130 of Lecture Notes in Artificial Intelligence, pages 292–297. Springer, 2006.