Sufficient Conditions for First-Order and Datalog Rewritability in ELU Mark Kaminski and Bernardo Cuenca Grau Department of Computer Science University of Oxford, UK. Abstract. We study the problem of answering instance queries over non-Horn ontologies by rewriting them into Datalog programs or First- Order queries (FOQs). We consider the basic non-Horn language ELU, which extends EL with disjunctions. Both Datalog-rewritability and FO- rewritability of instance queries have been recently shown to be decidable for ALC ontologies (and hence also for ELU); however, existing decision methods are mainly of theoretical interest. We identify two fragments of ELU for which we can compute Datalog-rewritings and FO-rewritings of instance queries, respectively, by means of a resolution-based algorithm. 1 Introduction We study the problem of answering queries over DL ontologies by rewriting them into a First-Order query (FOQ) or a Datalog program. On the theoretical side, rewriting queries into an FOQ or a Datalog program ensures tractability of query answering in terms of data complexity. On the practical side, it allows one to reuse optimised data management systems, namely RDBMSs in the case of FOQs and rule-based systems such as OWLim or Oracle’s Semantic Data Store in the case of Datalog. The problem of ontology-based query answering via query rewriting has so far been mainly studied for Horn DLs. FO-rewritability is ensured for logics of the DL-Lite family [5] and query rewriting algorithms in these DLs have been implemented in systems such as QuOnto [1], Presto [17], Quest [16], Rapid [6], and Owlgres [20]. Datalog rewritability is ensured for logics of the EL family, as well as more expressive languages such as Horn-SHIQ [11]; optimised algorithms have been implemented in systems such as Requiem [15] and Clipper [10]. FO- rewritability has also been studied as a decision problem for logics of the EL- family [3], where tight complexity bounds have been shown. Horn DLs, however, cannot capture disjunctive knowledge, such as ‘every student is either a graduate or an undergraduate’. As a consequence, ontologies capturing disjunctive knowledge cannot be processed using existing rewriting techniques. Little is known about how to compute FO and Datalog rewritings for non-Horn ontologies, and first results have been obtained only recently. Instance queries (i.e., queries of the form A(x) with A a concept name) are known to be FO-rewritable w.r.t. ontologies expressed in certain logics of the DL-Litebool fam- ily —the extension of DL-Lite logics with disjunction [2]—, and a goal-oriented resolution algorithm for computing rewritings w.r.t. such logics has been pro- posed in [8]. In contrast, instance queries w.r.t. ontologies expressed in the basic non-Horn DL ELU are not generally rewritable into Datalog: a result that holds regardless of any complexity-theoretic assumptions [8]. While FO and Datalog rewritability of instance queries w.r.t. ALC ontologies have been proved to be decidable [4], the decision methods in [4] are problematic in practice as their first step, translation to CSP, has exponential best-case complexity. Finally, for certain classes of conjunctive queries, query answering over non-Horn ontologies can be reduced to knowledge base consistency [13]. This approach, however, does not generally ensure tractability in terms of data complexity. In this paper, we are interested in ontologies formulated in the basic non-Horn DL ELU. For simplicity, we focus on instance queries of the form A(x). In gen- eral, however, our results apply to ground queries, where all variables are required to be mapped to constants; such queries form the basis of the standard query language SPARQL. Our main result is the identification of two sufficient condi- tions on ELU-ontologies that ensure FO-rewritability and Datalog rewritability of instance queries, respectively. Furthermore, we provide resolution-based algo- rithms that can be used for computing the corresponding rewritings in case our sufficient conditions are fulfilled. Our algorithms build on the generic resolution- based technique proposed in [8]. This paper is accompanied with an appendix containing the proofs of our technical results. 2 Preliminaries We consider First-Order logic without equality and function symbols. Variables, terms, (ground) atoms, literals, formulae, sentences, intepretations, models and entailment are defined as usual. An ABox is a finite set of ground atoms (called facts). We also adopt standard notions of (Horn) clauses, (variable) substitutions, and most general unifiers (MGUs). Positive factoring (PF) and binary resolution (BR) are as follows, where σ is the MGU of atoms A and B: C ∨A∨B C ∨ A D ∨ ¬B PF: BR: Cσ ∨ Aσ (C ∨ D)σ A clause C is a tautology if it contains literals A and ¬A. A clause C subsumes a clause D if a substitution σ exists such that each literal in Cσ occurs in D. Furthermore, C θ-subsumes D if C subsumes D and C has no more literals than D. Clause C is redundant in a set of clauses if C is a tautology or if C is θ-subsumed by another clause in the set. The condensation of a clause C is the clause D with the least number of literals such that D ⊆ C and C subsumes D. The Description Logic ELU. We assume familiarity with standard DLs; here, we briefly recapitulate the syntax of ELU. An ELU-concept is an expression of the form >, A, C1 u C2 , C1 t C2 , or ∃R.C, where A is a concept name, C(i) are concepts, and R is a role name. An ELU-TBox T is a finite set of GCIs of the form C1 v C2 where C1 and C2 are ELU-concepts. An ELU-TBox is normalised Fit it contains only axioms of the form ∃R.A v B, A v ∃R.B, and d n m i=1 Ai v j=1 Bj , where A(i) and B(j) are either named or >. Each ELU-TBox T can be transformed in polynomial time into a normalised ELU-TBox that is a conservative extension of T . An ELU-TBox T is linear if conjunction u does not occur on the left-hand side of a GCI in T . Rules. A rule is a First-Order sentence of the form ∀x.∀z.[ϕ(x, z) → ψ(x)], where tuples of variables x and z are disjoint, ϕ(x, z) is a conjunction of atoms, and ψ(x) is a disjunction of atoms. Formula ϕ is the body of r, formula ψ is the head of r, and quantifiers in a rule are omitted for brevity. Furthermore, we often abuse notation and treat a rule and its equivalent clause as synonyms. A rule is Datalog if ψ(x) is a single atom, and it is disjunctive otherwise. A (Datalog) program is a finite set of (Datalog) rules. A program is monadic if every predicate occurring in the head of a rule is unary. Vn WmAn ELU-program consists of the following kinds of rules: (i) i=1 Ai (x) → j=1 Bj (x) and (ii) R(x, y)∧A(y) → B(x), where the atom A(y) can be omitted. For convenience, we use the unary atom >(x) to denote an empty rule body. An EL-program is an ELU-program that is also Datalog. Finally, an ELU-program is linear if conjunction does not occur on the left-hand side of a rule of type (i). Queries. An (instance) query is a unary atom Q(x). A constant c is an answer to Q(x) w.r.t. a set of FO sentences F and an ABox A if F ∪ A |= Q(c). The set of answers to Q relative to F and A is denoted as cert(Q(x), F, A). FO and Datalog Rewritings. For an ABox A, let IA the interpretation cor- responding to A in the obvious way. An FO formula ϕ(x) with one free variable is an FO-rewriting of a query Q(x) w.r.t. an ELU-TBox (or, equivalently, an ELU-program) T if the following condition holds for every constant c and ABox A: c ∈ cert(Q, T , A) iff IA |= ϕ(c). W Furthermore, we say that ϕ(x) is a UCQ- n rewriting if it is of the form ϕ(x) = i=1 ϕi (x) where each ϕi (x) is constructed using only conjuction and existential quantification. We say that Q(x) is FO- rewritable w.r.t. T if an FO-rewriting of Q(x) w.r.t. T exists. Finally, T is FO-rewritable if for each concept name A in T the query A(x) is FO-rewritable w.r.t. T . A Datalog program P is a rewriting of a query Q(x) relative to a TBox (or, equivalently, an ELU-program) T if cert(Q(x), T , A) = cert(Q(x), P, A) for every ABox A. We say that Q(x) is Datalog-rewritable w.r.t. T if a Datalog rewriting of Q(x) w.r.t. T exists. Finally, P is a rewriting of T if it is a rewriting for each query A(x) w.r.t. T , with A a concept name in T ; TBox T is Datalog- rewritable if a Datalog rewriting of T exists. As observed in [4], it follows from the homomorphism preservation theorem for finite structures [18] that each FO- rewritable query is also UCQ rewritable and hence Datalog rewritable as well. 3 Computing Rewritings via Resolution We next recapitulate the generic resolution-based technique proposed in [8], which takes a TBox T and attempts to rewrite it into a Datalog program. If T is restricted to be in ELU, this technique consists of the following two steps. Procedure 1 Compile-Horn Input: S: set of clauses Output: SH : set of Horn clauses 1: SH := {C ∈ S | C is a Horn clause and not a tautology} 2: SH := {C ∈ S | C is a non-Horn clause and not a tautology} 3: repeat 4: F:= factors of each C1 ∈ SH non-redundant in SH ∪ SH 5: R: = resolvents of each C1 ∈ SH and C2 ∈ SH ∪ SH not redundant in SH ∪ SH 6: for each C ∈ F ∪ R do 7: C 0 := the condensation of C 8: Delete from SH and SH all clauses θ-subsumed by C 0 9: if C 0 is Horn then SH := SH ∪ {C 0 } 10: else SH := SH ∪ {C 0 } 11: until F ∪ R = ∅ 12: return SH Step 1: From DLs to Disjunctive Datalog. First, the algorithm in [12] is applied to transform T into an ELU-program D that entails the same facts as T w.r.t. every ABox. By eliminating the positive occurrences of existential quantifiers, one thus reduces the problem of rewriting the DL TBox T to the problem of rewriting the disjunctive program D. Step 2: From Disjunctive Datalog to Datalog. Second, the program D is transformed into a Datalog program P using a variant of the mainstream knowledge compilation algorithms proposed in [19] for propositional clauses, and extended in [9] to First-Order clauses (but without termination guarantees). Procedure 1 summarises (a slight variation of) the technique from [9]. Roughly speaking, the procedure applies binary resolution and factoring and keeps only the consequences that are not redundant. Unlike unrestricted resolution, the procedure never resolves two Horn clauses. The main property of Procedure 1 shown in [9] is that, even if it never terminates, each Horn consequence of the input S will also eventually become entailed by SH . In [8], it was shown that Procedure 1 also enjoys the following key property: if the program D is given as input and the procedure terminates, then the output P is a Datalog rewriting of D. In essence, this result implies that compilation into Horn clauses can be done in an ABox independent way: D and P can be combined with an arbitrary ABox and still entail the same facts. Example 3.1 We use the ELU-programs D1 and D2 as running examples: D1 = { C(x) → A(x) ∨ B(x) D2 = { >(x) → A(x) ∨ B(x) R(x, y) ∧ A(y) → H(x) R(x, y) ∧ A(y) → B(x) R(x, y) ∧ B(y) → D(x) R(x, y) ∧ B(y) → A(x) } R(x, y) ∧ D(y) → H(x) } When applied to program D1 , Procedure 1 terminates with the following sets DH 1 1 and DH of disjunctive and Datalog rules, respectively: 1 1 DH ={ C(x) → A(x) ∨ B(x) DH = { R(x, y) ∧ A(y) → H(x) R(x, y) ∧ C(y) → H(x) ∨ B(y) R(x, y) ∧ B(y) → D(x) R(x, y) ∧ C(y) → D(x) ∨ A(y) R(x, y) ∧ D(y) → H(x) R(x, y) ∧ R(z, y) ∧ C(y) → H(x) ∨ D(z) } R(x, z) ∧ R(x, y) ∧ R(z, y) ∧ C(y) → H(x) } 1 The results proved in [8] imply that DH is a Datalog rewriting of D1 . In contrast, Procedure 1 does not terminate when given D2 as input. Although the mutually recursive Datalog rules in D2 are never resolved directly with each other, they can interact via the program’s (only) disjunctive rule. As a result, Procedure 1 will eventually derive each of the following (infinitely many) rules, for each even number n and for each Z ∈ {A, B}: n ^ R(xn , x0 ) ∧ R(xi , xi−1 ) → Z(xn ) i=1 4 Sufficient Conditions for Rewritability In what follows, we present two conditions on ELU-programs that ensure FO and Datalog-rewritability, respectively. Both conditions come with resolution-based algorithms for computing rewritings. To ensure termination of resolution, both our conditions apply to linear ELU-programs only. As we discuss later on in §5, termination in the non-linear case becomes much more problematic, and it is left for future work. 4.1 FO Rewritability As shown by Bienvenu et al. [3], if an instance query A(x) is FO-rewritable w.r.t. an EL-TBox, then there exists a tree-shaped UCQ that is an FO-rewriting for the given query and TBox. This property was critical to the study of FO-rewritability in the context of EL as it implies a characterisation of FO-rewritability in terms of tree-shaped ABoxes. Since only tree-shaped ABoxes are relevant, one can exploit standard tree automata machinery to study the problem. Once disjunctions come into play, however, this key property seems to break. As demonstrated by the following example, there are FO-rewritable ELU-TBoxes that do not seem to be rewritable into tree-shaped UCQs. Thus, the machinery developed in [3] for EL does not seem to extend to ELU. Example 4.1 The query H(x) is FO-rewritable w.r.t. our example program D1 . The following UCQ ϕ(x) = ϕ1 (x) ∨ ϕ2 (x) ∨ ϕ3 (x) ∨ ϕ4 (x) ∨ ϕ5 (x) can be obtained by unfolding the Datalog rewriting obtained by Procedure 1 on D1 : ϕ1 (x) = H(x) ϕ4 (x) = ∃y.∃z. R(x, y) ∧ R(x, z) ∧ R(z, y) ∧ C(y) ϕ2 (x) = ∃y. R(x, y) ∧ A(y) ϕ5 (x) = ∃y.∃z. R(x, y) ∧ R(y, z) ∧ B(z) ϕ3 (x) = ∃y. R(x, y) ∧ D(y) Clearly, ϕ4 (x) is not tree-shaped. Moreover, H(x) does not seem to have a tree- shaped UCQ-rewriting as defined in [3]. Although a linear ELU-program D that is FO-rewritable may not have a tree-shaped rewriting, linearity allows us to restrict resolution proofs in a critical way. We can show that each derivation from D via resolution and condensation only (i.e., no factoring) involves only tree-shaped rules. This implies that each non-tree rule derived by resolution and factoring is a factor of a tree-shaped rule. In the following, we define a condition on linear ELU programs that guar- antees FO-rewritability. When applied to a program D satisfying the condition (such as D1 in our running example), Procedure 1 will terminate: the size of de- rived tree-shaped rules will be limited by the size of D. Furthermore, the Datalog program obtained as output will be bounded for every predicate in the standard sense [14]. Our condition is defined as given next. Definition 4.2. The dependency graph of a linear ELU-program D is the least directed edge-labeled graph GD = (V, E, µ) satisfying the following conditions: 1. each unary predicate occurring in D is a node in V ; 2. (A, B) ∈ E ∩ µ for each rule in D of the form R(x, y) ∧ A(y) →W B(x); n 3. (A, Bi ) ∈ E with i ∈ [1, n] for each rule in D of the form A(x) → i=1 Bi (x). Edges contained in the labelling µ are called transfer edges; all remaining edges are called propositional edges. A transfer cycle is a simple cycle that contains at least one transfer edge. The program D is acyclic if GD contains no transfer cycle. Finally, the transfer depth of a unary predicate A in an acyclic program D is the maximal number of transfer edges on a path in GD ending in A. Intuitively, the maximal transfer depth of a predicate in an acyclic program es- tablishes a bound on the role depth of the tree-shaped rules that can be generated by Procedure 1. Termination of Procedure 1 is then ensured by condensation, which bounds the branching factor of rules in terms of their depth. Example 4.3 Consider the program D1 from Example 3.1. The dependency graph for D1 is given next, where transfer edges are dashed. Clearly, D1 is acyclic and predicate H has transfer depth 2. C A H B D Procedure 2 Rewrite-UCQ Input: D: a linear, acyclic ELU-program; A: a unary predicate occurring in D Output: ϕ(x): a UCQ 1: P := Compile-Horn(D) 2: P 0 := {A(x) → Q(x)} where Q is a fresh unary predicate 3: P 00 := ∅ 4: repeat 5: Select some r ∈ P 0 6: N := the resolvents of r with clauses in P that are not subsumed in P 0 ∪ P 00 7: P 0 := (P 0 \ {r}) ∪ N 8: P 00 := P 00 ∪ {r} 9: until P 0W =∅ 10: ϕ(x) := { ∃y. ψ(x, y) | ∀x. ∀y. ψ(x, y) → Q(x) ∈ P 00 } 11: return ϕ(x) Rules derived by Procedure 1 on D1 have depth at most 2 (see Example 3.1). Procedure 2 computes a UCQ rewriting of a query A(x) relative to an acyclic program D. On input D and A, we first apply Procedure 1 to compute a Datalog rewriting P of D; as already seen, termination of this first step is guaranteed. Predicate A is then unfolded in P using standard techniques, as shown in Lines (2-10); unfolding terminates iff the predicate A is bounded in P [14, 7]. To show termination of unfolding, we observe that the depth of the rules in P still cannot exceed the transfer depth of the predicates in the dependency graph of D; hence, the loop in Lines (4)-(9) only derives finitely many rules modulo subsumption. The properties of the procedure are thus as follows. Theorem 4.4. Given an acyclic ELU-program D and a unary predicate A in D, Procedure 2 terminates and returns a UCQ rewriting of A(x) w.r.t. D. Example 4.5 When applied to the predicate H and the program D1 from Ex- 1 ample 3.1, Procedure 2 first calls Procedure 1, which returns the program DH 1 1 from Example 3.1 as a Datalog rewriting of D . So, P is initialised with DH and P 0 is set to {H(x) → Q(x)}. On these values, unfolding terminates after five iterations of the main loop with the following rules in P 00 : H(x) → Q(x) R(x, z) ∧ R(x, y) ∧ R(z, y) ∧ C(y) → Q(x) R(x, y) ∧ A(y) → Q(x) R(x, y) ∧ R(y, z) ∧ B(z) → Q(x) R(x, y) ∧ D(y) → Q(x) This yields precisely the UCQ ϕ(x) given in Example 4.1. 4.2 Datalog Rewritability Our Datalog-rewritability condition relaxes the acyclicity condition from §4.1 by allowing certain kinds of cyclic programs. Intuitively, we allow programs that can be separated into two parts: one that may contain disjunctive rules but no transfer cycles, and another one that contains only Datalog rules, but may contain transfer cycles. We call such programs separable. Definition 4.6. Let D be a linear ELU-program and let D∨ be the smallest subset of D such that 1. each disjunctive rule in D is contained in D∨ ; and 2. no unary predicate in D \ D∨ occurs in the body of a rule in D∨ . Program D is separable if D∨ is acyclic. Example 4.7 Program D2 from our running example 3.1 is not FO-rewritable; 2 however, it is separable: D∨ = {>(x) → A(x) ∨ B(x)} and it is therefore acyclic as in Definition 4.2. We deal with separable programs by eliminating cycles from their Datalog part. Cycle elimination is based on the observation that every linear EL-program P can be seen as a labeled transition system TP with unary predicates as states and rules as transitions labeled by binary predicates (or > if the rule contains no binary predicate). Given a unary predicate B in P, an ABox A, and an individual b, every derivation of B(b) from P ∪ A corresponds to a path ending in B in TP ; hence, paths in TP encode resolution proofs. The set of all possible derivations of an assertion B(b) from an assertion A(a) by the rules in P corresponds to the regular language accepted by the non- deterministic finite automaton constructed from TP by taking A as the unique initial state, and B as the unique accepting state. Definition 4.8. Let D be a linear ELU-program and let hA, Bi be a pair of unary predicates occurring in D. The automaton for D relative to hA, Bi is the non-deterministic finite word automaton AutD (A, B) = hS, Γ, →, {A}, {B}i defined as follows: the set of states S consists of the unary predicates in D; the alphabet Γ is the set of binary predicates in D plus the special symbol >; the (unique) initial and final states are A and B respectively; finally, the transition relation → consists of the following transitions: – C →> D for each EL-rule C(x) → D(x) in D; – C →R D for each EL-rule R(x, y) ∧ C(y) → D(x) in D. Finally, we denote with L(AutD (A, B)) the language accepted by AutD (A, B). Example 4.9 The automaton for D2 relative to hA, Bi is given below: R start A B R The automata for hA, Ai, hB, Ai, and hB, Bi contain exactly the same states and transitions and only differ on the particular choice of initial and final state. The language L(AutD (A, B)) is regular, and hence can be described by means of some regular expression over the alphabet Γ of the automaton. We adopt the following definition of regular expressions over Γ , where α ∈ Γ and the atomic expression ∅ denotes the empty language. e ::= α | ∅ | ee | e + e | e∗ With each regular expression e over Γ we uniquely associate a (fresh) binary predicate Ne and a Datalog program Pe that defines Ne as shown next. Definition 4.10. Let D be a linear ELU-program and let e be a regular expres- sion over the binary predicates in D and the symbol >. The Datalog program Pe corresponding to e is defined inductively as follows: P∅ = ∅ P> = {>(x) → N> (x, x)} PR = {R(x, y) → NR (y, x)} Pe1 e2 = Pe1 ∪ Pe2 ∪ {Ne1 (x, y) ∧ Ne2 (y, z) → Ne1 e2 (x, z)} Pe1 +e2 = Pe1 ∪ Pe2 ∪ {Ne1 (x, y) → Ne1 +e2 (x, y), Ne2 (x, y) → Ne1 +e2 (x, y)} Pe∗ = Pe ∪ {>(x) → Ne∗ (x, x), Ne (x, y) ∧ Ne∗ (y, z) → Ne∗ (x, z)} Example 4.11 The language of the automaton AutD2 (A, B) in Example 4.9 can be captured by the regular expression ehA,Bi = R(RR)∗ . The corresponding program PehA,Bi consists of the following Datalog rules: R(x, y) → NR (y, x) (1) NR (x, y) ∧ NR (y, z) → NRR (x, z) (2) >(x) → N(RR)∗ (x, x) (3) NRR (x, y) ∧ N(RR)∗ (y, z) → N(RR)∗ (x, z) (4) NR (x, y) ∧ N(RR)∗ (y, z) → NR(RR)∗ (x, z) (5) The language for AutD2 (A, A) is captured by ehA,Ai = (RR)∗ , which is a subex- pression of ehA,Bi ; the program PehA,Ai thus consists of rules (1)-(4). Symmetry in the automaton’s transitions implies PehB,Ai = PehA,Bi and PehB,Bi = PehA,Ai . Let D be a separable program and D∨ be the disjunctive part of D. Since for each pair A, B in D \ D∨ we have that L(AutD (A, B)) can be described by some regular expression ehA,Bi , all ways in which an assertion A(a) can imply B(b) can be summarised by the single rule NehA,Bi (x, y) ∧ A(x) → B(y). Thus, the disjunctive program D0 defined as the union of D∨ , all programs PhA,Bi , and all rules NehA,Bi (x, y) ∧ A(x) → B(y) has the same Horn consequences as D. To obtain such Horn consequences, we could thus apply Procedure 1 to D0 ; however, to ensure termination we do the following instead: Procedure 3 Rewrite-Datalog Input: D: a separable ELU-program; Output: P: a Datalog program 1: Ξ(D), Ω(D) := ∅ 2: for each pair of unary predicates hA, Bi do 3: AutD (A, B) := finite word automaton for D relative to hA, Bi 4: Construct a regular expression ehA,Bi equivalent to AutD (A, B) 5: Construct Datalog program PehA,Bi and 6: Ξ(D) := Ξ(D) ∪ PehA,Bi 7: Ω(D) := Ω(D) ∪ {RehA,Bi ∧ A(x) → QB (y)} 8: P := Compile-Horn(D∨ ∪ Ω(D)) 9: P := P ∪ Ξ(D) 10: for each unary predicate A in D, replace QA with A in P 11: return P – in each rule NehA,Bi (x, y) ∧ A(x) → B(y) in D0 , we replace predicate B with a fresh predicate QB , which ensures acyclicity of the resulting program; and – we apply Procedure 1 only to the monadic rules in D0 , thus ignoring programs PhA,Bi for the purpose of resolution. The algorithm based on these ideas is given by Procedure 3. In Lines (1)-(7), the procedure computes the following intermediate programs, where A, B range over the unary predicates in the input: [ [ Ω(D) = {NehA,Bi (x, y) ∧ A(x) → QB (y)} Ξ(D) = PehA,Bi hA,Bi hA,Bi In Line (8), our algorithm invokes Procedure 1 to compute the Horn consequences of D∨ ∪ Ω(D). The following lemma establishes termination of this step. Lemma 4.12. Let D be a linear ELU-program that is separable. Then, Proce- dure 1 terminates on D∨ ∪ Ω(D). The following lemma shows that no Horn consequence is lost by applying Procedure 1 to D∨ ∪ Ω(D) only, and appending the non-monadic program Ξ(D) only afterwards. Lemma 4.13. Let D be a separable ELU-program. Let P be the union of Ξ(D) and the output of Procedure 1 on D∨ ∪ Ω(D). Then, for every query A(x) and every ABox A, we have cert(A(x), D, A) = cert(QA (x), P, A). Finally, in order to obtain a proper rewriting, in Line (10) we eliminate the auxiliary unary predicates QA before returning the output. Theorem 4.14. On input a separable ELU-program D, Procedure 3 returns a Datalog rewriting of D. Example 4.15 For our example program D2 , we have the following: [ Ξ(D2 ) = PehA,Bi = PR(RR)∗ = {(1), (2), (3), (4), (5)} hA,Bi 2 Ω(D ) = {NR(RR)∗ (x, y) ∧ A(x) → QB (y); NR(RR)∗ (x, y) ∧ B(x) → QA (y); N (RR)∗ (x, y) ∧ A(x) → QA (y); N(RR)∗ (x, y) ∧ B(x) → QB (y)} 2 When applied to D∨ ∪ Ω(D2 ), Procedure 1 computes the following additional Datalog rules: NR(RR)∗ (x, y) ∧ N(RR)∗ (x, y) → QA (y) (6) NR(RR)∗ (x, y) ∧ N(RR)∗ (x, y) → QB (y) (7) Procedure 3 finally returns a Datalog program P consisting of rules Ξ(D2 ), Ω(D2 ), as well as rules (6)-(7) with predicates QA and QB replaced with A and B, respectively. Program P is a rewriting of D2 . 5 Discussion and Future Work In this paper, we have presented two sufficient conditions for FO and Datalog rewritability of instance queries w.r.t. ELU-ontologies. Our results are still rather preliminary, and we are currently working on extending them in several ways. First, we are confident that our sufficient conditions can be extended to cover also non-linear ELU-programs, as well as more expressive DLs such as ALCHI. Devising a resolution-based rewriting algorithm in such extended setting be- comes, however, a much more challenging task. Example 5.1 Consider the nonlinear ELU-program D3 with the following rules: >(x) → A(x) ∨ B(x) R(x, y) ∧ A(y) → D(x) B(x) → A(x) D(x) → H1 (x) ∨ H2 (x) D(x) ∧ E(x) → H(x) D(x) → E(x) This program is FO-rewritable: the non-linear rule D(x) ∧ E(x) → H(x) can be replaced with D(x) → H(x) to obtain an equivalent linear program that satisfies our sufficient condition in Section 4.1. Procedure 1, however, does not terminate when given D3 as input; in particular, Procedure 1 will compute the following infinite family of disjunctive rules for each n > 1: " n # " n # ^ _ R(yk , xk ) ∧ R(yk−1 , xk ) ∧ D(yn ) → H(yk ) ∨ H1 (y0 ) ∨ H2 (y0 ) k=1 k=1 Second, it would be interesting to devise a resolution-based decision procedure for FO and Datalog rewritability for ALC ontologies; this is possible, since de- cidability has been recently shown. Finally, we are planning to implement our resolution algorithms and test them in practice. References 1. Acciarri, A., Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., Palmieri, M., Rosati, R.: QuOnto: Querying ontologies. In: AAAI. pp. 1670–1671 (2005) 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and relations. J. Artif. Intell. Res. 36, 1–69 (2009) 3. Bienvenu, M., Lutz, C., Wolter, F.: Deciding FO-rewritability in EL. In: DL. CEUR Workshop Proceedings, vol. 846 (2012) 4. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive Datalog, CSP, and MMSNP. In: ACM PODS (2013), arXiv:1301.6479 5. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning 39(3), 385–429 (2007) 6. Chortaras, A., Trivela, D., Stamou, G.: Optimized query rewriting in OWL 2 QL. In: CADE. pp. 192–206 (2011) 7. Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimiza- tion problems for database logic programs: Preliminary report. In: Simon, J. (ed.) STOC ’88. pp. 477–490. ACM (1988) 8. Cuenca Grau, B., Motik, B., Stoilos, G., Horrocks, I.: Computing datalog rewritings beyond Horn ontologies. In: IJCAI (2013), arXiv:1304.1402 9. Del Val, A.: First order LUB approximations: Characterization and algorithms. Artif. Intell. 162(1-2), 7–48 (2005) 10. Eiter, T., Ortiz, M., Šimkus, M., Tran, T.K., Xiao, G.: Query rewriting for Horn- SHIQ plus rules. In: AAAI. pp. 726–733 (2012) 11. Hustadt, U., Motik, B., Sattler, U.: Data complexity of reasoning in very expressive description logics. In: IJCAI. pp. 466–471 (2005) 12. Hustadt, U., Motik, B., Sattler, U.: Reasoning in Description Logics by a Reduction to Disjunctive Datalog. J. Autom. Reasoning 39(3), 351–384 (2007) 13. Kikot, S., Zolin, E.: Modal definability of first-order formulas with free variables and query answering. J. Appl. Log. 11(2), 190–216 (2013) 14. Naughton, J.F.: Data independent recursion in deductive databases. In: Silber- schatz, A. (ed.) PODS ’86. pp. 267–279. ACM (1986) 15. Pérez-Urbina, H., Motik, B., Horrocks, I.: Tractable query answering and rewriting under description logic constraints. J. Appl. Log. 8(2), 186–209 (2010) 16. Rodriguez-Muro, M., Calvanese, D.: High performance query answering over DL- Lite ontologies. In: KR. pp. 308–318 (2012) 17. Rosati, R., Almatelli, A.: Improving query answering over DL-Lite ontologies. In: KR. pp. 290–300 (2010) 18. Rossman, B.: Homomorphism preservation theorems. J. ACM 55(3) (2008) 19. Selman, B., Kautz, H.: Knowledge compilation and theory approximation. J. ACM 43(2), 193–224 (1996) 20. Stocker, M., Smith, M.: Owlgres: A scalable OWL reasoner. In: OWLED. CEUR Workshop Proceedings, vol. 432 (2008) A Proofs for Section 4.1 (FO Rewritability) Given a disjunctive program D, we write Σ(D) for the set of all predicate symbols in D. We call Σ(D) the signature of D. If D is an ELU-program, we partition Σ(D) into Σc (D), the set of all unary predicates (concept names) in D, and Σr (D), the set of all binary predicates (role names) in D. We write CloR (D) for the set of all rules derivable from D by binary reso- lution, CloF (D) for the set of all rules derivable from D by positive factoring, and CloRF (D) for the set of all rules derivable from D by binary resolution and positive factoring. Let x be a sequence of variables and let ϕ be a formula. We write x|yz and ϕ|yz for the result of substituting y for z in x and ϕ, respectively. We write FV(ϕ) for the set of variables that occur free in ϕ. Since we only consider formulas that are conjunctions or disjunctions of atoms, we can also view them as sets of atoms, writing A ∈ ϕ for “A occurs in ϕ”. The notation A ∈ r for a rule r is defined analogously. Proposition A.1. Let W D be a linear ELU-program. Every rule in CloR (D) has n the form A(x0 ) ∧ ϕ → i=1 Bi (xi ) where ϕ is a conjunction of binary atoms such that {x0 , . . . , xn } ⊆ FV(ϕ). Wn Given a rule r of the form A(x0 ) ∧ ϕ → i=1 Bi (xi ), where ϕ is a conjunction of binary atoms such that {x0 , . . . , xn } ⊆ FV(ϕ), we define: Gr := (FV(ϕ), { (x, y) | ∃R : R(y, x) ∈ ϕ }) We call r tree-shaped if the following conditions are satisfied: 1. Gr is a tree with x0 as its root and x1 , . . . , xn as its leaves. 2. If {R1 (x, y), R2 (x, y)} ⊆ ϕ, then R1 = R2 . The depth of r is defined as the depth of Gr . Lemma A.2. Let D be a linear ELU-program. Every rule in CloR (D) is tree- shaped. Proof. By induction on the derivation of r ∈ CloR (D). Clearly, all rules Wmin D are tree-shaped. So, w.l.o.g.,Wlet r be a resolvent or r1 = A(x) ∧ ϕ1 → i=1 Bi (xi ) n and r2 = B1 (y) ∧ ϕ2 → i=1 Ci (yi ) on B1 (x1 ) and B1 (y), respectively. By the inductive hypothesis, r1 and r2 are tree-shaped. Since FV(ϕ1 ) and FV(ϕ2 ) are assumed to be disjoint and the tree corresponding to r2 is rooted at y, the graph (FV(ϕ1 ∧ (ϕ2 |xy 1 )), { (x, y) | ∃R : R(y, x) ∈ ϕ1 ∧ (ϕ2 |xy 1 ) }) is a tree rooted at x that has x2 , . . . , xm , y1 , . . . , yn , as its leaves. Moreover, since y has no incoming edge in Gr2 , we have R(x1 , z) ∈ ϕ1 ∧ (ϕ2 |xy 1 ) if and only if R(x1 , z) ∈ ϕ1 (for every variable z). The claim follows. t u Lemma A.3. Let D be a linear, acyclic ELU-program and let n be the maximal transfer depth of a unary predicate in D. Then each rule in CloR (D) has depth at most n. Wn Proof. Let r = A(x) ∧ ϕ → i=1 Bi (xi ) ∈ CloR (D) By Lemma A.2, it suffices to show that, for every i ∈ [1, n], the length of the (unique) path from x to xi in Gr is equal to the difference between the transfer depth of A and the transfer depth of Bi . This follows by a straightforward induction on the derivation of r from D. t u Definition A.4 (n-Simulation). Let r = A(x0 )∧ϕ → ψ be a tree-shaped rule. We define an inductive family of preorders (nr )n∈N between variables in r as follows: x 0r y :⇔ { B | B(x) ∈ ψ } ⊆ { B | B(y) ∈ ψ } x n+1 r y :⇔ x 0r y and for every R and x0 such that R(x0 , x) ∈ ϕ there is some y 0 such that R(y 0 , y) ∈ ϕ and x0 nr y 0 We say x is n-simulated by y in r if x nr y. We define [x]nr := { y | x nr y and y nr x }. Lemma A.5. Let r be a tree-shaped rule of depth m and let x, y, z be variables such that, for some R, R(x, z), R(y, z) ∈ r and x nr y for some n ≥ m. There is a substitution σ such that xσ = y, rσ is tree-shaped, and rσ ⊆ r. Proof. The claim follows if we can show that for every tree-shaped rule r of depth m and every pair x, y such that x m r y, there is a substitution σ such that xσ = y and for every pair u, v of variables in the subtree of Gr rooted at x we have: 1. uσ, vσ are variables in the subtree of Gr rooted at y. 2. For every variable u not in the subtree of Gr rooted at x: uσ = u. 3. For every unary predicate B: if B(u) ∈ r, then B(u)σ ∈ r. 4. For every binary predicate R0 : if R0 (u, v) ∈ r, then R0 (u, v)σ ∈ r. We prove the claim by induction on m. If m = 0, x and y have no successors in Gr . Therefore, it suffices to show (1-3). Let σ map x to y and every other variable in r to itself. Thus, (1) and (2) are trivial. For (3), observe that, since x 0r y, we have B(x)σ = B(y) ∈ r whenever B(x) ∈ r. Now suppose m > 0. Let R1 , . . . , Rn , x1 , . . . , xn be all the predicate symbols and variables such that Ri (xi , x) ∈ r (i ∈ [1, n]). Since x m r y, there are y1 , . . . , yn such that Ri (yi , y) ∈ r and x rm−1 y. By the inductive hypothesis, there are substitutions σ1 , . . . , σn such that, for every i ∈ [1, n]: xi σi = yi and σi satisfies (1-4) for the subtree of Gr rooted at xi . Let σ be defined such that: – xσ = y. – For every i ∈ [1, n] and every variable u in the subtree of xi : uσ = uσi . – For every other variable u: uσ = u. Note that σ is well-defined since the subtrees rooted at x1 , . . . , xn are pairwise disjoint and do not contain x (which, in turn, holds since r is tree-shaped). Clearly, σ satisfies (1) and (2). For the subtrees of Gr rooted at x1 , . . . , xn , (3) and (4) hold since σ extends σ1 , . . . , σn . For x, (3) holds since x 0r y, and (4) follows since we have Ri (x, xi )σ = Ri (y, yi ) for every i ∈ [1, n]. t u Proposition A.6. Let D be a monadic disjunctive program and let f : N → N be a function defined as follows: f (0) := 2|Σc (D)| f (n + 1) := 2|Σc (D)| · |Σr (D)| · 2f (n) Then [x]nr ≤ f (n) for every rule r over Σ(D) and every variable x in r. A rule r is condensed if r has no condensation that is smaller than r. By Proposition A.6 and Lemma A.5 we obtain: Lemma A.7. Let D be an ELU-program. The size of condensed tree-shaped rules of depth n over Σ(D) is bounded in n and |Σ(D)|. Proof. Let D be an ELU-program and let f be defined as in Proposition A.6. It suffices to show that the number of variables in every condensed tree-shaped rule of depth n is bounded by n · (|Σr (D)| · f (n))n . Note that every tree T of depth n contains at most n · mn nodes, where m is the maximal outdegree of a node in T . Therefore, every tree pof depth n that has k nodes must contain a node that has outdegree at least n k/n. Suppose, for contradiction, r is a condensed tree-shaped rule of depth n that mentions more than n · (|Σr (D)| · f (n))n variables. By the above observation, Gr must have a node z that has outdegree greater |Σr (D)| · f (n). Consequently, by Proposition A.6, there is some predicate R and two distinct variables x, y such that R(x, z), R(y, z) ∈ r and [x]nr = [y]nr . By Lemma A.5, there is a substitution σ such that xσ = y and rσ ⊆ r. Since x and y are distinct, it follows that rσ ( r, contradicting the assumption that r is condensed. t u Lemma A.8. Every condensation of a tree-shaped rule is tree-shaped. Proof. Let r be a tree-shaped rule and rσ a condensation of r. Since rσ is a subclause of r, Grσ must be a forest, the root of Gr must be a source in Grσ , and all leaves of Gr must be sinks in Grσ . Since rσ is a substitution instance of r and Gr is connected, so must be Grσ . Hence, Grσ is a tree. The claim follows. t u Lemma A.9. Unrestricted binary resolution with condensation terminates on linear, acyclic ELU-programs. Moreover, the size of every rule derivable by binary resolution with condensation from an acyclic ELU-program D is bounded in the size of D. Proof. Let D be a linear, acyclic ELU-program. By Lemma A.2, Lemma A.5, and Lemma A.8, every rule derivable by resolution with condensation is tree-shaped. By Lemma A.3 and Lemma A.7, the size and hence the number of condensed tree-shaped rules is bounded in |Σ(D)|. The claims follow. t u Lemma A.10 ([8] Theorem 12, Lemma 19). Let D be a disjunctive pro- gram, let A be an ABox, and let C be a Horn clause such that D ∪ A |= C. i Let DH be the Datalog program computed by Procedure 1 after i iterations of the n main loop. Then there is some n such that DH ∪ A |= C. Theorem A.11. Let D be a linear, acyclic ELU-program. Procedure 1 termi- nates on D and returns a Datalog rewriting of D. Proof. To show termination of Procedure 1, it suffices to bound the size of every rule derived from D by resolution with condensation and factoring. By Lemma A.9, the size of every rule derived from D by resolution with conden- sation but no factoring is bounded in |Σ(D)|. Thus, it suffices to show that for every rule r obtained from D by resolution with condensation and factoring there is a (not necessarily strictly) larger rule r0 obtained from D by resolution with condensation but without factoring such that r is subsumed by r0 (where clauses are viewed as sets of atoms). This follows by a straightforward induction on the derivation of r. By Lemma A.10, the output of Procedure 1 is a Datalog rewriting of D. t u Wn Let D be a linear, acyclic ELU-program and let r = A(x) ∧ ϕ → i=1 Bi (xi ) ∈ CloR (D). The proof of Lemma A.3 exhibits that the length of the path from x to xi in Gr is equal to the difference between the transfer depth of A and the transfer depth of Bi . In particular, if Bi = Bj for some 1 ≤ i < j ≤ n, then xi and j must have the same distance from x in Gr . Thus, while factoring can destroy the tree structure of Gr , it does not change the distance between x and xi in Gr . The same is true for condensation. Thus, we have: Proposition A.12. Let D be a linear, acyclic ELU-program, let PWbe a Datalog n rewriting of D computed by Procedure 1, and let r = A(x) ∧ ϕ → i=1 Bi (xi ) ∈ P. Then for every i ∈ [1, n], the distance between x and xi in Gr is equal to the difference between the transfer depth of A and the transfer depth of Bi . Thus, given an ELU-program D, its Datalog rewriting P, and a rule r ∈ P, the depth of Gr is still bounded by the maximal transfer depth of a predicate in D and cannot be further increased by resolution. Theorem 4.4. Given an acyclic ELU-program D and a unary predicate A in D, Procedure 2 terminates and returns a UCQ rewriting of A(x) w.r.t. D. Proof. By Theorem A.11, the call to Procedure 1 (Compile-Horn) terminates and returns a Datalog rewriting of D. It is easily seen that the main loop of the procedure computes the A-expansions of D as defined in [7] (similarly to the algorithm in [14]). Therefore, the procedure terminates and returns a UCQ- rewriting of A(x) w.r.t. D provided the resolution strategy implemented in the main loop terminates. Suppose, for contradiction, this is not the case. Then for every n ∈ N, P 00 must eventually contain a rule r such that Gr contains more than n nodes and r is not subsumed by any clause previously added to P 00 . By Proposition A.12, the depth of every rule in CloR (P ∪ {A(x) → Q(x)}) is bounded in D. So, let d be the corresponding p bound. It follows that Gr must contain a variable x with outdegree at least d n/d. For large enough n, this means there is a rule r ∈ P and rules r1 = B(y) ∧ ϕ1 → Q(x), r2 = B(y) ∧ ϕ2 → Q(x) ∈ P 0 , r10 , r20 such that r10 is a resolvent of r with r1 , r2 is derived from r10 (and hence ϕ1 ⊆ ϕ2 ), and r20 is a resolvent of r with r2 . It is easily seen that r10 subsumes r20 . Moreover, since r20 is derived from r10 , r10 is added to P 00 before r20 , in contradiction to our assumption. t u B Proofs for Section 4.2 (Datalog Rewritability) Given a separable ELU-program D, we write D∨ for the fragment of D as used in Definition 4.6 and D∨¯ for D \ D∨ . Lemma 4.12. Let D be a linear ELU-program that is separable. Then, Proce- dure 1 terminates on D∨ ∪ Ω(D). Proof. The claim holds since D∨ ∪ Ω(D) is acyclic, which is the case since D∨ is acyclic and every rule added by Ω(D) leads to a sink. t u For the proof of Lemma 4.13, we need to generalise the notion of annotations from how they are defined in §4.2. Let x, y, z be finite sequences of variables. Annotations are now triples of the form (x, S, y), where S is a set of binary atoms whose free variables are contained in x, y, z. Let α = (x, S, y) be an annotation and let A = A1 , . . . , A|x| , and B = B1 , . . . , B|y| be finite sequences of unary predicates. We define the following notation for disjunctive rules:   |x| |y| ! ^ ^ _ AαB := ∀xyz.  Ai (xi ) ∧  C → Bj (yj ) i=1 C∈S j=1 For linear rules, the notation reduces to AαB, while for Datalog rules, we have AαB, where A, B are single predicate symbols. If AαB is an ELU-rule, α must . . x}) for some n ≥ 1. be of the form (x, {R(y, x)}, y) or (x, ∅, |x .{z n Let α = (x, S, y), β = (x0 , S 0 , y 0 ) be annotations such that FV(α) ∩ FV(β) = ∅, and let i ∈ [1, |y|], j ∈ [1, |x0 |]. We define: α ◦ij β := (xx01 . . . x0j−1 x0j+1 . . . x0|x0 | , S ∪ (S 0 |yxi0 ), y1 . . . yi−1 yi+1 . . . y|y| (y 0 |yxi0 )) j j We call α◦ij β the composition of α at position i to β at position j. We abbreviate α ◦1j β as α ◦j β, α ◦i1 β as α ◦i β, and α ◦11 β as α ◦ β. Proposition B.1. Let α = (xα , Sα , y α ), β = (xβ , Sβ , y β ), γ = (xγ , Sγ , y γ ) be annotations and let i, i0 ∈ [1, |y α |], j ∈ [1, |xβ |], k ∈ [1, |y β |], l ∈ [1, |xγ |]. Then: k+|y |−1 1. α ◦ij (β ◦kl γ) = (α ◦ij β) ◦l α γ. 0 0 0 i i −1 2. If i < i , then (α ◦j β) ◦l γ ≡a (α ◦il γ) ◦ij β. Let r = AαB, r0 = A0 α0 B 0 be two rules such that Bi = A0j and FV(α) ∩ FV(β) = ∅. We write r +ij r0 for the resolvent of r and r0 on Bi (yi ) and A0j (x0j ), respectively. We abbreviate r +1j r0 as r +j r0 , r +i1 r0 as r +i r0 , and r +11 r0 as r + r0 . Resolution naturally corresponds to composition of annotations: Proposition B.2. Let r = AαB, r0 = A0 α0 B 0 be linear rules such that Bi = A0 and FV(α) ∩ FV(α0 ) = ∅. Then: 1. r +i r0 = A(α ◦i α0 )B1 . . . Bi−1 Bi+1 . . . B|B| B 0 . 2. If r is Datalog, then r + r0 = A(α ◦ α0 )B 0 . We consider two rules r, r0 to be the same (and write r = r0 ) if they are identical up to renaming of bound variables and reordering of literals. The cor- responding equivalence on annotations can be defined as the least equivalence relation ≡a such that: 1. (x1 . . . xm , S, y1 . . . yn ) ≡a (x1 . . . xi−1 xj xi+1 . . . xj−1 xi xj+1 . . . xm , S, y1 . . . yk−1 yl yk+1 . . . yl−1 yk yl+1 . . . yn ) for all 1 ≤ i ≤ j ≤ m and 1 ≤ k ≤ l ≤ n, 2. (x, S, y) ≡a (x|uv , S|uv , y|uv ) for every u ∈ / x ∪ y ∪ FV(S). Proposition B.3. Two rules of the form AαB, AβB are identical up to re- naming of bound variables and reordering of literals if and only if α ≡a β. We assume ◦ij and +ij to be left-associative, writing r1 +ij11 r2 +ij22 r3 for (r1 +ij11 r2 ) +ij22 r3 and α1 ◦ij11 α2 ◦ij22 α3 for (α1 ◦ij11 α2 ) ◦ij22 α3 . As corollaries of Proposition B.1, we obtain: Corollary B.4. Let r = AαB, r0 , r00 be linear rules and i, j indices such that r +i (r0 +j r00 ) is a resolvent of r, r0 and r00 . Then: 1. r +i (r0 +j r00 ) = r +i r0 +j+|B|−1 r00 . 2. If r is Datalog, then r + (r0 +j r00 ) = r + r0 +j r00 . Corollary B.5. Let r = AαB, r0 , r00 be linear rules and i, j indices such that i < j and r +j r00 +i r0 is a resolvent of r, r0 , and r00 . Then r +i r0 +j−1 r00 = r +j r00 +i r0 . Lemma B.6. Let D be a separable ELU-program, let P be a Datalog program such that Σ(P) ∩ Σc (D) = ∅ and no rule in P resolves with a rule in D∨ , and let r ∈ CloR (D ∪ P) be linear. Then either r ∈ CloR (D∨¯ ∪ P) or, for some n ≥ 0, we have r = r∨ + r1 + . . . + rn where: – r∨ = AαB ∈ CloR (D∨ ) where |B| ≥ n. – {r1 , . . . , rn } ⊆ CloR (D∨¯ ∪ P). Proof. Let r ∈ CloR (D ∪ P) \ CloR (D∨¯ ∪ P) we show that r can be decomposed as claimed by induction on the derivation of r ∈ CloR (D ∪ P). If r ∈ D, the claim is true for n = 0. Otherwise, let r = r0 +ij r00 . Since r ∈/ CloR (D∨¯ ∪ P), we have {r0 , r00 } 6⊆ CloR (D∨¯ ∪ P). Therefore, since no rule in P resolves with D∨ and no unary predicate in D∨¯ occurs in the body of a rule in D∨ , we must have r0 ∈ CloR (D ∪ P) \ CloR (D∨¯ ∪ P). Hence, by the inductive hypothesis, we have r0 = r∨0 + r10 + . . . + rm0 0 where r∨ = A0 α0 B 0 ∈ CloR (D∨ ) and {r10 , . . . , rm 0 } ⊆ 00 CloR (D∨¯ ∪ P). Since r is linear and Σ(P) ∩ Σ(D∨ ) = ∅, r must be linear, and hence j = 1. We distinguish two cases: 1. r00 ∈ CloR (D∨¯ ∪P). Then either i ∈ [m+1, n] or i ∈ [1, m]. In the former case, the claim is immediate. In the latter case, the claim follows by Corollary B.4. 2. r00 ∈ CloR (D ∪ P) \ CloR (D∨¯ ∪ P). Then r00 = r∨ 00 + r100 + . . . + rk00 where 00 00 00 00 00 00 r∨ = A α B ∈ CloR (D∨ ) and {r1 , . . . , rk } ⊆ CloR (D∨¯ ∪ P). Moreover, n = m + k an |B| = |B 0 | + |B 00 |. Since no unary predicate in D∨¯ occurs in the body of a rule in D∨ , we have i ∈ [m + 1, n]. W.l.o.g., let i = 1 (we can always achieve this by reordering the literals in r00 ). Then 0 r = r∨ + r10 + . . . + rm 0 00 + (r∨ + r100 + . . . + rk00 ) 0 0 0 0 = r∨ + r10 + . . . + rm 0 00 + r∨ +|B | r100 +|B | . . . +|B | rk00 0 0 0 0 00 = (r∨ +m+1 r∨ ) + r10 + . . . + rm 0 +|B | r100 +|B | . . . +|B | rk00 by Corollary B.4 and Corollary B.5. Therefore, we have r = r∨ + r10 + . . . + 0 rm + r100 + . . . + rk00 where r∨ is obtained from r∨ 0 00 +m+1 r∨ by moving the last 00 |B | literals right after the first m literals. t u As shown in §4.2, every linear EL-program P can be seen as a labeled transition system (i.e., automaton without initial and final states). Every rule AαB ∈ P is seen as a transition from A to B labeled with α. This view yields a natural notion of a path between two unary predicates in P as a sequence of “connected” rules. Proposition B.7. Let P be a linear EL-program and let r be a rule. Then r ∈ CloR (P) iff there is a path r1 , . . . , rn (n ≥ 1) in P such that r = r1 + . . .+ rn . Corollary B.8. Let P be a linear EL-program and let r = AαB be a rule where A 6= B. Then r ∈ CloR (P) if and only if there is a word α1 . . . αn ∈ L(AutP (A, B)) such that α = α1 ◦ . . . ◦ αn . Given an annotation V α = (u, S, v) and variables x, y ∈ / FV(S), we write α(x, y) for the formula C∈S (C|xyuv ). If S is empty, the notation is only defined if x = y and stands for >(x). With this, we can generalise the definition of the Datalog program Pe for a regular expression e over annotations (see §4.2) as follows: P∅ = ∅ Pα = {α(x, y) → Nα (x, y)} Pe1 e2 = Pe1 ∪ Pe2 ∪ {Ne1 (x, y) ∧ Ne2 (y, z) → Ne1 e2 (x, z)} Pe1 +e2 = Pe1 ∪ Pe2 ∪ {Ne1 (x, y) → Ne1 +e2 (x, y), Ne2 (x, y) → Ne1 +e2 (x, y)} Pe∗ = Pe ∪ {>(x) → Ne∗ (x, x), Ne (x, y) ∧ Ne∗ (y, z) → Ne∗ (x, z)} Proposition B.9. Let P be a linear EL-program, let e be a regular expression over annotations in P, and let L(e) be the language represented by e. Then for every sequence α1 , . . . , αn (n ≥ 1) of annotations in P: α1 . . . αn ∈ L(e) iff (α1 ◦ . . . ◦ αn )(x, y) → Ne (x, y) ∈ CloR (Pe ) Proposition B.10. Let P be a linear EL-program and let e1 , . . . , en be regular expressions over rule annotations in P. Then Pe1 ∪ · · · ∪ Pen is a conservative extension of Pe1 in the sense that for every conjunction of atoms ϕ: ϕ → Ne1 (x, y) ∈ CloR (Pe1 ) iff ϕ → Ne1 (x, y) ∈ CloR (Pe1 ∪ · · · ∪ Pen ) Given a disjunctive program D, we write CloSR (D), CloSF (D), and CloSRF (D) for the set of all rules subsumed in CloR (D), CloF (D), and CloRF (D), respectively. Lemma B.11. Let D be an ELU-program such that CloR (D) = D and let R be a Datalog program such that Σc (D) ∩ Σ(R) = ∅. Then: ψ ∧ ϕ0 → χ is subsumed in D,   S S CloR (D ∪ R) = CloR (R) ∪ ψ ∧ ϕ → χ ϕ → ϕ0 ∈ CloSR (R) where ψ denotes conjunctions of unary atoms, χ denotes disjunctions of unary atoms, and ϕ, ϕ0 denote conjunctions of binary atoms. Proof. The inclusion from left to right is immediate. For the other inclusion, suppose r = ψ ∧ ϕ → χ ∈ CloR (D ∪ R) \ CloR (R). We show the existence of a rule ψ ∧ ϕ0 → χ ∈ D such that ϕ → ϕ0 ∈ CloSR (R) by induction on the derivation of r ∈ CloR (D ∪ R). If r ∈ D, the claim is immediate, so suppose r is obtained by resolution from r1 , r2 ∈ CloR (D ∪ R). Since r ∈ / CloR (R), {r1 , r2 } 6⊆ CloR (R). We prove the claim for {r1 , r2 } ∩ CloR (R) = ∅ (the other case is similar but simpler). Let r1 = ψ1 ∧ ϕ1 → χ1 and r2 = ψ2 ∧ ϕ2 → χ2 . Since r is a resolvent of r1 and r2 , we have ϕ = ϕ1 ∧ ϕ2 . By the inductive hypothesis, there are rules r10 = ψ1 ∧ ϕ01 → χ1 and r20 = ψ2 ∧ ϕ02 → χ2 such that ϕ1 → ϕ01 ∈ CloSR (R) and ϕ2 → ϕ02 ∈ CloSR (R). Then r0 = ψ ∧ ϕ01 ∧ ϕ02 → χ is a resolvent of r10 and r20 and hence r0 is subsumed by a clause in D. Since ϕ1 → ϕ01 ∈ CloSR (R) and ϕ2 → ϕ02 ∈ CloSR (R), we have ϕ1 ∧ ϕ2 → ϕ01 ∧ ϕ02 ∈ CloSR (R). The claim follows. t u Similarly, we have: Lemma B.12. Let D be an ELU-program such that CloRF (D) = D and let R be a Datalog program such that Σc (D) ∩ Σ(R) = ∅. Then: ψ ∧ ϕ0 → χ is subsumed in D,   CloSRF (D ∪ R) = CloSR (R) ∪ ψ ∧ ϕ → χ ϕ → ϕ0 ∈ CloSR (R) Proof. Proceeds analogously to the proof of Lemma B.11 with the additional observation that CloR (R) = CloRF (R) since R is a Datalog program. t u Given a disjunctive program D, we write DH for the set of all Datalog (or Horn) rules in D. Lemma B.13. Let D be an ELU-program, let R be a Datalog program such that Σc (D) ∩ Σ(R) = ∅, and let r be a Datalog rule. Then: r ∈ CloSRF (D ∪ R) iff r ∈ CloSR (CloRF (D)H ∪ R) Proof. The direction from left to right holds since CloR (CloRF (D)H ∪ R) ⊆ CloRF (D ∪ R). For the other direction, suppose r ∈ CloRF (D ∪ R). Clearly, CloRF (D∪R) ⊆ CloRF (CloRF (D)∪R). Therefore, by Lemma B.12, r ∈ CloSR (R)∪ { ψ ∧ ϕ → χ | ψ ∧ ϕ0 → χ ∈ CloSRF (D), ϕ → ϕ0 ∈ CloSR (R) }. On the other hand, since CloRF (CloRF (D)H ) = CloRF (D)H , we have CloSR (CloRF (D)H ∪ R) = CloSR (R) ∪ { ψ ∧ ϕ → χ | ψ ∧ ϕ0 → χ ∈ CloSRF (D)H , ϕ → ϕ0 ∈ CloSR (R) }. The claim follows since r is Datalog. t u Lemma B.14. Let P be a linear EL-program, let r = AαB be a rule where A 6= B, and let A ∈ S ⊆ Σc (P). Then: r ∈ CloSR (P) iff r ∈ CloSR ({A(x) ∧ NehA,Bi (x, y) → B(y)} ∪ PehA,Bi ) Proof. Let r0 = A(x) ∧ NehA,Bi (x, y) → B(y). For the direction from left to right, suppose r ∈ CloR (P). By Corollary B.8 and Proposition B.9, r00 = α(x, y) → NehA,Bi (x, y) ∈ CloR (PehA,Bi ). The claim follows since r is a resolvent of r0 and r00 . For the direction from right to left, suppose r ∈ CloR ({r0 } ∪ PehA,Bi ). Since CloRF ({r0 }) = {r0 } and Σ(PehA,Bi ) ∩ {A, B} = ∅, we have r ∈ { A(x) ∧ ϕ → B(y) | ϕ → NehA,Bi (x, y) ∈ CloSR (PehA,Bi ) } (by Lemma B.12), and thus r00 ∈ CloSR (PehA,Bi ). The claim follows by Proposition B.9 and Corollary B.8 (it is easily seen that the equivalence AαB ∈ CloR (P) ⇔ r00 ∈ CloR (PehA,Bi ) implies AαB ∈ CloSR (P) ⇔ r00 ∈ CloSR (PehA,Bi )). t u Proposition B.15. Let r1 , r2 be tree-shaped rules, r10 a positive factor of r1 , and r30 a resolvent of r10 and r2 . Then there is a rule r3 such that: 1. r3 can be obtained by resolution from r1 and r2 in at most two steps. 2. r30 is equivalent, up to mutual subsumption, to a positive factor of r3 . Lemma B.16. For every linear ELU-program D: CloSRF (D) = CloSF (CloR (D)). Proof. Clearly, CloF (CloR (D)) ⊆ CloRF (D). The other inclusion follows with Proposition B.15 by induction on the derivation of r ∈ CloRF (D). t u Lemma B.17. Let D be a linear ELU-program and let R be a Datalog program such that Σc (D) ∩ Σ(R) = ∅. Then CloSRF (D ∪ R) = CloSF (CloR (D ∪ R)). Proof. Clearly, CloF (CloR (D ∪ R)) ⊆ CloRF (D ∪ R). So, let r ∈ CloRF (D ∪ R) = CloRF (CloRF (D) ∪ R) ⊆ CloSR (R) ∪ { ψ ∧ ϕ → χ | ψ ∧ ϕ0 → χ ∈ CloSRF (D), ϕ → ϕ0 ∈ CloSR (R) } (by Lemma B.12). By Lemma B.16, r ∈ CloSR (R) ∪ { ψ ∧ ϕ → χ | ψ ∧ ϕ0 → χ ∈ CloSF (CloR (D)), ϕ → ϕ0 ∈ CloSR (R) } = CloSF (CloR (R) ∪ { ψ ∧ ϕ → χ | ψ ∧ ϕ0 → χ ∈ CloR (D), ϕ → ϕ0 ∈ CloR (R) }). The claim, r ∈ CloSF (CloR (D ∪ R)), follows by Lemma B.11. t u Lemma B.18. Let D be a separable ELU-program. Let, for S every A ∈ Σc (D), QA be the predicate introduced for A in Ω(D). Let D0 = D ∪ A0 ∈Σc (D) {A0 (x) → QA0 (x)} and let r = AαQB be a rule where A, B ∈ Σc (D). Then: r ∈ CloSRF (D0 ) iff r ∈ CloSR (CloRF (D∨ ∪ Ω(D))H ∪ Ξ(D)) Proof. Suppose r ∈ CloRF (D0 ). By Lemma B.16, there is some r0 ∈ CloR (D0 ) such that r ∈ CloSF ({r0 }). By Lemma B.6, either r0 ∈ CloR (D∨ 0 0 ¯ ) or r = r∨ + 0 0 r1 + . . . + rn where r∨ ∈ CloR (D∨ ) and r = Aa αi QB ∈ CloR (D∨¯ ) (i ∈ [1, n]). We show the claim for the second case as the first case is similar but simpler. By Lemma B.14, we have ri ∈ CloSR ({Ai (x)∧NehAi ,QB i (x, y) → B(y)}∪PehAi ,QB i ) ⊆ CloSR (D∨ ∪Ω(D)∪Ξ(D)) (i ∈ [1, n]). Therefore, r0 ∈ CloSR (D∨ ∪Ω(D)∪Ξ(D)) and, consequently, r ∈ CloSRF (D∨ ∪ Ω(D) ∪ Ξ(D)). The claim follows by Lemma B.13. Now suppose r ∈ CloR (CloRF (D∨ ∪ Ω(D))H ∪ Ξ(D)). By Lemma B.13, we then have r ∈ CloSRF (D∨ ∪ Ω(D) ∪ Ξ(D)). By Lemma B.17, there is some r0 ∈ CloR (D∨ ∪ Ω(D) ∪ Ξ(D)) such that r ∈ CloSF ({r0 }). By Lemma B.6, either r0 ∈ CloR (Ω(D) ∪ Ξ(D)) or r0 = r∨ + r1 + . . . + rn such that r∨ ∈ CloR (D∨ ) and {r1 , . . . , rn } ⊆S CloR (Ω(D) ∪ Ξ(D)). Let P 0 = A0 ∈Σc (D) {A0 (x) → QA0 (x)}. It suffices to show that for every i ∈ [1, n]: ri ∈ CloSR (D∨¯ ∪ P 0 ) (since then r ∈ CloSF (CloSR (D∨ ∪ D∨¯ ∪ P 0 )) = CloSRF (D0 )). So, let i ∈ [1, n] and let ri = Ai αi QB . By definition, no two rules in Ω(D) resolve with each other. Moreover, the rules in Ω(D) have pairwise distinct unary predicates in their bodies. Therefore, with Lemma B.11, we obtain ri ∈ CloSR ({Ai (x) ∧ NehAi ,QB i (x, y) → QB (y)} ∪ Ξ(D)). The claim follows by Lemma B.14 and Proposition B.10. t u Proposition B.19. Let F be a set of FO-sentences and let r be a Datalog rule of the form ϕ → H. Then F |= r if and only if for every substitution σ from the variables in r to individuals: F ∪ {ϕσ} |= Hσ. Lemma B.20. Let D be an ELU-program, let P be an EL-program, and let Q be a unary predicate such that Σ(D) ⊆ Σ(P) and for every Datalog rule r = ϕ → Q(x) over Σ(D): r ∈ CloSRF (D) iff r ∈ CloSR (P) Then for every ABox A over Σ(D) and every individual a: D ∪ A |= Q(a) iff P ∪ A |= Q(a) Proof. Suppose D ∪ A |= Q(a). Let θ be aVsubstitution mapping each individual in A to a distinct variable and let r = ( A)θ → Q(aθ). By Proposition B.19, we then have D |= r. Then, by completeness of binary resolution with positive factoring, r ∈ CloSRF (D), and hence r ∈ CloSR (P) by the assumption. Since binary resolution with factoring is a sound inference calculus, we have P |= r, and hence D ∪ A |= Q(a) by Proposition B.19 (as the inverse of θ is a substitution from variables in r to individuals). The other direction follows analogously since binary resolution without fac- toring is a sound and complete inference calculus for Horn clauses. t u Lemma 4.13. Let D be a separable ELU-program. Let P be the union of Ξ(D) and the output of Procedure 1 on D∨ ∪ Ω(D). Then, for every query A(x) and every ABox A, we have cert(A(x), D, A) = cert(QA (x), P, A). Proof. Let D0 = D ∪ A∈Σc (D) {A(x) → QA (x)} where QA is the predicate S introduced in Ω(D) for A. Clearly, we have D ∪ A |= A(a) iff D0 ∪ A |= QA (a) for every query A(x), individual a, and ABox A over Σ(D). Hence, it suffices to show that for every A(x), a, A: D0 ∪ A |= QA (a) iff P ∪ A |= QA (a). By Lemma B.20, this is the case if CloSRF (D0 ) = CloSR (P) restricted to Datalog rules over Σ(D0 ) of the form ϕ → QA (x) (for every A(x)). The equation holds by Lemma B.18 as every Datalog rule over Σ(D0 ) in CloSRF (D0 ) ∪ CloSR (P) is linear, and CloRF (D∨ ∪ Ω(D))H is equal to the output of Procedure 1 on D∨ ∪ Ω(D) up to subsumption. t u Theorem 4.14. On input a separable ELU-program D, Procedure 3 returns a Datalog rewriting of D. Proof. Let P be set computed in Line (9) of Procedure 3 on input D and let P 0 be the output of the procedure. By Lemma 4.13, it suffices to show that cert(QA (x), P, A) = cert(A(x), P 0 , A) for every ABox A and every unary predi- cate A in D. So, let A and A be as required. Clearly, cert(QA (x), P, A) ⊆ cert(A(x), P 0 , A). For the other inclusion, ob- serve that, by Lemma 4.13, D ∪ A |= A(a) whenever P ∪ A |= QA (a). So, replacing QA by A in P does not allow to derive anything that is not entailed by the original program. t u