Virtual OBDA over Expressive Ontologies: Rewritings and Approximations? E. Botoeva1 , D. Calvanese1 , V. Santarelli2 , D. F. Savo2 , A. Solimando3 , and G. Xiao1 1 Free University of Bozen-Bolzano, Italy, lastname@inf.unibz.it 2 Sapienza Università di Roma, Italy, lastname@dis.uniroma1.it 3 DIBRIS, University of Genova, Italy, alessandro.solimando@unige.it Abstract. In this paper we extend virtual OBDA to more expressive ontology languages than that of DL-LiteR , which is the logic underpinning the W3C on- tology language OWL 2. We achieve this by relying on two well-known mecha- nisms, namely conservative rewriting and approximation, and compiling some of the domain semantics into OBDA mappings. 1 Introduction Ontology-Based Data Access (OBDA) is a popular paradigm that enables end users to access data sources through an ontology, abstracting away low-level details of the data sources themselves. The ontology provides a high-level description of the domain of interest, and is semantically linked to the data sources by means of a set of mapping assertions [7, 16]. Typically, the data sources are represented as relational data, the on- tology is a set of logical axioms over concepts and roles, and each mapping assertion relates an SQL query over the database to a concept or role of the ontology. Making OBDA work efficiently over large amounts of data, requires that query an- swering over the ontology is first-order (FO)-rewritable1 [8, 1], which in turn limits the expressiveness of the ontology language, and the degree of detail with which the domain of interest can be captured. The current language of choice for OBDA is DL-LiteR , the logic underlying OWL 2 QL [24], which has been specifically designed to ensure FO- rewritability of query answering. Hence, it does not allow one to express disjunctive information, or any form of recursion on the data (e.g., as resulting from qualified ex- istentials on the left-hand side of concept inclusions), since using such constructs in general causes the loss of FO-rewritability [9]. For this reason, in many situations the expressive power of DL-LiteR is too restricted to capture real-world scenarios. The aim of this work is to overcome these limitations of DL-LiteR by allowing the use of additional constructs in the ontology. To be able to exploit the added value coming from OBDA in real-world settings, an important requirement is the efficiency of query answering, achieved through a rewriting-based approach. This is only possi- ble for ontology languages that are FO-rewritable. Two general mechanisms that have ? This paper is an abridged version of [5]. 1 Recall that FO queries constitute the core of SQL. been proposed to cope with computational complexity coming from high expressive- ness of ontology languages, and that allow one to regain FO-rewritability, are conserva- tive rewriting [20] and approximation [26, 10]. In the former, an ontology in a powerful language is rewritten, when possible, into an equivalent one in a restricted language, while in the latter it is approximated, thus losing part of its semantics. In this work, we significantly extend the practical impact of both approaches by bringing into the picture the mapping, an essential component of OBDA that has been ignored so far. Indeed, it is a fairly expressive component of an OBDA system, since it allows one to make use of arbitrary SQL (hence FO) queries to relate the content of the data source to the elements of the ontology. Hence, a natural question is how one can use the mapping component to capture as much as possible additional domain seman- tics, resulting in better approximations or more cases where conservative rewritings are possible, while maintaining a DL-LiteR ontology. We illustrate how this can be done on an example. Consider a bank domain, where we can specify that a checking account in the name of a person is a simple account by means of the axiom CAcc u ∃inNameOf.Person v SAcc. Further, assume that the information about the accounts and their owners is stored in a database D, and that the predicates CAcc, inNameOf, and Person are connected to D respectively via the map- ping assertions sql 1 (x) CAcc(x), sql 2 (x, y) inNameOf(x, y) and sql 3 (x) Person(x), where each sql i is an SQL query over D. Then this non-DL-LiteR axiom can be encoded by adding the assertion sql 1 (x) ./ sql 2 (x, y) ./ sql 3 (y) SAcc(x) to the mapping. This assertion connects D directly to the ontology term SAcc by making use of a join of the SQL queries in the original mapping. We observe that the result- ing mapping, together with the ontology in which the non-DL-LiteR axiom has been removed, constitutes a conservative rewriting of the original OBDA specification. In this paper, we elaborate on this idea, by introducing a novel framework for rewrit- ing and approximation of OBDA specifications. Specifically, we provide a notion of rewriting based on query inseparability of OBDA specifications [3]. To deal with those cases where it is not possible to rewrite the OBDA specification into a query inseparable one whose ontology is in DL-LiteR , we give a notion of approximation that is sound for query answering. We develop techniques for rewriting and approximation of OBDA specifications based on compiling the extra expressiveness into the mappings. We tar- get rather expressive ontology languages, and for Horn-ALCHIQ, a Horn fragment of OWL 2, we study decidability of existence of OBDA rewritings, and techniques to compute them when they exist, and to approximate them, otherwise. We have implemented our techniques in a prototype system called O NTO P ROX, which exploits functionalities provided by the O NTOP [27] and C LIPPER systems [14] to rewrite or approximate an OBDA specification expressed in Horn-SHIQ to one that can be directly processed by any OBDA system. We have evaluated O NTO P ROX over synthetic and real OBDA instances against (i) the default O NTOP behavior, (ii) local se- mantic approximation (LSA), (iii) global semantic approximation (GSA), and (iv) C LIP - PER over materialized ABoxes. Using O NTO P ROX, for a few queries we have been able to obtain more answers (in fact, complete answers, as confirmed by C LIPPER). However, for many queries O NTO P ROX showed no difference with respect to the default O NTOP behavior. One reason for this is that in the considered real-world scenario, the mapping designers put significant effort to manually create complex mappings that overcome the limitations of DL-LiteR . Essentially they followed the principle of the technique pre- sented here, producing an OBDA specification that was already “complete” by design. The observations above immediately suggest a significant practical value of our ap- proach, which can be used to facilitate the design of new OBDA specifications for ex- isting expressive ontologies: instead of a manual compilation, which is cumbersome, error-prone, and difficult to maintain, mapping designers can write straightforward mappings, and the resulting OBDA specification can then be automatically transformed into a DL-LiteR OBDA specification with rich mappings. Omitted proofs can be found in the extended version of this paper [4]. 2 Preliminaries 2.1 Ontologies We assume to have the following pairwise disjoint countably infinite alphabets: NC of concept names, NR of role names, and NI of constants (also called individuals). We consider ontologies expressed in Description Logics (DLs). Here we present the logics Horn-ALCHIQ, the Horn fragment of SHIQ without role transitivity, and DL-LiteR , for which we develop some of the technical results in the paper. However, the general approximation framework is applicable to any OWL 2 fragment. A Horn-ALCHIQ TBox in normal form d [18, 14] is a finite set of axioms of the following forms: concept inclusions (CIs) i Ai v C, role inclusions (RIs) R1 v R2 , and role disjointness axioms R1 u R2 v ⊥, where A, Ai denote concept names, R, R1 , R2 denote role names P or their inverses P − , and C denotes a concept of the form ⊥, A, ∃R.A, ∀R.A, or ≤ 1 R.A. For an inverse role R = P − , we use R− to denote P . ⊥ denotes the empty concept/role. A DL-LiteR TBox is a finite set of axioms of the form B1 v B2 , B1 u B2 v ⊥, R1 v R2 , and R1 u R2 v ⊥, where Bi denotes a concept of the form A or ∃R.>. In what follows, for simplicity we write ∃R instead of ∃R.>, we use N to denote either a concept or a role name, and assume that all TBoxes are in normal form. An ABox is a finite set of membership assertions of the form A(c) or P (c, c0 ), where 0 c, c ∈ NI . For a DL L, an L-ontology is a pair O = hT , Ai, where T is an L-TBox and A is an ABox. A signature Σ is a finite set of concept and role names. An ontology O is said to be defined over (or simply, over) Σ if all the concept and role names occurring in it belong to Σ (and likewise for TBoxes, ABoxes, concept inclusions, etc.). When T is over Σ, we denote by sig(T ) the subset of Σ actually occurring in T . Moreover we denote with Ind(A), the set of individuals appearing in A. The semantics, models, and the notions of satisfaction and consistency of ontologies are defined in the standard way. We only point out that we adopt the Unique Name Assumption (UNA), and for simplicity we also assume to have standard names, i.e., for every interpretation I and every constant c ∈ NI interpreted by I, we have that cI = c. 2.2 OBDA and Mappings Let S be a relational schema over a countably infinite set NS of database predicates. For simplicity, we assume to deal with plain relational schemas without constraints, and with database instances that directly store abstract objects (as opposed to values). So, a database instance D of S is a set of ground atoms over the predicates in NS and the constants in NI .2 Queries over S are expressed in SQL. We use ϕ(x) to denote that query ϕ has x = x1 , . . . , xn as free (i.e., answer) variables, where n is the arity of ϕ. Given a database instance D of S and a query ϕ over S, ans(ϕ, D) denotes the set of tuples of constants in NI computed by evaluating ϕ over D. In OBDA, one provides access to an (external) database through an ontology TBox, which is connected to the database by means of a mapping. Given a source schema S and a TBox T , a (GAV) mapping assertion between S and T has the form ϕ(x) A(x) or ϕ0 (x, x0 ) P (x, x0 ), where A and P are respectively concept and role names, 0 0 and ϕ(x), ϕ (x, x ) are arbitrary (SQL) queries expressed over S, called source queries. Intuitively, given a database instance D of S and a mapping assertion m = ϕ(x) A(x), the instances of the concept A generated by m from D is the set ans(ϕ, D); similarly for a mapping assertion ϕ(x, x0 ) P (x, x0 ). An OBDA specification is a triple P = hT , M, Si, where T is a DL TBox, S is a relational schema, and M is a finite set of mapping assertions. Without loss of gen- erality, we assume that all concept and role names appearing in M are contained in sig(T ). An OBDA instance is a pair hP, Di, where P is an OBDA specification, and D is a database instance of S. The semantics of the OBDA instance hP, Di is specified in terms of interpretations of the concepts and roles in T . We define it by relying on the (virtual3 ) ABox AM,D = {N (o) | o ∈ ans(ϕ, D) and ϕ(x) N (x) in M} gener- ated by M from D, where N is a concept or role name in T . Then, a model of hP, Di is simply a model of the ontology hT , AM,D i. To abstract away the low-level SQL details in source queries, following [13], we split each mapping assertion m = ϕ(x) N (x) in M into two parts. By introducing an intermediate view name Vm for the SQL query ϕ(x), we obtain a low-level mapping assertion of the form ϕ(x) Vm (x), and a high-level mapping assertion of the form Vm (x) N (x). Hence, in our technical development we deal only with the high-level mappings, and directly consider the intermediate views as our data sources. 2.3 Query Answering We consider conjunctive queries, which are the basic and most important query- ing mechanism in relational database systems and ontologies. A conjunctive query (CQ) q(x) over a signature Σ is a formula ∃y. ϕ(x, y), where ϕ is a conjunction of atoms N (z), such that N is a concept or role name in Σ, and z are variables from x and y. The set of certain answers to a CQ q(x) over an ontology hT , Ai, denoted cert(q, hT , Ai), is the set of tuples c of elements from Ind(A) of the same length as x, such that q(c) (considered as a FO sentence) holds in every model of hT , Ai. We mention two more query classes. An atomic query (AQ) is a CQ consisting of exactly one atom whose variables are all free. A CQ with inequalities (CQ6= ) is a CQ that may contain inequality atoms between the variables of the predicate atoms. 2 All our results easily extend to the case where objects are constructed from retrieved database values [7]. 3 We call such an ABox ‘virtual’, because we are not interested in materializing its facts. Given a CQ q, an OBDA specification P = hT , M, Si and a database instance D of S, the answer to q over the OBDA instance hP, Di, denoted cert(q, P, D), is defined as cert(q, hT , AM,D i). Observe that, when D is inconsistent with P (i.e., hP, Di does not have a model), then cert(q, P, D) is the set of all possible tuples of constants in AM,D (of the same arity as q). 3 An OBDA Rewriting Framework We extend the notion of query inseparability of ontologies [6] to OBDA specifications. We adopt the proposal by [3], but we do not enforce preservation of inconsistency. Definition 1. Let Σ be a signature. Two OBDA specifications P1 = hT1 , M1 , Si and P2 = hT2 , M2 , Si are Σ-CQ inseparable if cert(q, P1 , D) = cert(q, P2 , D), for every CQ q over Σ and every database instance D of S. In OBDA, one must deal with the trade-off between the computational complexity of query answering and the expressiveness of the ontology language. Suppose that in an OBDA specification P = hT , M, Si, T is expressed in an ontology language L that does not allow efficient query answering. A possible solution is to exploit the expressive power of the mapping to compute a new OBDA specification P 0 = hT 0 , M0 , Si in which T 0 is expressed in a language Lt more suitable for query answering than L. The aim is to encode in M0 not only M but also part of the semantics of T , so that P 0 is query-inseparable from P. This leads to the notion of rewriting of OBDA specifications. Definition 2. Let Lt be an ontology language. The OBDA specification P 0 = hT 0 , M0 , Si is a CQ-rewriting in Lt of the OBDA specification P = hT , M, Si if (i) sig(T ) ⊆ sig(T 0 ), (ii) T 0 is an Lt -TBox, and (iii) P and P 0 are Σ-CQ inseparable, for Σ = sig(T ). If such P 0 exists, we say that P is CQ-rewritable into Lt . We observe that the new OBDA specification can be defined over a signature that is an extension of that of the original TBox. This is specified by condition (i). In condi- tion (ii), we impose that the new ontology is specified in the target language Lt . Finally, condition (iii) imposes that the OBDA specifications cannot be distinguished by CQs over the original TBox. Note that the definition allows for changing the ontology and the mappings, but not the source schema, accounting for the fact that the data sources might not be under the control of the designer of the OBDA specification. As expected, it is not always possible to obtain a CQ-rewriting of P in an ontology language Lt that allows for efficient query answering. Indeed, the combined expressive- ness of Lt with the new mappings might not be sufficient to simulate query answering over P without loss. In these cases, we can resort to approximating query answers over P in a sound way, which means that the answers to queries posed over the new speci- fication are contained in those produced by querying P. Hence, we say that the OBDA specification P 0 = hT 0 , M0 , Si is a sound CQ-approximation in Lt of the OBDA spec- ification P = hT , M, Si if P 0 satisfies (i), (ii), and cert(q, P 0 , D) ⊆ cert(q, P, D), for each CQ q over sig(T ) and for each instance D of S. Next, we study CQ-rewritability of OBDA specifications into DL-LiteR , developing suitable techniques. 4 Rewriting OBDA Specifications In this section, we develop our OBDA rewriting technique, which relies on Datalog rewritings of the TBox (and mappings). Recall that a Datalog program (with inequali- ties) is a finite set of definite Horn clauses without functions symbols, i.e., rules of the form head ← ϕ, where ϕ is a finite non-empty list of predicate atoms and guarded inequalities called the body of the rule, and head is an atom, called the head of the rule, all of whose variables occur in the body. The predicates that occur in rule heads are called intensional (IDB), the other predicates are called extensional (EDB). 4.1 ET-mappings Now, we extend the notion of T-mappings introduced by [27], and define the notion of an ET-mapping that results from compiling into the mapping the expressiveness of ontology languages that are Datalog rewritable, as introduced below. We first introduce notation we need. Let Π be a Datalog program and N an IDB i predicate. For a database D over the EDB predicates of Π, let NΠ (D) denote the set of facts about N that can be deduced from D by at most i ≥ 1 applications of the rules in ∞ i ∞ S Π, and let NΠ (D) = i≥1 NΠ (D). It is known that the predicate NΠ (·) defined by 6= N in Π can be characterized by a possibly S infinite union of CQ s N[11], i.e., there exist 6= N N ∞ CQ s ϕ0 , ϕ1 , . . . such that NΠ (D) = i≥0 {N (a) | a ∈ ans(ϕi , D)}, for every D. The ϕN i ’s are called the expansions of N and can be described in terms of expansion trees; cf. [4, Appendix A]. We denote by ΦΠ (N ) the set of expansion trees for N in Π, and abusing notation also the (possibly infinite) union of CQ6= s corresponding to it. Note that ΦΠ (N ) might be infinite due to the presence of IDB predicates that are recursive, i.e., either directly or indirectly refer to themselves. We call a TBox T Datalog rewritable if it admits a translation ΠT to Datalog that preserves consistency and answers to AQs (see, e.g., the translations by [17], [14], and [28] for Horn-SHIQ, and by [12] for SHI). We assume that ΠT makes use of a special nullary predicate ⊥ that encodes inconsistency, i.e., for an ABox A, hT , Ai is consistent iff ⊥∞ 4 ΠT (A) is empty. We also assume that ΠT includes the following auxiliary rules, which ensure that ΠT derives all possible facts constructed over sig(T ) and Ind(A) whenever hT , Ai is inconsistent: >∆ (x) ← A(x); >∆ (x) ← P (x, y); >∆ (y) ← P (x, y); A(x) ← ⊥, >∆ (x); P (x, y) ← ⊥, >∆ (x), >∆ (y); where A and P respectively range over concept and role names in sig(T ), and >∆ is a fresh unary predicate denoting the set of all the individuals appearing in A. In the following, we denote with ΠM the (high-level) mapping M viewed as a Datalog program, and with ΠT ,M the Datalog program ΠT ∪ ΠM associated to a Datalog rewritable TBox T and a mapping M. From the properties of the translation ΠT (and the simple structure of ΠM ), we obtain that ΠT ,M satisfies the following: 4 Here we simply consider A as a database. Input: Horn-ALCHIQ TBox T and mapping M. Output: DL-LiteR TBox Tr and ET-mapping Mc . Step 1: T1 is obtained from T by adding all CIs of the form Ai v ∃R.( A0j ) entailed by d d 0 T , for concept names Ai , Aj ∈ sig(T ). Step 2: T2 = norm∃ (T1 ). Step 3: T3 = normu (T2 ). Step 4: Mc is etmT3 (M), and Tr is the DL-LiteR TBox consisting of all DL-LiteR axioms over sig(T3 ) entailed by T3 (including the trivial ones N v N ). Fig. 1. OBDA specification rewriting procedure RewObda. Lemma 1. Let hT , M, Si be an OBDA specification where T is Datalog rewritable. Then, for every database instance D of S, concept or role name N of T , and a in ∞ Ind(AM,D ), we have that hT , AM,D i |= N (a) iff N (a) ∈ NΠ T ,M (D). For a predicate N , we say that an expansion ϕN ∈ ΦΠT ,M (N ) is DB-defined if ϕN is defined over database predicates. Now we are ready to define ET-mappings. Definition 3. Let hT , M, Si be an OBDA specification where T is Datalog rewritable. The ET-mapping for M and T , denoted etmT (M), is defined as the set of assertions of the form ϕN (x) N (x) such that N is a concept or role name in T , and ϕN ∈ ΦΠT ,M (N ) is DB-defined. It is easy to show that, for M0 = etmT (M) and each database instance D, the vir- tual ABox AM0 ,D (which can be defined for ET-mappings as for ordinary mappings) contains all facts entailed by hT , AM,D i. In this sense, the ET-mapping etmT (M) plays for a Datalog rewritable TBox T the same role as T-mappings play for (the sim- pler) DL-LiteR TBoxes. Note that, in general, an ET-mapping is not a mapping, as it may contain infinitely many assertions. However, AM0 ,D is still finite, given that it is constructed over the finite number of constants appearing in D. 4.2 Rewriting Horn-ALCHIQ OBDA Specifications to DL-LiteR Let hT , M, Si be an OBDA specification, where T is a Horn-ALCHIQ TBox over a signature Σ. Procedure RewObda(T , M), in Figure 1, constructs a DL-LiteR TBox Tr and an ET-mapping Mc such that hTr , Mc , Si is Σ-CQ inseparable from hT , M, Si. applies to T1 the normalization step norm∃ , which gets rid In Step 2, the procedure d of concepts of the form ∃R.( A0j ) in the right-hand side of CIs. This is achieved by the dm dn following well-known substitution [1]: every CI i=1 Ai v ∃R.( j=1 A0j ) in T1 is re- dm placed with i=1 Ai v ∃Pnew , Pnew v R, and > v ∀Pnew .A0j , for 1 ≤ j ≤ n, where Pnew is a fresh role name. Notice that the latter two forms of inclusions introduced by norm∃ are actually in DL-LiteR , as > v ∀Pnew .A0j is equivalent to ∃Pnew − v A0j . In Step 3, the procedure applies to T2 a further normalization step, normu , which intro- duces a fresh concept name AA1 u···uAn for each concept conjunction A1 u · · · u An ap- pearing in T2 , and adds A1 u· · ·uAn ≡ AA1 u···uAn 5 to the TBox. Note that norm∃ (T1 ) 5 We use ‘≡’ to abbreviate inclusion in both directions. and normu (T2 ) are model-conservative extensions of T1 and T2 , respectively [21], as one can easily show. We denote by rew(T ) the resulting TBox Tr , which in general is exponential in the size of T , and by comp(T , M) the resulting ET-mapping Mc , which in general is infinite. Example 1 Assume that the domain knowledge is represented by the TBox T = { CAccu∃inNameOf.Person v SAcc } about bank accounts. Its normalization is T b = {Person v ∀inNameOf − .A1 , CAcc u A1 v SAcc}. Assume that the database schema S b consists of the two relations ENT(ID, TYPE, EMPID), PROD(NUM, TYPE, CUSTID), whose data are mapped to the ontology terms by means of the following mapping M: mP : SELECT ID AS X FROM ENT WHERE ENT.TYPE=’P’ Person(X) mN : SELECT NUM AS X, CUSTID AS Y FROM PROD inNameOf(X, Y) mC : SELECT NUM AS X FROM PROD P WHERE P.TYPE=’B’ CAcc(X) The corresponding high-level mapping Mb consists of the assertions: hP : {x | VPerson (x)} Person(x) hN : {x, y | VinNameOf (x, y)} inNameOf(x, y) hC : {x | VCAcc (x)} CAcc(x) Now, consider the OBDA specification P b = hT b , Mb , S b i. The RewObda procedure invoked on (T b , Mb ) produces: – The intermediate TBoxes T1b and T2b coinciding with T b , and T3b extending T b with ACAccuA1 ≡ CAcc u A1 . – The ET-mapping Mbc = etmT3b (Mb ), which extends Mb with the following three assertions: {x | VinNameOf (x, y), VPerson (y)} A1 (x); {x | VCAcc (x), VinNameOf (x, y), VPerson (y)} SAcc(x); {x | VCAcc (x), VinNameOf (x, y), VPerson (y)} ACAccuA1 (x). The procedure returns the DL-LiteR TBox Trb = {ACAccuA1 v CAcc, ACAccuA1 v A1 , b ACAccuA1 v SAcc} and the mapping Mbc . It is possible to show that PDL-LiteR = b b b b hTr , Mc , S i is a CQ-rewriting of P into DL-LiteR . The TBox T3 obtained as an intermediate result in Step 3 of RewObda(T , M), is a model-conservative extension of T , tailored towards capturing in DL-LiteR the answers to tree-shaped CQs. This is obtained by introducing in Step 2 sufficiently many new role names, and in Step 3 new concept names, so as to capture entailed axioms that generate the tree-shaped parts of models. Note that at-most number restrictions do not participate in the generation of the tree-shaped parts of models, hence are not considered explicitly in Steps 1 to 3. On the other hand, the ET-mapping Mc = comp(T , M) generates from a database instance a virtual ABox Av that is complete with respect to all ABox facts that might be involved in the generation of the tree-shaped parts of models of Tr and Av . This allows us to prove the main result of this section. Theorem 2. Let hT , M, Si be an OBDA specification such that T is a Horn-ALCHIQ TBox, and let hTr , Mc i = RewObda(T , M). Then hT , M, Si and hTr , Mc , Si are Σ-CQ inseparable, for Σ = sig(T ). Clearly, hTr , Mc , Si is a candidate for being a CQ-rewriting of hT , M, Si into DL- LiteR . However, since Mc might be an infinite set, hTr , Mc , Si might not be an OBDA specification and hence might not be effectively usable for query answering. Next we address this issue, and show that in some cases we obtain proper CQ-rewritings, while in others we have to resort to approximations. 5 Approximating OBDA Specifications To obtain from an ET-mapping a proper mapping, we exploit the notion of predicate boundedness in Datalog, and use a bound on the depth of Datalog expansion trees. An IDB predicate N is said to be bounded in a Datalog program Π, if there exists k a constant k depending only on Π such that, for every database D, we have NΠ (D) = ∞ NΠ (D) [11]. If N is bounded in Π, then there exists an equivalent Datalog program Π 0 such that ΦΠ 0 (N ) is finite, and thus represents a finite union of CQ6= s. It is well known that predicate boundedness for Datalog is undecidable in general [15]. We say that Ω is a boundedness oracle if for a Datalog program Π and a predicate N it returns one of the three answers: N is bounded in Π, N is not bounded in Π, or unknown. When N is bounded, Ω returns also a finite union of CQ6= s, denoted ΩΠ (N ), defining N . Given a constant k, ΦkΠ (N ) denotes the set of trees (and the corresponding union of CQ6= s) in ΦΠ (N ) of depth at most k, hence ΦkΠ (N ) is always finite. We introduce a cutting operator cutΩ k , which is parametric with respect to the cut- ting depth k > 0 and the boundedness oracle Ω, which, when applied to a predicate N and a Datalog program Π, returns a finite union of CQ6= s as follows: ( Ω  ΩΠ (N ), if N is bounded in Π w.r.t. Ω cutk N, Π = ΦkΠ (N ), otherwise. We apply cutting also to ET-mappings: given an ET-mapping etmT (M), the mapping cutΩ N k (etmT (M)) is the (finite) set of mapping assertions ϕ (x) N (x) s.t. N is a N Ω concept or role name in T , and ϕ ∈ cutk (N, ΠT ,M ) is DB-defined. The following theorem provides a sufficient condition for CQ-rewritability into DL- LiteR in terms of the well-known notion of first-order (FO)-rewritability, which we recall here: a query q is FO-rewritable with respect to a TBox T , if there exists a FO query q 0 such that cert(q, hT , Ai) = ans(q 0 , A), for every ABox A over sig(T ) (viewed as a database). It uses the fact that if an AQ is FO-rewritable with respect to a Horn-ALCHIQ TBox T , then it is actually rewritable into a union of CQ6= s, and the fact that if T is FO-rewritable for AQs (i.e., every AQ is FO-rewritable with respect to T ), then each concept and role name is bounded in ΠT [22, 2]. Theorem 3. Let hT , M, Si be an OBDA specification such that T is a Horn-ALCHIQ TBox. Further, let Tr = rew(T ) and M0 = cutΩ k (comp(T , M)), for a boundedness oracle Ω and some k > 0. If T is FO-rewritable for AQs, then hT , M, Si is CQ-rewritable into DL-LiteR , and hTr , M0 , Si is its CQ-rewriting. Oth- erwise, hTr , M0 , Si is a sound CQ-approximation of hT , M, Si in DL-LiteR . This gives us decidable conditions for rewritability of OBDA specifications in sev- eral significant cases. It is shown by [2] and [22] that FO-rewritability of AQs relative to Horn-SHI-TBoxes, Horn-ALCF-TBoxes, and Horn-ALCIF-TBoxes of depth two is decidable. In fact, these FO-rewritability algorithms provide us with a boundedness oracle Ω: for each concept and role name N in T , they return a FO-rewriting of the AQ N (x) that combined with the mapping M results in ΩΠT ,M (N ). Unfortunately, a complete characterization of CQ-rewritability into DL-LiteR is not possible if arbitrary FO-queries are allowed in the (low-level) mapping. Theorem 4. The problem of checking whether an OBDA specification with an EL on- tology and FO source queries in the mapping is CQ-rewritable into DL-LiteR is unde- cidable. However, if we admit only unions of CQs in the (low-level) mapping, it follows from decidability of boundedness of monadic Datalog programs [11] that we can fully characterize CQ-rewritability. Theorem 5. The problem of checking whether an OBDA specification with a Horn- ALCHI ontology of depth one and unions of CQs as source queries in the mapping is CQ-rewritable into DL-LiteR is decidable. 6 Implementation To demonstrate the feasibility of our OBDA specification rewriting technique, we implemented the O NTO P ROX6 prototype system and evaluated it over synthetic and real OBDA instances. It relies on the OBDA reasoner O NTOP7 and the complete Horn-SHIQ CQ-answering system C LIPPER8 , used as Java libraries, on a standard Pro- log engine (SWI -P ROLOG9 ), and on an OWL 2 reasoner (H ERMI T10 ). Essentially, O NTO P ROX implements the rewriting and compiling procedure de- scribed in Figure 1, but instead of computing the (possibly infinite) ET-mapping comp(T , M), it computes its finite part cutk (comp(T, M)). So, it gets as input an OWL 2 OBDA specification hTOWL2 , M, Si and a positive integer k, and produces a DL-LiteR OBDA specification that can be used with any OBDA system. Below we describe some of the implementation details: (1) TOWL2 is first approximated to the Horn-SHIQ TBox T by dropping the axioms outside this fragment. (2) T is translated intoda (possibly recursive) Datalog program Π and saturated with all CIs of the form Ai v ∃R.( A0j ), using functionalities provided by C LIPPER. d (3) The expansions cutk (ΦΠ (X)) are computed by an auxiliary Prolog program using Prolog meta-programming. (4) To produce actual mappings that can be used by an OBDA reasoner, the views in the high-level mapping cutk (comp(T , M)) are replaced with their original SQL definitions using functionalities of O NTOP. (5) The DL-LiteR closure is computed by relying on the OWL 2 reasoner for Horn-- SHIQ TBox classification. 6 https://github.com/ontop/ontoprox/ 7 http://ontop.inf.unibz.it/ 8 http://www.kr.tuwien.ac.at/research/systems/clipper/ 9 http://www.swi-prolog.org/ 10 http://hermit-reasoner.com/ For the experiments, we have considered two scenarios: UOBM. The university ontology benchmark (UOBM) [23] comes with a SHOIN ontology (with 69 concepts, 35 roles, 9 attributes, and 204 TBox axioms), and an ABox generator. We have designed a suitable database schema for the generated ABox, con- verted the ABox to a 10MB database instance for the schema, and manually created the mapping, consisting of 96 assertions11 . Telecom benchmark. The telecommunications ontology models a portion of the net- work of a leading telecommunications company, namely the portion connecting sub- scribers to the operating centers of their service providers. The current specification consists of an OWL 2 ontology with 152 concepts, 53 roles, 73 attributes, 458 TBox axioms, and of a mapping with 264 mapping assertions. The database instance contains 32GB of real-world data. For each OBDA instance hhT , M, Si, Di, we have evaluated the number of query answers and the query answering time with respect to five different setups: (1) The default behavior of O NTOP v1.15, which simply ignores all non-DL-LiteR ax- ioms in T , i.e., using hT 1 , M, Si where T 1 are all the DL-LiteR axioms in T . (2) The local semantic approximation (LSA) of T in DL-LiteR , i.e., using hT 2 , M, Si where T 2 is obtained as the union, for each axiom α ∈ T , of the set of DL-LiteR axioms Γ (α) entailed by α [10]. (3) The global semantic approximation (GSA) of T in DL-LiteR , i.e., using hT 3 , M, Si where T 3 is the DL-LiteR closure of T [25]. (4) Result of O NTO P ROX, i.e., hrew(T ), cut5 (comp(T, M)), Si. (5) C LIPPER over the materialization of the virtual ABox. The details of the evaluation can be found in [4]. 7 Conclusions We proposed a novel framework for rewriting and approximating OBDA specifications in an expressive ontology language to specifications in a weaker language, in which the core idea is to exploit the mapping layer to encode part of the semantics of the original specification, and we developed techniques for DL-LiteR as the target language. We plan to continue our work along the following directions: (i) extend our technique to Horn-SHIQ, and, more generally, to Datalog rewritable TBoxes [12]; (ii) deepen our understanding of the computational complexity of deciding CQ- rewritability of OBDA specifications into DL-LiteR ; (iii) extend our technique to SPARQL queries under different OWL entailment regimes [19]; (iv) carry out more extensive experiments, considering queries that contain existentially quantified vari- ables. This will allow us to verify the effectiveness of RewObda, which was designed specifically to deal with existentially implied objects. Acknowledgement. This work is partially supported by the EU under the large-scale integrating project (IP) Optique (Scalable End-user Access to Big Data), grant agree- ment n. FP7-318338. We thank Martin Rezk for insightful discussions, and Benjamin Cogrel and Elem Güzel for help with the experimentation. 11 https://github.com/ontop/ontop-examples/tree/master/aaai-2016-ontoprox/uobm References 1. Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Zakharyaschev. The DL-Lite family and relations. J. of Artificial Intelligence Research, 36:1–69, 2009. 2. Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. First-order rewritability of atomic queries in Horn description logics. In Proc. of the 23rd Int. Joint Conf. on Artificial In- telligence (IJCAI), pages 754–760, 2013. 3. Meghyn Bienvenu and Riccardo Rosati. Query-based comparison of OBDA specifications. In Proc. of the 28th Int. Workshop on Description Logic (DL), volume 1350 of CEUR Elec- tronic Workshop Proceedings, 2015. 4. Elena Botoeva, Diego Calvanese, Valerio Santarelli, Domenico Fabio Savo, Alessandro Soli- mando, and Guohui Xiao. Beyond OWL 2 QL in OBDA: Rewritings and approximations (Extended version). CoRR Technical Report abs/1511.08412, arXiv.org e-Print archive, 2015. Available at http://arxiv.org/abs/1511.08412. 5. Elena Botoeva, Diego Calvanese, Valerio Santarelli, Domenico Fabio Savo, Alessandro Soli- mando, and Guohui Xiao. Beyond OWL 2 QL in OBDA: Rewritings and approximations. In Proc. of the 30th AAAI Conf. on Artificial Intelligence (AAAI), 2016. 6. Elena Botoeva, Roman Kontchakov, Vladislav Ryzhikov, Frank Wolter, and Michael Za- kharyaschev. Query inseparability for description logic knowledge bases. In Proc. of the 14th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR), pages 238–247. AAAI Press, 2014. 7. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Mariano Rodriguez-Muro, and Riccardo Rosati. Ontologies and databases: The DL- Lite approach. In 5th Reasoning Web Int. Summer School Tutorial Lectures (RW), volume 5689 of LNCS, pages 255–356. Springer, 2009. 8. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Ric- cardo Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 9. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Ric- cardo Rosati. Data complexity of query answering in description logics. Artificial Intelli- gence, 195:335–360, 2013. 10. Marco Console, José Mora, Riccardo Rosati, Valerio Santarelli, and Domenico Fabio Savo. Effective computation of maximal sound approximations of description logic ontologies. In Proc. of the 13th Int. Semantic Web Conf. (ISWC), volume 8797 of LNCS, pages 164–179. Springer, 2014. 11. Stavros S. Cosmadakis, Haim Gaifman, Paris C. Kanellakis, and Moshe Y. Vardi. Decidable optimization problems for database logic programs. In Proc. of the 20th ACM SIGACT Symp. on Theory of Computing (STOC), pages 477–490, 1988. 12. Bernardo Cuenca Grau, Boris Motik, Giorgos Stoilos, and Ian Horrocks. Computing dat- alog rewritings beyond Horn ontologies. In Proc. of the 23rd Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 832–838, 2013. 13. Floriana Di Pinto, Domenico Lembo, Maurizio Lenzerini, Riccardo Mancini, Antonella Poggi, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. Optimizing query rewrit- ing in ontology-based data access. In Proc. of the 16th Int. Conf. on Extending Database Technology (EDBT), pages 561–572. ACM Press, 2013. 14. Thomas Eiter, Magdalena Ortiz, Mantas Simkus, Trung-Kien Tran, and Guohui Xiao. Query rewriting for Horn-SHIQ plus rules. In Proc. of the 26th AAAI Conf. on Artificial Intelligence (AAAI), pages 726–733. AAAI Press, 2012. 15. Haim Gaifman, Harry G. Mairson, Yehoshua Sagiv, and Moshe Y. Vardi. Undecidable opti- mization problems for database logic programs. In Proc. of the 2nd IEEE Symp. on Logic in Computer Science (LICS), pages 106–115, 1987. 16. Martin Giese, Ahmet Soylu, Guillermo Vega-Gorgojo, Arild Waaler, Peter Haase, Ernesto Jiménez-Ruiz, Davide Lanti, Martin Rezk, Guohui Xiao, Özgür L. Özçep, and Riccardo Rosati. Optique: Zooming in on Big Data. IEEE Computer, 48(3):60–67, 2015. 17. Ullrich Hustadt, Boris Motik, and Ulrike Sattler. Data complexity of reasoning in very ex- pressive description logics. In Proc. of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 466–471, 2005. 18. Yevgeny Kazakov. Consequence-driven reasoning for Horn-SHIQ ontologies. In Proc. of the 21st Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 2040–2045, 2009. 19. Roman Kontchakov, Martin Rezk, Mariano Rodriguez-Muro, Guohui Xiao, and Michael Zakharyaschev. Answering SPARQL queries over databases under OWL 2 QL entailment regime. In Proc. of the 13th Int. Semantic Web Conf. (ISWC), LNCS. Springer, 2014. 20. Carsten Lutz, Robert Piro, and Frank Wolter. Description logic TBoxes: Model-theoretic characterizations and rewritability. In Proc. of the 22nd Int. Joint Conf. on Artificial Intelli- gence (IJCAI), pages 983–988, 2011. 21. Carsten Lutz, Dirk Walther, and Frank Wolter. Conservative extensions in expressive de- scription logics. In Proc. of the 20th Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 453–458, 2007. 22. Carsten Lutz and Frank Wolter. Non-uniform data complexity of query answering in de- scription logics. In Proc. of the 24th Int. Workshop on Description Logic (DL), volume 745 of CEUR Electronic Workshop Proceedings, 2011. 23. Li Ma, Yang Yang, Zhaoming Qiu, GuoTong Xie, Yue Pan, and Shengping Liu. Towards a complete OWL ontology benchmark. In Proc. of the 3rd European Semantic Web Conf. (ESWC), volume 4011 of LNCS, pages 125–139. Springer, 2006. 24. Boris Motik, Achille Fokoue, Ian Horrocks, Zhe Wu, Carsten Lutz, and Bernardo Cuenca Grau. OWL Web Ontology Language profiles. W3C Recommendation, World Wide Web Consortium, October 2009. Available at http://www.w3.org/TR/ owl-profiles/. 25. Jeff Z. Pan and Edward Thomas. Approximating OWL-DL ontologies. In Proc. of the 21st AAAI Conf. on Artificial Intelligence (AAAI), pages 1434–1439, 2007. 26. Yuan Ren, Jeff Z. Pan, and Yuting Zhao. Soundness preserving approximation for TBox reasoning. In Proc. of the 24th AAAI Conf. on Artificial Intelligence (AAAI), 2010. 27. Mariano Rodriguez-Muro, Roman Kontchakov, and Michael Zakharyaschev. Ontology- based data access: Ontop of databases. In Proc. of the 12th Int. Semantic Web Conf. (ISWC), volume 8218 of LNCS, pages 558–573. Springer, 2013. 28. Despoina Trivela, Giorgos Stoilos, Alexandros Chortaras, and Giorgos Stamou. Optimising resolution-based rewriting algorithms for OWL ontologies. J. of Web Semantics, pages 30– 49, 2015.