Query-based comparison of OBDA specifications Meghyn Bienvenu1 and Riccardo Rosati2 1 Laboratoire de Recherche en Informatique CNRS & Université Paris-Sud, France 2 Dipartimento di Ingegneria informatica, automatica e gestionale Sapienza Università di Roma, Italy Abstract. An ontology-based data access (OBDA) system is composed of one or more data sources, an ontology that provides a conceptual view of the data, and declarative mappings that relate the data and ontology schemas. In order to de- bug and optimize such systems, it is important to be able to analyze and compare OBDA specifications. Recent work in this direction compared specifications us- ing classical notions of equivalence and entailment, but an interesting alternative is to consider query-based notions, in which two specifications are deemed equiv- alent if they give the same answers to the considered query or class of queries for all possible data sources. In this paper, we define such query-based notions of en- tailment and equivalence of OBDA specifications and investigate the complexity of the resulting analysis tasks when the ontology is formulated in DL-LiteR . 1 Introduction Ontology-based data access (OBDA) [13] is a recent paradigm that proposes the use of an ontology as a conceptual, reconciled view of the information stored in a set of existing data sources. The connection between the ontology and the data sources is provided by declarative mappings, that relate the elements of the ontology with the elements of the data sources. The ontology layer is the virtual interface used to access data, through queries over the elements of the ontology. Due to the recent availability of techniques and systems for query processing in this setting [5, 14], the OBDA approach has recently started to be experimented in real ap- plications (see e.g. [1, 7, 10]). In these projects, the construction, debugging and main- tenance of the OBDA specification, consisting of the ontology, the schemas of the data sources, and the mapping, is a non-trivial task. Actually, the size and the complexity of the ontology and, especially, the mappings makes the management of such specifica- tions a practical issue in these projects. Providing formal tools for supporting the above activities is therefore very important for the successful deployment of OBDA solutions. In addition, the OBDA specification plays a major role in query answering, since the form of the specification may affect the system performance in answering queries: different, yet semantically equivalent specifications may give rise to very different ex- ecution times for the same query. So, the study of notions of equivalence and formal comparison of OBDA specifications is also important for optimizing query process- ing in OBDA systems. Indeed, some systems already implement forms of optimization based on transformations of the OBDA specification (an example is [14]). So far, most of the work in OBDA has focused on query answering, often in a simplified setting without any mappings. Very little attention has been devoted to the formal analysis of OBDA specifications. The first approach that explicitly focuses on the formal analysis of OBDA specifications is [12], whose aim is the identification of semantic anomalies in mappings. Such an approach is based on a classical notion of logical equivalence and entailment between OBDA specifications. While it is very natural to resort to such classical notions, a significant alternative in many cases may be the adoption of query-based notions of equivalence and comparison, in which two specifications are compared with respect to a given query or a given class of queries, and are deemed equivalent if they give the same answers to the considered queries for all possible extensions of the data sources. This idea has been already explored in the data exchange and schema mapping literature (see, e.g., [9]) and for description logics for comparing TBoxes and knowledge bases [11, 4]. To the best of our knowledge, it has never been explicitly considered for OBDA specifications. The majority of work on on OBDA has considered conjunctive queries (CQs) as the query language. Therefore, a first natural choice would be to compare OBDA specifica- tions with respect to the whole class of CQs. We thus define and study a notion of CQ- entailment between OBDA specifications that formalizes this case. We also consider the important subclass of instance queries (IQs), i.e., queries that ask for the instances of a single concept or role, and analyze the notion of IQ-entailment between specifications. Moreover, in many application contexts only a (small) set of predefined conjunctive queries are of interest for the OBDA user(s): in such cases, it may be more appropriate to tailor the comparison of specifications to a specific set of queries. For this reason, we also study in this paper the notions of single CQ-entailment and single IQ-entailment, which compare specifications with respect to a single CQ or IQ, respectively. We present a first investigation of the computational complexity of deciding the above forms of entailment for a pair of OBDA specifications. We study ontologies spec- ified in DL-LiteR and three different mapping languages (linear, GAV and GLAV). In all cases, we provide exact complexity bounds for the entailment problem. Our results are summarized in Figure 1. As shown in the table, the complexity of the entailment check ranges from NL (non-deterministic logarithmic space) for linear mappings and IQ-entailment to EXPTIME for CQ-entailment. To obtain these results, we show that instead of considering all possible data instances, it is sufficient to consider a small num- ber of databases of a particular form. We also exploit connections to query containment in the presence of signature restrictions [3] and KB query inseparability [4]. 2 Preliminaries We start from four pairwise disjoint countably infinite set of names: the set of concept names NC , the set of role names NR , the set of relation names Nrel , the set of constant names NI (also called individuals). To introduce OBDA specifications, we first recall the notion of knowledge base (KB) in Description Logics (DLs). A DL KB is a pair hT , Ai, where: T , called the TBox, is the intensional component of the KB, and is constituted by a finite set of axioms expressing intensional knowledge; and A, called the ABox, is a finite set of atomic concept and role assertions (set of ground facts). We assume that the concept, role and constant names occurring in every TBox and ABox belong to NC , NR and NI , respectively. We denote by sig(T ) and sig(A) the set of concept and role names occurring in T and A, respectively. Although the definitions of Section 3 are general, in Section 4 we will focus on the DL DL-LiteR [6]. A DL-LiteR TBox consists of a finite set of concept inclusions B v C and role inclusions R v S, where B, C, R, and S are defined according to the following syntax (where A is a concept name and P is a role name): B → A | ∃R C → B | ¬B R → P | P− S → R | ¬R We now introduce OBDA specifications. As already explained, a mapping asser- tion specifies the semantic relationship between elements of a DL ontology, specified through a TBox, to elements of a database. Such a relationship is specified through a pair of queries, one over the TBox signature, and the other one over the database signature. In this paper, we focus on the case where both queries involved in the map- ping assertion are conjunctive queries: such mapping assertions are called GLAV (for ‘global-as-view’) mappings in the literature [8]. Mappings are formally defined as follows. An atom is an expression r(t) where r is a predicate and t is a tuple of variables and constants. Then, a (GLAV) mapping as- sertion m is an expression of the form qs (x) → qo (x), where qs (x) (called the body of m, body(m)) is a conjunction of atoms over predicates from Nrel and constants from NI , qo (x) (called the head of m, head (m)) is a conjunction of atoms using predicates from NC ∪ NR and constants from NI , and x, called the frontier variables of m, are the variables that appear both in qo and in qs . The arity of m is the number of its frontier variables. When qo (x) has the form p(x) (i.e., qo (x) is a single atom whose arguments are x), we call m a GAV mapping assertion. A linear mapping assertion is a GAV asser- tion whose body consists of a single atom. A (GLAV) mapping M is a set of mapping assertions. A GAV mapping is a mapping constituted of GAV mapping assertions. A linear mapping is a set of linear mapping assertions. Without loss of generality, we as- sume that in every mapping M, every pair of distinct mapping assertions uses pairwise disjoint sets of variables. An OBDA specification is a pair Γ = hT , Mi, where T is a TBox and M is a mapping. Given a mapping assertion m of arity n and an n-tuple of constants a, we denote by m(a) the assertion obtained from m by replacing the frontier variables with the constants in a. Given a set of atoms AT , gr (AT ) is the function that returns the set of ground atoms obtained from AT by replacing every variable symbol x with a fresh constant symbol cx . We assume that such constant symbols do not occur elsewhere in the application context of the function gr (i.e., in the TBoxes, mappings and databases involved). In this paper, a database (instance) is a set of ground atoms using relation names from Nrel and constant names from NI . Given a mapping M and a database instance D, we define the ABox for D and M, denoted as AM,D , as the following ABox: { β ∈ gr (head (m(a))) | m ∈ M and and D |= ∃y.body(m(a)) } where we assume that y are the variables occurring in body(m(a)). Given an OBDA specification Γ = hT , Mi and a database instance D, we define the models of Γ and D, denoted as M ods(Γ, D) as the set of models of the KB hT , AM,D i. When such a set is empty, we write hT , M, Di |= ⊥ (analogously, when a KB hT , Ai has no models, we write hT , Ai |= ⊥). We are interested in the problem of answering instance queries and conjunctive queries over a pair composed of an OBDA specification and a database. A Boolean conjunctive query (CQ) is an expression of the form ∃x(α1 ∧ . . . ∧ αn ) where every αi is an atom whose arguments are either constants or variables from x. For a non-Boolean CQ q with answer variables v1 , . . . , vk , a tuple of constants a = ha1 , . . . , ak i occurring in A is said to be a certain answer for q w.r.t. K just in the case that K |= q(a), where q(a) is the Boolean query obtained from q by replacing each vi by ai . We call instance query (IQ) a CQ consisting of a single atom of the form A(x) or R(x, y), with A concept name, R role name, and x, y distinct free variables. We denote by sig(q) the set of concept and role names occurring in a query q. We use CQ (resp. IQ ) to refer the set of all CQs (resp. IQs) over the DL signature NC ∪ NR . Given an OBDA specification Γ = hT , Mi, a database instance D, and a conjunc- tive query q, we define the certain answers for q w.r.t. (Γ, D) as the tuples of constants from D that are certain answers for q w.r.t. hT , AM,D i. In particular, for Boolean CQs, we say that q is entailed by (Γ, D), denoted by (Γ, D) |= q (or hT , M, Di |= q), if I |= q for every I ∈ M ods(Γ, D). Note that for non-Boolean queries, we only consider tuples of constants from D, in order to avoid including those fresh constants introducing in AM,D by grounding existential variables in mapping heads. 3 Query-based Entailment for OBDA Specifications We start by recalling the classical notion of entailment between OBDA specifications. Definition 1 (Logical entailment). An OBDA specification hT1 , M1 i logically entails hT2 , M2 i, written hT1 , M1 i |=log hT2 , M2 i if and only the first-order theory T1 ∪ M1 logically entails the first-order theory T2 ∪ M2 . We now define the formal notions of query-based entailment between OBDA spec- ifications considered in this paper. First, we introduce a notion of entailment that com- pares specifications based upon the constraints they impose regarding consistency. Definition 2 (⊥-entailment). Let q be a query. An OBDA specification hT1 , M1 i ⊥- entails hT2 , M2 i, written hT1 , M1 i |=⊥ hT2 , M2 i, iff, for every database D, hT2 , M2 , Di |= ⊥ ⇒ hT1 , M1 , Di |= ⊥ Next, we define a notion of query entailment between OBDA specifications with respect to a single query. Definition 3 (Single query entailment). Let q be a query. An OBDA specifica- tion hT1 , M1 i q-entails hT2 , M2 i, written hT1 , M1 i |=q hT2 , M2 i, if and only if hT1 , M1 i |=⊥ hT2 , M2 i and for every database D, hT2 , M2 , Di |= q(a) ⇒ hT1 , M1 , Di |= q(a) When q is an IQ, we call the entailment relation in the preceding definition single IQ- entailment, while we call it single CQ-entailment if q is an arbitrary CQ. We can generalize the previous definition to classes of queries as follows. Definition 4 (Query entailment). Let L be a (possibly infinite) set of queries. An OBDA specification hT1 , M1 i L-entails hT2 , M2 i, written hT1 , M1 i |=L hT2 , M2 i iff hT1 , M1 i |=⊥ hT2 , M2 i and hT1 , M1 i |=q hT2 , M2 i for every query q ∈ L. When L = IQ, we call the preceding entailment relation IQ-entailment, and for L = CQ, we use the term CQ-entailment. Note that, for each of the above notions of entailment, a notion of equivalence be- tween OBDA specifications can be immediately derived, corresponding to entailment in both directions (we omit the formal definitions due to space limitations). The following property immediately follows from the above definitions. Proposition 1. Let hT1 , M1 i, hT2 , M2 i be two OBDA specifications, and let L1 be a set of queries. Then, hT1 , M1 i |=log hT2 , M2 i implies hT1 , M1 i |=L1 hT2 , M2 i. More- over, if L2 ⊆ L1 , then hT1 , M1 i |=L1 hT2 , M2 i implies hT1 , M1 i |=L2 hT2 , M2 i. As a consequence of the above property, we have that logical entailment implies CQ-entailment, and CQ-entailment implies IQ-entailment. The converse implications do not hold, as the following examples demonstrate. Example 1. We start by illustrating the difference between logical entailment and CQ-entailment. Consider a database containing instances for the relation EXAM(studentName,courseName,grade,date). Then, let Γ1 = hT1 , M1 i, where T1 = {Student v Person, PhDStudent v Student} M1 = {EXAM(x, y, z, w) → Student(x)} and let Γ2 = hT2 , M2 i, where T2 = {Student v Person} and M2 = M1 . It is immediate to verify that Γ2 6|=log Γ1 . However, we have that Γ2 |=CQ Γ1 . Indeed, Γ2 |=CQ Γ1 can be intuitively explained by the fact that the mapping M1 does not retrieve any instances of the concept PhDStudent (and there are no subclasses that can indirectly populate it), so the presence of the inclusion PhDStudent v Student in T1 does not have any effect on query answering; in particular, every CQ that mentions the concept PhDStudent cannot be entailed both under Γ1 and under Γ2 . Notice also that, if we modify the mapping M1 to map PhDStudent instead of Student (i.e., if M1 were {EXAM(x, y, z, w) → PhDStudent(x)}), then CQ-entailment between Γ2 and Γ1 would no longer hold. Next, consider Γ3 = hT3 , M3 i, where T3 = ∅ and M3 = {EXAM(x, y, z, w) → Student(x), EXAM(x, y, z, w) → Person(x)} Again, it it immediate to see that Γ3 6|=log Γ2 , while we have that Γ3 |=CQ Γ2 . Indeed, Γ3 |=CQ Γ2 follows informally from the fact that the mapping M3 is able to “exten- sionally” simulate the inclusion Student v Person of T2 , which is sufficient for Γ3 to entail every CQ in the same way as Γ2 . Example 2. We slightly modify the previous example to show the difference between CQ-entailment and IQ-entailment. Consider Γ1 = hT1 , Mi and Γ2 = hT2 , Mi where T1 = {Student v Person, Student v ∃takesCourse} T2 = {Student v Person} M = {EXAM(x, y, z, w) → Student(x)} Type of entailment Type of mapping Complexity logical GAV / GLAV NP-complete linear NL-complete ⊥ GAV / GLAV NP-complete linear NL-complete CQ linear / GAV / GLAV EXPTIME-complete IQ linear NL-complete GAV / GLAV NP-complete single CQ linear / GAV / GLAV Π2p -complete single IQ linear NL-complete GAV / GLAV NP-complete Fig. 1. Complexity results for entailment between OBDA specifications in DL-LiteR Then, it can be easily verified that Γ2 6|=CQ Γ1 . Indeed, consider the Boolean CQ ∃x, y takesCourse(x, y): for every database D, this query is not entailed by the pair (Γ2 , D), while this is not the case when the specification is Γ1 . On the other hand, we have that Γ2 |=IQ Γ1 : in particular, for every database D and for every pair of individ- uals a, b, neither (Γ1 , D) nor (Γ2 , D) entails the IQ takesCourse(a, b). Finally, let q be the non-Boolean CQ ∃x takesCourse(x, y): then, it can be easily verified that the single CQ-entailment Γ2 |=q Γ1 holds; while for the CQ q 0 of the form ∃y takesCourse(x, y), the single CQ-entailment Γ2 |=q0 Γ1 does not hold. 4 Complexity Results for DL-LiteR In this section, we investigate the computational properties of the different notions of entailment between OBDA specifications defined in the previous section. For this first study, we focus on the case in which the TBox is formulated in DL-LiteR [6], as it is the basis for the OWL 2 QL profile and one of the most commonly considered DLs for OBDA. The results of our complexity analysis are displayed in Figure 1. In what follows, we formally state the different complexity results and provide some ideas about the proofs. We begin by considering the complexity of deciding classical entailment between OBDA specifications. Theorem 1. Classical logical entailment for OBDA specifications based upon DL-LiteR TBoxes is NP-complete for GAV or GLAV mappings, and NL-complete for linear mappings. Proof. Let Γ1 = hT1 , M1 i, Γ2 = hT2 , M2 i. First, it is easy to see that Γ1 |=log Γ2 iff (i) T1 |= T2 ; and (ii) Γ1 |=log M2 . Property (i) can be decided in NL [2]. Property (ii) can be decided by an algorithm that, for every assertion m ∈ M2 , first builds a database D corresponding to gr (body(m)) (i.e., obtained by “freezing” the body of m), and then checks whether hΓ1 , Di entails the CQ corresponding to the head of m whose frontier variables have been replaced by the corresponding constants. This algorithm runs in NP in the case of GAV and GLAV mappings, and in NL in the case of linear mappings, which implies the overall upper bounds in the theorem statement. The lower bound for GAV mappings can be obtained through an easy reduction of conjunctive query containment to logical entailment, while the one for linear mappings follows from a reduction of the entailment of a concept inclusion axiom in a DL-LiteR TBox. t u We next consider ⊥-entailment. Our upper bounds rely on the following result that shows it is sufficient to consider a small number of small databases. Theorem 2. Let q be a CQ, and let Γ1 = hT1 , M1 i and Γ2 = hT2 , M2 i be OBDA specifications such that T1 , T2 are formulated in DL-LiteR , and M1 , M2 are GLAV mappings. Then Γ1 |=⊥ Γ2 if and only if hT1 , M1 , Di |= ⊥ for every database D satisfying the following condition:1 – Condition 1: D is obtained by (i) taking two mapping assertions m1 , m2 from M2 , (ii) selecting atoms α1 and α2 from head (m1 ) and head (m2 ) respectively, (iii) identifying in m1 and m2 some variables from α1 and α2 in such a way that hT , gr ({α1 , α2 })i |= ⊥, (iv) setting D equal to gr (body(m1 ) ∪ body(m2 )). Proof. The one direction is immediate from the definitions. For the interesting direc- tion, let us suppose that hT1 , M1 , Di |= ⊥ for every database D satisfying Condition 1. Let us further suppose that we have hT2 , M2 , D0 i |= ⊥, where D0 may be any database. We thus have hT2 , AM2 ,D0 i |= ⊥. It is well known that every minimal in- consistent subset of a DL-LiteR KB contains at most two ABox assertions, so there must exist a subset A0 ⊆ AM2 ,D with |A0 | ≤ 2 such that hT2 , A0 i |= ⊥. Let γ be the conjunction of atoms obtained by taking for each ABox assertion in A0 , a mapping assertion that produced it, identifying those variables (and only those variables) needed to produce the ABox assertion(s), and then taking the conjunction of the atoms in the bodies. We observe that by construction Dγ = gr (γ) satisfies Condition 1 and is such that hT1 , M1 , Dγ i |= ⊥. By construction, there is a homomorphism of γ into the origi- nal database D0 . It follows that hT1 , M1 , D0 i |= ⊥. t u Using the preceding result, we can pinpoint the complexity of ⊥-entailment. Theorem 3. The ⊥-entailment problem is NP-complete for OBDA specifications based upon DL-LiteR TBoxes and GAV / GLAV mappings, and NL-complete in the case of linear mappings. Proof. We know from Theorem 2 that Γ1 |=⊥ Γ2 iff hT1 , M1 , Di |= ⊥ for every database D satisfying Condition 1. For the GAV / GLAV case, we guess one such database D and a polynomial-size proof that hT1 , M1 , Di |= ⊥. For the linear case, we note that the databases satisfying Condition 1 contain at most 2 tuples each and can be enumerated in logarithmic space. For every such database, we can check using an NL oracle whether hT1 , M1 , Di |= ⊥. Since LNL = NL, we obtain an NL procedure. t u Next we consider entailment with respect to a specific query. We again start by showing it is sufficient to consider a finite number of databases of a particular form. 1 Recall that distinct mapping assertions in a mapping have no common variables. Theorem 4. Let q be a CQ, and let Γ1 = hT1 , M1 i and Γ2 = hT2 , M2 i be OBDA specifications such that T1 , T2 are formulated in DL-LiteR , and M1 , M2 are GLAV mappings. Then Γ1 |=q Γ2 if and only if Γ1 |=⊥ Γ2 and hT2 , M2 , Di |= q(a) implies hT1 , M1 , Di |= q(a) for every database D satisfying the following condition: – Condition 2: D is obtained by (i) taking k ≤ |q| mapping assertions m1 , m2 , . . . , mk from M2 , (ii) identifying some of the frontier variables in m1 , m2 , . . . , mk , (iii) letting D = gr (body(m1 ) ∪ body(m2 ) ∪ . . . ∪ body(mk )). If q is an IQ, then the latter condition can be replaced by: – Condition 3: D is obtained by (i) taking a mapping assertion m from M2 and choosing an atom α ∈ head (m), (ii) possibly identifying in m the (at most two) frontier variables appearing in α, and (iii) letting D = gr (body(m)). Proof. Again the one direction is immediate. To show the non-trivial direction, let us suppose that Γ1 |=⊥ Γ2 and that hT2 , M2 , Di |= q(c) implies hT1 , M1 , Di |= q(c) for every tuple c and database D satisfying Condition 2 (we return later to the case of IQs). Let us further suppose that we have hT2 , M2 , D0 i |= q(a). The first pos- sibility is that hT2 , M2 , D0 i |= ⊥, in which case we have hT1 , M1 , D0 i |= ⊥ be- cause of Γ1 |=⊥ Γ2 . We thus obtain hT1 , M1 , D0 i |= q0 (a). The other possibility is that hT2 , M2 , D0 i |= q0 (a) and hT2 , M2 , D0 i 6|= ⊥. If hT1 , M1 , D0 i |= ⊥, we immediately obtain hT1 , M1 , D0 i |= q0 (a). Otherwise, let AM2 ,D0 be the ABox for M2 and D0 . Since hT2 , M2 , D0 i |= q0 (a), we have hT2 , AM2 ,D0 i |= q0 (a). It is a well-known property of DL-LiteR that there exists a subset A0 ⊆ AM2 ,D0 with |A0 | ≤ |q0 | such that hT2 , A0 i |= q0 (a). Let |A0 | = k, and let β1 , . . . , βk be the ABox assertions in A0 . For each βi , we choose a mapping assertion mi ∈ M2 and a homomorphism hi of body(mi ) into D0 such that gr (hi (head (m))) contains βi . We also select an atom αi ∈ head (m) such that gr (h(α)) = β. Let m0i be ob- tained from mi by identifying frontier variables y and z if hi (y) = hi (z), and set D0 = gr (body(m01 ) ∪ . . . ∪ body(m0k )). It is easy to see that D0 satisfies Condition 2. Moreover, by construction, the ABox AM2 ,D0 contains a subset A00 that is isomorphic to A0 , and so hT2 , M2 , D0 i |= q0 (a0 ) where a0 is tuple corresponding to a according to this isomorphism. Applying our assumption, we obtain hT1 , M1 , D0 i |= q0 (a0 ). Using the fact that there is a homomorphism of body(m01 ) ∪ . . . ∪ body(m0k ) into D0 that is an isomorphism on the frontier variables, we obtain hT1 , M1 , Di |= q0 (a). Finally, for the case of instance queries, we simply note that we have k = 1, and it is only necessary to identify those variables in the head atom of the mapping that leads to introducing the single ABox assertion of interest. This yields Condition 3. t u We pinpoint the complexity of single CQ-entailment, showing it to be Π2p -complete. Theorem 5. The single CQ-entailment problem is Π2p -complete for OBDA specifica- tions based upon DL-LiteR TBoxes and GLAV mappings. The lower bound holds even for linear mapping assertions and when both TBoxes are empty. Proof. For the upper bound, consider two OBDA specifications Γ1 = hT1 , M1 i and Γ2 = hT2 , M2 i. From Theorems 2 and 4, we know that Γ1 6|=q Γ2 if and only if one of the following holds: – there is a database D satisfying Condition 2 such that hT1 , M1 , Di 6|= ⊥; – there is a database D satisfying Condition 2 such that hT2 , M2 , Di |= q(a), hT2 , M2 , Di 6|= ⊥, and hT1 , M1 , Di 6|= q(a). The first item can be checked using an NP oracle (by Theorem 3). To check the sec- ond item, we remark that the size of databases satisfying Condition 2 cannot exceed max(2, |q|) · maxbody, where maxbody is the maximum number of atoms appearing in the body of a mapping assertion in M2 . It follows that to show that the second item above is violated, we can guess a database D of size at most max(2, |q|) · maxbody together with a tuple of constants a and a polynomial-size proof that hT2 , M2 , Di |= q(a), and then we can verify using an NP oracle that hT1 , M1 , Di 6|= q(a). We there- fore obtain a Σ2p procedure for deciding the complement of our problem. For the lower bound, we utilize a result from [3] on query containment over signature-restricted ABoxes. In that paper, it is shown how, given a 2QBF ∀u∃vϕ(u, v), one can construct a TBox T , Boolean CQs q1 and q2 , and a signature Σ such that ∀u∃vϕ(u, v) is valid iff T , A |= q1 ⇒ T , A |= q2 for all ABoxes A with sig(A) ⊆ Σ. We will not detail the construction but simply remark that the same TBox T = {T v V, F v V } is used for all QBFs, the signature Σ is given by (sig(T ) ∪ sig(q1 ) ∪ sig(q2 )) \ {V }, and the query q2 is such that V 6∈ sig(q2 ). In what follows, we will show how given T , q1 , q2 , and Σ as above, we can reduce the problem of testing whether T , A |= q1 implies T , A |= q2 for all Σ-ABoxes to the problem of single CQ entailment. We will use Σ for our database instances, and we create two copies Σ1 = {P 1 | P ∈ Σ} and Σ2 = {P 2 | P ∈ Σ} of the signature Σ to be used in the head of mapping assertions. Next, we define sets of mapping as- sertions copy1 (Σ) and copy2 (Σ) that simply copies all of the predicates in Σ into the corresponding symbol in Σ1 (resp. Σ2 ). Formally, for j ∈ {1, 2}, copyj (Σ) = {A(x) → Aj (x) | A ∈ Σ ∩ NC } ∪ {R(x, y) → Rj (x, y) | R ∈ Σ ∩ NR } We further define, given a data signature Λ1 and DL signature Λ2 , a set populate(Λ1 , Λ2 ) of mapping assertions that populates the relations in Λ2 using all possible combinations of the constants appearing in tuples over Λ1 : populate(Λ1 , Λ2 ) ={P1 (x1 , . . . , xk ) → P2 (x01 , . . . , x0` ) | P1 ∈ Λ1 , arity(P ) = k, P2 ∈ Λ2 , arity(P ) = `, {x01 , . . . , x0` } ⊆ {x1 , . . . , xk )} Using copy1 (Σ), copy2 (Σ), populate(Σ, Σ 1 ), and populate(Σ, Σ 2 ), we construct the following mappings: M1 = populate(Σ, Σ 1 ) ∪ copy2 (Σ) M2 = copy1 (Σ) ∪ populate(Σ, Σ 2 ) ∪ {T (x) → V (x), F (x) → V (x)} Observe that both mappings are linear. For the query, we let q10 (resp. q20 ) be obtained from q1 (resp. q2 ) by replacing every predicate P by P 1 ( resp. P 2 ). We also rename variables so that q10 and q20 do not share any variables. We then let q be the CQ obtained by taking the conjunction of q10 and q20 and existentially quantifying all variables. In the appendix, we show that h∅, M1 i |=q h∅, M2 i iff T , A |= q1 ⇒ T , A |= q2 for all Σ-ABoxes. By combining this with the reduction from [3], we obtain a reduction from universal 2QBF to the q-entailment problem, establishing Π2p -hardness of the latter. t u If we consider IQs instead, the complexity drops to either NP- or NL-complete. Theorem 6. The single IQ-entailment problem is NP-complete for OBDA specifica- tions based upon DL-LiteR TBoxes and either GAV or GLAV mappings. It is NL- complete if linear mappings are considered. Proof. We give the arguments for GAV and GLAV mappings (for linear case, see the appendix). For the NP upper bound, consider two OBDA specifications Γ1 = hT1 , M1 i and Γ2 = hT2 , M2 i, and let q be an IQ. By Theorem 4, Γ1 |=q Γ2 if and only if Γ1 |=⊥ Γ2 and hT2 , M2 , Di |= α implies hT1 , M1 , Di |= α for all databases D satisfying Condition 3 and for all Boolean IQs α obtained by instantiating the variable(s) in q with constant(s) from D. We already know that it is in NP to test whether Γ1 |=⊥ Γ2 . For the second property, observe that there are only polynomially many databases satisfying Condition 2, since each corresponds to choosing a mapping assertion m in M2 , an atom α ∈ head (m), and deciding whether or not to identify variables in α. For every such database D, we compute (in polynomial time) the set of Boolean IQs β obtained by instantiating the IQ q with constants from D for which hT2 , M2 , gr (head (m))i |= β. For every such β, we guess a polynomial-size proof that hT1 , M1 , Di |= β. If all of our polynomially many guesses succeed, then the procedure returns yes, and otherwise no. By grouping all of the guesses together, we obtain an NP decision procedure. The NP lower bound is by reduction from the NP-complete CQ containment prob- lem: given two CQs q1 , q2 both having a single answer variable x, we have q1 ⊆ q2 iff h∅, {q2 → A(x)}i |=A(x) h∅, {q1 → A(x)}i, where A is a concept name that does not appear in either of q1 and q2 . t u Finally, we consider entailment with respect to entire classes of queries. Again, we can show it is sufficient to consider a small number of databases of a particular form. Theorem 7. Let Γ1 = hT1 , M1 i and Γ2 = hT2 , M2 i be as in Theorem 4. For L ∈ {CQ, IQ}, Γ1 |=L Γ2 if and only if Γ1 |=⊥ Γ2 and hT2 , M2 , Di |= q(a) implies hT1 , M1 , Di |= q(a) for every q ∈ L and every database D that satisfies Condition 3. We show that testing CQ-entailment is much more difficult than for single CQs. Both the upper and lower bounds use recent results on KB query inseparability [4]. Theorem 8. CQ-entailment is EXPTIME-complete for OBDA specifications based upon DL-LiteR TBoxes and either GLAV, GAV, or linear mappings. Proof. We start with the proof of membership in EXPTIME. Consider OBDA spec- ifications Γ1 = hT1 , M1 i and Γ2 = hT2 , M2 i. By Theorem 7, Γ1 |=CQ Γ2 if and only if Γ1 |=⊥ Γ2 and hT2 , M2 , Di |= q(a) implies hT1 , M1 , Di |= q(a) for ev- ery choice of q(a) and every database D satisfying Condition 3. We know that testing Γ1 |=⊥ Γ2 can be done in NP (Theorem 3). To decide whether the second property holds, we consider each of the (polynomially many) databases satisfying Condition 3. For every such database D, we generate the two ABoxes AM1 ,D and AM2 ,D and the corresponding KBs K1 = hT1 , AM1 ,D i and K2 = hT2 , AM2 ,D i. We then test whether it is the case that for every CQ q over sig(K2 ), K2 |= q(a) implies K1 |= q(a), and we return no if this is not the case. The preceding check corresponds to the Σ-query en- tailment problem for DL-LiteR KBs, which has been recently studied in [4] and shown to be EXPTIME-complete. We therefore obtain an EXPTIME procedure for deciding CQ-entailment between OBDA specifications. Our lower bound also makes use of the recent work on query inseparability of DL-LiteR knowledge bases. In [4], the following problem is shown to be EXPTIME- complete: given DL-LiteR TBoxes T1 and T2 that are consistent with the ABox {A(c)}, decide whether the certain answers for q w.r.t. hT2 , {A(c)}i are contained in those for hT1 , {A(c)}i for every CQ q with sig(q) ⊆ sig(T2 ). To reduce this problem to the CQ- entailment problem for OBDA specifications, we consider the following linear map- ping that populates a fresh concept A0 with all constants of a Σ-instance (refer to the proof of Theorem 5 for the definition of populate): M1 = M2 = populate(Σ, {A0 }). To complete the proof, we show in the appendix that hT1 , M1 i |=CQ hT2 , M2 i iff hT2 , {A(c)}i |= q(a) implies hT1 , {A(c)}i |= q(a) for every CQ q, where T10 and T20 are obtained from T1 and T2 by replacing A with A0 . t u Our final result shows that IQ-entailment has the same complexity as single IQ- entailment. The proof proceeds similarly to the proof of Theorem 6. Theorem 9. IQ-entailment is NP-complete for OBDA specifications based upon DL-LiteR TBoxes and either GAV or GLAV mappings. It is NL-complete if linear map- pings are considered. 5 Conclusion and Future Work In this paper, we have introduced notions of query-based entailment of OBDA spec- ifications and have analyzed the complexity of checking query-based entailment for different classes of queries and mappings and for TBoxes formulated in DL-LiteR . The present work constitutes only a first step towards a full analysis of query-based forms of comparing OBDA specifications, and can be extended in several directions: – First, it would be interesting to extend the computational analysis of query entail- ment to other DLs beyond DL-LiteR . For instance, one interesting question for DLs with functional or cardinality restrictions concerns the impact of the Unique Name Assumption on the complexity of (and the techniques for) query entailment. – Second, other forms of mapping beyond GAV and GLAV could be analyzed. In par- ticular, we would like to see whether decidability of query entailment is preserved if we add some restricted form of inequality or negation to the mapping bodies. – Third, we could introduce a query signature and only test entailment for queries formulated in the given signature, as has been done for TBox and KB query insep- arability [4]. In fact, all of the complexity upper bounds in this paper hold also if we introduce a query signature, but this may not be the case for other DLs. – Finally, to explore the impact of restricting the set of possible databases, we could extend the computational analysis to database schemas with integrity constraints. Acknowledgments. This research has been partially supported by the EU under FP7 project Optique (grant n. FP7-318338) and by the French National Research Agency under ANR project PAGODA (grant n. ANR-12-JS02-007-01). References 1. N. Antonioli, F. Castanò, C. Civili, S. Coletta, S. Grossi, D. Lembo, M. Lenzerini, A. Poggi, D. F. Savo, and E. Virardi. Ontology-based data access: the experience at the Italian De- partment of Treasury. In Proc. of the Industrial Track of the 25th Int. Conf. on Advanced Information Systems Engineering (CAiSE), 2013. 2. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The DL-Lite family and relations. J. of Artificial Intelligence Research, 36:1–69, 2009. 3. M. Bienvenu, C. Lutz, and F. Wolter. Query containment in description logics reconsidered. In Proc. of the 13th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR), 2012. 4. E. Botoeva, R. Kontchakov, V. Ryzhikov, F. Wolter, and M. Zakharyaschev. Query insepara- bility for description logic knowledge bases. In Proc. of the 14th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR), 2014. 5. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodriguez-Muro, R. Rosati, M. Ruzzi, and D. F. Savo. The Mastro system for ontology-based data access. Semantic Web J., 2(1):43–53, 2011. 6. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 7. D. Calvanese, M. Giese, P. Haase, I. Horrocks, T. Hubauer, Y. Ioannidis, E. Jiménez-Ruiz, E. Kharlamov, H. Kllapi, J. Klüwer, M. Koubarakis, S. Lamparter, R. Möller, C. Neuenstadt, T. Nordtveit, Ö. Özcep, M. Rodriguez-Muro, M. Roshchin, F. Savo, M. Schmidt, A. Soylu, A. Waaler, and D. Zheleznyakov. Optique: OBDA solution for big data. In Revised Selected Papers of ESWC 2013 Satellite Events, volume 7955 of Lecture Notes in Computer Science, pages 293–295, 2013. 8. A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012. 9. G. Gottlob, R. Pichler, and V. Savenkov. Normalization and optimization of schema map- pings. Very Large Database J., 20(2):277–302, 2011. 10. E. Kharlamov, M. Giese, E. Jiménez-Ruiz, M. G. Skjæveland, A. Soylu, D. Zheleznyakov, T. Bagosi, M. Console, P. Haase, I. Horrocks, S. Marciuska, C. Pinkel, M. Rodriguez-Muro, M. Ruzzi, V. Santarelli, D. F. Savo, K. Sengupta, M. Schmidt, E. Thorstensen, J. Trame, and A. Waaler. Optique 1.0: Semantic access to big data: The case of Norwegian Petroleum Directorate’s FactPages. In Proc. of the ISWC Posters & Demos Track, pages 65–68, 2013. 11. B. Konev, R. Kontchakov, M. Ludwig, T. Schneider, F. Wolter, and M. Zakharyaschev. Con- junctive query inseparability of OWL 2 QL TBoxes. In Proc. of the 25th AAAI Conf. on Artificial Intelligence (AAAI), 2011. 12. D. Lembo, J. Mora, R. Rosati, D. F. Savo, and E. Thorstensen. Towards mapping analysis in ontology-based data access. In Proc. of the 8th Int. Conf. on Web Reasoning and Rule Systems (RR), pages 108–123, 2014. 13. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking data to ontologies. J. on Data Semantics, X:133–173, 2008. 14. M. Rodriguez-Muro, R. Kontchakov, and M. Zakharyaschev. Ontology-based data access: Ontop of databases. In Proc. of the 12th Int. Semantic Web Conf. (ISWC), 2013.