=Paper=
{{Paper
|id=Vol-1378/paper9
|storemode=property
|title=From Classical to Consistent Query Answering under Existential Rules
|pdfUrl=https://ceur-ws.org/Vol-1378/AMW_2015_paper_9.pdf
|volume=Vol-1378
|dblpUrl=https://dblp.org/rec/conf/amw/LukasiewiczMPS15
}}
==From Classical to Consistent Query Answering under Existential Rules==
From Classical to Consistent Query Answering under Existential Rules Thomas Lukasiewicz1 , Maria Vanina Martinez2 , Andreas Pieris3 , and Gerardo I. Simari2 1 Department of Computer Science, University of Oxford, UK thomas.lukasiewicz@cs.ox.ac.uk 2 Departamento de Ciencias e Ingenierı́a de la Computación, Universidad Nacional del Sur and CONICET, Argentina {mvm,gis}@cs.uns.edu.ar 3 Institute of Information Systems, Vienna University of Technology, Austria pieris@dbai.tuwien.ac.at Abstract. Querying inconsistent ontologies is an intriguing new problem that gave rise to a flourishing research activity in the description logic (DL) com- munity. The computational complexity of consistent query answering under the main DLs is rather well understood; however, little is known about existential rules. The goal of the current work is to perform an in-depth analysis of the com- plexity of consistent query answering under the main decidable classes of exis- tential rules enriched with negative constraints. Our investigation focuses on the standard inconsistency-tolerant semantics, namely, the AR semantics. We estab- lish generic complexity results, which demonstrate the tight connection between classical and consistent query answering. These results allow us to obtain in a uniform way a relatively complete picture of the complexity of our problem. 1 Introduction An ontology is an explicit specification of a conceptualization of an area of interest. One of the main applications of ontologies is in ontology-based data access (OBDA), where they are used to enrich the extensional data with intensional knowledge. In this setting, description logics (DLs) and rule-based formalisms such as existential rules are popular ontology languages, while conjunctive queries (CQs) form the central querying tool. In real-life applications, involving large amounts of data, it is possible that the data are inconsistent with the ontology. Since standard ontology languages adhere to the classi- cal FOL semantics, inconsistencies are nothing else than logical contradictions. Thus, the classical inference semantics fails terribly when faced with an inconsistency, since everything follows from a contradiction. This demonstrates the need for developing inconsistency-tolerant semantics. There has been a recent and increasing focus on the development of such semantics for query answering purposes. Consistent query answering, first developed for relational databases [1] and then generalized as the AR semantics for several DLs [9], is the most widely accepted semantics for querying inconsistent ontologies. The AR semantics is based on the idea that an answer is considered to be valid if it can be inferred from each of the repairs of the extensional data set D, i.e., the ⊆-maximal consistent subsets of D. The complexity of query answering under the AR semantics when the ontology is described using one of the central DLs is rather well understood. The data and com- bined complexity were studied in [11] for a wide spectrum of DLs, while the work [2] identifies cases for simple ontologies (within the DL-Lite family) for which tractable data complexity results can be obtained. On the other hand, little is known when the on- tology is described using existential rules (a.k.a. tuple-generating dependencies (TGDs) and Datalog± rules), that is, formulas of the form ∀X∀Y(φ(X, Y) → ∃Z(ψ(X, Z))), and negative constraints (NCs) of the form ∀X(φ(X) → ⊥), where ⊥ denotes the truth constant false. Our main goal in this work is to perform an in-depth analysis of the data and com- bined complexity of consistent query answering under the main decidable classes of existential rules, enriched with negative constraints. Let us recall that the main (syntac- tic) conditions on existential rules that guarantee the decidability of query answering are guardedness [3], stickiness [4] and acyclicity. Interestingly, our complexity analy- sis shows that a systematic and uniform way for transferring complexity results from classical to consistent query answering can be formally established. To briefly summarize the main contributions: – We present generic complexity results, which demonstrate the tight connection be- tween classical and consistent query answering (Theorems 1 and 2). – By exploiting our generic theorems, we obtain a (nearly) complete picture of the combined and data complexity of consistent query answering (Table 2). For more details we refer the reader to [10]. 2 Consistent Query Answering In the classical setting of CQ answering, given a database D and a set Σ of TGDs and NCs, if the models of D and Σ, denoted mods(D, Σ), is empty, then every query is entailed since everything is inferred from a contradiction. Example 1. Consider the database D = {professor (John), fellow (John)}, asserting that John is both a professor and a fellow, and the set Σ of TGDs and NCs consisting of ∀X(professor (X) → ∃Y (faculty(X) ∧ teaches(X, Y ))) ∀X(fellow (X) → faculty(X)) ∀X(professor (X) ∧ fellow (X) → ⊥), expressing that each professor is a faculty member who teaches a course, each fellow is a faculty member, and professors and fellows form disjoint sets. It is easy to see that mods(D, Σ) = ∅, since John violates the disjointness constraint; thus, for every (Boolean) CQ q, (D ∧ Σ) |= q. As said above, the AR semantics is the standard semantics for querying inconsistent ontologies. A key notion, which is necessary for defining the AR semantics, is that of repair, which is a ⊆-maximal consistent subset of the given database. Definition 1. Consider a database D, and a set Σ of TGDs and NCs. A repair of D and Σ is some D′ ⊆ D such that (i) mods(D′ , Σ) ̸= ∅; and (ii) there is no a ∈ (D \ D′ ) for which mods(D′ ∪ {a}, Σ) ̸= ∅. Let rep(D, Σ) be the set of repairs of D and Σ. Example 2. Consider the database D and the set Σ of TGDs and NCs given in Exam- ple 1. The set of repairs of D and Σ consists of the following subsets of D: D1 = {professor (John)} D2 = {fellow (John)}. Clearly, we simply need to remove one of the database atoms in order to satisfy the single negative constraint occurring in Σ. The AR semantics [9] is based on the idea that a query can be considered to hold if it can be inferred from each of the repairs. Definition 2. Consider a database D, a set Σ of TGDs and NCs, and a Boolean CQ q. We say that q is entailed by D and Σ under the AR semantics, written (D ∧ Σ) |=AR q, if (D′ ∧ Σ) |= q, for every D′ ∈ rep(D, Σ). Example 3. Consider the database D and the set Σ of TGDs and NCs given in Exam- ple 1, and also the Boolean CQs q1 = faculty(John) q2 = ∃X(teaches(John, X)), where q1 asks whether John is a faculty member, while q2 asks whether John teaches a course. Recall that rep(D, Σ) consists of the databases D1 and D2 given in Example 2. Clearly, (Di ∧ Σ) |= q1 , for each i ∈ {1, 2}, and thus (D ∧ Σ) |=AR q1 . However, even if (D1 ∧ Σ) |= q2 , (D2 ∧ Σ) ̸|= q2 , and therefore (D ∧ Σ) ̸|=AR q2 . In the sequel, we refer to the problem of consistent (Boolean) CQ answering under the AR semantics as AR-CQ answering. 3 Generic Complexity Results We present two generic complexity results that demonstrate the tight connection be- tween classical and consistent CQ answering. These results will automatically provide us with a (nearly) complete picture of the combined and data complexity of AR-CQ an- swering under the main classes of TGDs, enriched with NCs. Given a class C of TGDs, let C⊥ be the formalism obtained by combining C with arbitrary negative constraints. 3.1 Combined Complexity We first focus on the combined complexity. Since we would like to understand how the complexity of our problem is affected when some key parameters are fixed, we also consider the following two variants of the combined complexity: (1) the bounded-arity combined complexity (ba-combined complexity), which is calculated by assuming that the arity of the underlying schema is bounded; and (2) the fixed-program combined complexity (fp-combined complexity), which is calculated by considering the set of TGDs and negative constraints as fixed. We show the following: Theorem 1. Assume that CQ answering under a class C of TGDs is C-complete in (x-)combined complexity, where x ∈ {ba, fp}. Then, the (x-)combined complexity of AR-CQ answering under C⊥ is (1) Π2p -complete, if C = NP; and (2) C-complete, if C ⊇ PSPACE is a deterministic class. Proof (sketch). Fix a database D, a set Σ ∈ C⊥ of TGDs and NCs, and a CQ q. The problem of deciding whether (D ∧ Σ) ̸|=AR q can be easily solved via a guess-and- check algorithm. We simply need to apply the following steps: 1. Guess an instance D′ ⊆ D; 2. Verify that D′ ∈ rep(D, Σ); and 3. Verify that (D′ ∧ Σ) ̸|= q. We can show that steps 2 and 3 are not harder than classical query answering, which implies that AR-CQ answering under C⊥ is in coNPC . Therefore, (1) If C = NP, then we get a Π2p upper bound since NP NP = Σ2p and coΣ2p = Π2p ; and (2) If C ⊇ PSPACE is a deterministic class, then we get a C upper bound since NPC = C and coC = C. Regarding the lower bounds, the C-hardness result, when C is deterministic class above PSPACE, follows immediately since CQ answering is a special case of AR-CQ answering. For the Π2p -hardness, we show, by a reduction from the validity problem of 2QBF formulas, that AR-CQ answering under a single negative constraint ∀X(φ(X) → ⊥), where φ consists of two atoms and it uses a single ternary predicate, while the database and the query use only binary and ternary predicates, is already Π2p -hard. 3.2 Data Complexity By providing a similar analysis as above, we can establish the following generic data complexity result: Theorem 2. Assume that CQ answering under a class C of TGDs is C-complete in data complexity. Then, the data complexity of AR-CQ answering under C⊥ is (1) coNP- complete, if C ⊆ PTIME; and (2) C-complete, if C ⊇ PSPACE is a deterministic class. Let us say that AR-CQ answering under a single negative constraint of the form ∀X(p(X) ∧ s(X) → ⊥) and a fixed query is already coNP-hard, which in turn implies the coNP-hardness result in Theorem 2. Actually, the latter is implicit [2, Example 5], and it can be shown by a reduction from a variant of UNSAT, called 2+2UNSAT, where each clause has two positive and two negative literals, where the literals involve either regular variables or the truth constant true or false. 4 From Classical to AR-CQ Answering We now focus on the main decidable classes of TGDs, enriched with NCs, and we show that the complexity of AR-CQ answering can be obtained in a uniform way by exploiting our generic complexity theorems. Recall that the main (syntactic) conditions on TGDs that guarantee the decidability of CQ answering are the following: (1) guard- edness [3], which guarantees the treelikeness of the underlying canonical models; (2) Combined ba-combined fp-combined Data Guarded 2EXPTIME EXPTIME NP PTIME Weakly-Guarded 2EXPTIME EXPTIME EXPTIME EXPTIME Sticky EXPTIME NP NP in AC0 Weakly-Sticky 2EXPTIME 2EXPTIME NP PTIME Acyclic NEXPTIME NEXPTIME NP in AC0 Weakly-Acyclic 2EXPTIME 2EXPTIME NP PTIME Table 1. CQ answering. All results are completeness results, unless stated otherwise. Combined ba-combined fp-combined Data Guarded 2EXPTIME EXPTIME Π2p coNP Weakly-Guarded 2EXPTIME EXPTIME EXPTIME EXPTIME Sticky EXPTIME Π2p Π2p coNP Weakly-Sticky 2EXPTIME 2EXPTIME Π2p coNP Acyclic NEXP - P NE NEXP - P NE Π2p coNP Weakly-Acyclic 2EXPTIME 2EXPTIME Π2p coNP Table 2. AR-CQ answering. A single complexity class in a cell refers to a completeness result, while two classes C1 -C2 refer to C1 -hardness and C2 -membership. stickiness [4], which ensures the termination of backward resolution; and (3) acyclicity, which guarantees the finiteness of the underlying canonical models. Interestingly, each one of the above conditions has its “weakly” counterpart: weak-guardedness [3], weak- stickiness [4] and weak-acyclicity [6], respectively. The complexity of CQ answering under the above classes of TGDs is summarized in Table 1. Clearly, Table 1 and Theo- rems 1 and 2 imply Table 2, apart from the (ba-)combined complexity for acyclic TGDs and NCs; let us briefly comment on this. The (ba-)combined complexity of CQ answering under acyclic TGDs has to our knowledge never been explicitly studied; we show that is NEXPTIME-complete: the upper bound is obtained by a reduction to nonrecursive logic programming [5], while the lower bound by a reduction from a TILING problem [7]. Notice that Theorem 1 does not cover the cases where classical CQ answering is in a nondeterministic class above PSPACE. Nevertheless, by exploiting the guess-and-check algorithm discussed in the proof of Theorem 1, we obtain coNP NEXPTIME upper bound. It is implicit in [8] that NP NEXPTIME = P NE , and since P NE is a deterministic class, co P NE = P NE . Consequently, AR-CQ answering under acyclic TGDs and NCs is in P NE in (ba-)combined complexity; the NEXPTIME-hardness is inherited from classical query answering. 5 Conclusions In this work, which is a short version of [10], we performed an in-depth complexity analysis of the problem of consistent query answering under the main decidable classes of TGDs, focussing on the AR semantics. Notably, generic complexity results have been established, which allowed us to obtain a (nearly) complete picture of the com- plexity of our problem in a systematic and uniform way. Regarding future work, apart from bridging the complexity gap for acyclic TGDs, we intend to perform a similar complexity analysis for other important semantics such as the IAR semantics, that is, a sound approximation of the AR semantics [9]. Acknowledgements. This work has been funded by the EPSRC grant EP/J008346/1. M.V. Martinez and G.I. Simari are partially supported by Proyecto PIP-CONICET 112-201101-01000. A. Pieris is also supported by the Austrian Science Fund (FWF): P25207-N23 and Y698. References 1. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS. pp. 68–79 (1999) 2. Bienvenu, M.: On the complexity of consistent query answering in the presence of simple ontologies. In: AAAI (2012) 3. Calı̀, A., Gottlob, G., Kifer, M.: Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013) 4. Calı̀, A., Gottlob, G., Pieris, A.: Towards more expressive ontology languages: The query answering problem. Artif. Intell. 193, 87–128 (2012) 5. Dantsin, E., Voronkov, A.: Complexity of query answering in logic databases with complex values. In: LFCS. pp. 56–66 (1997) 6. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: Semantics and query answer- ing. Theor. Comput. Sci. 336(1), 89–124 (2005) 7. Fürer, M.: The computational complexity of the unconstrained limited domino problem (with implications for logical decision problems). In: Logic and Machines. pp. 312–319 (1983) 8. Hemachandra, L.A.: The strong exponential hierarchy collapses. J. Comput. Syst. Sci. 39(3), 299–322 (1989) 9. Lembo, D., Lenzerini, M., Rosati, R., Ruzzi, M., Savo, D.F.: Inconsistency-tolerant seman- tics for description logics. In: RR. pp. 103–117 (2010) 10. Lukasiewicz, T., Martinez, M.V., Pieris, A., Simari, G.I.: From classical to consistent query answering under existential rules. In: AAAI (2015) 11. Rosati, R.: On the complexity of dealing with inconsistency in description logic ontologies. In: IJCAI. pp. 1057–1062 (2011)