=Paper= {{Paper |id=Vol-1378/paper9 |storemode=property |title=From Classical to Consistent Query Answering under Existential Rules |pdfUrl=https://ceur-ws.org/Vol-1378/AMW_2015_paper_9.pdf |volume=Vol-1378 |dblpUrl=https://dblp.org/rec/conf/amw/LukasiewiczMPS15 }} ==From Classical to Consistent Query Answering under Existential Rules== https://ceur-ws.org/Vol-1378/AMW_2015_paper_9.pdf
    From Classical to Consistent Query Answering under
                      Existential Rules

        Thomas Lukasiewicz1 , Maria Vanina Martinez2 , Andreas Pieris3 , and
                               Gerardo I. Simari2
               1
                  Department of Computer Science, University of Oxford, UK
                         thomas.lukasiewicz@cs.ox.ac.uk
2
  Departamento de Ciencias e Ingenierı́a de la Computación, Universidad Nacional del Sur and
                   CONICET, Argentina {mvm,gis}@cs.uns.edu.ar
        3
          Institute of Information Systems, Vienna University of Technology, Austria
                             pieris@dbai.tuwien.ac.at



       Abstract. Querying inconsistent ontologies is an intriguing new problem that
       gave rise to a flourishing research activity in the description logic (DL) com-
       munity. The computational complexity of consistent query answering under the
       main DLs is rather well understood; however, little is known about existential
       rules. The goal of the current work is to perform an in-depth analysis of the com-
       plexity of consistent query answering under the main decidable classes of exis-
       tential rules enriched with negative constraints. Our investigation focuses on the
       standard inconsistency-tolerant semantics, namely, the AR semantics. We estab-
       lish generic complexity results, which demonstrate the tight connection between
       classical and consistent query answering. These results allow us to obtain in a
       uniform way a relatively complete picture of the complexity of our problem.




1    Introduction

An ontology is an explicit specification of a conceptualization of an area of interest. One
of the main applications of ontologies is in ontology-based data access (OBDA), where
they are used to enrich the extensional data with intensional knowledge. In this setting,
description logics (DLs) and rule-based formalisms such as existential rules are popular
ontology languages, while conjunctive queries (CQs) form the central querying tool. In
real-life applications, involving large amounts of data, it is possible that the data are
inconsistent with the ontology. Since standard ontology languages adhere to the classi-
cal FOL semantics, inconsistencies are nothing else than logical contradictions. Thus,
the classical inference semantics fails terribly when faced with an inconsistency, since
everything follows from a contradiction. This demonstrates the need for developing
inconsistency-tolerant semantics.
    There has been a recent and increasing focus on the development of such semantics
for query answering purposes. Consistent query answering, first developed for relational
databases [1] and then generalized as the AR semantics for several DLs [9], is the most
widely accepted semantics for querying inconsistent ontologies. The AR semantics is
based on the idea that an answer is considered to be valid if it can be inferred from each
of the repairs of the extensional data set D, i.e., the ⊆-maximal consistent subsets of
D. The complexity of query answering under the AR semantics when the ontology is
described using one of the central DLs is rather well understood. The data and com-
bined complexity were studied in [11] for a wide spectrum of DLs, while the work [2]
identifies cases for simple ontologies (within the DL-Lite family) for which tractable
data complexity results can be obtained. On the other hand, little is known when the on-
tology is described using existential rules (a.k.a. tuple-generating dependencies (TGDs)
and Datalog± rules), that is, formulas of the form ∀X∀Y(φ(X, Y) → ∃Z(ψ(X, Z))),
and negative constraints (NCs) of the form ∀X(φ(X) → ⊥), where ⊥ denotes the truth
constant false.
     Our main goal in this work is to perform an in-depth analysis of the data and com-
bined complexity of consistent query answering under the main decidable classes of
existential rules, enriched with negative constraints. Let us recall that the main (syntac-
tic) conditions on existential rules that guarantee the decidability of query answering
are guardedness [3], stickiness [4] and acyclicity. Interestingly, our complexity analy-
sis shows that a systematic and uniform way for transferring complexity results from
classical to consistent query answering can be formally established.
     To briefly summarize the main contributions:
    – We present generic complexity results, which demonstrate the tight connection be-
      tween classical and consistent query answering (Theorems 1 and 2).
    – By exploiting our generic theorems, we obtain a (nearly) complete picture of the
      combined and data complexity of consistent query answering (Table 2).
     For more details we refer the reader to [10].

2     Consistent Query Answering
In the classical setting of CQ answering, given a database D and a set Σ of TGDs and
NCs, if the models of D and Σ, denoted mods(D, Σ), is empty, then every query is
entailed since everything is inferred from a contradiction.
Example 1. Consider the database D = {professor (John), fellow (John)}, asserting
that John is both a professor and a fellow, and the set Σ of TGDs and NCs consisting of
                    ∀X(professor (X) → ∃Y (faculty(X) ∧ teaches(X, Y )))
                        ∀X(fellow (X) → faculty(X))
         ∀X(professor (X) ∧ fellow (X) → ⊥),
expressing that each professor is a faculty member who teaches a course, each fellow
is a faculty member, and professors and fellows form disjoint sets. It is easy to see
that mods(D, Σ) = ∅, since John violates the disjointness constraint; thus, for every
(Boolean) CQ q, (D ∧ Σ) |= q.
    As said above, the AR semantics is the standard semantics for querying inconsistent
ontologies. A key notion, which is necessary for defining the AR semantics, is that of
repair, which is a ⊆-maximal consistent subset of the given database.
Definition 1. Consider a database D, and a set Σ of TGDs and NCs. A repair of D and
Σ is some D′ ⊆ D such that (i) mods(D′ , Σ) ̸= ∅; and (ii) there is no a ∈ (D \ D′ )
for which mods(D′ ∪ {a}, Σ) ̸= ∅. Let rep(D, Σ) be the set of repairs of D and Σ.
Example 2. Consider the database D and the set Σ of TGDs and NCs given in Exam-
ple 1. The set of repairs of D and Σ consists of the following subsets of D:

      D1 = {professor (John)}       D2 = {fellow (John)}.

Clearly, we simply need to remove one of the database atoms in order to satisfy the
single negative constraint occurring in Σ.
     The AR semantics [9] is based on the idea that a query can be considered to hold if
it can be inferred from each of the repairs.
Definition 2. Consider a database D, a set Σ of TGDs and NCs, and a Boolean CQ q.
We say that q is entailed by D and Σ under the AR semantics, written (D ∧ Σ) |=AR q,
if (D′ ∧ Σ) |= q, for every D′ ∈ rep(D, Σ).
Example 3. Consider the database D and the set Σ of TGDs and NCs given in Exam-
ple 1, and also the Boolean CQs

      q1 = faculty(John)      q2 = ∃X(teaches(John, X)),

where q1 asks whether John is a faculty member, while q2 asks whether John teaches a
course. Recall that rep(D, Σ) consists of the databases D1 and D2 given in Example 2.
Clearly, (Di ∧ Σ) |= q1 , for each i ∈ {1, 2}, and thus (D ∧ Σ) |=AR q1 . However, even
if (D1 ∧ Σ) |= q2 , (D2 ∧ Σ) ̸|= q2 , and therefore (D ∧ Σ) ̸|=AR q2 .
    In the sequel, we refer to the problem of consistent (Boolean) CQ answering under
the AR semantics as AR-CQ answering.


3     Generic Complexity Results
We present two generic complexity results that demonstrate the tight connection be-
tween classical and consistent CQ answering. These results will automatically provide
us with a (nearly) complete picture of the combined and data complexity of AR-CQ an-
swering under the main classes of TGDs, enriched with NCs. Given a class C of TGDs,
let C⊥ be the formalism obtained by combining C with arbitrary negative constraints.

3.1    Combined Complexity
We first focus on the combined complexity. Since we would like to understand how the
complexity of our problem is affected when some key parameters are fixed, we also
consider the following two variants of the combined complexity: (1) the bounded-arity
combined complexity (ba-combined complexity), which is calculated by assuming that
the arity of the underlying schema is bounded; and (2) the fixed-program combined
complexity (fp-combined complexity), which is calculated by considering the set of
TGDs and negative constraints as fixed. We show the following:
Theorem 1. Assume that CQ answering under a class C of TGDs is C-complete in
(x-)combined complexity, where x ∈ {ba, fp}. Then, the (x-)combined complexity of
AR-CQ answering under C⊥ is (1) Π2p -complete, if C = NP; and (2) C-complete, if
C ⊇ PSPACE is a deterministic class.
Proof (sketch). Fix a database D, a set Σ ∈ C⊥ of TGDs and NCs, and a CQ q. The
problem of deciding whether (D ∧ Σ) ̸|=AR q can be easily solved via a guess-and-
check algorithm. We simply need to apply the following steps:
 1. Guess an instance D′ ⊆ D;
 2. Verify that D′ ∈ rep(D, Σ); and
 3. Verify that (D′ ∧ Σ) ̸|= q.
We can show that steps 2 and 3 are not harder than classical query answering, which
implies that AR-CQ answering under C⊥ is in coNPC . Therefore, (1) If C = NP, then
we get a Π2p upper bound since NP NP = Σ2p and coΣ2p = Π2p ; and (2) If C ⊇ PSPACE is
a deterministic class, then we get a C upper bound since NPC = C and coC = C.
    Regarding the lower bounds, the C-hardness result, when C is deterministic class
above PSPACE, follows immediately since CQ answering is a special case of AR-CQ
answering. For the Π2p -hardness, we show, by a reduction from the validity problem of
2QBF formulas, that AR-CQ answering under a single negative constraint ∀X(φ(X) →
⊥), where φ consists of two atoms and it uses a single ternary predicate, while the
database and the query use only binary and ternary predicates, is already Π2p -hard.


3.2 Data Complexity
By providing a similar analysis as above, we can establish the following generic data
complexity result:
Theorem 2. Assume that CQ answering under a class C of TGDs is C-complete in
data complexity. Then, the data complexity of AR-CQ answering under C⊥ is (1) coNP-
complete, if C ⊆ PTIME; and (2) C-complete, if C ⊇ PSPACE is a deterministic class.
    Let us say that AR-CQ answering under a single negative constraint of the form
∀X(p(X) ∧ s(X) → ⊥) and a fixed query is already coNP-hard, which in turn implies
the coNP-hardness result in Theorem 2. Actually, the latter is implicit [2, Example 5],
and it can be shown by a reduction from a variant of UNSAT, called 2+2UNSAT, where
each clause has two positive and two negative literals, where the literals involve either
regular variables or the truth constant true or false.


4   From Classical to AR-CQ Answering
We now focus on the main decidable classes of TGDs, enriched with NCs, and we
show that the complexity of AR-CQ answering can be obtained in a uniform way by
exploiting our generic complexity theorems. Recall that the main (syntactic) conditions
on TGDs that guarantee the decidability of CQ answering are the following: (1) guard-
edness [3], which guarantees the treelikeness of the underlying canonical models; (2)
                         Combined       ba-combined        fp-combined         Data
        Guarded          2EXPTIME          EXPTIME              NP             PTIME
     Weakly-Guarded      2EXPTIME          EXPTIME           EXPTIME         EXPTIME
         Sticky           EXPTIME             NP                NP             in AC0
      Weakly-Sticky      2EXPTIME         2EXPTIME              NP             PTIME
        Acyclic          NEXPTIME         NEXPTIME              NP             in AC0
     Weakly-Acyclic      2EXPTIME         2EXPTIME              NP             PTIME

     Table 1. CQ answering. All results are completeness results, unless stated otherwise.

                         Combined       ba-combined        fp-combined         Data
        Guarded          2EXPTIME          EXPTIME              Π2p            coNP
     Weakly-Guarded      2EXPTIME          EXPTIME           EXPTIME         EXPTIME
         Sticky           EXPTIME             Π2p               Π2p            coNP
      Weakly-Sticky      2EXPTIME         2EXPTIME              Π2p            coNP
        Acyclic          NEXP - P
                                  NE
                                          NEXP - P
                                                   NE
                                                                Π2p            coNP
     Weakly-Acyclic      2EXPTIME         2EXPTIME              Π2p            coNP
Table 2. AR-CQ answering. A single complexity class in a cell refers to a completeness result,
while two classes C1 -C2 refer to C1 -hardness and C2 -membership.

stickiness [4], which ensures the termination of backward resolution; and (3) acyclicity,
which guarantees the finiteness of the underlying canonical models. Interestingly, each
one of the above conditions has its “weakly” counterpart: weak-guardedness [3], weak-
stickiness [4] and weak-acyclicity [6], respectively. The complexity of CQ answering
under the above classes of TGDs is summarized in Table 1. Clearly, Table 1 and Theo-
rems 1 and 2 imply Table 2, apart from the (ba-)combined complexity for acyclic TGDs
and NCs; let us briefly comment on this.
    The (ba-)combined complexity of CQ answering under acyclic TGDs has to our
knowledge never been explicitly studied; we show that is NEXPTIME-complete: the
upper bound is obtained by a reduction to nonrecursive logic programming [5], while
the lower bound by a reduction from a TILING problem [7]. Notice that Theorem 1
does not cover the cases where classical CQ answering is in a nondeterministic class
above PSPACE. Nevertheless, by exploiting the guess-and-check algorithm discussed in
the proof of Theorem 1, we obtain coNP NEXPTIME upper bound. It is implicit in [8] that
NP NEXPTIME = P NE , and since P NE is a deterministic class, co P NE = P NE . Consequently,
AR-CQ answering under acyclic TGDs and NCs is in P NE in (ba-)combined complexity;
the NEXPTIME-hardness is inherited from classical query answering.


5   Conclusions

In this work, which is a short version of [10], we performed an in-depth complexity
analysis of the problem of consistent query answering under the main decidable classes
of TGDs, focussing on the AR semantics. Notably, generic complexity results have
been established, which allowed us to obtain a (nearly) complete picture of the com-
plexity of our problem in a systematic and uniform way. Regarding future work, apart
from bridging the complexity gap for acyclic TGDs, we intend to perform a similar
complexity analysis for other important semantics such as the IAR semantics, that is, a
sound approximation of the AR semantics [9].
Acknowledgements. This work has been funded by the EPSRC grant EP/J008346/1.
M.V. Martinez and G.I. Simari are partially supported by Proyecto PIP-CONICET
112-201101-01000. A. Pieris is also supported by the Austrian Science Fund (FWF):
P25207-N23 and Y698.


References
 1. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases.
    In: PODS. pp. 68–79 (1999)
 2. Bienvenu, M.: On the complexity of consistent query answering in the presence of simple
    ontologies. In: AAAI (2012)
 3. Calı̀, A., Gottlob, G., Kifer, M.: Taming the infinite chase: Query answering under expressive
    relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013)
 4. Calı̀, A., Gottlob, G., Pieris, A.: Towards more expressive ontology languages: The query
    answering problem. Artif. Intell. 193, 87–128 (2012)
 5. Dantsin, E., Voronkov, A.: Complexity of query answering in logic databases with complex
    values. In: LFCS. pp. 56–66 (1997)
 6. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: Semantics and query answer-
    ing. Theor. Comput. Sci. 336(1), 89–124 (2005)
 7. Fürer, M.: The computational complexity of the unconstrained limited domino problem (with
    implications for logical decision problems). In: Logic and Machines. pp. 312–319 (1983)
 8. Hemachandra, L.A.: The strong exponential hierarchy collapses. J. Comput. Syst. Sci. 39(3),
    299–322 (1989)
 9. Lembo, D., Lenzerini, M., Rosati, R., Ruzzi, M., Savo, D.F.: Inconsistency-tolerant seman-
    tics for description logics. In: RR. pp. 103–117 (2010)
10. Lukasiewicz, T., Martinez, M.V., Pieris, A., Simari, G.I.: From classical to consistent query
    answering under existential rules. In: AAAI (2015)
11. Rosati, R.: On the complexity of dealing with inconsistency in description logic ontologies.
    In: IJCAI. pp. 1057–1062 (2011)