=Paper= {{Paper |id=None |storemode=property |title=On the Decidability of Consistent Query Answering |pdfUrl=https://ceur-ws.org/Vol-619/paper10.pdf |volume=Vol-619 |dblpUrl=https://dblp.org/rec/conf/amw/ArenasB10 }} ==On the Decidability of Consistent Query Answering== https://ceur-ws.org/Vol-619/paper10.pdf
           On the Decidability of Consistent Query
                        Answering

                      Marcelo Arenas1,⋆ and Leopoldo Bertossi2,⋆⋆
      1
          Pontificia Universidad Católica de Chile, Department of Computer Science,
                         Santiago, Chile. Email: marenas@ing.puc.cl
           2
             Carleton University, School of Computer Science, Ottawa, Canada.
                               Email: bertossi@scs.carleton.ca



           Abstract. Consistent query answering (CQA) is about formally char-
           acterizing and computing semantically correct answers to queries posed
           to a database that may fail to satisfy certain integrity constraints. In
           this paper, we revisit the decidability status of consistent query answer-
           ing by considering different parameters of the problem as input to the
           decision problem. More specifically, we obtain some new results about
           the undecidability and combined complexity of CQA.


1         Introduction
Consistent query answering (CQA) is about formally characterizing and com-
puting semantically correct answers to queries posed to a database that may fail
to satisfy certain integrity constraints (ICs). This problem was brought into the
main stream research on foundations of databases by [3]. Informally, a consistent
answer to a query Q posed to a relational database instance D that may not
satisfy a set IC of integrity constraints is as a usual answer to Q that can be
obtained from every minimal repair of D. A minimal repair D′ of D satisfies
IC and minimally departs from D (after enforcing the ICs). Different repair se-
mantics can be obtained by playing with different notions of distance between
database instances. In this paper, we stick to the original distance introduced
and studied in [3]: A repair of D makes minimal under set inclusion the set of
tuples in (D r D′ ) ∪ (D′ r D), i.e., the symmetric set difference. In particular,
this assumes that IC violations are solved by insertions or deletions of whole
database tuples. For recent surveys of CQA we refer to [7, 13].
    The complexity of CQA problem, i.e., of deciding if a tuple is a consistent
answer to a query under a given set of ICs, has been investigated in several
papers and under several repairs semantics [11, 12, 27, 18, 8, 21, 28, 2, 23]. As
expected most all the complexity results have been about data complexity, i.e.,
in terms of the size of the original instance D. Few results have been presented
on combined complexity, where other parameters are also taken into account,
⋆
     M. Arenas was supported by FONDECYT grant 1090565.
⋆⋆
     Faculty Fellow of the IBM Center for Advanced Studies. Also affiliated to the Uni-
     versity of Concepcion, Chile.
e.g. size of the query or of the set of ICs. There are also few results about
undecidability of CQA. Exceptions in this direction are [11, 8]; and both refer to
a form of combined undecidability, i.e., without a fixed query and a fixed set of
ICs. We do not elaborate more on [8], because the repair semantics is different
from the one investigated in this paper. Furthermore, the results in [8] heavily
rely on having denial constraints that involve numerical attributes.
    However, some results in [11] are highly relevant in our context. The frame-
work there considers a combination of key constraints (KCs) and inclusion de-
pendencies (IDs). In case of inconsistency, KCs are repaired via tuple deletions;
but IDs are repaired only by insertion of tuples. If the original instance is con-
sistent wrt the KCs, then inconsistencies are solved by insertion of whole tuples,
which can happen in possibly many different ways, and only constrained by the
KCs. The new tuples can take values in the database domain associated to the
schema. In this case, the repairs minimize the set of inserted tuples wrt set
inclusion.
    It is proved in [11] that CQA is undecidable. This result is obtained by
reduction from the problem of deciding entailment of dependencies from a set
of functional dependencies (FDs) and inclusion dependencies: F ∪ I |= δ? Here,
F and I are sets of FDs and IDs, respectively, and δ is a dependency. This
problem is undecidable [1, theorem 9.2.4]. A careful examination of the proof
in [11, theorems 3.2, 3.4], shows that the reduction in it requires, in essence,
generating for the combination of F , I and δ an ad hoc combination of KCs,
IDs, a query and a database instance. In consequence, the undecidability result
involves as input to the problem all the three natural parameters of CQA, namely
the ICs, the query and the instance. The set I is actually a cyclic set of IDs.
    In this paper, we revisit the decidability status of consistent query answer-
ing by considering different parameters of the problem as input to the decision
problem. We obtain some new results about the undecidability and combined
complexity of CQA. More specifically, we prove that CQA is undecidable for
all the combinations of the possible inputs (instance, ICs, query) to the prob-
lem depending upon the parameters that are left fixed. We also establish the
decidability of CQA for a broad class of universal ICs, with the combination
of instance, ICs, and query as the input. Furthermore, we study the combined
complexity on data and ICs (fixed query) for this class of dependencies.


2   Preliminaries

If we start from a database schema S, including a countably infinite database
domain U , we may consider the first-order language L(S) based on the predicates
in S and the constants for the elements of U . Database instances for schema S are
Herbrand structures with domain U and finite extensions for the interpretations
of the predicates in S. In consequence, a database instance D can be identified
with a set of ground atoms R(t̄), with R ∈ S and t̄ a finite sequence of elements
of U of the same length as the arity of R. The active domain of D, denoted by
dom(D), is the finite set that contains all the elements of U that appear in D.
We also allow built-in predicates in L(S), like =, 6=, <, etc., that have fixed and
predefined extensions in a database instance.
    An integrity constraint (IC) over a database schema S is a first-order sentence
in L(S). In particular, a universal IC is a sentence of the form ∀x̄ ψ, where ψ does
not contain any quantifiers. We use IC to denote a set of integrity constraints,
and we assume that IC is consistent in the sense that there is some database
instance that makes it true, but not necessarily by the one at hand. It is always
the case that in CQA, and also in this paper, we consider database instances
that may not satisfy the given set of ICs.
Definition 1. A database instance D is consistent with respect to IC if D
satisfies IC , denoted by D  IC; and inconsistent, otherwise.          2
Definition 2 ([3]).
(a) The distance ∆(D1 , D2 ) between database instances D1 , D2 is the symmetric
    difference (D1 r D2 ) ∪ (D2 r D1 ).
(b) For a fixed instance D and instances D′ , D′′ : D′ ≤D D′′ iff ∆(D, D′ ) ⊆
    ∆(D, D′′ ).
(c) For a given database instances D, we say that an instance D′ is a repair of D
    wrt IC iff D′  IC and D′ is ≤D -minimal in the class of database instances
    that satisfy IC .                                                           2
Notice that the extensions of the built-in predicates do not contribute to ∆,
because they have fixed extensions, identical in every database instance.
     Queries are formulas of L(S). If a query is a sentence, i.e., it has not free
variables, it is called a Boolean query. We assume that queries and ICs satisfy the
usual syntactic safety conditions [25] that make them domain independent [17,
24]. This means that they can be evaluated or checked against the active domain
of the database instance.
     For a query Q(x̄), with free variables in x̄, and a database instance D, a tuple
t̄ of constants of U is an answer to Q in D iff D |= Q[t̄], i.e., Q is true in D when
the variables in x̄ are replaced by the corresponding values in t̄.
Definition 3 ([3]). Given a set of integrity constraints, a (ground) tuple t̄ is
a consistent answer to a query Q(x̄) in a database instance D wrt a set of
integrity constraints IC , denoted D |=IC Q[t̄], if for every repair D′ of D wrt
IC , D′  Q[t̄]. If Q is Boolean, then yes is the consistent answer if for every
repair D′ of D, D′  Q; otherwise the consistent answer is no.                2


3    The Decision Problem of CQA
The computational problem of retrieving consistent answers to a query from a
possibly inconsistent database can be studied by concentrating on the decision
problem of consistent query answering. Actually, this gives rise to several decision
problems depending on which of the natural parameters involved are considered
to be fixed and which are considered to be a part of the input. The most basic
decision problems of CQA are the following.
Definition 4.
(a) For a Boolean query Q and a consistent set IC of integrity constraints:

         d -CQA(Q, IC ) := {D | D is a database instance and D |=IC Q}.

(b) For a database instance D and a Boolean query Q:

       ic-CQA(D, Q) := {IC | IC is a consistent set of ICs and D |=IC Q}.

(b) For a database instance D and a consistent set IC of integrity constraints:

           q-CQA(D, IC ) := {Q | Q is a Boolean query and D |=IC Q}.

                                                                                 2

Problem d -CQA(Q, IC ) is the problem of CQA in data, where the ICs and the
query are fixed -the parameters of the problem- and the database is variable,
i.e., the input. The data complexity [26] of CQA is precisely the complexity of
this problem, for different classes of parameters IC, Q. As common in database
research, this is the problem that has been investigated the most. This problem
is decidable for all common classes of ICs and queries, and tight complexity
bounds have been established [11, 12, 27, 18, 8, 21, 28, 2, 23].
     We may consider combined versions of the CQA decision problem. For ex-
ample, for a given Boolean query Q:

  {d, ic}-CQA(Q) := {(D, IC ) | D is a database instance,
                                       IC is a set of ICs and D |=IC Q}, (1)

and, more generally,

  {d, q, ic}-CQA := {(D, Q, IC ) | D is a database instance,
                  Q is a Boolean query, IC is a set of ICs and D |=IC Q}. (2)

In what follows, when we refer to {d}-CQA(Q, IC ), we mean d-CQA(Q, IC ), as in
Definition 4; the same for ic and q. The complexity of {d , q}-CQA(IC ) is usually
called combined complexity of CQA (with fixed ICs). It is measured in terms of
the size of both the database and the query. In the case of {q, ic, d}-CQA, the
instance, the query and the set of ICs become part of the input. All this decision
problems can also be stated for queries with free variables, say Q(x̄), and tuples
t̄ of constants in U , asking if t̄ is a consistent answer. Since the schema has
the constants in U , we may restrict ourselves to Boolean queries. As already
indicated above, it holds:
Proposition 1 ([11]). {d, q, ic}-CQA is undecidable.                             2
The proof of this proposition given in [11] uses key constraints and cyclic sets
of referential integrity constraints (RICs) [1]. The existential quantifiers of the
RICs are satisfied by picking up values from the underlying database domain
U . We should mention that, in [10], {q, ic, d}-CQA was made decidable for this
type of ICs by considering repairs based on introduction of null values as used
in SQL databases. Those null values are used for the existential variables in
the RICs. Actually, the decidability follows in [10] from the use of disjunctive
Datalog programs with stable model semantics [19, 15] to specify the repairs of
the original instance wrt universal ICs and RICs, including those considered in
[11]. These repair programs have been used before [5, 20, 6], and they provide an
upper bound of Π2P for the time complexity of d-CQA(Q, IC ) [14]. This is also
a lower bound since CQA can be Π2P -complete in data [12].


4   Undecidability of CQA
Now we turn to some of the other decision problems for CQA. Clearly, for Y ⊆
X ⊆ {d, ic, q}, if X-CQA is decidable, then also Y-CQA is decidable, but not
necessarily the other way around. We first consider the problem of CQA in
integrity constrains.

Proposition 2. There exist a database instance D and a Boolean query Q such
that ic-CQA(D, Q) is undecidable.

Proof. Let S be a database schema that contains a binary predicate E and
a unary predicate P . Consider the instance D of S whose extension for P is
{P (a)}, where a is an element of U , and whose extension for E is ∅. Further-
more, consider the Boolean query Q : ¬P (a); and the problem SAT := {ψ ∈
L({E}) | ψ is satisfiable}. This problem is undecidable over finite graphs [16,
sec. 7.2.1], and can be reduced to the complement of ic-CQA(D, Q) as follows:
Given a first-order sentence ψ ∈ L({E}), define a set of integrity constraints IC
by {ψ ↔ ∃xP (x)}. It is easy to check that ψ ∈ SAT iff D 6|=IC Q.               2
Now we address the problem of CQA in queries, that is, when queries are the
input of the problem. The complexity of this decision problem is the query com-
plexity of CQA [26].

Proposition 3. There exists a database instance D and a consistent set IC of
integrity constraints such that q-CQA(D, IC ) is undecidable.

Proof. Let S be a database schema containing a unary predicate P , a binary
predicate E, a ternary predicate F and built-in predicate ≤ that is interpreted
in such a way that (U, ≤) is isomorphic to (N, ≤). Then let D be the empty
instance of S, and IC be a set consisting of the following integrity constraints:

                     ∃xP (x),
                     ∀u∀v(E(u, v) ↔ ∃yF (u, v, y)),
                     ∀x∀y(P (x) ∧ y ≤ x → ∃u∃vF (u, v, y)).

Consider SAT := {ψ ∈ L({E}) | ψ is satisfiable}. As mentioned in the proof
of Proposition 2, this problem is undecidable over finite graphs. Next we show
that the restriction of SAT to finite graphs can be reduced to the complement
of q-CQA(D, IC ). In fact, we prove that ψ ∈ SAT iff ¬ψ 6∈ q-CQA(D, IC ).
    Assume that ψ ∈ SAT , and let D′ be a database instance over {E} satisfying
ψ. Define a database instance D⋆ over S as follows. The interpretation of E in
D⋆ coincides with the interpretation of E in D′ . Let t̄1 , . . ., t̄m be the tuples
in the interpretation of E in D′ , and assume that a1 , . . ., am are the first m
elements of U according to linear order ≤. Then the interpretations of P and F
in D⋆ are {am } and {(t̄i , ai ) | 1 ≤ i ≤ m}, respectively. It is straightforward to
prove that D⋆ |= IC . Furthermore, D⋆ is a repair of D since no proper subset
of D⋆ satisfies IC . Thus, given that D⋆ |= ψ, we conclude that D 6|=IC ¬ψ
and, therefore, ¬ψ ∈/ q-CQA(D, IC ). The other implication is immediate, which
concludes the proof of the proposition.                                            2
Finally, we also consider the case when integrity constraints and queries are
considered to be fixed.

Proposition 4. There exist a Boolean query Q and a consistent set IC of in-
tegrity constraints such that d -CQA(Q, IC ) is undecidable.

Proof. We will reduce to the complement of our problem the halting problem
for deterministic Turing machines with empty string on an infinite unidirectional
tape with alphabet 0, 1, B (blank). The integrity constraints will codify the dy-
namics of the machine, and the database instance will contain the transition
function δ of the machine. Repairing the instance wrt the dynamics of the ma-
chine corresponds to making the machine compute. The initial and final states
are q0 , qf , respectively. We need the following predicates:

 1. Head (t, p): At time t the head is at position with number p of the tape.
 2. State(t, q): At time t the state is q.
 3. Conf a (t, p): At time t and position p the symbol read is a. Here, a 6= B; and
    a blank is represented by the absence of 0 or 1.
 4. δa,a′ ,R (q, q ′ ): Being at state q, reading a causes, according to the transition
    function δ, to write a′ , move to the right (R) and jump to state q ′ . There
    are similar predicates, L, for left.
 5. Succ(n, n′ ): For natural numbers n, n′ , it holds that n′ is the successor of
    n. This predicate can be defined using the built-in predicate <, and its
    definition can be part of IC . Thus, we also assume that the database schema
    contains a built-in predicate < that is interpreted in such a way that (U, <)
    is isomorphic to (N, <).

Assume that a0 is the first element of U according to linear order <. Then IC
contains the following sentences:

(a) Head (a0 , a0 ) and State(a0 , q0 ).
(b) ∀p(¬Conf 0 (a0 , p) ∧ ¬Conf 1 (a0 , p)). This formula states that that we start
    with a blank tape. In general, we represent with ¬Conf 0 (t, p) ∧ ¬Conf 1 (t, p)
    the fact that at time t in position p there is a blank.
(c) For a 6= B and a′ 6= B:
                        
      ∀t∀t′ ∀q∀q ′ ∀p∀p′ State(t, q) ∧ Head (t, p) ∧ Conf a (t, p) ∧

                        δa,a′ ,L (q, q ′ ) ∧ Succ(t, t′ ) ∧ Succ(p′ , p) →
                                                                                             
                                        Conf a′ (t′ , p) ∧ Head (t′ , p′ ) ∧ State(t′ , q ′ ) .

    There is a similar sentence for δa,a′ ,R . If a is B, in the previous formula
    Conf a (t, p) is replaced by ¬Conf 0 (t, p) ∧ ¬Conf 1 (t, p), obtaining:
                          
           ′      ′     ′
      ∀t∀t ∀q∀q ∀p∀p State(t, q) ∧ Head (t, p) ∧ ¬Conf 0 (t, p) ∧

               ¬Conf 1 (t, p) ∧ δB,a′ ,L (q, q ′ ) ∧ Succ(t, t′ ) ∧ Succ(p′ , p) →
                                                                                       
                                                   ′                ′   ′             ′   ′
                                        Conf a′ (t , p) ∧ Head (t , p ) ∧ State(t , q ) .

    Similar formulas are obtained when a′ = B.
(d) For every a 6= B:
    ∀t∀t′ ∀p∀p′ (Head (t, p) ∧ p 6= p′ ∧ Succ(t, t′ ) → (Conf a (t, p′ ) ↔ Conf a (t′ , p′ ))).
The query Q is ¬∃t State(t, qf ). All these elements determine a particular deci-
sion problem of the form q-CQA(IC , Q), parameterized by IC and Q. Now, we
show that this problem is undecidable by reduction from the halting problem.
    For a machine M , the database instance D(M ) corresponds to the transition
function δ by means of the δ-predicates above and their contents. The other
relations are empty. It holds that M halts at state qf iff D(M ) 6|=IC Q. In fact:
(⇒) In this case, there is a repair D(M )′ corresponding to the finite computation
of M , i.e., it contains all the configuration tuples that lead to the final state. For
this repair, it holds that D(M )′ 6|= Q. Thus, D(M ) 6|=IC Q.
(⇐) Assume M does not halt at state qf . We have two cases: First, assume that
M does not halt (at any state). The infinite computation does not generate a
repair, because repairs have finite relations. In this case, repairs can be obtained
by deletion of tuples from D(M ), and then they correspond to sub-function
of the original δ. None of the computations related to such a repair can reach
state qf , because they represent subcomputation of the original machine (the
determinism of the machine is crucial here). In this case, all the repairs satisfy
Q and, therefore D(M ) |=IC Q. In the second case, M halts at a state different
from qf . A similar argument as in the previous case can be applied.               2
It is important to notice that the proof of Proposition 4 relies on the availabil-
ity of a built-in linear order. Also, in this proof we use universal ICs except
for the definition of the successor function. However, this definition, namely
∀m∀n(Succ(m, n) ↔ m < n ∧ ¬∃j(m < j ∧ j < n)), as an IC does not con-
tain “repairable” predicates, i.e., database predicates. Thus, one could also rely
on the availability of a built-in successor predicate to prove this proposition,
avoiding the use of non-universal constraints.
    The propositions above have been established for the single input cases in
Definition 4. However, several possible inputs can be combined, as in (1) and
(2). From Propositions 2, 3 and 4, we obtain:

Theorem 1 (Undecidability of consistent query answering). For every
nonempty subset X of {d, q, ic}, there are cases where X-CQA is undecidable.
The cases depend upon the parameters of the problem, i.e., on {d, q, ic} r X. 2


5    Combined Decidability and Complexity of CQA
In this section, we concentrate on a particular but common class of ICs.

Definition 5. An integrity constraint is a safe universal IC if both:
(a) It is logically equivalent to a sentence of the form
                                    m
                                    _                  n
                                                       _
                               ∀(        Pi (x̄i ) ∨         ¬Qj (ȳj ) ∨ ψ),             (3)
                                   i=1                 j=1

    where ∀ represents the universal closure of the formula, x̄i , ȳj are tuples
    of variables, the Pi , Qj are database predicates, and ψ is a formula that
    mentions only the built-in predicates.
(b) Each variable in an x̄i or ψ in (3) appears in some ȳj .                  2

The second condition is the safety condition that guarantees domain indepen-
dence [17, 24]. It is easy to see that, according to this definition, the ICs in
Proposition 4 are not safe, nor domain independent.

Theorem 2. For finite sets IC of safe universal ICs, and Boolean queries Q,
the problem of CQA, i.e., {(D, Q, IC ) | D |=IC Q}, is decidable.

Proof. We may assume that the sentences in IC are in the clausal form (3). This
representation for ICs allows us to determine all the repairs of a database that
does not satisfy them: From the domain independence condition, the finitely
many tuples that participate in a violation and whose modification may lead to
a restoration of the consistency of the database, can be obtained by posing to D,
for each IC (3), the domain independent query about the set of violating tuples:
                                             n
                                             ^                        m
                                                                      ^
                 V (ȳ1 , . . . , ȳn ) :-         Qj (ȳj ) ∧ ¬ψ ∧         ¬Pi (x̄i ).
                                             j=1                      i=1

Repairs are obtained by deleting Qj -ground tuples and/or inserting Pi -ground
tuples. In this way, the finitely many repairs can be computed. Then, the query
Q can be evaluated in every single repair. If for all of them it becomes true, the
answer is yes, otherwise, is no.                                                2
Notice that the decision algorithm in the proof of Theorem 2 requires the com-
putation all of possible repairs; and we know that there may be exponentially
many of them in the size of the database [4].
    This decidability result extends similar implicit results [5, 20, 6, 10] that
guarantee decidability for certain syntactic classes of universal ICs. Their syntax
makes them safe, and then, also domain independent. This is obtained from the
representation of repairs as stable models of disjunctive logic programs. A direct
proof of decidability of {d, ic, q}-CQA for this class of universal constraints plus
RICs can be found in [10] (existential variables are satisfied with an SQL null).
Decidability of {d, ic, q}-CQA is proved in [11] for sets of non-conflicting key
constrains and inclusion dependencies.
    In the following, we study a form of combined complexity of CQA, where the
input consists of the database and the ICs.3 This is a special case of the general
{d, ic}-CQA problem in (1) where only universal, domain independent ICs are
considered in the input.
Theorem 3.
(a) For every Boolean query Q,
            CQA(Q) := {(D, ϕ) | ϕ is a safe universal IC and D |={ϕ} Q}

     is in co-2-NEXP.

(b) There are Boolean queries for which the same problem is co-NEXP-hard.
Proof. (a) We prove that the complement of CQA(Q) belongs to 2-NEXP, that
is, we prove that there exists a non-deterministic Turing machine M that works
in double exponential time and accepts the complement of CQA(Q). Let D
be a database instance and ϕ a safe universal IC. In order to verify whether
(D, ϕ) 6∈ CQA(Q), Turing machine M first transforms ϕ into an equivalent
IC ψ of the form (3), then it guesses a repair D′ of D wrt ψ, and finally it
checks whether D′ 6|= Q (which implies that D 6|={ϕ} Q). Sentence ψ can be of
exponential size in the size of ϕ, and it takes exponential time to generate this
formula. Checking that D′ is a repair of D can be done in exponential time in the
size of ψ and D, because it might be necessary to compare D′ for inclusion with
exponentially many candidates (see the proof of Theorem 2). Finally, verifying
that D′ 6|= Q can be done in polynomial time in the size of D′ since Q is assumed
to be a fixed query. Thus, we conclude that M works in double exponential time
in the size of ϕ and D.
(b) Let P be a unary predicate and a ∈ U . Next we prove that the complement
of CQA(Q) is NEXP -hard, where Q is the Boolean query P (a). More precisely,
next we provide a reduction from SAT for the Bernays-Schoenfinkel’s syntactic
class of first-order formulas (BSC ) to the complement of CQA(Q). The former
problem is decidable and NEXP -complete [22, chapter 20].
3
    Other cases of combined complexity for CQA around key constraints and inclusion
    dependencies can be found in [11].
     Consider a schema S, and let χ: ∃x1 · · · ∃xn ∀ȳ ψ(x1 , . . . , xn , ȳ) ∈ L(S), with
ψ quantifier free, be an instance for BSC . Without loss of generality, assume that
S does not contain unary predicate P . If χ is satisfiable, then it is satisfiable in
the universe U = {1, . . . , n}. In consequence, a sentence in BSC is satisfiable iff
it is satisfiable in a finite structure (actually with finite universe), which allows
us to use this result in our context.
     Let a1 , . . ., an be pairwise distinct elements from U , which are all distinct
from a, U1 , . . ., Un be unary predicates not contained in S ∪ {P }, and D be the
empty database instance over S ∪ {U1 , . . . , Un , P }. Furthermore, let ϕ be the
conjunction of:

                     n
                     ^                                 
                            ∀x∀y Ui (x) ∧ Ui (y) → x = y ,                                 (4)
                     i=1
                     ^n _n
                                Ui (aj ),                                                  (5)
                     i=1 j=1
                                                                              
        ∀x1 · · · ∀xn U1 (x1 ) ∧ · · · ∧ Un (xn ) → ∀ȳ ψ(x1 , . . . , xn , ȳ) ∨ P (a).   (6)


The first two sentences make sure that each Ui contains exactly one element of
U . If χ is satisfiable, then there will be a repair where each of the U1 , ..., Un will
contain exactly one element of U and nothing else, and P will be empty. In that
repair, the query P (a) will be false and, thus, D 6|={ϕ} P (a). On the other hand,
if χ is not satisfiable, then no matter how U1 , . . ., Un and the relations in S are
populated, sentence
                                                                                     
            ∀x1 · · · ∀xn    U1 (x1 ) ∧ · · · ∧ Un (xn ) → ∀ȳ ψ(x1 , . . . , xn , ȳ)


will not be satisfied. Therefore, all the repairs of D will include element a into
P in order to satisfy (6), which implies that D |={ϕ} P (a). Hence, we conclude
that χ is satisfiable iff D 6|={ϕ} P (a).
    The IC ϕ is clearly universal, but we have to check safety. The first two
conjuncts obviously are. For the third conjunct, due to the universal quantifi-
cation on x1 , ..., xn and their occurrence in the predicates U1 , ..., Un , the only
source of non-safety could be formula ψ. There is no reason for ψ to be safe.
However, for the purpose of establishing our complexity lower-bound, we may
restrict ourselves to a subclass of sentences in BSC that still is NEXP -hard,
namely sentences that encode bounded tiling problems. Those sentences turn out
to be safe (cf. [9, theorem 6.2.21] for details).4                                 2

4
    In particular, due to the bounded domain to be tiled, for each instance we do not
    need to go beyond a finite numerical domain U . The same would happen if we were
    encoding computations of Turing machines that are bounded in time.
References
 [1] Abiteboul, S., Hull, R. and Vianu, V. Foundations of Databases. Addison-Wesley,
     1995.
 [2] Afrati, F. and Kolaitis, Ph. Repair Checking in Inconsistent Databases: Algo-
     rithms and Complexity. Proc. of the International Conference on Database Theory
     (ICDT 09), 2009, pp. 31-41.
 [3] Arenas, M., Bertossi, L. and Chomicki, J. Consistent Query Answers in Incon-
     sistent Databases. Proc. ACM Symposium on Principles of Database Systems
     (PODS 99), 1999, pp. 68-79.
 [4] Arenas, M., Bertossi, L., Chomicki, J., He, X., Raghavan, V. and Spinrad, J.
     Scalar Aggregation in Inconsistent Databases. Theoretical Computer Science,
     2003, 296(3): 405-434.
 [5] Arenas, M., Bertossi, L. and Chomicki, L. Answer Sets for Consistent Query
     Answering in Inconsistent Databases. Theory and Practice of Logic Programming,
     2003, 3(4-5):393-424.
 [6] Barcelo, P., Bertossi, L. and Bravo, L. Characterizing and Computing Semanti-
     cally Correct Answers from Databases with Annotated Logic and Answer Sets.
     Chapter in Semantics of Databases, Springer LNCS 2582, 2003, pp. 1-27.
 [7] Bertossi, L. Consistent Query Answering in Databases. In ACM Sigmod Record,
     2006, 35(2):68-76.
 [8] Bertossi, L., Bravo,L., Franconi, E. and Lopatenko, A. The Complexity and Ap-
     proximation of Fixing Numerical Attributes in Databases Under Integrity Con-
     straints. Information Systems, 2008, 33(4):407-434.
 [9] Börger, E., Grädel, E. and Gurevich, Y. The Classical Decision Problem. Springer,
     1997.
[10] Bravo, L. and Bertossi, L. Semantically Correct Query Answers in the Pres-
     ence of Null Values. Proc. EDBT International Workshop on Inconsistency and
     Incompleteness in Databases (IIDB 06). Springer LNCS 4254, 2006, pp. 336-357.
[11] Cali, A., Lembo, D. and Rosati, R. On the Decidability and Complexity of Query
     Answering over Inconsistent and Incomplete Databases. Proc. ACM Symposium
     on Principles of Database Systems (PODS 03), 2003, pp. 260-271.
[12] Chomicki, J. and Marcinkowski, J. Minimal-Change Integrity Maintenance Using
     Tuple Deletions. Information and Computation, 2005, 197(1-2):90-121.
[13] Chomicki, J. Consistent Query Answering: Five Easy Pieces. Proc. International
     Conference on Database Theory (ICDT 07), Springer LNCS 4353, 2007, pp. 1-17.
[14] Dantsin, E., Eiter, T., Gottlob, G. and Voronkov, A. Complexity And Expressive
     Power Of Logic Programming. ACM Computer Surveys, 2001, 33(3):374-425.
[15] Eiter, T., Gottlob, G. and Mannila, H. Disjunctive Datalog. ACM Transactions
     on Database Systems, 1997, 22(3):364-418.
[16] Ebbinghaus, H.-D. and Flum, J. Finite Model Theory. 2nd edition, Springer, 1999.
[17] Fagin, R. Horn Clauses and Database Dependencies. Journal of the ACM, 1982,
     29(4):952–985.
[18] Fuxman, A. and Miller, R. First-Order Query Rewriting for Inconsistent
     Databases. Journal of Computer and System Sciences, 2007, 73(4):610-635.
[19] Gelfond, M. and Lifschitz, V. Classical Negation in Logic Programs and Disjunc-
     tive Databases. New Generation Computing, 1991, 9:365-385.
[20] Greco, G., Greco, S. and Zumpano, E. A Logical Framework for Querying and
     Repairing Inconsistent Databases. IEEE Transactions on Knowledge and Data
     Engineering, 2003, 15(6):1389-1408.
[21] Lopatenko, A. and Bertossi, L. Complexity of Consistent Query Answering in
     Databases under Cardinality-Based and Incremental Repair Semantics. Proceed-
     ings of the International Conference of Database Theory (ICDT 07), Springer
     LNCS 4353, 2007, pp. 179-193.
[22] Papadimitriou, Ch. Computational Complexity. Addison-Wesley, 1994.
[23] Staworko, S. and Chomicki, J. Consistent Query Answering in the Presence of
     Universal Constraints. Information Systems, 2010, 35(1):1-22.
[24] Ullman, J. Principles of Database and Knowledge-Base Systems, Vol. I. Computer
     Science Press, 1988.
[25] Van Gelder, A. and Topor, R. Safety and Correct Translation of Relational
     Calculus Formulas. Proceedings of the Sixth ACM Symposium on Principles of
     Database Systems (PODS 87). ACM Press, 1987, pp. 313-327.
[26] Vardi, M. The Complexity of Relational Query Languages. Proc. ACM Symposium
     on Theory of Computing (STOC 82), 1982, pp. 137-146.
[27] Wijsen, J. Database Repairing Using Updates. ACM Transactions on Database
     Systems, 2005, 30(3):722-768.
[28] Wijsen, J. On the Consistent Rewriting of Conjunctive Queries Under Primary
     Key Constraints. Information Systems, 2009, 34(7):578-601.