=Paper=
{{Paper
|id=Vol-1378/paper35
|storemode=property
|title=Intuitionistic Data Exchange
|pdfUrl=https://ceur-ws.org/Vol-1378/AMW_2015_paper_35.pdf
|volume=Vol-1378
|dblpUrl=https://dblp.org/rec/conf/amw/GrahneMO15
}}
==Intuitionistic Data Exchange==
<pdf width="1500px">https://ceur-ws.org/Vol-1378/AMW_2015_paper_35.pdf</pdf>
<pre>
                   Intuitionistic Data Exchange

                   Gösta Grahne, Ali Moallemi, and Adrian Onet

                      Concordia University, Montreal, Canada
    grahne@cs.concordia.ca,moa_ali@encs.concordia.ca,adrian_onet@yahoo.com


        Abstract. The field of Data Exchange has overcome many obstacles,
        but when it comes to negation in rule bodies combined with existentially
        quantified variables in rule heads, to unequalities ≠, as well as to incon-
        sistency management, the same intractability barriers that plague the
        area of incomplete information arise.
        In this paper we develop an intuitionistic relevance-logic based semantics
        that allows us to extend the polynomial time “naive evaluation” tech-
        nique to Data Exchange dependencies (TGDs and EGDs) with negated
        atoms and unequalities, and to Data Exchange Target Queries consist-
        ing of unions of conjunctive queries with negation and unequalities. The
        semantics is also paraconsistent, and avoids the intractability barriers
        encountered in inconsistency management as well.


1     Introduction

The problem of data exchange poses one of the major challenges in distributed
information processing environments. A connection in such an environment can
be viewed as a labeled directed edge from a node representing a source database to
a node representing a target database. The edge-label denotes a schema mapping
that guides the middle-ware in restructuring the data from the source database
to fit the requirements of the target database. Since its inception in 2003 by
Fagin et al. in [8], the field of data exchange has been intensely investigated, and
many functionalities are mature for technology transfer. In this paper we focus
on the problems of

 – computing and materializing target instances when the schema mapping is
   expressed as a set of embedded dependencies (TGDs and EGDs), and
 – computing certain answers to unions of conjunctive queries expressed on the
   target schema (target UCQs).

    These problems have been shown to admit efficient implementation, based
on a property colloquially called the naive evaluation property. The property
roughly says that the incompleteness of some domain values can be ignored, as
long as these values are distinguished from each other, and from the “ordinary”
domain values that denote named and known objects. The price to pay is that we
have to restrict the schema-mappings and target queries to be monotone. Most
attempts to include non-monotone features, such as negation (¬) and unequality
(≠), soon run into intractability barriers, due to the underlying issues of incom-
plete information. For a comparison between different closed world semantics
the reader should consult [22].
    We recall (see [1]) that a tuple generating dependency (TGD) is a first order
formula of the form ∀XY (Φ(XY) → ∃Z Ψ (XZ)), where Φ and Ψ are conjunc-
tions of relational atoms, and X, Y, and Z are sequences of variables. We refer to
Φ as the body of the dependency, and to Ψ as the head. In an equality generating
dependency (EGD), there is no existential quantification, and Ψ is an equality
X1 = X2 , where X1 and X2 are variables from XY.
    When it comes to dependencies with negation in the body, the main pro-
posals are the stratified model the stable model and the well-founded model (for
definitions, see [1]). We argue that these semantics are not well suited for data
exchange, mainly because they “favor” negative information. Sometimes this is
desirable, as for example in the classical Tweety-example:

               ∀X (BIRD(X), ¬P EN GU IN (X) → F LY (X)).

If the source database is {BIRD(tweety)} it might be desirable to “defeasibly”
conclude F LY (tweety). However, if we use the same semantics for

           ∀X (COUN T RY (X), ¬N U KES(X) → F RIEN D(X)),

and the source database is {COU N T RY (sylvania)}, it might not be prudent
to conclude F RIEN D(sylvania).
    Intuitionistic logic rejects the law of the excluded middle and non-constructive
existence proofs. The motivation is usually epistemological, but it can also be
computational, as in Kleene’s intuitionistic logic based on recursive predicates
[20]. In an intuitionistic approach we would assign both F LY (tweety) and
F RIEN D(sylvania) the value unknown or undetermined. This does not pre-
clude the truth-value to later change, as was for example the case with (the
knowledge of) the truth-value of the statement expressing Fermat’s Last Theo-
rem, that changed from unknown to true in 1994.
    If non-constructive existence proofs are epistemologically susceptible, what
can one say about defeasible reasoning? We contend that negative facts should
be explicitly derived, just as the positive ones are. In other words, only if we
have explicitly decided the fact ¬P EN GU IN (tweety) can we draw the conclu-
sion F LY (tweety). In this spirit we propose to use Nuel Belnap’s four-valued
relevance logic R [5] as foundation for negation in data exchange. This allows
us to derive efficient algorithms for:

 – computing and materializing target instances when the schema mapping is
   expressed as a set of TGDs with negated atoms in the body and in the head,
   and EGDs with negation in the body, and
 – computing certain answers to target unions of conjunctive queries with nega-
   tion and unequality (≠).
     Belnap’s logic also is paraconsistent, meaning that “if we have inconsistent
information about ducks, it is possible that our information about decimals can
still be trusted” (see Fitting [10], page 10). This feature allows efficient incon-
sistency management, which is not in general possible in the repair-approach
of [3].
     The rest of this paper is organized as follows. The next section introduces the
notions used throughout the paper. This is followed by the section dedicated to
Belnap’s Logic F OU R that plays a central role in this paper. Section 4 describes
the “naive evaluation” of conjunctive queries with negation, and Section 5 adds
negation to the bodies of tuple generating dependencies, in a way that allows us
to extend the “naive chase” (i.e. the standard chase [8]). In Section 6 we study
how our techniques affect the termination of the chase procedure. We end with
conclusions in Section 7.


2    Preliminaries
For basic definitions and concepts we refer to [1]. Let a = a1 , a2 , . . . , an and
b = b1 , b2 , . . . , bm be sequences of elements from a semigroup (a set with a
binary associative operation ⋅ ). Then ab denotes the concatenation sequence
a1 , a2 , . . . , an , b1 , b2 , . . . , bm , and if n = m we can form the product sequence
a ⋅ b = a1 ⋅ b1 , a2 ⋅ b2 , . . . , an ⋅ bn . The length of a sequence a = a1 , a2 , . . . , an ,
denoted ∣a∣, is n. The empty sequence is denoted . Let X = X1 , X2 , . . . , Xn be a
sequence and A a set. Then a mapping f ∶ X → An , where f (Xi ) = ai , is listed
as f = {X1 /a1 , X2 /a2 , . . . , Xn /an }.
Signatures, schemas, languages etc. A signature σ consists of a finite set
R = {R, S, . . .} of relation symbols (called the schema), a countably infinite set
Vars = {X, Y, Z, . . .} of variables, a countably infinite set Nulls = {x, y, z, . . .} of
nulls, and a countably infinite set Cons = {a, b, c, . . .} of constants. We assume
that Nulls and Cons are semigroups under the ⋅ operation. Each relational symbol
R ∈ R has an associated natural number ar(R), called the arity of R.
    Let σ be a signature. The set of well-formed formulas (wff ’s) of the language
Lσ is defined inductively by stating that each atom R(X), where R ∈ R and
X ∈ (Vars ∪ Cons)ar(R) , is a wff, and then closing the set under ∨, ∧, ¬, ∃, and
∀. If Φ(X) is a Lσ -formula, then v ars(Φ) denotes its variables, the sequence X
denotes the free variables, and cons(Φ) the constants in Φ. If Φ(X) is a formula
and v is the mapping X ↦ a, then Φ(a) denotes the sentence Φ(v(X)). If a wff
Φ does not have any free variables, it is called a sentence.
Boolean valued instances. In algebraic semantics (see e.g. [23]) an interpreta-
tion of sentences in the object language is obtained through a homomorphism h
into a structure (A, ϕ), where A is an algebra, and ϕ is a subset of the carrier set
A of A. The elements in ϕ represent “truth.” The homomorphism maps atomic
sentences to elements in A, and the connectives and quantifiers are mapped into
operators in the algebra. Under interpretation h, a sentence Φ is then “true”
in A, if h(Φ) ∈ ϕ.
    The well known Boolean valued models [19] are obtained by taking the algebra
to be a Boolean algebra B = (B, ∨, ∧, ¬, 0, 1), and ϕ to be an ultrafilter of B.
Recall that an ultrafilter is a subset ϕ of B, such that 1 ∈ ϕ; for all b1 , b2 ∈ B,
b1 ∧ b2 ∈ ϕ iff b1 ∈ ϕ and b2 ∈ ϕ; and for all b ∈ B either b ∈ ϕ or ¬ b ∈ ϕ, but
{b, ¬ b} ⊆/ ϕ. Adapting this approach to database instances (models), we get the
following definitions.

Definition 1. Let σ = (R, Vars, Nulls, Cons) be a database signature, and let
B = (B, ∨, ∧, ¬, 0, 1) be a Boolean algebra. A B-instance I consists of the follow-
ing parts:
1. A finite set of elements d om(I) ⊆ Cons ∪ Nulls, the domain of I,
2. cons(I) = d om(I) ∩ Cons, the constants of I,
3. The set nulls(I) = d om(I) ∩ Nulls, the nulls of I, and
4. For each R ∈ R and a ∈ d om(I)ar(R) , an interpretation R(a)I ∈ B.

Definition 2. Let Φ be an L-sentence, B a Boolean algebra, ϕ ⊂ B an ultrafilter
in B, and I a B-interpretation. Then the satisfaction relation I ⊧B,ϕ Φ is defined
inductively as follows.
1. I ⊧B,ϕ R(a), if R(a)I ∈ ϕ,
2. I ⊧B,ϕ Φ ∧ Ψ , if ΦI ∧ Ψ I ∈ ϕ,
3. I ⊧B,ϕ ¬Φ, if ¬(ΦI ) ∈ ϕ,
4. I ⊧B,ϕ ∀XΦ(X), if ⋀ a∈dom(I)ar(R) (Φ(a)I ) ∈ ϕ,
5. I ⊧B,ϕ Φ → Ψ , if ΦI ∈ ϕ entails that Ψ I ∈ ϕ.

    The natural partial order ≤ in B is defined as α ≤ β iff α ∨ β = β. The partial
order can be lifted to (B, ϕ)-instances I and J by stipulating that I ≤ J if
cons(I) ⊆ cons(J ), and there is a mapping h ∶ d om(I) → d om(J ), identity on
the constants, such that R(a)I ≤ R(h(a))J , for all R ∈ R and a ∈ d om(I)ar(R) .
Such a mapping h is said to be a homomorphism from I to J . It is easily shown
that ≤ is a partial order on the equivalence classes generated by the relation
I ≃ J , defined to hold iff I ≤ J and J ≤ I.
    Going back to the satisfaction relation ⊧B,ϕ , note that if B = {0, 1}, then
B = 2 (the two-element Boolean algebra), and the unique ultrafilter is {1}. Thus
we can write simply ⊧2 , and I ⊧2 Φ becomes the standard two-valued semantics
in which → behaves as material implication. Thus I ⊧2 Φ → Ψ if and only if
ΦI ≤ Ψ I . Here ≤ is the natural order, i.e. 0 ≤ 1, and I ≤ J means that there is a
classical database homomorphism [1] from I to J , and I ≃ J means that I and
J are homomorphically equivalent (in the classical database sense).
Conjunctive queries and certain answers. A conjunctive query (CQ) is a
first order formula of the form q = ∃YΦ(XY), where Φ is a conjunction of
atoms of Lσ . For an atom R of arity k in Φ, for notational convenience we write
R(XY) instead of R(xi1 , . . . , xik ), where xi1 , . . . , xik is a subsequence of XY.
The free variables of Φ are those in X. If X = , the expression denotes a Boolean
CQ. A query q is called a union of conjunctive queries (UCQ) if it is of the form
q = ∃Y(Φ1 (XY)∨. . .∨Φm (XY)), where each Φi is a conjunction of atoms. It is
a well known fact that UCQs are monotonic in the 0 ≤ 1 partial order, meaning
that if I ≤ J , then q(I) ≤ q(J ), for all UCQs q. It has further been shown in [14]
that I ≃ J if and only if q(I) ≃ q(J ), for all UCQs q. Thus, if a user only has
UCQs as query language, then the user cannot distinguish between ≃-equivalent
(homomorphically equivalent) instances. This leads to the important notion of
incomplete instances and certain answers, instrumental in data exchange [8].
Conceptually, an incomplete database instance is a set X of ordinary instances I.
The information that is certain is the information that holds in all possible
instances (possible worlds). This leads to the definition of the certain answer to
a UCQ q on an incomplete instance X as ⋀q(X ) = ⋀{q(I) ∶ I ∈ X }. Here ⋀
denotes the ≤-greatest lower-bound (modulo ≃) of a set of ordinary instances.
Likewise, ⋁ denotes the ≤-least upper-bound (modulo ≃). The least upper-bound
is clearly given by the disjoint (rename nulls apart) union. The lower bound ⋀
has been shown to exist by Hell and Nesetril [16], whose results were generalized
by Libkin in [21]. In particular, Libkin showed that if X is a finite set of instances,
then ⋀(X ) can always be represented as a finite instance (up to ≃-equivalence).
    Usually the certain answer is further restricted to only the named domain
values (the constants). For an instance I, define I↾ as function I restricted to
sequences of objects in cons(I). Thus, the usual certain answer is

                        C ert(q, X ) = (⋀{q(I) ∶ I ∈ X })↾ .                       (1)

Note that q(X ) is what Libkin [21] calls answers as knowledge, while (1) is
his answers as objects. Since an instance I can contain nulls, it can be viewed
as a “naive table” [18], that represents the set of complete (ground) instances
XI = {J ∶ I ≤ J and d om(J ) ⊆ Cons}. Consider now an instance I as a naive
table representing the set XI of possible worlds. Then the “naive evaluation
theorem” says that ⋀q(XI ) ≃ q(I). Consequently for a UCQ q, we have [18, 8]:

                              C ert(q, XI ) = (q(I))↾ .                            (2)


3    Belnap’s logic F OU R

We are interested in Belnap’s intuitionistic logic F OU R. This logic has a sound
and complete axiomatization in terms of a system R of relevance logic, and
Φ → Ψ becomes relevant entailment of first degree (see [2, 5]).
    The logic F OU R is a generalization of Kleene’s strong three-valued logic.
F OU R has four truth-values ∅, {f }, {t}, and {f , t}. Intuitively, assigning a sen-
tence Φ the value {t} means that “the computer has been told that Φ is true,”
(see [5], page 11) and assigning the value {f } means that “the computer has been
told that Φ is false.” Furthermore, the value {f , t} means that “the computer
has been told that Φ is false, and the computer has been told that Φ is true.” In
other words, Φ is inconsistent. Finally, assigning Φ the value ∅ means that “the
computer has not been told anything about the truth-value of Φ,” that is, the
truth-value of Φ is unknown.
     Since truth-values are sets, they are naturally ordered by set-inclusion. The
more elements the set contains, the more information the computer has about
the truth. This gives rise to the information ordering, which we in the sequel
will denote t. We can still retain the ≤-order, which is called the truth order, by
stipulating that {f } ≤{t}, and that ∅ and {f , t} are incomparable and both lie
strictly between {f } and {t}. We shall use the following symbols: 0 for {f }, 1
for {t},  for ∅, and ⊺ for {f , t}. Furthermore, ⊺ and  are incomparable wrt
≤, and 0 and 1 are incomparable in t. All of this can be put together into a
structure 4 = (4, ∨, ∧, /, ., ¬, 0, 1, , ⊺), where 4 = {, ⊺, 0, 1}. Here ∨ is the least
upper bound in the ≤-order, and ∧ the greatest lower bound (i.e. ⊺ ∨  = 1, and
⊺ ∧  = 0, etc). The least upper bound in the t-order is /, and the greatest lower
bound is . (e.g. 0 / 1 = ⊺, and 0 . 1 = , etc). For negation, ¬ 1 = 0, ¬ 0 = 1, ¬ ⊺ = ⊺,
and ¬  = . Consequently, (4, ∨, ∧, ¬, 0, 1) is a Boolean algebra.
     Belnap’s structure 4 has subsequently been generalized by Ginsberg and
Fitting to so called bilattices [12, 9]. The simplest non-trivial bilattice is 4, just
as 2 is the simplest non-trivial Boolean algebra. Bilattices have been thoroughly
investigated by Fitting and Avron, among others (see e.g. [9, ?,?,?]). The algebra
4 has been shown to provide a principled and uniform account of various non-
monotonic semantics of logic programming with negation, including the stable
models and the well-founded model.
     Since 4 is an algebra, it can be used to give meaning to Lσ sentences, by
choosing the designated values ϕ. Arieli and Avron [4] have argued that ϕ should
be a ultrabifilter1 and they have shown that ϕ = {⊺, 1} is the unique ultrabifilter
in 4. We shall thus work with the algebraic semantics (4, {⊺, 1}).
     A 4-instance is then a B-interpretation of Definition 1, where B = 4, and
by noticing that (4, ∨, ∧, ¬, 0, 1) is a Boolean algebra, I ⊧4 Ψ can be read from
Definition 2 of ⊧(B,ϕ) , by choosing B = 4, and ϕ = {⊺, 1}.
     Let I and J be 4-instances, such that cons(I) ⊆ cons(J ) and let h be a
mapping from d om(I) to d om(J ), such that h is identity on the constants.
The mapping h is said to be a homomorphism from I to J , provided that
R(a)I t R(h(a))J , for all R ∈ R and a ∈ d om(I)ar(R) . The partial order t of 4
is lifted to 4-instances in the following definition.

Definition 3. Let I and J be 4-instances such that cons(I) ⊆ cons(J ). We
say that J is more informative than I, denoted I t J , if there is a homomor-
phism from I to J . Furthermore, if both I t J and J t I, we say that I and
J are information equivalent, and denote it I ≐ J . Let I denote the set of all
instances over σ. Then I /≐ denotes the set of equivalence classes of instances
induced by ≐.

    Clearly t is a partial order on I /≐ . Due to space limitation, the definitions
for relations . and / are omitted, they can be found in the full version [13]. We
need to verify that . and / indeed are the meet and join in the lattice induced
by t.
1
    For a technical definition, see [4]
Lemma 1. Let I and J be 4-instances. Then I t J if and only if I . J ≐ I.
and I t J if and only if I / J ≐ J .

Reducing F OU R to T WO. We now losslessy represent a 4-instance as a two-
valued instance as follows.

Definition 4. 1. Let I be a 4-instance over schema R. The 2 -projection of I,
   denoted I ↓2 , is a two-valued Boolean instance over schema ⋃R∈R {R+ , R− },
   where
   (a) cons(I ↓2 ) = cons(I), nulls(I ↓2 ) = nulls(I), and d om(I ↓2 ) = d om(I).
   (b) For all relations R ∈ R and a = a1 , . . . , aα(R) , where ai ∈ d om(I) for all
       i ∈ {1, . . . , α(R)}, R+ (a)I = 1 if R(a)I u 1, and 0 otherwise; and
                                     ↓2


       R− (a)I = 1 if R(a)I u 0, and 0 otherwise.
                 ↓2


2. Let I be a 2-instance. Then the 4-instance corresponding to I is I ↑4 , where
   (a) cons(I ↑4 ) = cons(I), nulls(I ↑4 ) = nulls(I), and d om(I ↑4 ) = d om(I).
   (b) For all R ∈ R and a = a1 , . . . , aα(R) , for all i ∈ {1, . . . , α(R)},

                                     ⎧
                                     ⎪ ⊺ if R+ (a)I = 1 and R− (a)I = 1
                                     ⎪
                                     ⎪
                                     ⎪
                                     ⎪  if R+ (a)I = 0 and R− (a)I = 0
                                    =⎨
                               ↑4
                       R(a)I
                                     ⎪
                                     ⎪
                                     ⎪ 1 if R+ (a)I = 1 and R− (a)I = 0
                                     ⎪
                                     ⎪
                                     ⎩ 0 if R (a) = 0 and R (a) = 1
                                             +    I          −    I


4    UCQs with negation

The intuitionistic semantics (4, {⊺, 1}) allows us to extend the (polynomial time)
“naive evaluation” from unions of conjunctive queries to unions of conjunctive
queries with negation. We next define CQs with negation. UCQs with negation
is defined similarly.
    A literal is an expression of the form R(X) or ¬R(X), where R(X) is an
atom in Lσ . A conjunctive query with negation (NCQ) is a first order formula of
the form q = ∃YΦ(XY), where Φ is a conjunction of literals. Consider now an
NCQ
                                    n            n+m
                    q = ∃Y ( ⋀ Ri (XY) ∧ ⋀ ¬Ri (XY)).                             (3)
                                    i=1          i=n+1

The NCQ q, when applied on an instance I, extends the interpretation I to also
include the relation Q, defined as follows.

Definition 5. Let I be a 4-instance and q = ∃YΦ(XY) an NCQ of the form
(3). Let v range over all mappings from XY to d om(I). Then

                                     n                   n+m
     Q(a)I = ⋁{v ∶ v(X)=a} (⋀i=1 Ri (v(XY))I ∧ ⋀i=n+1 ¬ (Ri (v(XY))I )),

for each a ∈ d om(I)∣X∣ .
    We next show that NCQs are monotonic in the information order t. By a
recent result of Gheerbrant et al. [11], such monotonicity is a necessary and
sufficient condition for a class of queries to admit naive evaluation. Indeed, the
following holds.

Theorem 1. Let q be an NCQ, and let I and J be 4-instances, such that I t J
by homomorphism h. Then Q(a)I t Q(h(a))J , for all a ∈ d om(I)ar(Q) .

    Furthermore, the naive evaluation property of CQs on 2-instances can now be
extended to NCQs on 4-instances. It turns out that an NCQ q can be evaluated
naively on I ↓2 by turning q into a UCQs, denoted q ↓2 . Assume q is of the form
(3). Then
             n              n+m                n               m
 q ↓2 = ∃Y(( ⋀ Ri+ (XY) ∧ ⋀ Ri− (XY)) ∨ ⋁ Ri− (XY) ∨ ⋁ Rj+ (XY)). (4)
            i=1            i=n+1              i=1            j=n+1

   In the next theorem we view an NCQ query q as mapping an instance
I to the instance q(I), where q(I) is a 4-instance over schema {Q}, with
Q(a)q(I) = Q(a)I .

Theorem 2. Let q be an NCQ and I a 4-instance. Then q(I) ≐ (q↓2 (I ↓2 ))↑4 .

Similarly to the two-valued case, we can regard a 4-instance I as a naive table
defining the set XI = {J ∶ I t J and d om(J ) ⊆ Cons} of possible worlds.
We can now extended the naive evaluation property to NCQs on 4-instances, by
using the result (proved in the full version) that  q(XI ) ≐ q(I). In analogue
with (1) we define the certain answer as C ert4 (q, X ) = ({q(I) ∶ I ∈ X })↾ . To
wrap this section up formally, we conclude
Corollary 1. Let q be an NCQ and I a 4-instance. Then C ert4 (q, XI )           =
((q↓2 (I ↓2 )↑4 )↾ .

    Note that all the previous results hold as well for the larger class NUCQ of
unions of conjunctive queries with negation. We have only presented the trans-
formation of NCQs q into UCQs q ↓2 . In a very similar fashion we can transform
any NUCQ query into a UCQ, while preserving all theorems of this section, and
in particular Corollary 1.


5   Data Exchange in T WO and F OU R
A tuple generating dependency (TGD) is a first order formula having the form
∀XY(Φ(XY) → ∃Z R(XZ)). An equality generating dependency (EGD) is a
first order formula of the form ∀X(Φ(X) → X1 = X2 ), where X1 and X2 are
variables in X. A TGD with negation and unequality (NTGD≈/ ) is a TDG that
can have atomic negations in the body and in the head, and that also includes
a special equality symbol ≈, possibly negated (denoted ≈/ ). Note that NTGD≈ s
include the EGDs. Also, the notion of a 4-instance has to be extended to account
for the explicit equality. Details appear in the full paper [13]. The salient point
is that an atom a ≈ b can have truth-values in 4, meaning that (a ≈ b)I can be
inconsistent or unknown, in addition to true or false. Let n ∈ {2, 4}. A depen-
dency ξ is said to be satisfied in an n-instance I if I ⊧n ξ. A set of dependencies
is denoted Σ, and I ⊧n Σ, if I ⊧n ξ for all ξ ∈ Σ. The following is a central
concept in data exchange.
Definition 6. Let I be a 2-instance (a 4-instance) and Σ a set of TGDs and
EGDs (a set of NTGD≈ s). A 2-universal model (a 4-universal model) of (I, Σ)
is a 2-instance J (a 4-instance J ), such that
1. I ≤ J (I t J ),
2. J ⊧2 Σ (J ⊧4 Σ), and
3. for all instances K, if I ≤ K and K ⊧2 Σ, then J ≤ K
                       (if I t K and K ⊧4 Σ, then J t K).
   Let I be a 2-instance and Σ a set of dependencies. The definition of the
procedure C hase(I, Σ) can be found in [8, 7]. The chase procedure is said to
successfully terminate (or simply terminate) [8], if it does not fail due to some
EGD and it stops after a finite number of steps. In this case C hase(I, Σ) de-
notes the instance returned by the chase procedure. It is known (see [8, 7]) that
C hase(I, Σ) is a 2-universal model of (I, Σ).

Data Exchange. A data exchange schema is a schema of the form S ∪ T, where
S ∩ T = ∅. The schema S is called the source schema, and T is called the target
schema. Let S∪T be a data exchange schema. A source-to-target TGD (st-TGD)
is a TGD where the relation symbols in the body come from S and the relation
symbols in the head come from T. Source-to-target NTGD≈ s (s-t NTGD≈ s) are
defined similarly. A target EGD (t-EGD) is an EGD where all relation symbols
come from T. A data exchange setting is a pair (S ∪ T, Σ), where S ∪ T is a data
exchange schema, and Σ is a set of s-t TGDs (or s-t NTGD≈ s) and t-EGDs.
    A 2-instance of a data exchange setting is a pair (I, Σ), such that nulls(I) =
∅, R(a)I = 0, for all R ∈ T and a ∈ d om(I)ar(R) , and Σ = Σst ∪ Σt , where Σst
is a set of s-t TGDs and Σt is a set of t-EGDs.
    A 4-instance of a data exchange setting is like a 2-instance, except that Σst
can contain NTGD≈ ’s, and that R(a)I = , for all R ∈ T and a ∈ d om(I)ar(R) .
    Let n ∈ {2, 4} and ≼ ∈ {≤, t}. An n-solution for an n-instance (I, Σ) of a data
exchange setting (S ∪ T, Σ), is an n-instance J , such that I ≼ J , J ⊧n Σ, and
J ↾S = I↾S . The notation J ↾S stands for the instance J restricted to the relation
symbols in S, and similarly for I↾S . The set of all n-solutions to an n-instance
I in setting (S ∪ T, Σ) is denoted S oln (I, Σ). A n-solution J ∈ S oln (I, Σ) is
said to be an n-universal solution if J ≼ K, for all K ∈ S oln (I, Σ).
    Let I be a 2 instance of a data exchange setting (S ∪ T, Σ), and let q =
∃YΦ(XY) be a target UCQ, i.e. such that all relation symbols in Φ come from T.
Then the certain answer [8] to q on (I, Σ) is defined as

                C ert2 (q, I, Σ) = (⋀{q(J ) ∶ J ∈ S ol2 (I, Σ)})↾ .            (5)
In other words, C ert2 (q, I, Σ) is the greatest lower bound in the partial order
≤ of the set {q(J ) ∶ J ∈ S ol2 (I, Σ)}, restricted to the constants. The following
result is at the foundation of data exchange.

Theorem 3. [8] Let (I, Σ) be a data exchange setting, such that the chase with
Σ of I terminates, and let q be a target UCQ. Then
1. J = C hase(I, Σ) is a 2-universal solution for (I, Σ).
2. C ert(q, I, Σ) = (q(J ))↾.

    In the full paper [13] we show that by a suitable reduction, any set Σ of
NTGD≈ s can be transformed into a set Σ ↓2 of TGDs, such that for all 4-instances
I, I ⊧4 Σ if and only if I ↓2 ⊧2 Σ ↓2 . The following theorem makes the connection
between the data exchange solutions, as defined in [8], and solutions for the
intuitionistic data exchange problem.

Theorem 4. Let (S ∪ T, Σ) be a data exchange setting where Σ consists of
st-NTGD≈ s and t-EGDs, and let I be a 4-instance. Then

                          S ol4 (I, Σ) = S ol2 (I ↓2 , Σ ↓2 )↑4 .

    Armed with this result we now show that we can use the chase process defined
in [8] to find a 4-universal model. As the next theorem shows, a 4-universal
solution is a good candidate to be materialized and used to compute the certain
answer for any NUCQ-query.

Theorem 5. Let (I, Σ) be a 4-instance of a data exchange setting, where Σ
consists of st-NTGD≈ s and t-EGDs, Then (C hase(I ↓2 , Σ ↓2 ))↑4 is a 4-universal
solution for (I, Σ). Moreover, for every NUCQ query q we have

                 C ert4 (q, I, Σ) = (q ↓2 ((C hase(I ↓2 , Σ ↓2 ))))↾↑4 .


6   On Universal Solution Existence
In the previous sections we have shown that for a given 4-instance I and a
set of NTGDs Σ, (or NTGD≈/ s), the 4-universal solution is the best candidate
for target materialization in data exchange, and that as such it can be used to
efficiently compute certain answers to target UNCQ queries (or UNCQ≈ queries).
The question then arises whether a 4-universal solution exists. Since our results
include the lossless decomposition S ol4 (I, Σ) = S ol2 (I ↓2 , Σ ↓2 )↑4 , the answer
follows directly from from [7] and [15]. Formally, we have

Theorem 6. 1. The problem of testing if there exists a 4-universal solution
   for a given 4-instance and a set of NTGD≈ s is RE-complete.
2. The problem of testing if a set of NTGD≈ s has a 4-universal solution for
   every 4-instance is coRE-complete.
The definition of the core-chase procedure can be found in [7], from which the
next theorem follows.
Theorem 7. The core chase procedure is complete for finding 4-universal solu-
tions for 4-instances I and NTGD≈ s
   One of the important classes for which universal solutions are guaranteed
to exist, is the guarded dependencies of [6]. Applying the definition from [6],
we say that an NTGD≈ is guarded if the body has an atom that contains all
the universally quantified variables. It can easily be shown that if Σ is a set of
guarded NTGD≈ s, then Σ ↓2 is a set of guarded TGDs as well. From this and
from the elegant result of [17] it follows that

Theorem 8. A 4-universal solution always exists for a set of guarded NTGD≈ s.

    In the literature one can find many classes of TGDs that are known to en-
sure uniform chase termination, and thus guaranteeing the existence of universal
solutions for all instances (for an overview, see e.g. [22]). Let C be such a class.
We then have
Corollary 2. Let Σ be a set of NTGD ≈ s. If Σ ↓2 ∈ C, then a 4-universal solution
for (I, Σ) exists, for all 4-instances I.
We note that most of the known classes guaranteeing uniform termination have
a tractable membership problem.


7   Conclusions

In this paper we have put forth a new approach to Data Exchange based on
Belnap’s logic F OU R. We showed that moving to a four-valued logic comes with
the benefit of naturally extending the mapping language to NTGD≈ s and the
query language to NUCQ≈ s, thus allowing negation over atoms, including equal-
ity atoms a ≈ b. We also showed that well-known techniques from Data exchange
(chase, universal solution, naive evaluation, etc) can be used in F OU R as well.
Furthermore, the core of a 4-instance I (minimal instance in size that is ≃ equiv-
alent with I) is the same as core(I ↓2 )↑4 . Thus the core chase [7] is complete in
finding 4-universal solutions as well. In the special case where the instances
are two-valued, the mappings are TGDs and EGDs, and the target queries are
UCQs, the semantics reduces to the classical one [8].
    There is still more work to be done as we haven’t touched on for instance
meta-data management problems, such as composition and inverse. Another
aspect that needs to be further investigated is the chase termination problem in
T WO for a set of TGDs corresponding to a set of NTGD s in F OU R. Note that
                                                            ≈

the set of TGDs in this case is special in the sense that it always contain the set
of full TGDs expressed by Σ.
    We believe that ⊧4 is a semantics with natural appeal, and one that can be
practically implemented over existing DBMSs.
References
 1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases.
    Addison-Wesley, 1995.
 2. Alan Anderson, Belnap R., D. Nuel, and J. Michael Dunn. Entailment: The Logic
    of Relevance and Necessity, Vol. I. Princeton University Press, 1992.
 3. Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent query an-
    swers in inconsistent databases. In Victor Vianu and Christos H. Papadimitriou,
    editors, PODS, pages 68–79. ACM Press, 1999.
 4. Ofer Arieli and Arnon Avron. The value of the four values. Artif. Intell., 102(1):97–
    141, 1998.
 5. Jr. Belnap, Nuel D. A useful four-valued logic. In J.M̃ichael Dunn and George
    Epstein, editors, Modern Uses of Multiple-Valued Logic, volume 2 of Episteme,
    pages 5–37. Springer Netherlands, 1977.
 6. Andrea Calı̀, Georg Gottlob, and Michael Kifer. Taming the infinite chase: Query
    answering under expressive relational constraints. In KR, pages 70–80, 2008.
 7. Alin Deutsch, Alan Nash, and Jeffrey B. Remmel. The chase revisited. In PODS,
    pages 149–158, 2008.
 8. Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data ex-
    change: Semantics and query answering. In ICDT, pages 207–224, 2003.
 9. Melvin Fitting. Bilattices and the semantics of logic programming. J. Log. Pro-
    gram., 11(1&2):91–116, 1991.
10. Melvin Fitting. The family of stable models. J. Log. Program., 17(2/3&4):197–225,
    1993.
11. Amélie Gheerbrant, Leonid Libkin, and Cristina Sirangelo. When is naive evalua-
    tion possible? In PODS, pages 75–86, 2013.
12. Matthew L. Ginsberg. Bilattices and modal operators. In Rohit Parikh, editor,
    TARK, pages 273–287. Morgan Kaufmann, 1990.
13. Gösta Grahne, Ali Moallemi, and Adrian Onet. Intuitionistic data exchange. Tech-
    nical report, http://arxiv.org/.
14. Gösta Grahne and Adrian Onet. Representation systems for data exchange. In
    ICDT, pages 208–221, 2012.
15. Gösta Grahne and Adrian Onet. The data-exchange chase under the microscope.
    CoRR, abs/1407.2279, 2014.
16. Pavol Hell and Jaroslav Neŝetr̈il. Graphs And Homomorphisms. Oxford University
    Press, 2004.
17. André Hernich. Computing universal models under guarded tgds. In ICDT, pages
    222–235, 2012.
18. Tomasz Imielinski and Witold Lipski Jr. Incomplete information in relational
    databases. J. ACM, 31(4):761–791, 1984.
19. Thomas Jech. Boolean valued models. In Handbook of Boolean Algebras, pages
    1197–1211, 1989.
20. Stephen Cole Kleene. Introduction to metamathematics. D. Van Norstrand, 1952.
21. Leonid Libkin. Incomplete data: what went wrong, and how to fix it. In PODS,
    pages 1–13, 2014.
22. Adrian Onet. The chase procedure and its applications. PhD thesis, Concordia
    University, 2012.
23. Alasdair Urquhart. Basic many-valued logic. In D.M. Gabbay and F. Guenth-
    ner, editors, Handbook of philosophical logic, volume 2, pages 249–295. Springer
    Netherlands, 2001.

</pre>