=Paper= {{Paper |id=Vol-1644/paper7 |storemode=property |title=Inference-Based Semantics in Data Exchange |pdfUrl=https://ceur-ws.org/Vol-1644/paper7.pdf |volume=Vol-1644 |authors=Adrian Onet |dblpUrl=https://dblp.org/rec/conf/amw/Onet16 }} ==Inference-Based Semantics in Data Exchange== https://ceur-ws.org/Vol-1644/paper7.pdf
    Inference-based semantics in Data Exchange

                                   Adrian Onet

                        Morgan Stanley, Montreal, Canada
                        adrian.onet@morganstanley.com




      Abstract. Data Exchange is an old problem that was firstly studied
      from a theoretical point of view only in 2003. Since then many approaches
      were considered when it came to the language describing the relationship
      between the source and the target schema. These approaches focus on
      what it makes a target instance a “good” solution for data-exchange. In
      this paper we propose the inference-based semantics that solves many
      certain-answer anomalies existing in current data-exchange semantics.
      We show that in case of mappings represented by source-to-target tgds
      and safe target egds we may compute in polynomial time a target table
      (universal representative) able to exactly represent the inference-based
      semantics. We show that the new semantics agrees with the other se-
      mantics when it comes to UCQ queries. Finally we show that one may
      use the universal representative to compute certain-answers in tractable
      time for large class of non-UCQ queries even for a subclass of UCQ¬ .



1   Introduction

The data-exchange problem is that of transforming a database existing under
a source schema into another database under a different target schema. This
database transformation is based on mappings that describe the relationship
between the source and the target database. A mapping M can be viewed as
a, possibly infinite, set of pairs (I, J), where I is a source instance and J a
target instance. In this case, J is called a data-exchange solution for I and M .
The mapping between the source and the target database is usually specified in
some logic formalism. The most widely accepted mapping language is the one
based on sets of tuple-generating dependencies (tgds) and equality-generating
dependencies (egds).
    It is very common to query the target solutions. This takes us to the prob-
lem of querying incomplete databases [2]. The exact answers semantics defines
the answer to a query over an incomplete database as the set of all answers to
the query for each instance from its semantics. This approach is rarely feasible
in practice, therefore two approximations are commonly accepted: certain an-
swer (answers occurring for all solutions) and maybe answer semantics (answers
occurring for some solutions).
    Based on the interpretation of the mapping language there may be sev-
eral semantics in data exchange. The first semantics was introduced in [7] and
it is here referred to as the OWA-semantics (open world assumption). In or-
der to improve the certain answer behavior for non-UCQ queries under OWA-
semantics, Libkin [15] introduced the first closed-world semantics in data ex-
change, known as CWA-semantics. This semantics was later on extended by
Hernich and Schweikardt [14] to include target tgds. In CWA-semantics a so-
lution incorporates only tuples that are “justified” by the source and the de-
pendencies. Later on Hernich [11] introduced the GCWA∗ -semantics for data
exchange. Unfortunately most of these semantics have a rather strange behavior
when looking for certain answers over the set of solutions. More recently Arenas
et al. [4, ?] introduced a new semantics for data-exchange based on bidirectional
constraints. This new semantics solves most of the query anomalies present in
the other semantics but it comes with price of non-tractable data complexity
even for the simplest data-exchange problems.
    To better understand the type of anomalies we may encounter under these se-
mantics, consider the next simple example. A company has in the source schema
a binary relation P representing the relationship between projects and employ-
ees. After some reorganization it was realized that each projects financing should
be provided by one or more cost-centers, each employee belonging to one of these
cost-centers. With this the company creates the target schema with two binary
relations P C and CE representing the project to cost-center and cost-center to
employee relationship respectively. In the case of the unidirectional semantics
the process can be specified by tgd

                  ξ1 : ∀p ∀e P (p, e) → ∃cc P C(p, cc) ∧ CE(cc, e).


Let source I be P I = {(p1 , e1 ), (p1 , e2 ), (p2 , e3 )}, stating that there are two
employees e1 and e2 working on project p1 and only employee e3 working on
project p2 . Consider query: Q:=∀p ∀cc P C(p, cc) ∧ CE(cc, “e3 “) → p = “p2 “
(Is e3 involved only in project p2 ?). Let instance J be P C J = {(p1 , cc1 ), (p2 , cc1 )}
and CE J = {(cc1 , e1 ), (cc1 , e2 ), (cc1 , e3 )}. Because J is part of all aforemen-
tioned unidirectional semantics, the certain answer for the given query will be
un-intuitively false. Naturally one may expect this answer to be true, as there
is no indication from the source that employee e3 is involved in project p1 .
    Motivated by certain answer anomalies in the current data-exchange seman-
tics, in this paper we propose a new data-exchange semantics based on logical
inference for mappings represented by sets of s-t tgds and safe target egds. This
semantics eliminates most anomalies related to certain-answers and keeps the
same certain answers with the other semantics for union of conjunctive queries.
We will also show that under this settings for any source instance one may com-
pute a target table representation that may be queries to obtain certain-answers
for any FO query. Finally we will review some complexity results regarding
the certain-answers under this new semantics and present large classes of FO
queries for which certain-answers can be computed in polynomial time. More
details, examples and proofs can be found in the full version of this paper.
2    Preliminaries
This section reviews the basic technical preliminaries and definitions. More infor-
mation on relational database theory can be obtained from [1]. We will consider
the complexity classes P, NP and coNP. For the definition of these classes we
refer to [17].
    A finite mapping f , where f (ai ) = bi , for i ∈ {1, . . . , n}, will be represented
as {a1 /b1 , a2 /b2 , . . . , an /bn }. When it is clear from the context f , it will be also
viewed as the following formula a1 = b1 ∧ a2 = b2 ∧ . . . ∧ an = bn . For a mapping
f and set A, with f |A will denote the mapping f restricted to values from A.
By abusing the notation, a vector x̄ = (x1 , x2 , . . . , xn ) will be often viewed as
the set {x1 , x2 , . . . , xn }, thus we may have set operations like xi ∈ x̄ or x̄ ∩ ȳ.
Databases. A schema S is a finite set {S1 , S2 , . . . , Sn } of relational symbols,
each symbol Si having a fixed arity arity(Si ). Let Cons, Nulls and Vars be three
countably infinite sets of constants, nulls and variables such that there are no
common elements between any two of these sets. Elements from Cons are sym-
bolized by lower case (possibly subscripted) characters from the beginning of
the alphabet (e.g. a, b1 ). Elements from Vars are represented by lower case (pos-
sibly subscripted) characters from the end of the alphabet (e.g. z, x2 ). Each
element from the countable set Nulls is represented with, possible subscripted,
symbol ⊥ (e.g. ⊥i ). A naı̈ve table T of S is an interpretation that assigns to
each relational symbol Si a finite set SiT ⊂ (Cons ∪ Nulls)arity(Si ) , sometimes
we also view Si as a relation between elements of Cons ∪ Nulls. The set dom(T )
means all elements that occur in T , clearly dom(T ) ⊆ Cons ∪ Nulls. A naı̈ve
table T is called an instance if dom(T ) ⊂ Cons. In contrast to general naı̈ve
tables, which are identified by capitalized characters from the end of the al-
phabet (e.g. T , V ), instances are represented by capitalized characters from the
middle of the alphabet (e.g. I, J). The set of all instances over schema S is
denoted Inst(S). A valuation v is a mapping over the set Cons ∪ Nulls such that
v(a) = a, for all a ∈ Cons, and v(⊥) ∈ Cons, for all ⊥ ∈ Nulls. Valuations are
extended to tuples and naı̈ve tables as follows. For each tuple t̄ = (t1 , t2 , . . . , tn ),
let v(t̄):=(v(t1 ), v(t2 ), . . . , v(tn )); and for a naı̈ve table T over schema S, define
v(T ) as Rv(T ) :={v(t̄) : t̄ ∈ RT }, for all R ∈ S. The interpretation of a naı̈ve
table T is given by Rep(T ):={v(T ) : v valuation}.
Schema mappings. A data-exchange schema mapping is a triple M = (S, T, Σ),
where S and T are two disjoint schemas named the source and target schema re-
spectively; Σ is a set of formulae expressing the relationship between the source
and the target database. Most commonly, Σ is represented by a set of source-
to-target tuple-generating dependencies (s-t tgds) and target equality-generating
dependencies (egds). Where a source-to-target tuple-generating dependency is a
FO sentence ξ of the form: ∀x̄ (∀ȳ α(x̄, ȳ) → ∃z̄ β(x̄, z̄)), where x̄, ȳ and z̄ are
vectors of variables from Vars; α(x̄, ȳ) (often referred to as the body of the tgd) is
a conjunction of atoms over the source schema; and β(x̄, z̄) (often referred to as
the head of the tgd) is a conjunction of atoms over the target schema. An equality-
generating dependency is a FO sentence ξ of the form: ∀x̄ (α(x̄) → x = y), where
α(x̄) is a conjunction of atoms over the target schema and x,y are variables from
the vector x̄. A source instance I ∈ Inst(S) and a target instance J ∈ Inst(T)
are said to satisfy a s-t tgd ξ, denoted (I, J) |= ξ; if I ∪ J is a model of ξ in the
model-theoretic sense. Similarly, a target instance J satisfies an egd ξ, denoted
J |= ξ, if J is a model for ξ in the model-theoretic sense. This is extended to a
set of s-t tgds and egds Σ by stipulating that (I, J) |= ξ, for all s-t tgd ξ ∈ Σ
and J |= ξ, for all egd ξ ∈ Σ. A position in Σ is a pair (R, i), where R is a
relational symbol and 1 ≤ i ≤ arity(R). A position (R, i) is said to be affected in
Σ if it holds an existentially quantified variable somewhere in Σ. With aff(Σ) is
denoted the set of all affected positions in Σ. When the schemata are known or
not relevant in the context, we usually interchange the notion of schema mapping
and the set of dependencies that defines it.
    A data-exchange semantics O associates for a schema mapping M = (S, T, Σ)
and a source instance I a possible infinite set of target instances J(I, M )KO . We
refer to each element of J(I, M )KO as a solution for I and M under semantics O.
Queries. CQ, UCQ, UCQ¬ and UCQ6= denote the classes of conjunctive queries,
union of conjunctive queries, union of conjunctive queries with negation and
union of conjunctive queries with unequalities, respectively. For complete defini-
tions of these classes, please refer to [1]. For a given data-exchange semantics O,
schema mapping M and source instance     T I, the certain answers for a given query
Q is defined as: certO (Q, (I, M )) := J∈J(I,M )KO Q(J).


3     Data-exchange semantics
As mentioned in the introduction, there are many semantics considered in data-
exchange. In this section we briefly review the most prominent of these semantics.
In Section 3.4 we introduce a new semantics for mappings specified by sets of
s-t tgds and egds that addresses many of certain-answers anomalies occurring
under the current semantics. In the final part of this section we will review the
semantics based on bidirectional constraints.

3.1   OWA Data Exchange
The OWA-Semantics is the first semantics considered in data exchange [7]. This
is, by far, the most studied [7, 3, 8, 6, 5, 10, 13]. Under this semantics, given a
data-exchange mapping specified by Σ a set of tgds and egds and given a source
instance I, the OWA-semantics for I under Σ is defined as:

                   J(I, Σ)Kowa := {J ∈ Inst(T) : I ∪ J |= Σ}.                    (1)


3.2   CWA Data Exchange
To outcome the counter-intuitive behavior under the OWA-semantics Libkin [15]
introduced the CWA-semantics for mappings specified by a set of s-t tgds. Her-
nich and Schweikardt [14] extended the semantics by adding target tgds. Given a
set Σ of s-t tgds and a source instance I a CWA-solution for I and Σ is defined
as any naı̈ve table T over the target schema that satisfies the following three
requirements: 1) each null from dom(T ) is justified by some tuples from I and a
s-t tgd from Σ; 2) each justification for nulls is used only once; and 3) each fact
in T is justified by I and Σ. The CWA-semantics for certain-answers is defined
as: J(I, Σ)Kcwa :={J ∈ Rep(T ) : T is a CWA-solution for I under Σ}. Examples
of certain-answer anomalies under CWA-semantics can be found in [12].


3.3   GCWA∗ Data Exchange

The GCWA∗ semantics was inspired from Minker’s [16] GCWA- semantics and
nicely adapted by Hernich [12] for data exchange. For a source instance I and Σ
a set of s-t tgds and egds with Solmin (I, Σ) we denote all the subset-minimal
target instances J such that I ∪ J |= Σ. The CGWA∗ -semantics is defined as:
                         n
                         [
                
J(I, Σ)Kgcwa∗ := J : J =    Jn for some n, Ji ∈ Solmin (I, Σ) and I ∪ J |= Σ
                           i←1

Even if the GCWA∗ semantics solves all aforementioned anomalies, it introduces
new ones exemplified hereafter.
Example 1. Consider source instance with binary relation DeptC for depart-
ments and names of consultant employees working in that department and
ternary relation DeptF T E for departments and full-time employees (name and
id) from the given department. Suppose the company hires all the consultants as
full-time employees, thus the target schema will be the ternary relation DeptEmp
with the same structure as DeptF T E. The exchange mapping is represented as:

                DeptC(did, name) → ∃eid DeptEmp(did, name, eid);
             DeptF T E(did, name, eid) → DeptEmp(did, name, eid).

Consider source instance with consultants “john” and “adam” part of the “hr”
department and full-time employee “adam” with employee id 1 part of “hr”. Let
target query be: Is there exactly one employee named adam in hr department?.
Under the CGWA∗ -semantics the query will return the counter-intuitive answer
true, even if based on the source instance and given mapping one would expect
the answer to be false, because beside full-time employee “adam” there maybe
a consultant named “adam” in the “hr” department.


3.4   Inference-based semantics

In order to avoid the certain-answer anomalies presented in this section and in
the introduction, we present a new closed-world semantics for mappings specified
by a set of s-t tgds and egds.
     Let ξ : α(x̄, ȳ) → ∃z̄ β(x̄, z̄) be a s-t tgd and I a source instance. A set
of facts J 0 is said to be inferred from I and ξ with function f denoted with
   ξ
I−→f J 0 if f (α(x̄, ȳ)) ⊆ I and f (β(x̄, z̄)) = J 0 . Tuple t ∈ J 0 is said to be strongly
inferred from I and ξ with function f if t ∈ f |x̄ (β(x̄, z̄)), otherwise we say that t
is weakly inferred. Intuitively a tuple t is strongly inferred from I and ξ with f
if t does not depend on the assignment given by f to existential variables from
ξ. Let Σ be a set of s-t tgds, I a source instance and J a target instance. A
function κ that assigns for each ξ ∈ Σ a set of functions {f1 , f2 , . . . , fn } such
        ξ
that I −→fi Ji ⊆ J is called an inference strategy for I and J with Σ. Note that
the function that allocates the empty set for each ξ ∈ Σ is an inference strategy
for any I and Σ. Given an inference strategy κ and ξ ∈ Σ, a tuple t ∈ J is said
to be: strongly inferred with strategy κ for ξ if there exist f ∈ κ(ξ) such that t is
strongly inferred from I and σ with f ; not inferred with strategy κ for ξ if there
                        ξ
is no f ∈ κ(ξ) with I −→f J 0 3 t; weakly inferred with strategy κ for ξ otherwise.
A tuple t is said to be inferred with strategy κ for I, Σ and J if there exists
ξ ∈ Σ such that t is strongly or weakly inferred with κ for ξ. With this we are
now ready to introduce the inferred-based semantics.
Definition 1. Given a source instance I and Σ a set of s-t tgds and egds the
inference-based semantics for I and Σ, denoted with J(I, Σ)Kinf , is the set of all
target instances J for which there exists an inference strategy κ such that:
1. Every tuple t ∈ J is inferred with κ for I, Σ and J;
2. For every ξ : α → β ∈ Σ and every function f with f (α) ⊆ I there exists
   f 0 ∈ κ(ξ) such that f 0 is an extensionSof f ;
3. For every ξ : α → β ∈ Σ and Jκ,ξ =                      J , there is no function
                                              →g Jg ,g∈κ(ξ) g
                                               ξ
                                             I−
   f with f (β) ∈ Jκ,ξ and f (α) 6⊆ I, where f (β) contains at least one weakly
   inferred tuple with strategy κ for ξ.
4. (I, J) |= Σ in the model-theoretic sense.
    Intuitively the first rule from the definition states that all tuples in the target
instance needs to be inferred from the source instance and the mapping, this is
taking care of the tuples not inferred present under OWA-semantics. This also
allows the same nulls to be matched to different constant as long as there exists
an inference for each of these. This takes care of the query anomalies present
under CWA-semantics. The second condition makes sure that all possible source
triggers are fired. With this all tuples that can be inferred will be present in at
least in one instance in the semantics, this solves the anomaly presented in Ex-
ample 1 for GCWA∗ . The third condition ensures that the assignment of nulls
does not contradict with the inference strategy used, thus taking care of the
query anomaly from the introduction. The last condition is needed in order to
guarantee that the instances from the semantics are models for the egds. Need
to mention here that by renouncing to Condition 3 from the previous defini-
tion we obtain the semantics defined in [9] in the context of exchange recovery.
The following theorem reveals the complexities for the solution existence (Is
J(I, Σ)Kinf = ∅? for a fixed set Σ) and solution check (Is J ∈ J(I, Σ)Kinf = ∅ for
a fixed set Σ) problems.
Theorem 1. For a fixed set of s-t tgds the solution-existence problem can be
solved in polynomial time and the solution-check is an NP-complete problem
under inference-based semantics.
3.5   Bidirectional constraints
Arenas et al. in [4] considered another approach to the certain answer anomaly
problem by changing the language used to express the schema mapping. For
this, the authors proposed the language of bidirectional constraints. Where a
bidirectional constraints is a FO sentence of the form: ∀x̄ α(x̄) ↔ β(x̄);, where α
and β are FO formulae over atoms from the source and target schema respectively
with free variables x̄. If the language of α is LS and of β is LT , then we are talking
about a hLS , LT i-dependency. With this, given a source instance I and Σ ↔ a
set of hLS , LT i-dependencies, the bidirectional semantics is defined as:

                   J(I, Σ ↔ )K↔ :={J ∈ Inst(T) : I ∪ J |= Σ ↔ }.                   (2)

This approach did solve most of the anomalies related to the other semantics.
Unfortunately, this is achieved at a high cost, as even the most common data-
exchange problems became non-tractable. For example, testing if the semantics
is empty is an NP-hard problem [4] even for a set of hCQ, CQi-dependencies.
    Another issue with bidirectional semantics is that there are simple unidi-
rectional mappings for which neither of the presented closed-world semantics
are not expressible using bidirectional constraints without changing the target
schema, as shown in the example below.
Example 2. Consider source schema with two binary relations P F T E, for projects
and the full-time employees assigned on the project, and P T , that contains the
tasks associated with each project. Let target schema consist of a binary rela-
tion P E, for projects and employee assigned to that project, and binary relation
T M , for employee and the task they manage. Consider source instance I, with
P F T E I = {(hr, adam)} and P T I = {(hr, comp)}, stating that full-time em-
ployee ‘adam‘ works on the ‘hr‘ project and the ’hr’ project consists of one task
‘comp’. Consider the following mapping Σ stating that each full-time employee
working on a project is also an employee working on the project (ξ1 ) and that for
each project task there exists an employee working on that project that manages
that task (ξ2 ).
               ξ1 : P F T E(pid, eid) → P E(pid, eid)
              ξ2 : P T (pid, tid) → ∃eid P E(pid, eid), T M (eid, tid).
    It is easy to observe that for any set Σ ↔ of bidirectional constraints there
is no possibility to differentiate between the tuples from relation P E as being
inferred from P F T E or P T source relation. Thus, for J1 = {P E(hr, adam)},
J2 = J1 ∪ {T M (adam, comp)} and J3 = J1 ∪ {P E(hr, sal), T M (sal, comp)}, we
have that either J1 ∈ J(I, Σ ↔ )K↔ or J2 ∈  / J(I, Σ ↔ )K↔ or J3 ∈
                                                                 / J(I, Σ ↔ )K↔ , where
“sal” is a consultant not a full-time employee. On the other hand, we have that
J1 ∈/ J(I, Σ)K∗ and J2 , J3 ∈ J(I, Σ)K∗ , for any semantics ∗ ∈ {cwa, gcwa∗ , inf}.

4     Universal Representatives
As mentioned in the introduction, data exchange transforms a database existing
under a source schema into another database under the target schema. This
means that for a given semantics it would be preferable to be able to materialize
one or more table representations on a target. The materialized table(s) could
later be used to obtain answers for different queries over the target database.
In this section we will introduce a new type of table capable of representing all
solutions for the inference-based semantics. Thus this new table can be used to
obtain the certain answers for any FO query.
Definition 2. Let Nulls be partitioned in two infinite sets Nullso and Nullsc . A
semi-naı̈ve table is a naı̈ve table T for which each null is identified as being either
from Nullso or Nulls[ c
                     n . The semi-naı̈ve table T has the following interpretation:
                         vi (T )) : v valuation over Nullsc , vi valuation over Nullso
           
 Rep(T ):= J = v(
                       i←1

   The nulls from Nullso are called open and denoted ⊥o (possibly subscripted).
The ones from Nullsc are called closed and denoted ⊥c (possibly subscripted).
Example 3. Let T be the semi-naı̈ve table with RT = {(a, ⊥o1 , ⊥c1 , ⊥c2 )}. We have
I1 , I2 , I3 ∈ Rep(T ), where RI1 = {(a, a, b, c)}, RI2 = {(a, a, b, c), (a, b, b, c)} and
RI3 = {(a, a, b, a), (a, b, b, a), (a, c, b, a)}. For RI4 = {(a, a, b, c), (a, a, b, d)}, we
have that I4 ∈  / Rep(T ), because closed null ⊥c2 was valued to both c and d.
    To a semi-naı̈ve table T we add a global condition ϕ∗ , denoted (T, ϕ∗ ), as a
conjunction δ1 ∧ δ2 ∧ . . . ∧ δn where each conjunct δi , 1 ≤ i ≤ n, is a disjunc-
tion of unequalities over the elements from dom(T ). Given v a valuation over
Nullsc and v1 , v2 , . . . , vn valuations over Nullso , for some integer n, we say that
(v, {v1 , v2 , . . . , vn }) satisfies x 6= y, denoted (v, {v1 , v2 , . . . , vn }) |= (x 6= y), iff:
 – v(x) 6= a, when x ∈ Nullsc and y = a ∈ Cons;
 – vi (x) 6= a, for all i ≤ n, when x ∈ Nullso and y = a ∈ Cons;
 – v(x) 6= vi (y), for all i ≤ n, when x ∈ Nullsc and y ∈ Nullso ;
 – v(x) 6= v(y), when x, y ∈ Nullsc ; and
 – vi (x) 6= vj (y), for all i, j ≤ n, when x, y ∈ Nullso .
    The previous notion is naturally extended to a disjunction of unequalities and
to the conjunctive formula ϕ∗ where each conjunct represents a disjunction of
unequalities. This is denoted (v, {v1 , v2 , . . . , vn }) |= ϕ∗ . With this we can define:
                       [n
Rep(T, ϕ∗ ):= J = v(      vi (T )) : n an integer, v valuation over Nullsc ,
              
                           i←1
                                  vi valuation over Nullso and (v, {v1 , . . . , vn }) |= ϕ∗
Example 4. Consider (T, ϕ∗ ), where T is the same as in Example 3 and global
condition ϕ∗ :=(⊥o1 6= a ∨ ⊥c2 6= ⊥o1 ). It can be verified that I1 , I2 ∈ Rep(T, ϕ∗ )
and I3 ∈/ Rep(T, ϕ∗ ) because the tuple R(a, a, b, a) in I3 was obtained from
valuations v = {⊥c1 /b, ⊥c2 /a} and v1 = {⊥o1 /a} and (v, {v1 }) 6|= ϕ∗ .
    For the next result, let’s define the notion of safe egds. For a set Σ of s-t tgds
and egds we say that it contains safe egds if each variable that occurs more
than once in the body of an egd it occurs only in positions different than aff(Σ).
    The following result shows that for any set Σ of s-t tgds and safe egds there
exists an exact representation for the inference-based semantics.
Theorem 2. Let Σ be a set of s-t tgds and safe egds. Then one may compute
in polynomial time (for a fixed Σ) (T, ϕ∗ ) such that J(I, Σ)Kinf = Rep(T, ϕ∗ ).
    The pair (T, ϕ∗ ) from the previous theorem is called universal representative
for I and Σ. In Theorem 2 the universal representative is computed using a
3-step chase algorithm (for a detailed description please check the full version of
the paper).
Example 5. Consider Σ = {ξ1 , ξ2 , ξ3 }, where:
                        ξ1 :   S(x, y) → K(x, z), V (z, y);
                        ξ2 :     R(x) → U (x, y);
                        ξ3 :   U (x, y), K(x, z) → y = z.

Let instance I be S I = {(a, b), (c, d)} and RI = {(a)}. In this case a universal
representative for I and Σ is the pair (T, ϕ∗ ), where K T = {(a, ⊥c1 ), (c, ⊥o1 )},
V T = {(⊥c1 , b), (⊥o1 , d)}, U T = {(a, ⊥c1 )} and ϕ∗ :=(⊥c1 6= ⊥o1 ).


5   Query Answering
If in the previous section we showed how we can compute universal represen-
tatives for inference-based semantics. in this section we will focus on when and
how these representatives can be used to compute certain answers for different
query classes and check the complexities of such evaluations. Let us first start
by defining the certain-answer evaluation problem.

                Problem Evalinf (Σ,Q)
                Input:    I ∈ Inst(S) and t̄ ∈ (dom(I))arity(t̄) .
                Question: Is t̄ ∈ certinf (Q, (I, Σ))?

    The following theorem ensures that for any mapping Σ of s-t tgds and safe
egds the OWA and inference-based semantics agrees on UCQ certain-answers
for any source instance.
Theorem 3. Let Σ be a set of s-t tgds and safe egds . Then for any source
instance I and q ∈ UCQ we have certowa (Q, (I, Σ)) = certinf (Q, (I, Σ)) and one
may use the universal representative for I and Σ to compute certinf (Q, (I, Σ))
in polynomial time (for a fixed Σ).
    The following negative result shows that not all queries are tractable under
inference-based semantics.
Theorem 4. There exists a set Σ of s-t tgds and safe egds and there exists a
query Q ∈ CQ¬ such that the problem Evalinf (Σ,Q) is coNP-complete.
   From this it follows that for tractable query evaluation under inference-based
semantics we need to restrict either the set Σ or the query class used or both.
In the last part of this section we will present such restrictions that ensure
tractability for certain answers evaluation under inference-based semantics.
Proposition 1. Let Σ be a set of full s-t tgds and safe egds. Then for any FO
query Q the Evalinf (Σ,Q) is tractable and one may use the universal represen-
tative to compute it.

   With UCQ6=,n is denoted the set of UCQ6= queries with at most n unequalities
per disjunct.

Theorem 5. Let Σ be a set of s-t tgds and safe egds. Then for any UCQ6=,1
query Q the Evalinf (Σ,Q) problem is tractable and one may use only the univer-
sal representative to evaluate the query. And the Evalinf (Σ,Q) for q ∈ UCQ6=,2
is coNP-complete.

    In [12] Hernich showed that if the mapping is given by a restricted set of
s-t tgds (packed s-t tgds), then the certain answers evaluation problem may be
answered in polynomial time for universal queries under the GCWA∗ -semantics.
Where a universal query is one of the form Q(x̄):=∀ȳ β(x̄, ȳ), with β a quantifier-
free FO formula over the target schema. In our next result we show that similar
polynomial time can be achieved under the inference-based semantics even with-
out any restriction on the s-t tgds and also by adding safe target egds.

Theorem 6. Let I be a source instance and Σ a set of s-t tgds and safe egds.
Then for any universal query Q, Evalinf (Σ,Q) can be solved in PTIME.

   Next we will present a subclass of CQ¬ that has tractable query evaluation
properties for a restricted class of s-t tgds and safe egds. Let CQ¬,1 denote the
subclass of CQ¬ such that each query of this class has exactly one positive atom.

Theorem 7. Let Σ be a set of s-t tgds and safe egds, such that for all s-t
tgds each existentally quantified variable occurs in only one atom and each egd
does not equate two variables both occurring in affected positions. Then for any
CQ¬,1 query the Evalinf (Σ, Q) problem is polynomial and can be decided using
a universal representative.

   Intuitively, the restrictions on the mapping language from the previous theo-
rem ensure that the universal representative does not have any global condition
and it contains only open nulls.


6   Conclusions

In this paper we introduced the inference-based semantics for data exchange.
We showed that the inference-based semantics solves most of the certain-answers
anomalies existing in the existing semantics and that one may compute a univer-
sal representative that exactly represents the semantics. For the certain answer
semantics it remained an open problem if one can evaluate certain-answers for
any CQ¬,1 queries in polynomial time for any Σ and not only for the restricted
class of dependencies presented here. As further work we intend to increase the
language for this semantics to included target tgds too.
References
 1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases.
    Addison-Wesley, 1995.
 2. Serge Abiteboul, Paris C. Kanellakis, and Gösta Grahne. On the representation
    and querying of sets of possible worlds. In SIGMOD Conference, pages 34–48,
    1987.
 3. Marcelo Arenas, Pablo Barceló, Ronald Fagin, and Leonid Libkin. Locally con-
    sistent transformations and query answering in data exchange. In Proceedings of
    the Twenty-third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of
    Database Systems, June 14-16, 2004, Paris, France, pages 229–240, 2004.
 4. Marcelo Arenas, Gabriel Diéguez, and Jorge Pérez. Expressiveness and complexity
    of bidirectional constraints for data exchange. In Proceedings of the 8th Alberto
    Mendelzon Workshop on Foundations of Data Management, Cartagena de Indias,
    Colombia, June 4-6, 2014., 2014.
 5. Marcelo Arenas, Jorge Pérez, and Juan L. Reutter. Data exchange beyond complete
    data. In PODS, pages 83–94, 2011.
 6. Alin Deutsch, Alan Nash, and Jeffrey B. Remmel. The chase revisited. In PODS,
    pages 149–158, 2008.
 7. Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data ex-
    change: semantics and query answering. Theor. Comput. Sci., 336(1):89–124, 2005.
 8. Ronald Fagin, Phokion G. Kolaitis, and Lucian Popa. Data exchange: getting to
    the core. ACM Trans. Database Syst., 30(1):174–210, 2005.
 9. Gösta Grahne, Ali Moallemi, and Adrian Onet. Recovering exchanged data. In
    Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS
    2015, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 105–116, 2015.
10. Gösta Grahne and Adrian Onet. Representation systems for data exchange. In
    ICDT, pages 208–221, 2012.
11. André Hernich. Foundations of query answering in relational data exchange. PhD
    thesis, 2010.
12. André Hernich. Answering non-monotonic queries in relational data exchange.
    Logical Methods in Computer Science, 7(3), 2011.
13. André Hernich. Computing universal models under guarded tgds. In ICDT, pages
    222–235, 2012.
14. André Hernich and Nicole Schweikardt. Cwa-solutions for data exchange settings
    with target dependencies. In PODS, pages 113–122, 2007.
15. Leonid Libkin. Data exchange and incomplete information. In PODS, pages 60–69,
    2006.
16. Jack Minker. On indefinite databases and the closed world assumption. In CADE,
    pages 292–308, 1982.
17. Christos H. Papadimitriou. Computational complexity. Addison-Wesley, 1994.