=Paper=
{{Paper
|id=Vol-1644/paper7
|storemode=property
|title=Inference-Based Semantics in Data Exchange
|pdfUrl=https://ceur-ws.org/Vol-1644/paper7.pdf
|volume=Vol-1644
|authors=Adrian Onet
|dblpUrl=https://dblp.org/rec/conf/amw/Onet16
}}
==Inference-Based Semantics in Data Exchange==
Inference-based semantics in Data Exchange Adrian Onet Morgan Stanley, Montreal, Canada adrian.onet@morganstanley.com Abstract. Data Exchange is an old problem that was firstly studied from a theoretical point of view only in 2003. Since then many approaches were considered when it came to the language describing the relationship between the source and the target schema. These approaches focus on what it makes a target instance a “good” solution for data-exchange. In this paper we propose the inference-based semantics that solves many certain-answer anomalies existing in current data-exchange semantics. We show that in case of mappings represented by source-to-target tgds and safe target egds we may compute in polynomial time a target table (universal representative) able to exactly represent the inference-based semantics. We show that the new semantics agrees with the other se- mantics when it comes to UCQ queries. Finally we show that one may use the universal representative to compute certain-answers in tractable time for large class of non-UCQ queries even for a subclass of UCQ¬ . 1 Introduction The data-exchange problem is that of transforming a database existing under a source schema into another database under a different target schema. This database transformation is based on mappings that describe the relationship between the source and the target database. A mapping M can be viewed as a, possibly infinite, set of pairs (I, J), where I is a source instance and J a target instance. In this case, J is called a data-exchange solution for I and M . The mapping between the source and the target database is usually specified in some logic formalism. The most widely accepted mapping language is the one based on sets of tuple-generating dependencies (tgds) and equality-generating dependencies (egds). It is very common to query the target solutions. This takes us to the prob- lem of querying incomplete databases [2]. The exact answers semantics defines the answer to a query over an incomplete database as the set of all answers to the query for each instance from its semantics. This approach is rarely feasible in practice, therefore two approximations are commonly accepted: certain an- swer (answers occurring for all solutions) and maybe answer semantics (answers occurring for some solutions). Based on the interpretation of the mapping language there may be sev- eral semantics in data exchange. The first semantics was introduced in [7] and it is here referred to as the OWA-semantics (open world assumption). In or- der to improve the certain answer behavior for non-UCQ queries under OWA- semantics, Libkin [15] introduced the first closed-world semantics in data ex- change, known as CWA-semantics. This semantics was later on extended by Hernich and Schweikardt [14] to include target tgds. In CWA-semantics a so- lution incorporates only tuples that are “justified” by the source and the de- pendencies. Later on Hernich [11] introduced the GCWA∗ -semantics for data exchange. Unfortunately most of these semantics have a rather strange behavior when looking for certain answers over the set of solutions. More recently Arenas et al. [4, ?] introduced a new semantics for data-exchange based on bidirectional constraints. This new semantics solves most of the query anomalies present in the other semantics but it comes with price of non-tractable data complexity even for the simplest data-exchange problems. To better understand the type of anomalies we may encounter under these se- mantics, consider the next simple example. A company has in the source schema a binary relation P representing the relationship between projects and employ- ees. After some reorganization it was realized that each projects financing should be provided by one or more cost-centers, each employee belonging to one of these cost-centers. With this the company creates the target schema with two binary relations P C and CE representing the project to cost-center and cost-center to employee relationship respectively. In the case of the unidirectional semantics the process can be specified by tgd ξ1 : ∀p ∀e P (p, e) → ∃cc P C(p, cc) ∧ CE(cc, e). Let source I be P I = {(p1 , e1 ), (p1 , e2 ), (p2 , e3 )}, stating that there are two employees e1 and e2 working on project p1 and only employee e3 working on project p2 . Consider query: Q:=∀p ∀cc P C(p, cc) ∧ CE(cc, “e3 “) → p = “p2 “ (Is e3 involved only in project p2 ?). Let instance J be P C J = {(p1 , cc1 ), (p2 , cc1 )} and CE J = {(cc1 , e1 ), (cc1 , e2 ), (cc1 , e3 )}. Because J is part of all aforemen- tioned unidirectional semantics, the certain answer for the given query will be un-intuitively false. Naturally one may expect this answer to be true, as there is no indication from the source that employee e3 is involved in project p1 . Motivated by certain answer anomalies in the current data-exchange seman- tics, in this paper we propose a new data-exchange semantics based on logical inference for mappings represented by sets of s-t tgds and safe target egds. This semantics eliminates most anomalies related to certain-answers and keeps the same certain answers with the other semantics for union of conjunctive queries. We will also show that under this settings for any source instance one may com- pute a target table representation that may be queries to obtain certain-answers for any FO query. Finally we will review some complexity results regarding the certain-answers under this new semantics and present large classes of FO queries for which certain-answers can be computed in polynomial time. More details, examples and proofs can be found in the full version of this paper. 2 Preliminaries This section reviews the basic technical preliminaries and definitions. More infor- mation on relational database theory can be obtained from [1]. We will consider the complexity classes P, NP and coNP. For the definition of these classes we refer to [17]. A finite mapping f , where f (ai ) = bi , for i ∈ {1, . . . , n}, will be represented as {a1 /b1 , a2 /b2 , . . . , an /bn }. When it is clear from the context f , it will be also viewed as the following formula a1 = b1 ∧ a2 = b2 ∧ . . . ∧ an = bn . For a mapping f and set A, with f |A will denote the mapping f restricted to values from A. By abusing the notation, a vector x̄ = (x1 , x2 , . . . , xn ) will be often viewed as the set {x1 , x2 , . . . , xn }, thus we may have set operations like xi ∈ x̄ or x̄ ∩ ȳ. Databases. A schema S is a finite set {S1 , S2 , . . . , Sn } of relational symbols, each symbol Si having a fixed arity arity(Si ). Let Cons, Nulls and Vars be three countably infinite sets of constants, nulls and variables such that there are no common elements between any two of these sets. Elements from Cons are sym- bolized by lower case (possibly subscripted) characters from the beginning of the alphabet (e.g. a, b1 ). Elements from Vars are represented by lower case (pos- sibly subscripted) characters from the end of the alphabet (e.g. z, x2 ). Each element from the countable set Nulls is represented with, possible subscripted, symbol ⊥ (e.g. ⊥i ). A naı̈ve table T of S is an interpretation that assigns to each relational symbol Si a finite set SiT ⊂ (Cons ∪ Nulls)arity(Si ) , sometimes we also view Si as a relation between elements of Cons ∪ Nulls. The set dom(T ) means all elements that occur in T , clearly dom(T ) ⊆ Cons ∪ Nulls. A naı̈ve table T is called an instance if dom(T ) ⊂ Cons. In contrast to general naı̈ve tables, which are identified by capitalized characters from the end of the al- phabet (e.g. T , V ), instances are represented by capitalized characters from the middle of the alphabet (e.g. I, J). The set of all instances over schema S is denoted Inst(S). A valuation v is a mapping over the set Cons ∪ Nulls such that v(a) = a, for all a ∈ Cons, and v(⊥) ∈ Cons, for all ⊥ ∈ Nulls. Valuations are extended to tuples and naı̈ve tables as follows. For each tuple t̄ = (t1 , t2 , . . . , tn ), let v(t̄):=(v(t1 ), v(t2 ), . . . , v(tn )); and for a naı̈ve table T over schema S, define v(T ) as Rv(T ) :={v(t̄) : t̄ ∈ RT }, for all R ∈ S. The interpretation of a naı̈ve table T is given by Rep(T ):={v(T ) : v valuation}. Schema mappings. A data-exchange schema mapping is a triple M = (S, T, Σ), where S and T are two disjoint schemas named the source and target schema re- spectively; Σ is a set of formulae expressing the relationship between the source and the target database. Most commonly, Σ is represented by a set of source- to-target tuple-generating dependencies (s-t tgds) and target equality-generating dependencies (egds). Where a source-to-target tuple-generating dependency is a FO sentence ξ of the form: ∀x̄ (∀ȳ α(x̄, ȳ) → ∃z̄ β(x̄, z̄)), where x̄, ȳ and z̄ are vectors of variables from Vars; α(x̄, ȳ) (often referred to as the body of the tgd) is a conjunction of atoms over the source schema; and β(x̄, z̄) (often referred to as the head of the tgd) is a conjunction of atoms over the target schema. An equality- generating dependency is a FO sentence ξ of the form: ∀x̄ (α(x̄) → x = y), where α(x̄) is a conjunction of atoms over the target schema and x,y are variables from the vector x̄. A source instance I ∈ Inst(S) and a target instance J ∈ Inst(T) are said to satisfy a s-t tgd ξ, denoted (I, J) |= ξ; if I ∪ J is a model of ξ in the model-theoretic sense. Similarly, a target instance J satisfies an egd ξ, denoted J |= ξ, if J is a model for ξ in the model-theoretic sense. This is extended to a set of s-t tgds and egds Σ by stipulating that (I, J) |= ξ, for all s-t tgd ξ ∈ Σ and J |= ξ, for all egd ξ ∈ Σ. A position in Σ is a pair (R, i), where R is a relational symbol and 1 ≤ i ≤ arity(R). A position (R, i) is said to be affected in Σ if it holds an existentially quantified variable somewhere in Σ. With aff(Σ) is denoted the set of all affected positions in Σ. When the schemata are known or not relevant in the context, we usually interchange the notion of schema mapping and the set of dependencies that defines it. A data-exchange semantics O associates for a schema mapping M = (S, T, Σ) and a source instance I a possible infinite set of target instances J(I, M )KO . We refer to each element of J(I, M )KO as a solution for I and M under semantics O. Queries. CQ, UCQ, UCQ¬ and UCQ6= denote the classes of conjunctive queries, union of conjunctive queries, union of conjunctive queries with negation and union of conjunctive queries with unequalities, respectively. For complete defini- tions of these classes, please refer to [1]. For a given data-exchange semantics O, schema mapping M and source instance T I, the certain answers for a given query Q is defined as: certO (Q, (I, M )) := J∈J(I,M )KO Q(J). 3 Data-exchange semantics As mentioned in the introduction, there are many semantics considered in data- exchange. In this section we briefly review the most prominent of these semantics. In Section 3.4 we introduce a new semantics for mappings specified by sets of s-t tgds and egds that addresses many of certain-answers anomalies occurring under the current semantics. In the final part of this section we will review the semantics based on bidirectional constraints. 3.1 OWA Data Exchange The OWA-Semantics is the first semantics considered in data exchange [7]. This is, by far, the most studied [7, 3, 8, 6, 5, 10, 13]. Under this semantics, given a data-exchange mapping specified by Σ a set of tgds and egds and given a source instance I, the OWA-semantics for I under Σ is defined as: J(I, Σ)Kowa := {J ∈ Inst(T) : I ∪ J |= Σ}. (1) 3.2 CWA Data Exchange To outcome the counter-intuitive behavior under the OWA-semantics Libkin [15] introduced the CWA-semantics for mappings specified by a set of s-t tgds. Her- nich and Schweikardt [14] extended the semantics by adding target tgds. Given a set Σ of s-t tgds and a source instance I a CWA-solution for I and Σ is defined as any naı̈ve table T over the target schema that satisfies the following three requirements: 1) each null from dom(T ) is justified by some tuples from I and a s-t tgd from Σ; 2) each justification for nulls is used only once; and 3) each fact in T is justified by I and Σ. The CWA-semantics for certain-answers is defined as: J(I, Σ)Kcwa :={J ∈ Rep(T ) : T is a CWA-solution for I under Σ}. Examples of certain-answer anomalies under CWA-semantics can be found in [12]. 3.3 GCWA∗ Data Exchange The GCWA∗ semantics was inspired from Minker’s [16] GCWA- semantics and nicely adapted by Hernich [12] for data exchange. For a source instance I and Σ a set of s-t tgds and egds with Solmin (I, Σ) we denote all the subset-minimal target instances J such that I ∪ J |= Σ. The CGWA∗ -semantics is defined as: n [ J(I, Σ)Kgcwa∗ := J : J = Jn for some n, Ji ∈ Solmin (I, Σ) and I ∪ J |= Σ i←1 Even if the GCWA∗ semantics solves all aforementioned anomalies, it introduces new ones exemplified hereafter. Example 1. Consider source instance with binary relation DeptC for depart- ments and names of consultant employees working in that department and ternary relation DeptF T E for departments and full-time employees (name and id) from the given department. Suppose the company hires all the consultants as full-time employees, thus the target schema will be the ternary relation DeptEmp with the same structure as DeptF T E. The exchange mapping is represented as: DeptC(did, name) → ∃eid DeptEmp(did, name, eid); DeptF T E(did, name, eid) → DeptEmp(did, name, eid). Consider source instance with consultants “john” and “adam” part of the “hr” department and full-time employee “adam” with employee id 1 part of “hr”. Let target query be: Is there exactly one employee named adam in hr department?. Under the CGWA∗ -semantics the query will return the counter-intuitive answer true, even if based on the source instance and given mapping one would expect the answer to be false, because beside full-time employee “adam” there maybe a consultant named “adam” in the “hr” department. 3.4 Inference-based semantics In order to avoid the certain-answer anomalies presented in this section and in the introduction, we present a new closed-world semantics for mappings specified by a set of s-t tgds and egds. Let ξ : α(x̄, ȳ) → ∃z̄ β(x̄, z̄) be a s-t tgd and I a source instance. A set of facts J 0 is said to be inferred from I and ξ with function f denoted with ξ I−→f J 0 if f (α(x̄, ȳ)) ⊆ I and f (β(x̄, z̄)) = J 0 . Tuple t ∈ J 0 is said to be strongly inferred from I and ξ with function f if t ∈ f |x̄ (β(x̄, z̄)), otherwise we say that t is weakly inferred. Intuitively a tuple t is strongly inferred from I and ξ with f if t does not depend on the assignment given by f to existential variables from ξ. Let Σ be a set of s-t tgds, I a source instance and J a target instance. A function κ that assigns for each ξ ∈ Σ a set of functions {f1 , f2 , . . . , fn } such ξ that I −→fi Ji ⊆ J is called an inference strategy for I and J with Σ. Note that the function that allocates the empty set for each ξ ∈ Σ is an inference strategy for any I and Σ. Given an inference strategy κ and ξ ∈ Σ, a tuple t ∈ J is said to be: strongly inferred with strategy κ for ξ if there exist f ∈ κ(ξ) such that t is strongly inferred from I and σ with f ; not inferred with strategy κ for ξ if there ξ is no f ∈ κ(ξ) with I −→f J 0 3 t; weakly inferred with strategy κ for ξ otherwise. A tuple t is said to be inferred with strategy κ for I, Σ and J if there exists ξ ∈ Σ such that t is strongly or weakly inferred with κ for ξ. With this we are now ready to introduce the inferred-based semantics. Definition 1. Given a source instance I and Σ a set of s-t tgds and egds the inference-based semantics for I and Σ, denoted with J(I, Σ)Kinf , is the set of all target instances J for which there exists an inference strategy κ such that: 1. Every tuple t ∈ J is inferred with κ for I, Σ and J; 2. For every ξ : α → β ∈ Σ and every function f with f (α) ⊆ I there exists f 0 ∈ κ(ξ) such that f 0 is an extensionSof f ; 3. For every ξ : α → β ∈ Σ and Jκ,ξ = J , there is no function →g Jg ,g∈κ(ξ) g ξ I− f with f (β) ∈ Jκ,ξ and f (α) 6⊆ I, where f (β) contains at least one weakly inferred tuple with strategy κ for ξ. 4. (I, J) |= Σ in the model-theoretic sense. Intuitively the first rule from the definition states that all tuples in the target instance needs to be inferred from the source instance and the mapping, this is taking care of the tuples not inferred present under OWA-semantics. This also allows the same nulls to be matched to different constant as long as there exists an inference for each of these. This takes care of the query anomalies present under CWA-semantics. The second condition makes sure that all possible source triggers are fired. With this all tuples that can be inferred will be present in at least in one instance in the semantics, this solves the anomaly presented in Ex- ample 1 for GCWA∗ . The third condition ensures that the assignment of nulls does not contradict with the inference strategy used, thus taking care of the query anomaly from the introduction. The last condition is needed in order to guarantee that the instances from the semantics are models for the egds. Need to mention here that by renouncing to Condition 3 from the previous defini- tion we obtain the semantics defined in [9] in the context of exchange recovery. The following theorem reveals the complexities for the solution existence (Is J(I, Σ)Kinf = ∅? for a fixed set Σ) and solution check (Is J ∈ J(I, Σ)Kinf = ∅ for a fixed set Σ) problems. Theorem 1. For a fixed set of s-t tgds the solution-existence problem can be solved in polynomial time and the solution-check is an NP-complete problem under inference-based semantics. 3.5 Bidirectional constraints Arenas et al. in [4] considered another approach to the certain answer anomaly problem by changing the language used to express the schema mapping. For this, the authors proposed the language of bidirectional constraints. Where a bidirectional constraints is a FO sentence of the form: ∀x̄ α(x̄) ↔ β(x̄);, where α and β are FO formulae over atoms from the source and target schema respectively with free variables x̄. If the language of α is LS and of β is LT , then we are talking about a hLS , LT i-dependency. With this, given a source instance I and Σ ↔ a set of hLS , LT i-dependencies, the bidirectional semantics is defined as: J(I, Σ ↔ )K↔ :={J ∈ Inst(T) : I ∪ J |= Σ ↔ }. (2) This approach did solve most of the anomalies related to the other semantics. Unfortunately, this is achieved at a high cost, as even the most common data- exchange problems became non-tractable. For example, testing if the semantics is empty is an NP-hard problem [4] even for a set of hCQ, CQi-dependencies. Another issue with bidirectional semantics is that there are simple unidi- rectional mappings for which neither of the presented closed-world semantics are not expressible using bidirectional constraints without changing the target schema, as shown in the example below. Example 2. Consider source schema with two binary relations P F T E, for projects and the full-time employees assigned on the project, and P T , that contains the tasks associated with each project. Let target schema consist of a binary rela- tion P E, for projects and employee assigned to that project, and binary relation T M , for employee and the task they manage. Consider source instance I, with P F T E I = {(hr, adam)} and P T I = {(hr, comp)}, stating that full-time em- ployee ‘adam‘ works on the ‘hr‘ project and the ’hr’ project consists of one task ‘comp’. Consider the following mapping Σ stating that each full-time employee working on a project is also an employee working on the project (ξ1 ) and that for each project task there exists an employee working on that project that manages that task (ξ2 ). ξ1 : P F T E(pid, eid) → P E(pid, eid) ξ2 : P T (pid, tid) → ∃eid P E(pid, eid), T M (eid, tid). It is easy to observe that for any set Σ ↔ of bidirectional constraints there is no possibility to differentiate between the tuples from relation P E as being inferred from P F T E or P T source relation. Thus, for J1 = {P E(hr, adam)}, J2 = J1 ∪ {T M (adam, comp)} and J3 = J1 ∪ {P E(hr, sal), T M (sal, comp)}, we have that either J1 ∈ J(I, Σ ↔ )K↔ or J2 ∈ / J(I, Σ ↔ )K↔ or J3 ∈ / J(I, Σ ↔ )K↔ , where “sal” is a consultant not a full-time employee. On the other hand, we have that J1 ∈/ J(I, Σ)K∗ and J2 , J3 ∈ J(I, Σ)K∗ , for any semantics ∗ ∈ {cwa, gcwa∗ , inf}. 4 Universal Representatives As mentioned in the introduction, data exchange transforms a database existing under a source schema into another database under the target schema. This means that for a given semantics it would be preferable to be able to materialize one or more table representations on a target. The materialized table(s) could later be used to obtain answers for different queries over the target database. In this section we will introduce a new type of table capable of representing all solutions for the inference-based semantics. Thus this new table can be used to obtain the certain answers for any FO query. Definition 2. Let Nulls be partitioned in two infinite sets Nullso and Nullsc . A semi-naı̈ve table is a naı̈ve table T for which each null is identified as being either from Nullso or Nulls[ c n . The semi-naı̈ve table T has the following interpretation: vi (T )) : v valuation over Nullsc , vi valuation over Nullso Rep(T ):= J = v( i←1 The nulls from Nullso are called open and denoted ⊥o (possibly subscripted). The ones from Nullsc are called closed and denoted ⊥c (possibly subscripted). Example 3. Let T be the semi-naı̈ve table with RT = {(a, ⊥o1 , ⊥c1 , ⊥c2 )}. We have I1 , I2 , I3 ∈ Rep(T ), where RI1 = {(a, a, b, c)}, RI2 = {(a, a, b, c), (a, b, b, c)} and RI3 = {(a, a, b, a), (a, b, b, a), (a, c, b, a)}. For RI4 = {(a, a, b, c), (a, a, b, d)}, we have that I4 ∈ / Rep(T ), because closed null ⊥c2 was valued to both c and d. To a semi-naı̈ve table T we add a global condition ϕ∗ , denoted (T, ϕ∗ ), as a conjunction δ1 ∧ δ2 ∧ . . . ∧ δn where each conjunct δi , 1 ≤ i ≤ n, is a disjunc- tion of unequalities over the elements from dom(T ). Given v a valuation over Nullsc and v1 , v2 , . . . , vn valuations over Nullso , for some integer n, we say that (v, {v1 , v2 , . . . , vn }) satisfies x 6= y, denoted (v, {v1 , v2 , . . . , vn }) |= (x 6= y), iff: – v(x) 6= a, when x ∈ Nullsc and y = a ∈ Cons; – vi (x) 6= a, for all i ≤ n, when x ∈ Nullso and y = a ∈ Cons; – v(x) 6= vi (y), for all i ≤ n, when x ∈ Nullsc and y ∈ Nullso ; – v(x) 6= v(y), when x, y ∈ Nullsc ; and – vi (x) 6= vj (y), for all i, j ≤ n, when x, y ∈ Nullso . The previous notion is naturally extended to a disjunction of unequalities and to the conjunctive formula ϕ∗ where each conjunct represents a disjunction of unequalities. This is denoted (v, {v1 , v2 , . . . , vn }) |= ϕ∗ . With this we can define: [n Rep(T, ϕ∗ ):= J = v( vi (T )) : n an integer, v valuation over Nullsc , i←1 vi valuation over Nullso and (v, {v1 , . . . , vn }) |= ϕ∗ Example 4. Consider (T, ϕ∗ ), where T is the same as in Example 3 and global condition ϕ∗ :=(⊥o1 6= a ∨ ⊥c2 6= ⊥o1 ). It can be verified that I1 , I2 ∈ Rep(T, ϕ∗ ) and I3 ∈/ Rep(T, ϕ∗ ) because the tuple R(a, a, b, a) in I3 was obtained from valuations v = {⊥c1 /b, ⊥c2 /a} and v1 = {⊥o1 /a} and (v, {v1 }) 6|= ϕ∗ . For the next result, let’s define the notion of safe egds. For a set Σ of s-t tgds and egds we say that it contains safe egds if each variable that occurs more than once in the body of an egd it occurs only in positions different than aff(Σ). The following result shows that for any set Σ of s-t tgds and safe egds there exists an exact representation for the inference-based semantics. Theorem 2. Let Σ be a set of s-t tgds and safe egds. Then one may compute in polynomial time (for a fixed Σ) (T, ϕ∗ ) such that J(I, Σ)Kinf = Rep(T, ϕ∗ ). The pair (T, ϕ∗ ) from the previous theorem is called universal representative for I and Σ. In Theorem 2 the universal representative is computed using a 3-step chase algorithm (for a detailed description please check the full version of the paper). Example 5. Consider Σ = {ξ1 , ξ2 , ξ3 }, where: ξ1 : S(x, y) → K(x, z), V (z, y); ξ2 : R(x) → U (x, y); ξ3 : U (x, y), K(x, z) → y = z. Let instance I be S I = {(a, b), (c, d)} and RI = {(a)}. In this case a universal representative for I and Σ is the pair (T, ϕ∗ ), where K T = {(a, ⊥c1 ), (c, ⊥o1 )}, V T = {(⊥c1 , b), (⊥o1 , d)}, U T = {(a, ⊥c1 )} and ϕ∗ :=(⊥c1 6= ⊥o1 ). 5 Query Answering If in the previous section we showed how we can compute universal represen- tatives for inference-based semantics. in this section we will focus on when and how these representatives can be used to compute certain answers for different query classes and check the complexities of such evaluations. Let us first start by defining the certain-answer evaluation problem. Problem Evalinf (Σ,Q) Input: I ∈ Inst(S) and t̄ ∈ (dom(I))arity(t̄) . Question: Is t̄ ∈ certinf (Q, (I, Σ))? The following theorem ensures that for any mapping Σ of s-t tgds and safe egds the OWA and inference-based semantics agrees on UCQ certain-answers for any source instance. Theorem 3. Let Σ be a set of s-t tgds and safe egds . Then for any source instance I and q ∈ UCQ we have certowa (Q, (I, Σ)) = certinf (Q, (I, Σ)) and one may use the universal representative for I and Σ to compute certinf (Q, (I, Σ)) in polynomial time (for a fixed Σ). The following negative result shows that not all queries are tractable under inference-based semantics. Theorem 4. There exists a set Σ of s-t tgds and safe egds and there exists a query Q ∈ CQ¬ such that the problem Evalinf (Σ,Q) is coNP-complete. From this it follows that for tractable query evaluation under inference-based semantics we need to restrict either the set Σ or the query class used or both. In the last part of this section we will present such restrictions that ensure tractability for certain answers evaluation under inference-based semantics. Proposition 1. Let Σ be a set of full s-t tgds and safe egds. Then for any FO query Q the Evalinf (Σ,Q) is tractable and one may use the universal represen- tative to compute it. With UCQ6=,n is denoted the set of UCQ6= queries with at most n unequalities per disjunct. Theorem 5. Let Σ be a set of s-t tgds and safe egds. Then for any UCQ6=,1 query Q the Evalinf (Σ,Q) problem is tractable and one may use only the univer- sal representative to evaluate the query. And the Evalinf (Σ,Q) for q ∈ UCQ6=,2 is coNP-complete. In [12] Hernich showed that if the mapping is given by a restricted set of s-t tgds (packed s-t tgds), then the certain answers evaluation problem may be answered in polynomial time for universal queries under the GCWA∗ -semantics. Where a universal query is one of the form Q(x̄):=∀ȳ β(x̄, ȳ), with β a quantifier- free FO formula over the target schema. In our next result we show that similar polynomial time can be achieved under the inference-based semantics even with- out any restriction on the s-t tgds and also by adding safe target egds. Theorem 6. Let I be a source instance and Σ a set of s-t tgds and safe egds. Then for any universal query Q, Evalinf (Σ,Q) can be solved in PTIME. Next we will present a subclass of CQ¬ that has tractable query evaluation properties for a restricted class of s-t tgds and safe egds. Let CQ¬,1 denote the subclass of CQ¬ such that each query of this class has exactly one positive atom. Theorem 7. Let Σ be a set of s-t tgds and safe egds, such that for all s-t tgds each existentally quantified variable occurs in only one atom and each egd does not equate two variables both occurring in affected positions. Then for any CQ¬,1 query the Evalinf (Σ, Q) problem is polynomial and can be decided using a universal representative. Intuitively, the restrictions on the mapping language from the previous theo- rem ensure that the universal representative does not have any global condition and it contains only open nulls. 6 Conclusions In this paper we introduced the inference-based semantics for data exchange. We showed that the inference-based semantics solves most of the certain-answers anomalies existing in the existing semantics and that one may compute a univer- sal representative that exactly represents the semantics. For the certain answer semantics it remained an open problem if one can evaluate certain-answers for any CQ¬,1 queries in polynomial time for any Σ and not only for the restricted class of dependencies presented here. As further work we intend to increase the language for this semantics to included target tgds too. References 1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. 2. Serge Abiteboul, Paris C. Kanellakis, and Gösta Grahne. On the representation and querying of sets of possible worlds. In SIGMOD Conference, pages 34–48, 1987. 3. Marcelo Arenas, Pablo Barceló, Ronald Fagin, and Leonid Libkin. Locally con- sistent transformations and query answering in data exchange. In Proceedings of the Twenty-third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 14-16, 2004, Paris, France, pages 229–240, 2004. 4. Marcelo Arenas, Gabriel Diéguez, and Jorge Pérez. Expressiveness and complexity of bidirectional constraints for data exchange. In Proceedings of the 8th Alberto Mendelzon Workshop on Foundations of Data Management, Cartagena de Indias, Colombia, June 4-6, 2014., 2014. 5. Marcelo Arenas, Jorge Pérez, and Juan L. Reutter. Data exchange beyond complete data. In PODS, pages 83–94, 2011. 6. Alin Deutsch, Alan Nash, and Jeffrey B. Remmel. The chase revisited. In PODS, pages 149–158, 2008. 7. Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data ex- change: semantics and query answering. Theor. Comput. Sci., 336(1):89–124, 2005. 8. Ronald Fagin, Phokion G. Kolaitis, and Lucian Popa. Data exchange: getting to the core. ACM Trans. Database Syst., 30(1):174–210, 2005. 9. Gösta Grahne, Ali Moallemi, and Adrian Onet. Recovering exchanged data. In Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS 2015, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 105–116, 2015. 10. Gösta Grahne and Adrian Onet. Representation systems for data exchange. In ICDT, pages 208–221, 2012. 11. André Hernich. Foundations of query answering in relational data exchange. PhD thesis, 2010. 12. André Hernich. Answering non-monotonic queries in relational data exchange. Logical Methods in Computer Science, 7(3), 2011. 13. André Hernich. Computing universal models under guarded tgds. In ICDT, pages 222–235, 2012. 14. André Hernich and Nicole Schweikardt. Cwa-solutions for data exchange settings with target dependencies. In PODS, pages 113–122, 2007. 15. Leonid Libkin. Data exchange and incomplete information. In PODS, pages 60–69, 2006. 16. Jack Minker. On indefinite databases and the closed world assumption. In CADE, pages 292–308, 1982. 17. Christos H. Papadimitriou. Computational complexity. Addison-Wesley, 1994.