Union and Intersection of Schema Mappings Jorge Pérez1, Reinhard Pichler2 , Emanuel Sallinger2 , and Vadim Savenkov2 1 Department of Computer Science, Universidad de Chile 2 Faculty of Informatics, Vienna University of Technology Abstract. Schema mappings have been extensively studied in database research over the past decade – notably in the areas of data exchange and data integration. Recently, the notion of an information transfer order on schema mappings has been introduced to compare the amount of source information that is actually transferred by two mappings. In this paper, we present two new operators: the union and intersection of mappings. The union of two mappings allows us to describe the sum of all information transferred by several mappings. The intersection refers to the common part of information transferred by several mappings. As one of our main results we prove that there exists a large class of mappings (containing the class of source-to-target tuple-generating dependencies) that forms a complete lattice with respect to these two operators. 1 Introduction Schema mappings allow us to describe the relationship between two schemas. As such, schema mappings have been extensively studied in Data Exchange [9] and Data Integration [12]. Bernstein and Melnik [5, 6, 15] have proposed several fundamental operators on schema mappings, with composition [14, 7, 16, 2] and inverse [8, 10, 4, 3] being the most prominent ones. Recently, new concepts have been introduced [1, 10] to compare schema mappings in terms of the amount of source information transferred by the mappings. In this work, we present two new operators on mappings, which we consider as fundamental for studying the information transferred by several mappings. We thus introduce the union and intersection of mappings. The union of two mappings allows us to describe the sum of all information transferred by the mappings. The intersection of two mappings refers to the common part of information transferred by the mappings. Before providing more details on our new schema mapping operators, we re- call the notion of information transfer introduced by Arenas et al. [1]. Intuitively, the authors define a criterion to compare the amount of information that two mappings transfer from source to target. As an example consider the following two mappings given by source-to-target tuple-generating dependencies (st-tgds) between the source schema with one ternary relation A and the target schema with binary relations S and T and a ternary relation R: M1 : A(x, y, z) → ∃u S(x, u) M2 : A(x, y, z) → T (x, y) 129 Intuitively, M2 transfers more information than M1 since the first and the sec- ond component of tuples in A are being transferred to the target under M2 , while only the first component is being transferred under M1 [1]. This intuition was formalized in [1] and several algorithmic properties were studied. In particu- lar the authors show that given two mappings M1 and M2 specified by st-tgds, it can be decided whether M2 transfers more information than M1 [1]. A similar notion of information transfer was proposed by Fagin et al. [10], and it has been shown that both notions coincide for the important case of mappings specified by st-tgds [1, 10]. A possible application of the information transfer notion is in automatic mapping-generation tools [11]. As described in [10], if two possible mappings are automatically generated by different tools, then a plausible criterion to decide which mapping is the better to be used, is to choose the mapping that transfers more information from source to target. But what happens if both tools gener- ate incomparable mappings in terms of information transfer? Then the criterion presented in [1, 10] can no longer be used to decide which mapping to choose. This is one of the questions that motivate our research. To illustrate our new schema mapping operators of intersection and union, consider the following mapping between the same schemas as M1 and M2 : M3 : A(x, y, z) → S(x, z) It can be shown that M2 and M3 are incomparable with respect to the informa- tion that both mappings transfer from source to target. Assume now that M2 and M3 are mappings that have been generated independently by two different tools. Since M2 and M3 are incomparable in terms of the information trans- fer from the source, a conservative approach would be to synthesize from both mappings a new mapping M′ that only transfers the shared source information that is being mapped by both M2 and M3 . Since M2 is transferring the first and the second components of relation A, and M3 is transferring the first and the third components of relation A, then M′ can be defined as a mapping that transfers only the first component: M′ : A(x, y, z) → ∃u T (x, u). In our framework, M′ is the intersection of M2 and M3 . Formally, the inter- section of M2 and M3 is a new mapping N that transfers less information than M2 and M3 and such that any other mapping that transfer less information than M2 and M3 transfers no more information than N . That is, the intersec- tion is the greatest lower bound with respect to the information transfer order. Note that the actual target schema is not important for the information transfer ordering, and thus the intersection is not unique. For instance, one can show that the mapping M1 transfers the same information as M′ and therefore also expresses the intersection of M2 and M3 . We formalize the notion of intersection and study several of its properties. In the above example, computing the intersection was an easy task, but we show that in general, intersecting mappings is not trivial. In fact, we prove that even 2 130 for mappings specified by st-tgds, the intersection may not be expressible in First-Order logic (FO). On the other hand, we prove that Existential Second- Order logic (ESO) suffices to express the intersection of such mappings. The dual operator is the union of schema mappings. Intuitively, the union of two mappings is a new mapping that transfers the sum of all the information transferred by both initial mappings. In our example above, the union of M2 and M3 is the mapping M′ : A(x, y, z) → T (x, y) ∧ S(x, z) Formally, the union of M2 and M3 is a new mapping N that transfers more information than M2 and M3 and such that any other mapping that transfers more information than M2 and M3 also transfers more information than N . That is, the union is the least upper bound with respect to the information transfer order. As for the case of the intersection, dealing with union is not always trivial. For example, one might be tempted to state that the following mapping M′′ is also a union of M2 and M3 : M′′ : A(x, y, z) → R(x, y, z) but it can be shown that M′′ is strictly more informative than M′ and thus does not define the union for M2 and M3 . However, if we are given the func- tional dependencies {A[1] −→ A[2], A[1] −→ A[3]} over the source schema, then M′′ becomes the union of M2 and M3 . We show that, in the absence of source constraints, the union is considerably easier to handle compared with the inter- section. In particular, it can be shown that given mappings specified by a set of st-tgds their union can also be specified by a set of st-tgds. Organisation of the paper and summary of results. In Section 2, we recall some basic notions and results. A conclusion is given in Section 6. The main results of the paper are detailed in Sections 3 – 5: • New operators. In Section 3, we introduce the union and intersection oper- ators of schema mappings and state our main results, namely: for a large class of mappings (containing the class of st-tgds) the union and intersection always exist. More precisely, this class of mappings is the class Rec of all mappings that have a maximum recovery [4] (we recall the definition of maximum recov- ery in Section 2). Our new operators allow us to define a lattice of the mappings in Rec w.r.t. to the information transfer order, s.t. the union (resp. intersec- tion) corresponds to the least upper bound (resp. greatest lower bound) of two mappings. • Existence of the union. In Section 4, we show for the class Rec that the union of two mappings always exists. The proof is constructive in that we describe how to obtain the union. For mappings defined by a set of st-tgds we show that the union can also be expressed by st-tgds. This allows us to prove an NEXPTIME upper bound for checking if some mapping is the union of two other mappings for the case of st-tgds. 3 131 • Existence of the intersection. In Section 5, we show several fundamental results on the existence of the intersection. First, for two mappings from the class Rec, the intersection always exists. However, in general, even for the restricted case of st-tgds, this intersection is not expressible in First-Order logic (FO). On the other hand, in Existential Second-Order logic (ESO) it is always possible to express the intersection of mappings defined by st-tgds. 2 Preliminaries 2.1 Schemas and schema mappings A schema S is a finite set {R1 , . . . , Rk } of relation symbols, with each Ri having a fixed arity ni ≥ 0. Let D be a countably infinite domain. An instance I of S assigns to each relation symbol Ri of S a finite relation RiI ⊆ Dni . Inst(S) denotes the set of all instances of S. We denote by dom(I) the set of all elements that occur in any of the relations RiI . We say that Ri (t) is a fact of I if t ∈ RiI . We sometimes denote an instance by its set of facts. Given schemas S and T, a schema mapping (or just mapping) from S to T is a subset of Inst(S) × Inst(T). We say that J is a solution for I under M whenever (I, J) ∈ M. The set of all solutions for I under M is denoted by SolM (I). For a mapping M from S to T, we denote by dom(M) the set of all instances I ∈ Inst(S) such that SolM (I) 6= ∅. Moreover, M is said to be total if dom(M) = Inst(S). Notice that mappings are binary relations, and thus one can define the com- position of mappings as for the composition of binary relations [15, 7]. Let M12 be a mapping from schema S1 to schema S2 and M23 a mapping from S2 to schema S3 . Then M12 ◦ M23 is a mapping from S1 to S3 given by the set {(I, J) ∈ Inst(S1 ) × Inst(S3 ) | there exists K such that (I, K) ∈ M12 and (K, J) ∈ M23 } [15, 7]. 2.2 Dependencies and definability of mappings Given disjoint schemas S and T, a source-to-target tuple-generating dependency (st-tgd) from S to T is a sentence of the form ∀x̄∀ȳ(ϕ(x̄, ȳ) → ∃z̄ ψ(x̄, z̄)), where ϕ(x̄, ȳ) is a conjunctive query (CQ) over S, and ψ(x̄, z̄) is a CQ over T. The left-hand side of the implication in an st-tgd is called the premise, and the right-hand side the conclusion. A full st-tgd is an st-tgd with no existentially quantified variables in its conclusion. We usually omit the universal quantifiers when writing st-tgds. Suppose that we are given a set Σ of logical formulas over the schemas S and T, e.g., a set of st-tgds from S and T, a set of First-Order formulas or of Existential Second-Order formulas over the schemas S and T. We say that a mapping M is specified by Σ, denoted by M = (S, T, Σ), if for every (I, J) ∈ Inst(S) × Inst(T), we have (I, J) ∈ M if and only if (I, J) satisfies Σ. As is customary in the data exchange literature, we assume the existence of two disjoint sets of elements: constant values C and null values N. Thus, for a mapping defined by st-tgds, we assume that source instances are constructed by using only elements from C, while target instances are constructed by using elements from C ∪ N. 4 132 2.3 Maximum recovery The notion of maximum recovery proposed in [4] is fundamental to our study. It provides a natural notion for inverting schema mappings. In [4] the authors first define recoveries of schema mappings and then restrict them to maximum recoveries. Given a mapping M from S1 to S2 , we say that M′ from S2 to S1 is a recovery of M if for every instance I in S1 , it holds that (I, I) ∈ M ◦ M′ . In symbols, M′ is a recovery of M if Id ⊆ M ◦ M′ , where Id is the identity mapping {(I, I) | I ∈ Inst(S1 )}. Moreover, M′ is said to be a maximum recovery of M if for every other recovery M′′ of M, it holds that M ◦ M′ ⊆ M ◦ M′′ . Intuitively, M′ is a maximum recovery of M if M ◦ M′ is as close as possible to the identity mapping Id. We write Rec to denote the class of all total mappings that admit a maximum recovery. 2.4 Information Transfer In [1] a notion of information transfer for schema mappings was defined to compare the amount of information that two mappings transfer from source to target. Formally, given mappings M1 and M2 with the same source schema S, mapping M1 transfers at least as much source information as M2 , denoted by M2 inf M1 if there exists a mapping N such that M1 ◦ N = M2 [1]. That is, M2 inf M1 if M2 can be constructed from M1 via mapping com- position. Notice that inf is a pre-order, i.e., inf is a reflexive and transitive, but not antisymmetric relation. Thus, we say that M1 and M2 transfer the same information from the source, denoted by M1 ≡inf M2 if M1 inf M2 and M2 inf M1 . By slight abuse of notation we consider inf as an order (rather than a pre-order) by identifying a mapping M with the equivalence class of all mappings ≡inf -equivalent with M. 3 The union and intersection operators Below, we make use of inf to define the union and the intersection of two mappings. Intuitively the union is a mapping that transfers the sum of all the information transferred by the two initial mappings. Analogously, we define the intersection of two mappings as a mapping that transfers only information which is transferred by each of the initial mappings. Definition 1. Let C be a class of mappings and M1 and M2 two mappings in C with the same source schema. The union of M1 and M2 w.r.t. C, is a mapping M ∈ C such that: 1. M1 inf M, 2. M2 inf M, and 3. if N is a mapping in C with M1 inf N and M2 inf N , then M inf N . The union of M1 and M2 w.r.t. C is denoted by M1 ⊔C M2 . Definition 2. Let C be a class of mappings and M1 and M2 two mappings in C with the same source schema. The intersection of M1 and M2 w.r.t. C, is a mapping M ∈ C such that: 5 133 1. M inf M1 , 2. M inf M2 , and 3. if N is a mapping in C with N inf M1 and N inf M2 , then N inf M. The intersection of M1 and M2 is denoted by M1 ⊓C M2 . When the class C is clear from the context, we just write M1 ⊔ M2 (resp. M1 ⊓M2 ) for the union (resp. for the intersection) of two mappings. Notice that the definition of the union of mappings is just the least upper bound (supremum) of M1 and M2 w.r.t. inf (inside the class of mappings C). Analogously, the intersection of mappings is just the greatest lower bound (infimum) of M1 and M2 w.r.t. inf (inside the class C). Also notice that the union and the intersection as defined above are unique up to the equivalence relation ≡inf . This is why we speak of the union resp. intersection of two mappings. Notice that with the definition of union and intersection based on the order inf it is by no means evident that for any two mappings the union or intersection always exists. Thus, a first important question that needs to be answered for these operators is for which classes of mappings the existence of the intersection or the union is guaranteed. As we will show next, the class Rec of mappings having a maximum recovery will play a fundamental role in determining the existence of the union and intersection. Beside existence, there are two other important questions regarding these operators that need to be addressed. One is the question of expressiveness: what is the mapping language needed to express the union/intersection when it exists? Another main question is about computing these operators: is there an algorithm to compute the union/intersection? One of our main results is the following general result that gives a positive answer to the existence question. Theorem 1. There exists a class R of mappings (that contains the class of mappings specified by st-tgds), such that for every pair of mappings M1 and M2 in R having the same source schema, the union M1 ⊔R M2 and the intersection M1 ⊓R M2 always exist. We will show that Rec is such a class R that satisfies the statement of Theorem 1. By using notions of lattice theory, Theorem 1 can be restated as follows. Recall that given an order relation ≤, a lattice is a structure hA, ≤i such that every two elements X, Y ∈ A have a least upper bound (supremum) and a greatest lower bound (infimum) in A. Then Theorem 1 can be formulated as follows. Theorem 2. Let S be a relational schema. There exists a class RS that contains the class of all mappings specified by st-tgds having S as source schema, such that hRS , inf i is a lattice (up to ≡inf -equivalence). Theorem 1 is proved by combining the results in the following sections for union and intersection. Theorem 2 is an immediate consequence of Theorem 1 given the following proposition: 6 134 Proposition 1. The union and intersection of mappings are invariant under ≡inf -equivalence. Formally, let M1 , M′1 , M2 , and M′2 be mappings from some class C with M1 ≡inf M′1 and M2 ≡inf M′2 . Then the following relations hold: (1) If M1 ⊔C M2 exists, then M′1 ⊔C M′2 exists as well and the equivalence M1 ⊔C M2 ≡inf M′1 ⊔C M′2 holds. (2) If M1 ⊓C M2 exists, then M′1 ⊓C M′2 exists as well and the equivalence M1 ⊓C M2 ≡inf M′1 ⊓C M′2 holds. Intuitively, Proposition 1 states that union and intersection of mappings are preserved under ≡inf -equivalence. The proposition follows immediately from the definition of ⊔C and ⊓C . 4 Existence of the union In this section we propose a straightforward method to compute the union of mappings specified by st-tgds (w.r.t. the class of mappings specified by st-tgds). This method will allow us to provide a positive answer to all of the questions concerning the existence, expressiveness, and computation of the union for this class of mappings. In contrast, we will show in Section 5 that dealing with the intersection operator is considerably more difficult. The procedure to compute the union is very simple. Let M1 = (S, T1 , Σ1 ) and M2 = (S, T2 , Σ2 ) be two mappings specified by st-tgds. Let T b 2 be a copy b b of T2 such that T2 is disjoint with T1 , and let Σ2 be the set of dependencies that results from Σ2 by replacing every relation name in T2 by its copy in T b 2. ′ b b ′ Consider the mapping M = (S, T1 ∪ T2 , Σ1 ∪ Σ2 ). Then M is the union of M1 and M2 . From this we obtain the following result. Proposition 2. Let S be the class of mappings specified by st-tgds, and M1 = (S, T1 , Σ1 ) and M2 = (S, T2 , Σ2 ) be mappings such that Σ1 and Σ2 are sets of st-tgds. Then the union M1 ⊔S M2 always exists. Moreover, there exists an algorithm which, given M1 and M2 , computes M1 ⊔S M2 in polynomial time. Proposition 2 follows from a more general result on the existence of the union for a class of mappings that properly contains the class of mappings specified by st-tgds. Recall from Section 2.3 that we write Rec to denote the class of all total mappings that admit a maximum recovery. It was shown in [4] that every mapping specified by st-tgds is total and has a maximum recovery, and thus Rec contains the class of all mappings specified by st-tgds. The following is the general result for the union operator w.r.t. the class Rec. Proposition 3. For every pair of mappings M1 and M2 in Rec having the same source schema, the union M1 ⊔Rec M2 exists. Proof (sketch). Let T1 and T2 be disjoint schemas, M1 a mapping in Rec from S to T1 , and M2 a mapping in Rec from S to T2 . Consider the mapping M1 ⊕ M2 from S to T1 ∪ T2 defined as follows: M1 ⊕ M2 = {(I, J1 ∪ J2 ) | (I, J1 ) ∈ M1 and (I, J2 ) ∈ M2 }. 7 135 It can be shown that M1 ⊕M2 is the union of M1 and M2 w.r.t. Rec. If T1 and b 2 of T2 that is disjoint T2 are not disjoint, one can always construct a copy T c b with T1 , and a mapping M2 from S to T2 such that M2 ≡inf M c2 , and then M1 ⊕ M c2 is the desired union. ⊓ ⊔ The proof of Proposition 2 follows from the proof of Proposition 3 plus the fact that if M1 and M2 are specified by st-tgds, then M1 ⊕ M2 can also be specified by a set of st-tgds. The following example shows that computing the union is extremely easy for the case of mappings specified by st-tgds. Example 1. Let mappings M1 and M2 be defined by the following sets of st- tgds: M1 = {S(x, y) → T (x, y)}; M2 = {S(x, y) → T (x, y), Q(x) → T (x, x)}. The union M1 ⊔Rec M2 is simply the mapping M containing all three depen- dencies, with appropriately renamed target relation symbols: M = {S(x, y) → T (x, y), S(x, y) → T ′ (x, y), Q(x) → T ′ (x, x)} Proposition 2 also allows us to prove positive algorithmic results regarding the union of schema mappings. The following result follows directly from Propo- sition 2 and the results in [1] regarding the order inf . Proposition 4. Given mappings M1 , M2 , and M3 specified by st-tgds, it is decidable (in NEXPTIME) whether M3 is the union of M1 and M2 . A legitimate question, of course, is if the characterization in the proof of Proposition 3 also works outside Rec. We have to leave this as an open question for future research. The following proposition shows that this is a tricky problem which may even require some adaptation of the inf relation. Proposition 5. There exists a mapping M such that M ≺inf M ⊕ M (that is M inf M ⊕ M and M ⊕ M inf M). In other words, the inf order displays an unexpected behaviour for mappings outside Rec: intuitively, one would expect that the amount of source information transferred remains unchanged if, for every source instance I, we combine all pairs of solutions of I. For mappings in Rec, this is of course the case. In contrast, by Proposition 5, there are mappings outside Rec such that the amount of source information transferred strictly increases by this simple syntactic trick. 5 Existence of the intersection In this section we study the existence of the intersection. The main result is stated in the following theorem. Again, we use Rec to denote the class of total mappings that have a maximum recovery. Theorem 3. For every pair of mappings M1 and M2 in Rec having the same source schema, the intersection M1 ⊓Rec M2 exists. 8 136 Proof (sketch). To describe the proof of the theorem we need to introduce some technical notions. Let S be a schema and consider a mapping M from S to S (that is M ⊆ Inst(S) × Inst(S)). For a positive integer k, we define Mk recursively as follows: M1 = M, Mk+1 = M ◦ Mk . We shall also define M+ as the following mapping from S to S: ∞ [ M =+ Mi . i=1 Notice that M+ is the transitive closure of M when it is viewed as a binary relation over Inst(S). Now consider mappings M1 and M2 in the statement of the theorem. Given that M1 and M2 are mappings in Rec, we know that there exist mappings M′1 and M′2 such that M′1 is a maximum recovery of M1 , and M′2 is a maximum recovery of M2 . Now consider the mapping M given by  + M = (M1 ◦ M′1 ) ∪ (M2 ◦ M′2 ) . It can be shown that M is the intersection M1 ⊓Rec M2 . ⊓ ⊔ Since every mapping given by st-tgds is total and has a maximum recovery [4], from Theorem 3 we obtain that for mappings specified by st-tgds the intersection (w.r.t. the class Rec) always exists. Notice that Theorem 3 is only about existence and says nothing about the language needed to express the intersection of mappings specified by st-tgds. The following result shows that as opposed to the case of the union operator, the intersection of mappings specified by st-tgds may not be expressible in First- Order logic (FO). Theorem 4. There exist mappings M1 and M2 specified by st-tgds such that M1 ⊓Rec M2 cannot be specified by a set of FO sentences. Proof (sketch). Consider the schemas S = {A(·, ·), B(·, ·)}, T1 = {T1 (·, ·)}, and T2 = {T2 (·, ·)}. In the proof we use mappings M1 and M2 from S to T1 and from S to T2 , respectively, specified by the following st-tgds: M1 : ∃u (A(x, u) ∧ B(u, y)) → T1 (x, y) M2 : ∃u (B(x, u) ∧ A(u, y)) → T2 (x, y) It can be shown by an argument based on Ehrenfeucht-Fraı̈ssé games that M1 ⊓Rec M2 cannot be expressed by an FO sentence. ⊓ ⊔ 9 137 Notice that the above theorem states that M1 ⊓Rec M2 is not expressible in FO. In principle, it might be the case that if we restrict ourselves to the intersection with respect to a smaller class of mappings, for example the class of mappings specified by st-tgds, then we could obtain better expressibility re- sults. The following proposition shows a negative result in this respect. This is a corollary of the proof of Theorem 4. Proposition 6. Let S be the class of mappings specified by st-tgds. Then there are mappings M1 and M2 in S such that M1 ⊓S M2 does not exist. This raises the question as to which language is expressive enough to specify the intersection of two mappings specified by st-tgds. Below we provide an answer to this question. Theorem 5. Given two schema mappings M1 and M2 given by st-tgds, the intersection M1 ⊓Rec M2 is expressible by an Existential Second-Order logic (ESO) formula. b a copy of Proof (sketch). Let S be the source schema of M1 and M2 , and S ′ ′ schema S. Moreover, let M1 and M2 be respective maximum recoveries of M1 and M2 . One can show that the composition M1 ◦ M′1 can be expressed as an FO formula: ∀x1 (ϕ1 (x1 ) → ψ1 (x1 )) ∧ . . . ∧ ∀xn (ϕn (xn ) → ψn (xn )) where ϕi (xi ) and ψi (xi ) are FO formulas over S and S,b respectively. Similarly, M2 ◦ M′2 can be expressed as ∀y 1 (α1 (y 1 ) → β1 (y 1 )) ∧ . . . ∧ ∀y m (αm (y m ) → βm (y m )). The formula representing the intersection is based on the construction in the proof of Theorem 3, thus we need to show how to express the transitive closure of M1 ◦ M′1 ∪ M2 ◦ M′2 . For this we use an intermediate schema S e constructed as follows: for every n-ary relation R of S, we include an (n + 1)-ary relation Re in S. e The idea is that an atom R(a, e g) will represent the atom R(a) in the generation g of the computation of the transitive closure. Now, to define the intersection, we use an ESO formula of the form e ∃s ∃ zero (Ωs ∧ Ω e ∧ Ω E ) ∃S R Re where ∃S e denotes an existential quantification over all relation symbols in S, e s is a function symbol, and zero a first order variable. The rest of the formulas is constructed as follows: Ωs is the formula ∀x∀y (s(x) = s(y) → x = y) ∧ ¬(s(x) = x) ∧ ¬(s(x) = zero) that defines a successor function, with zero as the first element; ΩRe corresponds to the following FO formula (we assume S = {R1 , . . . , Rk } and z i is a tuple of variables of the same arity as Ri ):  ∀z 1 (R1 (z 1 ) → R e1 (z 1 , zero)) ∧ · · · ∧ ∀z k (Rk (zk ) → R ek (z k , zero)) ∧ ^ n m ^      ∀g ∀xi ϕ ei (xi , g) → ψei (xi , s(g)) ∨ ∀y i αei (y i , g) → βei (yi , s(g)) i=1 i=1 10 138 where ϕ ei (xi , g) is obtained from ϕi (xi ) by replacing every relational symbol R(z) e g), and ψei (xi , s(g)) is obtained from ψi (xi ) by replacing every relational by R(z, b symbol R(z) e s(g)), and similarly for α by R(z, ei and βei . The intuition is that the e first line initializes the relations Ri at generation 0, and the second line mimics a formula representing (M1 ◦ M′1 ∪ M2 ◦ M′2 )+ over schema S. e Finally, Ω E just e R extracts the target relations at some generation g of the transitive closure:  ∃g ∀z 1 (R e1 (z 1 , g) → R b1 (z 1 )) ∧ · · · ∧ ∀z k (R ek (z k , g) → R bk (z k )) . ⊓ ⊔ Example 2. Recall the mappings M1 and M2 from Example 1. Composed with their maximum recoveries, they have the following form (see [4]): b 1 , x2 )} and M1 ◦ M′1 = {S(x1 , x2 ) → S(x b 1 , x2 ) ∨ (x1 = x2 ∧ Q(x M2 ◦ M′2 = {S(x1 , x2 ) → S(x b 1 )), b b Q(x1 ) → S(x1 , x1 ) ∨ Q(x1 )}. The intersection M1 ⊓Rec M2 is expressed by the following ESO formula:   e e ∃S∃Q∃s∃ zero ∀x∀y (s(x) = s(y) → x = y) ∧ s(x) 6= x ∧ s(x) 6= zero ∧   e e ∀x1 ∀x2 S(x1 , x2 ) → S(x1 , x2 , zero) ∧ ∀x Q(x) → Q(x, zero) ∧  ∀g ∀x1 ∀x2 S(x e 1 , x2 , g) → S(xe 1 , x2 , s(g)) ∨   e 1 , x2 , g) → S(x ∀x1 ∀x2 S(x e 1 , x2 , s(g)) ∨ (x1 = x2 ∧ Q(xe 1 , s(g))) ∧   e 1 , g) → S(x ∀x1 ∀x2 Q(x e 1 , x1 , s(g)) ∨ Q(x e 1 , s(g)) ∧   e 1 , x2 , g ′ ) → S(x ∃g ′ ∀x1 ∀x2 (S(x b 1 , x2 )) ∧ ∀x(Q(x, e g ′ ) → Q(x)) b 6 Conclusion In this work, we have introduced two new operators union and intersection on schema mappings. We have proved that these operators allow us to define a lattice w.r.t. the order inf (up to ≡inf -equivalence) for the mappings in Rec (i.e., mappings having a maximum recovery). In particular, we have shown that the union and intersection always exist for mappings in Rec. When restricting us to the simple case of mappings specified by st-tgds it has turned out that the intersection operator is considerably more difficult to handle than the union operator. More specifically, the union of two mappings specified by st-tgds can again be specified by a set of st-tgds. In contrast, First-Order logic (FO) does in general not suffice to express the intersection of such mappings. A lot of interesting research questions have been left for future work. First of all, while our definitions of ⊔ and ⊓ are applicable to arbitrary mappings, we have restricted ourselves to the mappings in Rec for investigating the questions of existence, expressiveness, and computability of ⊔ and ⊓. We would like to extend 11 139 this study to arbitrary mappings. As has been illustrated in Proposition 5, such an extension may even require an adaptation of the inf -relation. Recall that in Theorem 5 we have shown that ESO is expressive enough to specify the intersection of two mappings given by sets of st-tgds. Further analysis is required to determine if a smaller fragment of ESO would also suffice. In addition it would be interesting to identify restrictions on the st-tgds so that the intersection of mappings specified by such st-tgds is FO-expressible. We would also like to extend our study to further set operators on schema mappings. Above all, we would like to study the complement of a mapping M (i.e., a mapping that transfers all the source information not transferred by M) and, more generally, the set difference of two mappings M and N (i.e., a mapping that transfers all the source information that is transferred by M but not by N ). Overall, we think that the union and intersection operators can be crucial in not only defining operators such as difference and complement, but in laying the foundation to a framework of similar operators on schema mappings. Given mappings from a source schema S to a target schema T, an interesting question is if the result of the union or intersection can be also specified as a mapping from S to T. Our definitions do not force the same target schema to be used when constructing a union or intersection, but we consider this setting to be the most common in practice. We leave the investigation of this case as a yet another item in the future research agenda. Acknowledgements This work has been funded in part by Marie Curie action IRSES under Grant No. 24761 (Net2), and by the Vienna Science and Technology Fund (WWTF) through project ICT08-032. Jorge Pérez has been supported by Fondecyt grant 11110404 and by VID grant U-Inicia 11/04 Universidad de Chile. References 1. M. Arenas, J. Pérez, J. L. Reutter, and C. Riveros. Foundations of schema mapping management. In Proc. PODS, pages 227–238, 2010. 2. Marcelo Arenas, Ronald Fagin, and Alan Nash. Composition with target con- straints. In Proc. ICDT’10, ACM International Conference Proceeding Series, pages 129–142. ACM, 2010. 3. Marcelo Arenas, Jorge Pérez, Juan L. Reutter, and Cristian Riveros. Composition and inversion of schema mappings. SIGMOD Record, 38(3):17–28, 2009. 4. Marcelo Arenas, Jorge Pérez, and Cristian Riveros. The recovery of a schema mapping: Bringing exchanged data back. ACM Trans. Database Syst., 34(4), 2009. 5. P. Bernstein. Applying model management to classical meta data problems. In Proc. CIDR, 2003. 6. P. Bernstein and S. Melnik. Model management 2.0: manipulating richer mappings. In Proc. SIGMOD, pages 1–12, 2007. 7. R. Fagin, P. G. Kolaitis, L. Popa, and W.-C. Tan. Composing schema mappings: Second-order dependencies to the rescue. TODS, 30(4):994–1055, 2005. 8. Ronald Fagin. Inverting schema mappings. ACM Trans. Database Syst., 32(4), 2007. 9. Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data ex- change: semantics and query answering. Theor. Comput. Sci., 336(1):89–124, 2005. 12 140 10. Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang Chiew Tan. Reverse data exchange: Coping with nulls. ACM Trans. Database Syst., 36(2):11, 2011. 11. L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio grows up: from research prototype to industrial tool. In Proc. SIGMOD, pages 805–810, 2005. 12. Maurizio Lenzerini. Data integration: A theoretical perspective. In Proc. PODS, pages 233–246. ACM, 2002. 13. Leonid Libkin. Elements of Finite Model Theory. Springer, 2004. 14. Jayant Madhavan and Alon Y. Halevy. Composing mappings among data sources. In Proc. VLDB’03, pages 572–583, 2003. 15. Sergey Melnik. Generic Model Management: concepts and Algorithms, volume 2967 of LNCS. Springer, 2004. 16. Alan Nash, Philip A. Bernstein, and Sergey Melnik. Composition of mappings given by embedded dependencies. ACM Trans. Database Syst., 32(1):4, 2007. 17. Jorge Pérez. Schema Mapping Management in Data Exchange Systems. PhD thesis, Escuela de Ingenierı́a, Pontificia Universidad Católica de Chile, 2011. 13 141