Union and Intersection of Schema Mappings

    Jorge Pérez1, Reinhard Pichler2 , Emanuel Sallinger2 , and Vadim Savenkov2
               1
                   Department of Computer Science, Universidad de Chile
              2
                   Faculty of Informatics, Vienna University of Technology


        Abstract. Schema mappings have been extensively studied in database
        research over the past decade – notably in the areas of data exchange
        and data integration. Recently, the notion of an information transfer
        order on schema mappings has been introduced to compare the amount
        of source information that is actually transferred by two mappings. In
        this paper, we present two new operators: the union and intersection of
        mappings. The union of two mappings allows us to describe the sum of all
        information transferred by several mappings. The intersection refers to
        the common part of information transferred by several mappings. As one
        of our main results we prove that there exists a large class of mappings
        (containing the class of source-to-target tuple-generating dependencies)
        that forms a complete lattice with respect to these two operators.


1     Introduction

Schema mappings allow us to describe the relationship between two schemas.
As such, schema mappings have been extensively studied in Data Exchange [9]
and Data Integration [12]. Bernstein and Melnik [5, 6, 15] have proposed several
fundamental operators on schema mappings, with composition [14, 7, 16, 2] and
inverse [8, 10, 4, 3] being the most prominent ones. Recently, new concepts have
been introduced [1, 10] to compare schema mappings in terms of the amount
of source information transferred by the mappings. In this work, we present
two new operators on mappings, which we consider as fundamental for studying
the information transferred by several mappings. We thus introduce the union
and intersection of mappings. The union of two mappings allows us to describe
the sum of all information transferred by the mappings. The intersection of two
mappings refers to the common part of information transferred by the mappings.
    Before providing more details on our new schema mapping operators, we re-
call the notion of information transfer introduced by Arenas et al. [1]. Intuitively,
the authors define a criterion to compare the amount of information that two
mappings transfer from source to target. As an example consider the following
two mappings given by source-to-target tuple-generating dependencies (st-tgds)
between the source schema with one ternary relation A and the target schema
with binary relations S and T and a ternary relation R:

                              M1 : A(x, y, z) → ∃u S(x, u)
                              M2 : A(x, y, z) → T (x, y)


                                          129
Intuitively, M2 transfers more information than M1 since the first and the sec-
ond component of tuples in A are being transferred to the target under M2 ,
while only the first component is being transferred under M1 [1]. This intuition
was formalized in [1] and several algorithmic properties were studied. In particu-
lar the authors show that given two mappings M1 and M2 specified by st-tgds,
it can be decided whether M2 transfers more information than M1 [1]. A similar
notion of information transfer was proposed by Fagin et al. [10], and it has been
shown that both notions coincide for the important case of mappings specified
by st-tgds [1, 10].
    A possible application of the information transfer notion is in automatic
mapping-generation tools [11]. As described in [10], if two possible mappings are
automatically generated by different tools, then a plausible criterion to decide
which mapping is the better to be used, is to choose the mapping that transfers
more information from source to target. But what happens if both tools gener-
ate incomparable mappings in terms of information transfer? Then the criterion
presented in [1, 10] can no longer be used to decide which mapping to choose.
This is one of the questions that motivate our research.
    To illustrate our new schema mapping operators of intersection and union,
consider the following mapping between the same schemas as M1 and M2 :

                            M3 : A(x, y, z) → S(x, z)

It can be shown that M2 and M3 are incomparable with respect to the informa-
tion that both mappings transfer from source to target. Assume now that M2
and M3 are mappings that have been generated independently by two different
tools. Since M2 and M3 are incomparable in terms of the information trans-
fer from the source, a conservative approach would be to synthesize from both
mappings a new mapping M′ that only transfers the shared source information
that is being mapped by both M2 and M3 . Since M2 is transferring the first
and the second components of relation A, and M3 is transferring the first and
the third components of relation A, then M′ can be defined as a mapping that
transfers only the first component:

                          M′ : A(x, y, z) → ∃u T (x, u).

In our framework, M′ is the intersection of M2 and M3 . Formally, the inter-
section of M2 and M3 is a new mapping N that transfers less information than
M2 and M3 and such that any other mapping that transfer less information
than M2 and M3 transfers no more information than N . That is, the intersec-
tion is the greatest lower bound with respect to the information transfer order.
Note that the actual target schema is not important for the information transfer
ordering, and thus the intersection is not unique. For instance, one can show
that the mapping M1 transfers the same information as M′ and therefore also
expresses the intersection of M2 and M3 .
    We formalize the notion of intersection and study several of its properties. In
the above example, computing the intersection was an easy task, but we show
that in general, intersecting mappings is not trivial. In fact, we prove that even

                                        2


                                      130
for mappings specified by st-tgds, the intersection may not be expressible in
First-Order logic (FO). On the other hand, we prove that Existential Second-
Order logic (ESO) suffices to express the intersection of such mappings.
    The dual operator is the union of schema mappings. Intuitively, the union
of two mappings is a new mapping that transfers the sum of all the information
transferred by both initial mappings. In our example above, the union of M2
and M3 is the mapping

                      M′ : A(x, y, z) → T (x, y) ∧ S(x, z)

Formally, the union of M2 and M3 is a new mapping N that transfers more
information than M2 and M3 and such that any other mapping that transfers
more information than M2 and M3 also transfers more information than N .
That is, the union is the least upper bound with respect to the information
transfer order. As for the case of the intersection, dealing with union is not
always trivial. For example, one might be tempted to state that the following
mapping M′′ is also a union of M2 and M3 :

                          M′′ : A(x, y, z) → R(x, y, z)

but it can be shown that M′′ is strictly more informative than M′ and thus
does not define the union for M2 and M3 . However, if we are given the func-
tional dependencies {A[1] −→ A[2], A[1] −→ A[3]} over the source schema, then
M′′ becomes the union of M2 and M3 . We show that, in the absence of source
constraints, the union is considerably easier to handle compared with the inter-
section. In particular, it can be shown that given mappings specified by a set of
st-tgds their union can also be specified by a set of st-tgds.
Organisation of the paper and summary of results. In Section 2, we
recall some basic notions and results. A conclusion is given in Section 6. The
main results of the paper are detailed in Sections 3 – 5:
• New operators. In Section 3, we introduce the union and intersection oper-
ators of schema mappings and state our main results, namely: for a large class
of mappings (containing the class of st-tgds) the union and intersection always
exist. More precisely, this class of mappings is the class Rec of all mappings
that have a maximum recovery [4] (we recall the definition of maximum recov-
ery in Section 2). Our new operators allow us to define a lattice of the mappings
in Rec w.r.t. to the information transfer order, s.t. the union (resp. intersec-
tion) corresponds to the least upper bound (resp. greatest lower bound) of two
mappings.
• Existence of the union. In Section 4, we show for the class Rec that the union
of two mappings always exists. The proof is constructive in that we describe how
to obtain the union. For mappings defined by a set of st-tgds we show that the
union can also be expressed by st-tgds. This allows us to prove an NEXPTIME
upper bound for checking if some mapping is the union of two other mappings
for the case of st-tgds.

                                       3


                                     131
• Existence of the intersection. In Section 5, we show several fundamental results
on the existence of the intersection. First, for two mappings from the class Rec,
the intersection always exists. However, in general, even for the restricted case
of st-tgds, this intersection is not expressible in First-Order logic (FO). On the
other hand, in Existential Second-Order logic (ESO) it is always possible to
express the intersection of mappings defined by st-tgds.


2     Preliminaries
2.1    Schemas and schema mappings

A schema S is a finite set {R1 , . . . , Rk } of relation symbols, with each Ri having
a fixed arity ni ≥ 0. Let D be a countably infinite domain. An instance I of
S assigns to each relation symbol Ri of S a finite relation RiI ⊆ Dni . Inst(S)
denotes the set of all instances of S. We denote by dom(I) the set of all elements
that occur in any of the relations RiI . We say that Ri (t) is a fact of I if t ∈ RiI .
We sometimes denote an instance by its set of facts.
    Given schemas S and T, a schema mapping (or just mapping) from S to T
is a subset of Inst(S) × Inst(T). We say that J is a solution for I under M
whenever (I, J) ∈ M. The set of all solutions for I under M is denoted by
SolM (I). For a mapping M from S to T, we denote by dom(M) the set of all
instances I ∈ Inst(S) such that SolM (I) 6= ∅. Moreover, M is said to be total if
dom(M) = Inst(S).
    Notice that mappings are binary relations, and thus one can define the com-
position of mappings as for the composition of binary relations [15, 7]. Let M12
be a mapping from schema S1 to schema S2 and M23 a mapping from S2 to
schema S3 . Then M12 ◦ M23 is a mapping from S1 to S3 given by the set
{(I, J) ∈ Inst(S1 ) × Inst(S3 ) | there exists K such that (I, K) ∈ M12 and
(K, J) ∈ M23 } [15, 7].
2.2    Dependencies and definability of mappings
Given disjoint schemas S and T, a source-to-target tuple-generating dependency
(st-tgd) from S to T is a sentence of the form ∀x̄∀ȳ(ϕ(x̄, ȳ) → ∃z̄ ψ(x̄, z̄)),
where ϕ(x̄, ȳ) is a conjunctive query (CQ) over S, and ψ(x̄, z̄) is a CQ over T.
The left-hand side of the implication in an st-tgd is called the premise, and the
right-hand side the conclusion. A full st-tgd is an st-tgd with no existentially
quantified variables in its conclusion. We usually omit the universal quantifiers
when writing st-tgds. Suppose that we are given a set Σ of logical formulas over
the schemas S and T, e.g., a set of st-tgds from S and T, a set of First-Order
formulas or of Existential Second-Order formulas over the schemas S and T. We
say that a mapping M is specified by Σ, denoted by M = (S, T, Σ), if for every
(I, J) ∈ Inst(S) × Inst(T), we have (I, J) ∈ M if and only if (I, J) satisfies Σ.
    As is customary in the data exchange literature, we assume the existence of
two disjoint sets of elements: constant values C and null values N. Thus, for
a mapping defined by st-tgds, we assume that source instances are constructed
by using only elements from C, while target instances are constructed by using
elements from C ∪ N.
                                          4


                                        132
2.3 Maximum recovery
The notion of maximum recovery proposed in [4] is fundamental to our study.
It provides a natural notion for inverting schema mappings. In [4] the authors
first define recoveries of schema mappings and then restrict them to maximum
recoveries. Given a mapping M from S1 to S2 , we say that M′ from S2 to S1
is a recovery of M if for every instance I in S1 , it holds that (I, I) ∈ M ◦ M′ .
In symbols, M′ is a recovery of M if Id ⊆ M ◦ M′ , where Id is the identity
mapping {(I, I) | I ∈ Inst(S1 )}. Moreover, M′ is said to be a maximum recovery
of M if for every other recovery M′′ of M, it holds that M ◦ M′ ⊆ M ◦ M′′ .
Intuitively, M′ is a maximum recovery of M if M ◦ M′ is as close as possible to
the identity mapping Id. We write Rec to denote the class of all total mappings
that admit a maximum recovery.
2.4 Information Transfer
In [1] a notion of information transfer for schema mappings was defined to
compare the amount of information that two mappings transfer from source to
target. Formally, given mappings M1 and M2 with the same source schema
S, mapping M1 transfers at least as much source information as M2 , denoted
by M2 inf M1 if there exists a mapping N such that M1 ◦ N = M2 [1].
That is, M2 inf M1 if M2 can be constructed from M1 via mapping com-
position. Notice that inf is a pre-order, i.e., inf is a reflexive and transitive,
but not antisymmetric relation. Thus, we say that M1 and M2 transfer the
same information from the source, denoted by M1 ≡inf M2 if M1 inf M2 and
M2 inf M1 . By slight abuse of notation we consider inf as an order (rather
than a pre-order) by identifying a mapping M with the equivalence class of all
mappings ≡inf -equivalent with M.


3   The union and intersection operators
Below, we make use of inf to define the union and the intersection of two
mappings. Intuitively the union is a mapping that transfers the sum of all the
information transferred by the two initial mappings. Analogously, we define the
intersection of two mappings as a mapping that transfers only information which
is transferred by each of the initial mappings.
Definition 1. Let C be a class of mappings and M1 and M2 two mappings in C
with the same source schema. The union of M1 and M2 w.r.t. C, is a mapping
M ∈ C such that:
1. M1 inf M,
2. M2 inf M, and
3. if N is a mapping in C with M1 inf N and M2 inf N , then M inf N .
The union of M1 and M2 w.r.t. C is denoted by M1 ⊔C M2 .
Definition 2. Let C be a class of mappings and M1 and M2 two mappings in
C with the same source schema. The intersection of M1 and M2 w.r.t. C, is a
mapping M ∈ C such that:

                                         5


                                       133
1. M inf M1 ,
2. M inf M2 , and
3. if N is a mapping in C with N inf M1 and N inf M2 , then N inf M.

The intersection of M1 and M2 is denoted by M1 ⊓C M2 .

    When the class C is clear from the context, we just write M1 ⊔ M2 (resp.
M1 ⊓M2 ) for the union (resp. for the intersection) of two mappings. Notice that
the definition of the union of mappings is just the least upper bound (supremum)
of M1 and M2 w.r.t. inf (inside the class of mappings C). Analogously, the
intersection of mappings is just the greatest lower bound (infimum) of M1 and
M2 w.r.t. inf (inside the class C). Also notice that the union and the intersection
as defined above are unique up to the equivalence relation ≡inf . This is why we
speak of the union resp. intersection of two mappings.
    Notice that with the definition of union and intersection based on the order
inf it is by no means evident that for any two mappings the union or intersection
always exists. Thus, a first important question that needs to be answered for
these operators is for which classes of mappings the existence of the intersection
or the union is guaranteed. As we will show next, the class Rec of mappings
having a maximum recovery will play a fundamental role in determining the
existence of the union and intersection.
    Beside existence, there are two other important questions regarding these
operators that need to be addressed. One is the question of expressiveness: what
is the mapping language needed to express the union/intersection when it exists?
Another main question is about computing these operators: is there an algorithm
to compute the union/intersection? One of our main results is the following
general result that gives a positive answer to the existence question.

Theorem 1. There exists a class R of mappings (that contains the class of
mappings specified by st-tgds), such that for every pair of mappings M1 and M2
in R having the same source schema, the union M1 ⊔R M2 and the intersection
M1 ⊓R M2 always exist.

    We will show that Rec is such a class R that satisfies the statement of
Theorem 1. By using notions of lattice theory, Theorem 1 can be restated as
follows. Recall that given an order relation ≤, a lattice is a structure hA, ≤i such
that every two elements X, Y ∈ A have a least upper bound (supremum) and
a greatest lower bound (infimum) in A. Then Theorem 1 can be formulated as
follows.

Theorem 2. Let S be a relational schema. There exists a class RS that contains
the class of all mappings specified by st-tgds having S as source schema, such
that hRS , inf i is a lattice (up to ≡inf -equivalence).

   Theorem 1 is proved by combining the results in the following sections for
union and intersection. Theorem 2 is an immediate consequence of Theorem 1
given the following proposition:

                                         6


                                       134
Proposition 1. The union and intersection of mappings are invariant under
≡inf -equivalence. Formally, let M1 , M′1 , M2 , and M′2 be mappings from some
class C with M1 ≡inf M′1 and M2 ≡inf M′2 . Then the following relations hold:
(1) If M1 ⊔C M2 exists, then M′1 ⊔C M′2 exists as well and the equivalence
M1 ⊔C M2 ≡inf M′1 ⊔C M′2 holds.
(2) If M1 ⊓C M2 exists, then M′1 ⊓C M′2 exists as well and the equivalence
M1 ⊓C M2 ≡inf M′1 ⊓C M′2 holds.
    Intuitively, Proposition 1 states that union and intersection of mappings are
preserved under ≡inf -equivalence. The proposition follows immediately from the
definition of ⊔C and ⊓C .

4   Existence of the union
In this section we propose a straightforward method to compute the union of
mappings specified by st-tgds (w.r.t. the class of mappings specified by st-tgds).
This method will allow us to provide a positive answer to all of the questions
concerning the existence, expressiveness, and computation of the union for this
class of mappings. In contrast, we will show in Section 5 that dealing with the
intersection operator is considerably more difficult.
    The procedure to compute the union is very simple. Let M1 = (S, T1 , Σ1 )
and M2 = (S, T2 , Σ2 ) be two mappings specified by st-tgds. Let T  b 2 be a copy
                 b                                 b
of T2 such that T2 is disjoint with T1 , and let Σ2 be the set of dependencies
that results from Σ2 by replacing every relation name in T2 by its copy in T  b 2.
                           ′             b         b            ′
Consider the mapping M = (S, T1 ∪ T2 , Σ1 ∪ Σ2 ). Then M is the union of
M1 and M2 . From this we obtain the following result.
Proposition 2. Let S be the class of mappings specified by st-tgds, and M1 =
(S, T1 , Σ1 ) and M2 = (S, T2 , Σ2 ) be mappings such that Σ1 and Σ2 are sets
of st-tgds. Then the union M1 ⊔S M2 always exists. Moreover, there exists an
algorithm which, given M1 and M2 , computes M1 ⊔S M2 in polynomial time.
    Proposition 2 follows from a more general result on the existence of the union
for a class of mappings that properly contains the class of mappings specified
by st-tgds. Recall from Section 2.3 that we write Rec to denote the class of all
total mappings that admit a maximum recovery. It was shown in [4] that every
mapping specified by st-tgds is total and has a maximum recovery, and thus
Rec contains the class of all mappings specified by st-tgds. The following is the
general result for the union operator w.r.t. the class Rec.
Proposition 3. For every pair of mappings M1 and M2 in Rec having the
same source schema, the union M1 ⊔Rec M2 exists.
Proof (sketch). Let T1 and T2 be disjoint schemas, M1 a mapping in Rec
from S to T1 , and M2 a mapping in Rec from S to T2 . Consider the mapping
M1 ⊕ M2 from S to T1 ∪ T2 defined as follows:
         M1 ⊕ M2 = {(I, J1 ∪ J2 ) | (I, J1 ) ∈ M1 and (I, J2 ) ∈ M2 }.

                                        7


                                      135
It can be shown that M1 ⊕M2 is the union of M1 and M2 w.r.t. Rec. If T1 and
                                                     b 2 of T2 that is disjoint
T2 are not disjoint, one can always construct a copy T
                           c             b
with T1 , and a mapping M2 from S to T2 such that M2 ≡inf M     c2 , and then
M1 ⊕ M  c2 is the desired union.                                             ⊓
                                                                             ⊔

    The proof of Proposition 2 follows from the proof of Proposition 3 plus the
fact that if M1 and M2 are specified by st-tgds, then M1 ⊕ M2 can also be
specified by a set of st-tgds. The following example shows that computing the
union is extremely easy for the case of mappings specified by st-tgds.
Example 1. Let mappings M1 and M2 be defined by the following sets of st-
tgds: M1 = {S(x, y) → T (x, y)}; M2 = {S(x, y) → T (x, y), Q(x) → T (x, x)}.
The union M1 ⊔Rec M2 is simply the mapping M containing all three depen-
dencies, with appropriately renamed target relation symbols:

        M = {S(x, y) → T (x, y), S(x, y) → T ′ (x, y), Q(x) → T ′ (x, x)}

    Proposition 2 also allows us to prove positive algorithmic results regarding
the union of schema mappings. The following result follows directly from Propo-
sition 2 and the results in [1] regarding the order inf .

Proposition 4. Given mappings M1 , M2 , and M3 specified by st-tgds, it is
decidable (in NEXPTIME) whether M3 is the union of M1 and M2 .

    A legitimate question, of course, is if the characterization in the proof of
Proposition 3 also works outside Rec. We have to leave this as an open question
for future research. The following proposition shows that this is a tricky problem
which may even require some adaptation of the inf relation.

Proposition 5. There exists a mapping M such that M ≺inf M ⊕ M (that is
M inf M ⊕ M and M ⊕ M inf M).

In other words, the inf order displays an unexpected behaviour for mappings
outside Rec: intuitively, one would expect that the amount of source information
transferred remains unchanged if, for every source instance I, we combine all
pairs of solutions of I. For mappings in Rec, this is of course the case. In contrast,
by Proposition 5, there are mappings outside Rec such that the amount of source
information transferred strictly increases by this simple syntactic trick.


5    Existence of the intersection

In this section we study the existence of the intersection. The main result is
stated in the following theorem. Again, we use Rec to denote the class of total
mappings that have a maximum recovery.

Theorem 3. For every pair of mappings M1 and M2 in Rec having the same
source schema, the intersection M1 ⊓Rec M2 exists.


                                          8


                                        136
Proof (sketch). To describe the proof of the theorem we need to introduce some
technical notions. Let S be a schema and consider a mapping M from S to
S (that is M ⊆ Inst(S) × Inst(S)). For a positive integer k, we define Mk
recursively as follows:

                                 M1 = M,
                               Mk+1 = M ◦ Mk .

We shall also define M+ as the following mapping from S to S:
                                         ∞
                                         [
                                 M =+
                                               Mi .
                                         i=1

Notice that M+ is the transitive closure of M when it is viewed as a binary
relation over Inst(S).
    Now consider mappings M1 and M2 in the statement of the theorem. Given
that M1 and M2 are mappings in Rec, we know that there exist mappings M′1
and M′2 such that M′1 is a maximum recovery of M1 , and M′2 is a maximum
recovery of M2 . Now consider the mapping M given by
                                                     +
                     M =     (M1 ◦ M′1 ) ∪ (M2 ◦ M′2 ) .

It can be shown that M is the intersection M1 ⊓Rec M2 .                         ⊓
                                                                                ⊔

   Since every mapping given by st-tgds is total and has a maximum recovery [4],
from Theorem 3 we obtain that for mappings specified by st-tgds the intersection
(w.r.t. the class Rec) always exists.
   Notice that Theorem 3 is only about existence and says nothing about the
language needed to express the intersection of mappings specified by st-tgds.
The following result shows that as opposed to the case of the union operator,
the intersection of mappings specified by st-tgds may not be expressible in First-
Order logic (FO).

Theorem 4. There exist mappings M1 and M2 specified by st-tgds such that
M1 ⊓Rec M2 cannot be specified by a set of FO sentences.

Proof (sketch). Consider the schemas S = {A(·, ·), B(·, ·)}, T1 = {T1 (·, ·)}, and
T2 = {T2 (·, ·)}. In the proof we use mappings M1 and M2 from S to T1 and
from S to T2 , respectively, specified by the following st-tgds:

                    M1 : ∃u (A(x, u) ∧ B(u, y)) → T1 (x, y)
                    M2 : ∃u (B(x, u) ∧ A(u, y)) → T2 (x, y)

It can be shown by an argument based on Ehrenfeucht-Fraı̈ssé games that
M1 ⊓Rec M2 cannot be expressed by an FO sentence.                     ⊓
                                                                      ⊔

                                         9


                                        137
    Notice that the above theorem states that M1 ⊓Rec M2 is not expressible
in FO. In principle, it might be the case that if we restrict ourselves to the
intersection with respect to a smaller class of mappings, for example the class
of mappings specified by st-tgds, then we could obtain better expressibility re-
sults. The following proposition shows a negative result in this respect. This is
a corollary of the proof of Theorem 4.

Proposition 6. Let S be the class of mappings specified by st-tgds. Then there
are mappings M1 and M2 in S such that M1 ⊓S M2 does not exist.

This raises the question as to which language is expressive enough to specify the
intersection of two mappings specified by st-tgds. Below we provide an answer
to this question.

Theorem 5. Given two schema mappings M1 and M2 given by st-tgds, the
intersection M1 ⊓Rec M2 is expressible by an Existential Second-Order logic
(ESO) formula.

                                                              b a copy of
Proof (sketch). Let S be the source schema of M1 and M2 , and S
                          ′        ′
schema S. Moreover, let M1 and M2 be respective maximum recoveries of M1
and M2 . One can show that the composition M1 ◦ M′1 can be expressed as an
FO formula:
               ∀x1 (ϕ1 (x1 ) → ψ1 (x1 )) ∧ . . . ∧ ∀xn (ϕn (xn ) → ψn (xn ))
where ϕi (xi ) and ψi (xi ) are FO formulas over S and S,b respectively. Similarly,
M2 ◦ M′2 can be expressed as ∀y 1 (α1 (y 1 ) → β1 (y 1 )) ∧ . . . ∧ ∀y m (αm (y m ) →
βm (y m )). The formula representing the intersection is based on the construction
in the proof of Theorem 3, thus we need to show how to express the transitive
closure of M1 ◦ M′1 ∪ M2 ◦ M′2 . For this we use an intermediate schema S           e
constructed as follows: for every n-ary relation R of S, we include an (n + 1)-ary
relation Re in S.
               e The idea is that an atom R(a,
                                             e g) will represent the atom R(a)
in the generation g of the computation of the transitive closure. Now, to define
the intersection, we use an ESO formula of the form
                                e ∃s ∃ zero (Ωs ∧ Ω e ∧ Ω E )
                               ∃S                  R     Re

where ∃S e denotes an existential quantification over all relation symbols in S,  e s
is a function symbol, and zero a first order variable. The rest of the formulas
is constructed as follows: Ωs is the formula ∀x∀y (s(x) = s(y) → x = y) ∧
¬(s(x) = x) ∧ ¬(s(x) = zero) that defines a successor function, with zero
as the first element; ΩRe corresponds to the following FO formula (we assume
S = {R1 , . . . , Rk } and z i is a tuple of variables of the same arity as Ri ):
                                                                                       
   ∀z 1 (R1 (z 1 ) → R e1 (z 1 , zero)) ∧ · · · ∧ ∀z k (Rk (zk ) → R  ek (z k , zero)) ∧
      ^ n                                              m
                                                        ^                                         
                                                                                             
   ∀g       ∀xi ϕ  ei (xi , g) → ψei (xi , s(g)) ∨          ∀y i αei (y i , g) → βei (yi , s(g))
        i=1                                            i=1


                                                 10


                                               138
where ϕ  ei (xi , g) is obtained from ϕi (xi ) by replacing every relational symbol R(z)
    e g), and ψei (xi , s(g)) is obtained from ψi (xi ) by replacing every relational
by R(z,
          b
symbol R(z)            e s(g)), and similarly for α
                  by R(z,                                ei and βei . The intuition is that the
                                          e
first line initializes the relations Ri at generation 0, and the second line mimics
a formula representing (M1 ◦ M′1 ∪ M2 ◦ M′2 )+ over schema S.               e Finally, Ω E just
                                                                                           e
                                                                                           R
extracts the target relations at some generation g of the transitive closure:
                                                                                        
         ∃g ∀z 1 (R  e1 (z 1 , g) → R
                                    b1 (z 1 )) ∧ · · · ∧ ∀z k (R
                                                               ek (z k , g) → R
                                                                              bk (z k )) .    ⊓
                                                                                              ⊔

Example 2. Recall the mappings M1 and M2 from Example 1. Composed with
their maximum recoveries, they have the following form (see [4]):
                                  b 1 , x2 )} and
        M1 ◦ M′1 = {S(x1 , x2 ) → S(x
                                  b 1 , x2 ) ∨ (x1 = x2 ∧ Q(x
        M2 ◦ M′2 = {S(x1 , x2 ) → S(x                     b 1 )),
                                b            b
                      Q(x1 ) → S(x1 , x1 ) ∨ Q(x1 )}.
The intersection M1 ⊓Rec M2 is expressed by the following ESO formula:
                
                                                                                  
 e  e
∃S∃Q∃s∃ zero ∀x∀y (s(x) = s(y) → x = y) ∧ s(x) 6= x ∧ s(x) 6= zero ∧
                                                                                 
                                     e                                 e
         ∀x1 ∀x2 S(x1 , x2 ) → S(x1 , x2 , zero) ∧ ∀x Q(x) → Q(x, zero) ∧
                                                        
     ∀g ∀x1 ∀x2 S(x e 1 , x2 , g) → S(xe 1 , x2 , s(g)) ∨
           
                                                                                      
                     e 1 , x2 , g) → S(x
             ∀x1 ∀x2 S(x                 e 1 , x2 , s(g)) ∨ (x1 = x2 ∧ Q(xe 1 , s(g))) ∧
                                                                      
                                                                     
                     e 1 , g) → S(x
            ∀x1 ∀x2 Q(x               e 1 , x1 , s(g)) ∨ Q(x
                                                          e 1 , s(g))      ∧
                                                                               
                                                                             
                    e 1 , x2 , g ′ ) → S(x
     ∃g ′ ∀x1 ∀x2 (S(x                  b 1 , x2 )) ∧ ∀x(Q(x,
                                                           e g ′ ) → Q(x))
                                                                        b


6    Conclusion
In this work, we have introduced two new operators union and intersection on
schema mappings. We have proved that these operators allow us to define a
lattice w.r.t. the order inf (up to ≡inf -equivalence) for the mappings in Rec
(i.e., mappings having a maximum recovery). In particular, we have shown that
the union and intersection always exist for mappings in Rec. When restricting
us to the simple case of mappings specified by st-tgds it has turned out that
the intersection operator is considerably more difficult to handle than the union
operator. More specifically, the union of two mappings specified by st-tgds can
again be specified by a set of st-tgds. In contrast, First-Order logic (FO) does in
general not suffice to express the intersection of such mappings.
     A lot of interesting research questions have been left for future work. First of
all, while our definitions of ⊔ and ⊓ are applicable to arbitrary mappings, we have
restricted ourselves to the mappings in Rec for investigating the questions of
existence, expressiveness, and computability of ⊔ and ⊓. We would like to extend

                                              11


                                            139
this study to arbitrary mappings. As has been illustrated in Proposition 5, such
an extension may even require an adaptation of the inf -relation.
    Recall that in Theorem 5 we have shown that ESO is expressive enough
to specify the intersection of two mappings given by sets of st-tgds. Further
analysis is required to determine if a smaller fragment of ESO would also suffice.
In addition it would be interesting to identify restrictions on the st-tgds so that
the intersection of mappings specified by such st-tgds is FO-expressible.
    We would also like to extend our study to further set operators on schema
mappings. Above all, we would like to study the complement of a mapping M
(i.e., a mapping that transfers all the source information not transferred by
M) and, more generally, the set difference of two mappings M and N (i.e., a
mapping that transfers all the source information that is transferred by M but
not by N ). Overall, we think that the union and intersection operators can be
crucial in not only defining operators such as difference and complement, but in
laying the foundation to a framework of similar operators on schema mappings.
    Given mappings from a source schema S to a target schema T, an interesting
question is if the result of the union or intersection can be also specified as a
mapping from S to T. Our definitions do not force the same target schema to
be used when constructing a union or intersection, but we consider this setting
to be the most common in practice. We leave the investigation of this case as a
yet another item in the future research agenda.
Acknowledgements This work has been funded in part by Marie Curie action
IRSES under Grant No. 24761 (Net2), and by the Vienna Science and Technology
Fund (WWTF) through project ICT08-032. Jorge Pérez has been supported by
Fondecyt grant 11110404 and by VID grant U-Inicia 11/04 Universidad de Chile.


References
 1. M. Arenas, J. Pérez, J. L. Reutter, and C. Riveros. Foundations of schema mapping
    management. In Proc. PODS, pages 227–238, 2010.
 2. Marcelo Arenas, Ronald Fagin, and Alan Nash. Composition with target con-
    straints. In Proc. ICDT’10, ACM International Conference Proceeding Series,
    pages 129–142. ACM, 2010.
 3. Marcelo Arenas, Jorge Pérez, Juan L. Reutter, and Cristian Riveros. Composition
    and inversion of schema mappings. SIGMOD Record, 38(3):17–28, 2009.
 4. Marcelo Arenas, Jorge Pérez, and Cristian Riveros. The recovery of a schema
    mapping: Bringing exchanged data back. ACM Trans. Database Syst., 34(4), 2009.
 5. P. Bernstein. Applying model management to classical meta data problems. In
    Proc. CIDR, 2003.
 6. P. Bernstein and S. Melnik. Model management 2.0: manipulating richer mappings.
    In Proc. SIGMOD, pages 1–12, 2007.
 7. R. Fagin, P. G. Kolaitis, L. Popa, and W.-C. Tan. Composing schema mappings:
    Second-order dependencies to the rescue. TODS, 30(4):994–1055, 2005.
 8. Ronald Fagin. Inverting schema mappings. ACM Trans. Database Syst., 32(4),
    2007.
 9. Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data ex-
    change: semantics and query answering. Theor. Comput. Sci., 336(1):89–124, 2005.


                                         12


                                        140
10. Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang Chiew Tan. Reverse
    data exchange: Coping with nulls. ACM Trans. Database Syst., 36(2):11, 2011.
11. L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio grows up: from
    research prototype to industrial tool. In Proc. SIGMOD, pages 805–810, 2005.
12. Maurizio Lenzerini. Data integration: A theoretical perspective. In Proc. PODS,
    pages 233–246. ACM, 2002.
13. Leonid Libkin. Elements of Finite Model Theory. Springer, 2004.
14. Jayant Madhavan and Alon Y. Halevy. Composing mappings among data sources.
    In Proc. VLDB’03, pages 572–583, 2003.
15. Sergey Melnik. Generic Model Management: concepts and Algorithms, volume 2967
    of LNCS. Springer, 2004.
16. Alan Nash, Philip A. Bernstein, and Sergey Melnik. Composition of mappings
    given by embedded dependencies. ACM Trans. Database Syst., 32(1):4, 2007.
17. Jorge Pérez. Schema Mapping Management in Data Exchange Systems. PhD
    thesis, Escuela de Ingenierı́a, Pontificia Universidad Católica de Chile, 2011.


                                        13


                                       141