<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Fagin, R., Kolaitis, P. G., Popa, L., Tan, W. C.: Composing schema mappings:
Second-order dependencies to the rescue., ACM Trans. Database Syst.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>XML Schema Mappings in the Presence of Key Constraints and Value Dependencies ?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Institute of Control</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Information Engineering</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Mathematics and Computer Science, Adam Mickiewicz University</institution>
          ,
          <addr-line>Poznan</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2006</year>
      </pub-date>
      <volume>30</volume>
      <issue>4</issue>
      <abstract>
        <p>Schema mappings play a central role in both data integration and data exchange, and are understood as high-level speci¯cations describing the relationships between data schemas. Based on these speci¯cations, data structured under a source schema can be transformed into data structured under a target schema. During the transformation some structural constraints, both context-free (the structure) and contextual (e.g. keys and value dependencies) should be taken into account. In this work, we present a new formalism for the schema mapping speci¯cation. We propose a new class of tree-pattern formulas in order to extend semantics of XML schema mappings by speci¯cation of key constraints and value dependencies. We discuss foundations of the method and propose a key-preserving transformation algorithm.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Many problems in information integration and data exchange need schema
mappings, which are high-level speci¯cations describing the relationships between
source and target data schemas. A schema mapping over a pair of schemas
expresses a relation on the set of instances of these two schemas. The mapping
should produce an instance of the target schema for a given instance of the
source schema. The target instance should correctly represent the information
contained in the source instance under the constraints imposed by the target
schema [
        <xref ref-type="bibr" rid="ref11 ref3">3, 11, 19, 29</xref>
        ]. The problem of schema mapping arises in many areas of
data management systems. Recently, this problem has received considerable
attention in the context of data exchange [
        <xref ref-type="bibr" rid="ref10 ref3">3, 10</xref>
        ], data integration [16], schema
evolution [18], P2P databases [
        <xref ref-type="bibr" rid="ref5">5, 20</xref>
        ], life science databases [21] or e-commerce
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where data comes from many di®erent sources with di®erent schemas. In
data management [18] such operations on mappings as composition [
        <xref ref-type="bibr" rid="ref11">17, 11, 19</xref>
        ]
and inversion [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has been considered.
      </p>
      <p>
        Schema mappings between relational schemas are usually de¯ned by means of
the source-to-target dependencies [
        <xref ref-type="bibr" rid="ref1">1, 12, 19</xref>
        ]. Recently, this method was adapted
? The work was supported in part by the Polish Ministry of Science and Higher
Education under Grant N516 015 31/1553.
to XML data and tree-patterns are used to capture the context-free structure of
XML trees [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, the approach does not take (contextual) key constraints
into account.
      </p>
      <p>
        In this paper we assume that the de¯nition of a schema is speci¯ed in XSD
(XML Schema De¯nition [27]) language, and except for context-free constraints
the following three classes of contextual constraints are taken into account: keys,
key references, and value dependencies.
1. Key constraints state that a subtree is uniquely identi¯ed by a tuple of values
of key paths [
        <xref ref-type="bibr" rid="ref4">4, 27</xref>
        ]. They are speci¯ed within the &lt;xs:key&gt; elements.
2. Keyref constraints assert a correspondence of the tuples of values resulting
from evaluating a referencing key (a foreign key) with those of the referenced
key [27]. They are speci¯ed within the &lt;xs:keyref&gt; elements.
3. Value dependency constraints impose that a text value of a text node
depends on a tuple of values of other nodes (resembling functional
dependencies in relational databases [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). The value dependencies are declared in the
&lt;xs:valdep&gt; section of XSD (this section can be included in &lt;xs:annotation&gt;
element of XSD).
      </p>
      <p>We introduce key-pattern formulas which are intended to represent keys
de¯ned within XSD, and are used to control mapping execution. It makes the
process of XML data mappings and transformations semantically richer. We propose
algorithms that construct an intended key-preserving target instance, we also
formulate a theorem that is the basis for the presented method. In our approach,
we distinguish among three kinds of mappings: automappings which are
automatically generated from XSD speci¯cations and capture both context-free and
contextual properties; c-mappings that express correspondences between source
and target schemas; t-mappings which are derived from auto- and c-mappings
and de¯ne key-preserving transformations. The advantage of this approach is
that the management of these mappings and the reasoning over them can be
carried out independently.</p>
      <p>The structure of the paper is as follows. In Section 2 formal de¯nitions
concerning XML trees and XML schemas are given and a running example is
presented. The fundamentals of schema mappings are reviewed in Section 3. In this
section we also characterize the existing approaches. In Section 4 we propose
a formalism for de¯ning key-preserving XML schema mappings. The theorem
and the algorithm concerning the mapping execution are presented in Section 5.
Section 6 concludes the work and formulates directions of the future research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>XML trees and XML schemas</title>
      <p>We consider XML data as a node-labeled unranked and ordered tree (XML tree)
[28]. For simplicity, attribute nodes are represented by text nodes.</p>
      <p>Let L ½ Lab be a ¯nite set of labels, Str be a set of string values (with the
distinguished null values ?; ?1; ?2; ::: in Str), and DId be a set of document
identi¯ers (e.g. URI addresses).</p>
      <p>De¯nition 1. An XML tree I is a tuple (r; N e; N t; ·; child; ¸; º; ¾), where:
{ r is a distinguished root node, N e is a ¯nite set of element nodes, and N t is
a ¯nite set of text nodes;
{ · { a total ordering relation on the set of nodes;
{ child µ (frg [ N e) £ (N e [ N t) { an acyclic binary relation introducing tree
structure into (r; N e; N t) such that the root has only one child (this child is
the top element) and each element node must have a child;
{ ¸ : N e ! L { a function labeling element nodes (the label is the type of the
node);
{ º : N t ! Str { a function labeling text nodes with their text values;
{ ¾ : frg ! DId { a function that assigns a document identi¯er to the root.
¤</p>
      <p>The XML trees will be considered as instances of XML schemas. We will
restrict ourselves to context-free and contextual (key) constraints speci¯ed by
an XML schema. The context-free constraints are given by regular expressions
describing types of children for any element node type. Keys specify how subtrees
are identi¯ed within an XML tree by means of text values of key paths.</p>
      <p>A path P is de¯ned by the grammar P ::= =l j ==l j l j P=l j P==l; where: =l
denotes a set of children of type l of the root, ==l denotes all descendent nodes
of type l of the root, P=l denotes a set of nodes of type l which are children of
the nodes denoted by P , P==l denotes all "descendent-or-self" nodes of type l
of the nodes denoted by P .</p>
      <p>
        We assume that there is a key for any node type. Following [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we assume
the following de¯nition of a key for XML:
De¯nition 2. A key is an expression of the form (P; (P 0; (P1; :::; Pk))); where
every P=P 0=Pi is a valid path. The path P is the context path, P 0 is the target
path, and P1; :::; Pk are key paths of the key. ¤
De¯nition 3. An XML schema S over a set L of labels is a tuple
      </p>
      <p>S = (top; ContF ree; KeyDep; Keyref Dep; V alDep);
where:
{ top is the the distinguished label of the top (outermost) element;
{ ConF ree is a function from L to regular expressions over L ¡ ftopg de¯ned
by the grammar e ::= ² j l j eje j e; e j e? j e+ j e¤;
{ KeyDep assigns a key (P; (l; (P1; :::; Pk))) to any label l 2 L ¡ ftopg.
{ KeyRef Dep assigns referential constraints, i.e. expressions of the form,
(P; l; (P1; :::; Pk)) ref KeyDep(l0), to some labels in L; l; l0 2 L ¡ ftopg. The
expression de¯nes a foreign key (P; l; (P1; :::; Pk)) that refers to a primary
key KeyDep(l0).
{ V alDep assigns a set of value dependencies to some labels in L. A value
dependency is an expression of the form (P=l; P 0) = f (P=l=P1; :::; P=l=Pm).
De¯nition 4. An XML tree I = (r; N e; N t; ·; child; ¸; º; ¾) conforms to the
XML schema S = (top; ConF ree; KeyDep; KeyRef Dep; V alDep), if:
1. ¾(r) is de¯ned, and for any text node n 2 N t, º(n) is de¯ned.
2. If (r; n) 2 child, then ¸(n) = top.
3. For any node n in N e with children (n1; :::; nm) such that n1 &lt; ::: &lt; nm, if
¸(n) = l, then the sequence ¸(n1):::¸(nm) is a word of the language de¯ned
by the regular expression ConF ree(l).
4. If KeyDep(l) = (P; (l; (P1; :::; Pk))), then each subtree rooted in a node n of
type l in an context denoted by the context path P , is uniquely identi¯ed by a
tuple of text values (v1; :::; vk) of the key paths (P1; :::; Pk); k ¸ 0. For k = 0,
(P; (l; ())) indicates that n is unconditionally unique in the context of P .
5. If KeyRef Dep(l) = (P; l; (P1; :::; Pk)) ref (P 0; (l0; (P10; :::; Pk0 ))), then for any
tuple (v1; :::; vk) of values such that P [l[P1 = v1 ^ ¢ ¢ ¢ ^ Pk = vk]] there exists
exactly one node n 2 [[P 0=l0]] such that the tuple (P10; :::; Pk0 ) of key paths on
n has the value equal to (v1; :::; vk).
6. If (P=l; P 0) = f (P=l=P1; :::; P=l=Pn) 2 V alDep(l), then text values of the
path P=l=P 0 are functionally dependent on the set of tuples of text values
determined by (P=l=P1; :::; P=l=Pn). We use the name f to distinguish two
di®erent dependencies having the same set of determining paths. ¤
An XML tree I is called an instance of a schema S, denoted I j= S, i®
conforms to S, i.e. satis¯es all the constraints imposed by S.</p>
      <p>Example 1. Sample XML trees and XML schema trees are given in Fig. 1. The
XML trees in Fig. 1 represent the bibliographical data. The node labels are as
follows: paper (P ) and title (T ) of the paper; author (A), name (N ) and the
a±liation (U ) of the author; year (Y ) of publication and the conference (C)
where the paper was presented; R and K are used to join authors with their
papers.</p>
      <p>If we restrict ourselves to the context-free structure only, then de¯nitions of
S1, S2 and S3 can be given as DTDs in Fig. 2. In order to specify contextual
constraints, we will use the schema de¯nition language XSD. In Fig. 4 the
complete XSD de¯nitions for S1 and fragments for S2 and S3 are given. It can be
seen that the XML tree I1 is an instance of S1, whereas I2 and I20 are instances
of S2 with respect to the DTDs D1 and D2, respectively. Moreover, I1 and I2 are
instances of schemas speci¯ed by XSDs SD1 and SD2, since they satisfy keys,
respectively:
(1)
(2)
(S2; (A; (N; P=T ))); (S2=A; (N; ())); (S2=A; (U; ()));
(S2=A; (P; (T ))); (S2=A=P; (T; ())); (S2=A=P; (Y; ())):
For example, each subtree in I 20 denoted by =S2=A is uniquely identi¯ed by a pair
of text values of the pair of key paths (N; P =T ). In Figure 3 there are constraints
assigned to elements of the schema S3.
¤</p>
      <p>I1 : S1
T
t1</p>
      <p>P</p>
      <p>A
N U N U
a1 u1 a2 u2</p>
      <p>S1: S1</p>
      <p>A</p>
      <p>T
t2
T</p>
      <p>A+
P*
N</p>
      <p>P
A
N
a1
U?</p>
      <p>A
N U P
a1 u1</p>
      <p>I2: S2
T Y T Y
t1 ⊥1 t2 ⊥2</p>
      <p>S2: S2
N</p>
      <p>A*
U?</p>
      <p>P+</p>
      <p>T Y?
P</p>
      <p>N U
a2 u2</p>
      <p>A</p>
      <p>P
T Y
t1 ⊥1</p>
      <p>A
N U P
a1 u1</p>
      <p>I2’: S2</p>
      <p>A
N U P
a1 u1</p>
      <p>S3: S3
T Y
t1 ⊥1</p>
      <p>T Y
t2 ⊥2
A*</p>
      <p>P+
N</p>
      <p>R* K</p>
      <p>T</p>
      <p>Y? C?</p>
      <p>A
N U P
a2 u2</p>
      <p>T Y
t1 ⊥1
source instances: let S be a source schema, T a target schema and M a schema
mapping, i.e. a formula expressing a relationship between S and T . For a given
instance I of S, ¯nd an instance J of T such that hI ; J i j= M. Such an instance
J is called a solution for I under M.</p>
      <p>
        In the relational data exchange setting, source-to-target dependencies (STDs)
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are usually used to express schema mappings [12, 19]. The STD is a ¯rst-order
formula:
STD):
      </p>
      <p>8x(©(x) ) 9yª (x; y));
that after skolemization has the form of the following second-order STD (SO
9f 8x(©(x) ^ Â(x; y) ) ª (x; y));
where: (1) f is a vector of function symbols, x, y are vectors of variables; (2) ©
is a conjunction of atoms over source schemas; (3) Â(x; y) is a conjunction of
equalities of the form t = t0 where t and t0 are terms over f , x and y; (4) ª is a
conjunction of atoms over target schema; (5) each variable is safe.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] it was shown that SO STDs are strictly more expressive then STDs
and are closed under composition. It means that the composition of two SO
STD mappings, M13 = M12 ± M23, is also a SO STD mapping (this is not
true for ¯rst-order STD mappings). The semantics of the composition of schema
mappings is de¯ned by means of the composition of two binary relations.
      </p>
      <p>
        For schemas containing nested data (such as XML schemas) extensions of the
above mentioned formalism were proposed [22, 29, 13]. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], tree-pattern
formulas [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are used in STDs instead of the relational atoms. Further on, such SO
STDs with tree-pattern formulas will be referred to as SO XSTDs (second-order
XML source-to-target dependencies). In this way correspondences between XML
data can be expressed. However, such mappings do not take key constraints
into consideration and they are not capable of ensuring that the target instance
conforms to a target schema with key speci¯cation. Speci¯cation in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is
restricted to DTD schemas only. To illustrate this problem suppose that we want
to restructure the instance I1 (Fig. 1) of D1 under D2 (Fig. 2).
      </p>
      <p>The following XSTD formula speci¯es a mapping from D1 to D2:
M := 8xT ; xN ; xU (=S1[P [T = xT ^ A[N = xN ^ U = xU ]]]</p>
      <p>) 9xY =S2[A[N = xN ^ U = xU ^ P [T = xT ^ Y = xY ]]]):</p>
      <p>
        Using the speci¯cation, two possible target instances, I2 and I20 (Fig. 1) can
be produced. These instances conform to D2. However, I2 and I20 conforms to
di±cult sets of keys { given in (2) and (3), respectively. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a repair process
is also proposed which allows for inventing null values in the ¯nal XML tree and
to state that some null values should be equal.
      </p>
      <p>In the next sections we will show how both key constraints and value
dependencies can be captured by mappings.
4</p>
      <p>Key-preserving XML schema mappings
In this paper we propose key-pattern formulas, which are a kind of tree-pattern
formulas intended to re°ect keys in the target schema. A key-pattern formula
&lt;schema xmlns="http://www.w3.org/2001/XMLSchema"&lt;&gt;schema xmlns="http://www.w3.org/2001/XMLSchema"&gt;
&lt;element name="S1"&gt; &lt;element name="S2"&gt;
&lt;complexType&gt; &lt;complexType&gt;
&lt;sequence&gt; &lt;sequence&gt;</p>
      <p>&lt;element ref="P" minOccurs="0" &lt;element ref="A" ... /&gt;
maxOccurs="unbounded" /&gt; &lt;/sequence&gt;
&lt;/sequence&gt; &lt;/complexType&gt;
&lt;/complexType&gt; &lt;/element&gt;
&lt;/element&gt; &lt;element name="A"&gt;
&lt;element name="P"&gt; ...</p>
      <p>&lt;complexType&gt; &lt;/element&gt;
&lt;sequence&gt; ...</p>
      <p>&lt;element name="T" type="string" /&gt; &lt;/schema&gt;
&lt;element ref="A" minOccurs="1"</p>
      <p>maxOccurs="unbounded" /&gt; SD3:
&lt;/sequence&gt;
&lt;/complexType&gt;
&lt;key name="PKey"&gt;
&lt;selector xpath="." /&gt;
&lt;field xpath="T" /&gt;
&lt;/key&gt;
&lt;/element&gt;
&lt;element name="A"&gt;
&lt;complexType&gt;
&lt;sequence&gt;
&lt;element name="N" type="string" /&gt;
&lt;element name="U" type="string"</p>
      <p>minOccurs="0" /&gt;
&lt;/sequence&gt;
&lt;/complexType&gt;
&lt;key name="AKey"&gt;
&lt;selector xpath="." /&gt;
&lt;field xpath="N" /&gt;
&lt;/key&gt;
&lt;valdep&gt;
&lt;target name="U" /&gt;
&lt;function name="fu" /&gt;
&lt;source xpath="N" /&gt;
&lt;/valdep&gt;
&lt;/element&gt;
&lt;/schema&gt;
&lt;schema xmlns="www.w3.org/2001/XMLSchema"&gt;
&lt;element name="S3"&gt;
&lt;complexType&gt;
&lt;sequence&gt;
&lt;element ref="A" ..." /&gt;
&lt;element ref="P" ..." /&gt;
&lt;/sequence&gt;
&lt;/complexType&gt;
&lt;/element&gt;
&lt;element name="A"&gt;
...
&lt;keyref name="AKeyref" refer="PKey"&gt;
&lt;selector xpath="." /&gt;
&lt;field xpath="R" /&gt;
&lt;/keyref&gt;
&lt;/element&gt;
&lt;element name="P"&gt;
...
&lt;key name="PKey"&gt;
&lt;selector xpath="." /&gt;
&lt;field xpath="K" /&gt;
&lt;/key&gt;
...</p>
      <p>&lt;/element&gt;
&lt;/schema&gt;
E ::= x j C
¼ ::= P [E] j ¼=P [E]</p>
      <p>C ::= P = x j ¼ j C ^ C
valuation !, where ! is a total function from the set of all variables occurring
¤
in ¼ to Str. We denote by !(x) the value of ! on x (we assume !(x) = ?, if
there is not any text value for x in I { this is possible because of semistructured
nature of XML).</p>
      <p>Further on, by [[¼]] and n[[¼]] we will denote a value of ¼ according to semantics
of XPath expressions [25, 14], i.e. a set of nodes reachable from the root or from
the context node n, respectively, via ¼.</p>
      <p>De¯nition 6. The satisfaction of a tree-pattern formula under a valuation ! in
a node n of an XML tree I, is de¯ned as follows (val(n) is the text value of the
text node n):
1. (I; n) j= (x)(!) i® val(n) = !(x).
2. (I; n) j= (P = x)(!) i® there is a node n0 2 n[[P ]] such that (I; n0) j= (x)(!).
3. (I; n) j= (C1 ^ C2)(!) i® (I; n) j= (C1)(!) and (I; n) j= (C2)(!).
4. (I; n) j= (P [E])(!) i® there is a node n0 2 n[[P ]] such that (I; n0) j= (E)(!).
5. (I; n) j= (¼=P [E])(!) i® (I; n) j= (¼)(!) and there is a node n0 2 n[[¼=P ]]
such that (I; n0) j= (E)(!).</p>
      <p>We say that a formula Á is satis¯ed in (I; n), or that (I; n) satis¯es Á, denoted
(I; n) j= Á, i® (I; n) j= Á(!) for some valuation ! over Á into I. I j= Á denotes
that Á is satis¯ed in the root of I. ¤
De¯nition 7. A key-pattern formula ± is a tree-pattern formula restricted to
the syntax:
± ::= =top=l[E] j ±=l[E]
E ::= x j C</p>
      <p>C ::= P = x j C ^ C ¤
Lemma 1. Let ±=l[E] be a key-pattern formula, and I be an XML tree. Then:
I j= ±=l[E] , I j= ± ^ 9n 2 [[±]]((I; n) j= l[E]):
(5)
Proof. The lemma follows from (4) and (5) in De¯nition 6.</p>
      <p>The above lemma will be used in Section 5 as the basis of an algorithm
creating key-preserving target instances. Key-pattern formulas can be automatically
generated from keys de¯ned in a schema S = (top; ConF ree; KeyDep;
KeyRef Dep; V alDep) over a set
L of labels, by means of the following recursive algorithm:
Algorithm 1 key-patt(key), generation of a key-pattern formula representing
the given key.</p>
      <p>Input : Schema S = (top; ConF ree; KeyDep; KeyRef Dep; V alDep)
over a set L of labels,
label l 2 L ¡ ftopg,
key · = KeyDep(l) = (P; (l; (P1; :::; Pk))), k ¸ 0.</p>
      <p>Output : Key-pattern formula key-patt(·).
key-patt(·) = case of (·)
(=top; (l; ())) : =top=l[x]
(=top; (l; (P1; :::; Pk))) : =top=l[P1 = x1 ^ ¢ ¢ ¢ ^ Pk = xk]
(P=l0; (l; ())) : key-patt(KeyDep(l0))=l[x]
(P=l0; (l; (P1; :::; Pk))) : key-patt(KeyDep(l0))=l[P1 = x1 ^ ¢ ¢ ¢ ^ Pk = xk]
endcase
The automappings are identity mappings over schemas that describe how
instances of the schema are transformed onto themselves. An automapping
captures constraints speci¯ed within a schema by means of SO XSTD of the form:
¼(x) ^ Ã(x) ) ¢(x);
(6)
where: ¼(x) is a tree-pattern formula capturing the structure of the schema;
Ã(x) is a conjunction of atoms of the form x = x0 and x = f (x1; :::; xn), where
the former captures a key reference and the latter { a value dependence; ¢(x) is
a conjunction of key-pattern formulas, where every formula represents a key in
the schema. We assume that function symbols are quanti¯ed existentially, while
variables { universally.</p>
      <p>Example 2. Automappings of S1, S2 and S3, referring to XSDs in Fig. 4, have
the following speci¯cations:
1. A1 := ¼1(xT ; xN ; xU ) ^ Ã1(xT ; xN ; xU ) ) ¢1(xT ; xN ; xU ), where
{ ¼1 := =S1[P [T = xT ^ A[N = xN ^ U = xU ]]]
{ Ã1 := xU = fU (xN )
{ ¢1 := =S1=P [T = xT ]
^ =S1=P [T = xT ]=T [xT ]
^ =S1=P [T = xT ]=A[N = xN ]
^ =S1=P [T = xT ]=A[N = xN ]=N [xN ]
^ =S1=P [T = xT ]=A[N = xN ]=U [xU ]
2. A2 := ¼2(yT ; yN ; yU ; yY ) ^ Ã2(yT ; yN ; yU ; yY ) ) ¢2(yT ; yN ; yU ; yY ), where
{ ¼2 := =S2[A[N = yN ^ U = yU ^ P [T = yT ^ Y = yY ]]]
{ Ã2 := yU = fU (yN ) ^ yY = fY (yT )
{ ¢2 := =S2=A[N = yN ]
^ =S2=A[N = yN ]=N [yN ]
^ =S2=A[N = yN ]=U [yU ]
^ =S2=A[N = yN ]=P [T = yT ]
^ =S2=A[N = yN ]=P [T = yT ]=T [yT ]
^ =S2=A[N = yN ]=P [T = yT ]=Y [yY ]
3. A3 :=
=S3[A[N = zN ^ R = zR] ^ P [K = zK ^ T = zT ^ Y = zY ^ C = zC ]]
^zR = zK ^ zY = fY (zT ) ^ zC = fC (zT ) ) =S3=A[N = zN ]^
=S3=A[N = zN ]=N [zN ] ^ =S3=A[N = zN ]=R[zR] ^ =S3=P [K = zK ]^
=S3=P [K = zK ]=T [zT ] ^ =S3=P [K = zK ]=Y [zY ] ^ =S3=P [K = zK ]=C[zC ]
4.2</p>
      <p>C-mappings
A c-mapping speci¯es the correspondence between source and target schemas,
and states how patterns in the source tree correspond to patterns in the target
tree. They are SO XSTDs of the form:
¼S(x) ^ ÁS;T (x; y) ) ¼T (x; y);
(7)
where: x and y are source and target variables, respectively; ¼S(x) is a
treepattern formula over a source schema S and de¯nes source variables; ÁS;T (x; y)
is a conjunction of atoms of the form x = x0, y = y0 and y = f (x1; :::; xn), where
x = x0 and y = y0 are restrictions over variables, and y = f (x1; :::; xn) de¯nes a
target variable by means of the source ones; ¼T (x; y) is a tree-pattern formula
over a target schema T .</p>
      <p>
        Discovering c-mappings or correspondences between schemas is a crucial
problem in schema mappings and may involve variety of methods and tools [
        <xref ref-type="bibr" rid="ref7 ref8">23,
7, 8</xref>
        ]. However, formal methods for representing mappings based on well
established formal languages enable inferring new mappings in a repository of accepted
mappings [
        <xref ref-type="bibr" rid="ref11 ref9">17, 11, 19, 9, 20</xref>
        ]. In this way user attention [15] can be utilized.
Example 3. A c-mapping from S1 into S2 has the following speci¯cation:
      </p>
      <p>M12 := ¼1(zT ; zN ; zU ) ^ Á12(zT ; zN ; zU ; zY ) ) ¼2(zT ; zN ; zU ; zY );
where ¼1(zT ; zN ; zU ) and ¼2(zT ; zN ; zU ; zY ) are tree-pattern formulas speci¯ed
in automappings of S1 and S2, respectively, and Á12(zT ; zY ) is the atom formula
zY = fY (zT ) de¯ning target variable zY . ¤
A t-mapping or a transformation is a speci¯cation that describes how data
structured under the source schema is to be transformed into data structured under
the target schema preserving keys in the target. A t-mapping TST from a source
schema S into a target schema T is a SO XSTD formula of the form
¼S(x) ^ ÃS;T (x; y) ) ¢T (x; y);
(8)
and can be automatically derived from the c-mapping MST from S to T and
the automappings AT over T . Then the following rule is used:</p>
      <p>MST = ¼S(x1) ^ ÁST (x1; y1) ) ¼T (x1; y1)</p>
      <p>AT = ¼T (x2) ^ ÃT (x2; y2) ) ¢T (x2; y2)</p>
      <p>TST = (¼S(x1) ^ ÁST (x1; y1))[(x1; y1) 7! x2] ^ ÃT (x2; y2) ) ¢T (x2; y2)
The formula (¼S(x1)^ÁST (x1; y1))[(x1; y1) 7! x2] arises from ¼S(x1)^ÁST (x1; y1)
by replacing variables in (x1; y1) with corresponding variables in x2. The
replacement is made according to occurrences of variables within ¼T (x1; y1) and
¼T (x2).</p>
      <p>Example 4. From the c-mapping M12 (Example 3) and the automapping A2
(Example 2), the following t-mapping T12 from S1 into S2 can be obtained:
T12 := ¼1(yT ; yN ; yU ) ^ Á12(yT ; yN ; yU ; yY ) ^ Ã2(yT ; yN ; yU ; yY )</p>
      <p>) ¢2(yT ; yN ; yU ; yY )
5</p>
      <p>Creation of key-preserving target instances
In this section we propose an algorithm generating a target instance for a given
t-mapping. The left-hand side of the t-mapping provides a set ­ of variable
valuations, and the right-hand side consists of a set of key-pattern formulas.
Each key-pattern formula is used to create a Skolem term that for valuations in
­ delivers nodes of the expected target XML tree. We will show that the created
instance satis¯es keys represented by key-pattern formulas.</p>
      <p>Let ± be a key-pattern formula with variables in x. If for each valuation !
over ± into I the set [[±(!)]] has exactly one element, then a key represented by ±
is satis¯ed in I. If ±0 is satis¯ed in I and ± is a key-pattern formula of the form
±0=l[E] then, using Lemma 1, we can recursively analyze whether ± represents a
key satis¯ed in I or not.</p>
      <p>The following theorem formulates a necessary condition for satisfaction of a
key in an XML tree I. We will show that if the condition is satis¯ed in I by
a key-pattern formula ±, then the key represented by ± is satis¯ed in I. That
observation will be the base of our implementation (Algorithm 2).
Theorem 1. Let ± = ±0=l[E] be a key-pattern formula and I be an XML tree:
{ ±0 = =top=l1[E1]=:::=lm[Em];
{ E = (P1 = x1 ^ ¢ ¢ ¢ ^ Pk = xk);
{ ­ { a set of all variable valuations over ± into text values in I.
If for any valuation ! 2 ­
count([[±0=l(!)]]) = countf! 2 ­ j I j= ±(!)g
(9)
then the key represented by ± is satis¯ed in I, i.e.</p>
      <p>I j= (=top=l1=:::=lm; (l; (P1; :::; Pk))):
Proof. (Sketch) The key represented by ±0 is satis¯ed in I if for each valuation
! 2 ­, a set [[±0(!)]] has exactly one element. By structural induction we will
proof that for any ! 2 ­ the equality (9) implies count([[±(!)]]) = 1. Indeed:
1. The thesis holds for ± = =top=l[P1 = x1 ^ ¢ ¢ ¢ ^ Pk = xk], since:
{ count([[=top(!)]]) = 1, for any ! 2 ­,
{ if the number of nodes of type l determined by =top=l is equal to the
number of di®erent valuations of variables (x1; :::; xk), then the key
(=top; (l; (P1; :::; Pk))) represented by ± is satis¯ed in I.
2. Let ± = ±0=l[P1 = x1 ^ ¢ ¢ ¢ ^ Pk = xk]. Then from the induction hypothesis
count([[±0(!)]]) = 1, for each ! 2 ­. Now it is easy to see that the equality (9)
implies that count([[±(!)]]) = 1, for each ! 2 ­; hence the key represented
by ± is satis¯ed in I. ¤
The theorem states that if the number of nodes in [[±0=l]] is equal to the
number of di®erent valuations of variables in ± satisfying I, then the key represented
by ± is satis¯ed in I.</p>
      <p>This observation is the basis of our algorithm for generation of key-preserving
instances in data exchange. The one-to-one (bijective) correspondence between
sets mentioned in the thesis of the theorem, equality (9), can be achieved by
means of node-valued Skolem functions. The list of arguments of a Skolem
function Fl consists of all variables x occurring in ±=key-patt(KeyDep(l)). Then
every valuation ! 2 ­ creates a unique element node Fl(!(x)) of type l.
Similarly for text nodes.</p>
      <p>Now, we shall de¯ne the semantics for t-mappings. The left-hand side:
¼S (x) ^ ÃS;T (x; y)
of a t-mapping (8), determines a totally ordered set ­ of variable valuations.</p>
      <p>The ordering of valuations in ­ is induced by the lexicographic ordering
of tuples of text nodes whose values are assigned to the vector x of variables.
By ord(!) we denote the ordering position of ! in ­. To order nodes in a
result target tree, we will use the Dewey order encoding [24], where each node
n is assigned a sequence P os(n) that represents position of n in the tree. For
example, P os(n) = 1:3:2 says that n is the second child of the third child of the
top element.</p>
      <p>We assume:
{ P os(r) = 0, if r is the root; P os(n) = 1, if n is the top element;
{ P os(n) = P os(n0):ord(!), where n0 is the parent of n and n0 is obtained
under a valuation ! 2 ­.</p>
      <p>Let ­ be a set of valuations, and ¢ = f±1; :::; ±pg be a set of key-pattern
formulas occurring in the right-hand side of a t-mapping (8).</p>
      <p>Then an XML tree J = semT (¢)(­); where J = (r; N e; N t; ·; child; ¸; º; ¾);
is created as follows:</p>
      <p>semT (¢)(­) = fsemT (±)(!) j ± 2 ¢; ! 2 ­g;</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>
        We proposed a new approach to use XML schema mappings in data exchange.
To this aim we introduced a new class of tree-pattern formulas called
keypattern formulas that provide a natural way to express contextual constraints
in XML schema mappings and are used to perform key-preserving mappings.
We distinguished among three classes of mappings: automappings
(transformations of a schema onto itself), c-mappings (correspondences) and t-mappings
(key-preserving transformations). The distinction is justi¯ed by di®erent ways
in which they are obtained: automatically from a schema speci¯cation
(automappings), quasi-manually (c-mappings), or by deriving from another mappings
(tmappings). In our formalism, mappings capture three kinds of contextual
constraints: keys, key references and value dependencies. We demonstrated bene¯ts
of this formalism including increased speci¯cation accuracy, and the ability to
manage mappings and reasoning on them. In the future we plan to consider
operation on key-preserving mappings, as well as on mappings capturing another
constraints. For this purpose we need reasoning procedures on tree-patterns and
ontological knowledge describing the semantics of XML schemas which may be
applied for ¯nding correspondences between di®erent schemas [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abiteboul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hull</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vianu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          : Foundations of Databases , Addison-Wesley, Reading, Massachusetts,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Amer-Yahia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakshmanan</surname>
            ,
            <given-names>L. V. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Tree pattern query minimization</article-title>
          .,
          <source>VLDB Journal</source>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ),
          <year>2002</year>
          ,
          <volume>315</volume>
          {
          <fpage>331</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Libkin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>XML Data Exchange: Consistency and Query Answering</article-title>
          , PODS Conference,
          <year>2005</year>
          ,
          <volume>13</volume>
          {
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Buneman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davidson</surname>
            ,
            <given-names>S. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hara</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W. C.</given-names>
          </string-name>
          :
          <article-title>Reasoning about keys for XML, Information Systems</article-title>
          ,
          <volume>28</volume>
          (
          <issue>8</issue>
          ),
          <year>2003</year>
          ,
          <volume>1037</volume>
          {
          <fpage>1063</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giacomo</surname>
            ,
            <given-names>G. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>Logical Foundations of Peer-To-Peer Data Integration.</article-title>
          ,
          <source>Proc. of the 23rd ACM SIGMOD Symposium on Principles of Database Systems (PODS</source>
          <year>2004</year>
          ),
          <year>2004</year>
          ,
          <volume>241</volume>
          {
          <fpage>251</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cybulka</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meissner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pankowski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Schema-</article-title>
          and
          <string-name>
            <surname>Ontology-Based XML Data Exchange in Semantic E-Business</surname>
            <given-names>Applications</given-names>
          </string-name>
          ,
          <source>Business Information Systems, BIS 2006, Lecture Notes in Informatics</source>
          , Vol.
          <volume>85</volume>
          ,
          <year>2006</year>
          ,
          <volume>429</volume>
          {
          <fpage>441</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach</article-title>
          ,
          <source>ACM SIGMOD</source>
          <year>2001</year>
          , ACM,
          <year>2001</year>
          ,
          <volume>509</volume>
          {
          <fpage>520</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          :
          <article-title>Introduction to the Special Issue on Semantic Integration</article-title>
          .,
          <source>SIGMOD Record</source>
          ,
          <volume>33</volume>
          (
          <issue>4</issue>
          ),
          <year>2004</year>
          ,
          <volume>11</volume>
          {
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fagin</surname>
          </string-name>
          , R.:
          <article-title>Inverting schema mappings</article-title>
          ., PODS Conference,
          <year>2006</year>
          ,
          <volume>50</volume>
          {
          <fpage>59</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data Exchange: Semantics and Query Answering</article-title>
          .,
          <source>ICDT 2003 , Lecture Notes in Computer Science 2572</source>
          , Springer,
          <year>2002</year>
          ,
          <volume>207</volume>
          {
          <fpage>224</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data exchange: getting to the core</article-title>
          .,
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <year>2005</year>
          ,
          <volume>174</volume>
          {
          <fpage>210</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>