<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Three Layer Evolution Model for XML Stored in Relational Databases</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>St Petersburg State University</institution>
        </aff>
      </contrib-group>
      <fpage>66</fpage>
      <lpage>79</lpage>
      <abstract>
        <p>XML-relational systems with well de ned XML and relational schemas are widely used in industry. In the presence of rapidly changing requirements both schemas of such a model need continuous evolution, which can be performed by a formal framework that describes the evolution of the mutually dependent schema pairs of the system as sequences of elementary schema transformations. There are a number of common tasks, like relational schema performance tuning, XML syntax clean-up, etc that stand apart from changes of semantics of the schemas. We discuss a framework with three layers of operations: the changes in XML schema only, the semantic changes that affect both schemas, and the changes that affect relational schema only. We show how the layered framework allows to formalize and address the above mentioned tasks. We consider the formal quality of the solutions of the tasks that may be achieved inside the framework.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>A practice of storing XML data in relational databases is widely employed because
of the capabilities of the highly-developed RDBMS technologies. Provided that XML
documents comply with a particular schema designed for the application domain, a
more ef cient utilization of these technologies can be achieved. In that case a
relational database can use the metadata to organize effective storage, which includes
selection of appropriate relational schema. On the other hand, if a choice between possible
XML document schemas exists, the schema that is best t for storing data in relational
database should be chosen. Thereby these XML and relational schemas become
mutually dependent.</p>
      <p>If the product that uses the schemas exists in the environment of rapidly changing
requirements, both schemas need to exhibit agile evolution in time. A major part of
software products storing XML in relational databases are actually in this situation because
they often need to change schemas with every new version of the product.
Continuous redesign requires repetitive evolution of data access and data storage algorithms. A
formal framework that presents schema changes as a sequence of elementary
transformations and ensures invariance of selected properties of the schema and data complying
with the schema is an alternative to ad-hoc solutions. Such a model is referred to as a
schema evolution model.</p>
      <p>Schema changes may be caused by changes in functionality of the product, by
system performance requirements, compatibility issues, and other reasons. While it is
preferable that performance problems are resolved on the relational schema level,
functional requirements in the presence of compatibility issues are better expressed in terms
of changes in XML schema only.</p>
      <p>
        General XML document evolution models (for example, those found in [
        <xref ref-type="bibr" rid="ref2 ref5">5,2</xref>
        ], etc)
work with XML schema only and cannot express the above mentioned problems.
Figure 11 demonstrates the application of a general XML document evolution model. DTD
denotes an original XML schema; DTD' is a new XML schema; RS is an original
relational schema; RS' is a new relational schema. The relational model changes can
be expressed only as a result of XML schema evolution; the effects of XML schema
changes on relational schema are not incorporated into the model. The following tasks
cannot be solved by a general evolution model:
1. Apply changes to the XML schema (e.g. to incorporate XML syntax re nement, or
use of schema derivation like [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) that would not require changing the underlying
relational schema (compatibility).
2. Ensure that semantic constraints are present in relational schema (to allow XML to
SQL queries conversion rather than obtaining XML document) and can be queried
directly through SQL rather than XML (functionality).
3. Apply changes to the relational schema (e.g., to tune performance) that would not
require XML schema changes (performance).
      </p>
      <p>We also add another task, richness: the changes available in the model should allow
to transform a given XML schema complying with a given set of semantic constraints
to any XML schema complying with the same constraints. This task is usually solved
by general XML evolution models. Resolving it in a speci c XML-relational evolution
model ensures that it is not less expressive about XML schema, than general XML
evolution model.</p>
      <p>We present an evolution model that explicitly contains the knowledge of the mutual
dependency of XML and relational schemas. The outline of the model is given on
Figure 2. S and S' are original and new states of intermediate layer that represents common
1 We use DTD as XML document schema description throughout the rest of the paper, though
most of the reasoning applies to other schema descriptions as well
part of the schemas; intermediate layer contains semantic constraints and is represented
in both schemas. Elementary operations of type 2 affect the intermediate layer;
operations of type 1 affect only DTD; operations of type 3 affect only relational schema.
In this paper we show how the above tasks can be solved in the model by considering
the sequences of operations of the 1st, 2nd and 3d types respectively. We demonstrate
that operations of the 1st and 2nd types are rich enough to solve the 4th task. We also
discuss the quality of the answers to these issues.</p>
      <p>The rest of the paper is organized as follows. Section 2 gives an overview of related
work. Section 3 gives a brief description of the XML-relational evolution model we
employ. The subsequent sections contain operations de nitions and discuss how the
operations address the listed above tasks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        The issue of storing XML documents in relational databases is a well explored area
of research. The main trends are structure-driven approach, for example, [
        <xref ref-type="bibr" rid="ref10 ref11">10,11</xref>
        ], and
model-driven approach, for example, edge approach from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The
structuredriven algorithms from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] generate a xed relational schema from a given XML DTD
using either Shared or Hybrid techniques. The generation algorithms of these
techniques construct a DTD graph similar to the one discussed in the paper. The XML
document order and regular expressions information is lost in the conversion. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
considers the addition of order information to the schemas generated using the Hybrid
algorithm from [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], while [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] explores possibilities of utilizing functional
dependencies information in nal schema generation. However, all these works do not consider
the effects of schema evolution over the algorithms.
      </p>
      <p>
        Extensive research is done recently in the eld of effective querying XML stored in
relational database. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] suggests a general solution for generating queries for arbitrary
XML schemas. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] discusses the use of the knowledge of the XML to SQL mapping
algorithm to generate ef cient SQL queries from XML queries.
DTD:
&lt;!ELEMENT book (booktitle, author)&gt;
&lt;!ELEMENT article(title,author*,contactauthor)&gt;
&lt;!ELEMENT contactauthor authorID IDREF IMPLIED&gt;
&lt;!ELEMENT monograph (title,author,editor?)&gt;
&lt;!ELEMENT editor(monograph*)&gt;
&lt;!ELEMENT editor name CDATA #REQUIRED&gt;
&lt;!ELEMENT author (name,address)&gt;
&lt;!ELEMENT name( rstname?,lastname)&gt;
&lt;!ELEMENT title ANY&gt;
Sample document:
&lt;monograph&gt;
&lt;title&gt;Temporal data &amp; relational model&lt;/title&gt;
&lt;author&gt;
&lt;name&gt;
      </p>
      <p>&lt;lastname&gt;Date&lt;/lastname&gt;
&lt;/name&gt;
&lt;address&gt;NA&lt;/address&gt;
&lt;/author&gt;
&lt;editor name=Homet&gt;
&lt;/editor&gt;
&lt;/monograph&gt;</p>
      <p>
        Various database schema evolution models were proposed for object-oriented [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
and relational databases [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Recently several works concerning XML appeared ([
        <xref ref-type="bibr" rid="ref2 ref5">5,2</xref>
        ]).
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] concentrates on versioning and schema derivations approach rather than schema
modi cation. It deals with arbitrary XML schemas. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] introduces an invariant-based
approach to de ning XML schema evolution. While these approaches may work well
for XML data they cannot be mapped directly to a XML-relational system due to
inability to express issues named compatibility, functionality, and performance as was shown
in Section 1. On the other hand, [
        <xref ref-type="bibr" rid="ref2 ref7 ref8">7,8,2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] satisfy richness requirement, where the
latter two do that for XML. [
        <xref ref-type="bibr" rid="ref7 ref8">7,8</xref>
        ] can express semantic constraints in a way similar to
proposed model. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] expresses semantic constraints for cycle-free XML schemas.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Evolution model overview</title>
      <p>
        In consequent discussion we use as a sample a XML document schema taken from [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
It is presented on Figure 3.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Schema S</title>
        <p>The essential part of the evolution model is the intermediate schema S. S has a xed
mapping from DTD. Figure 4 shows the schema S for the sample DTD. Dashed
rectangles are used to outline the factorized vertices. Filled circles mark the vertices that are
NB = fbooktitleg
ybooktitle
yauthorID
iarticle
ibook</p>
        <p>HHHjH
?
icontactauthor</p>
        <sec id="sec-3-1-1">
          <title>XHXHXHjXHXXXzX</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>XXXXXXXzX</title>
          <p>yid
yaddress</p>
          <p>NN = ff irstname; lastnameg
ylastname
imonograph</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>XH*XHXHjXH9XXXzX9 *</title>
          <p>?ieditor ytitle iauthor</p>
          <p>NM = fg
9?
iname</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>XXXXXXXzX</title>
          <p>?
y rstname
included into attribute sets of the factorized vertices. Factorized vertices not shown on
the gure: A (NA = fauthorI Dg), U (NU = fid; addressg), T (NT = ftitleg).</p>
          <p>
            First, the DTD graph is built. The DTD is simpli ed by replacing attributes with
leaf tags, inlining entities, replacing regular expressions in tag de nitions. After that the
tags become the vertices of the DTD graph. If one tag contains another an edge exists
between corresponding DTD graph vertices. Each vertex is labeled with the name of
the tag. An edge may be optionally marked by an asterisk (*). In that case it is a
staredge. Figure 4 contains a sample DTD graph. A formal de nition of the DTD graph
and the process of obtaining it from the DTD can be found in [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
          </p>
          <p>Figure 4 also shows a factorization of the DTD graph. Each factorized vertex, i.e., a
vertex of the factorized DTD graph2 has its own start-vertex. Each vertex in a factorized
vertex can be reached by a single path from its start-vertex. The path does not contain
star-edges.</p>
          <p>Some of the leaf vertices of the DTD graph are attributes of the factorized vertex,
which contains them. The choice of the attributes is up to the schema designer with
a condition that it does not violate the semantic invariants of the schema S (semantic
constraints are explained below). Each factorized vertex can contain another auxiliary
attribute. Its value denotes element types of the beginnings of the edges that start from
the vertices of the given factorized vertex and end in the start-vertices of other factorized
vertices. For example, book, monograph or article are the values of the auxiliary
attribute for the factorized vertex U. The auxiliary attribute is needed for add/remove
upper edge operation (see Xml-only operations) only and can be omitted if the operation
is not used. Thereinafter we omit explicit mention of this attribute.</p>
          <p>
            The factorized DTD graph maps original DTD into intermediate schema S. [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]
contains a formal description of the schema S construction process.
          </p>
          <p>The mapping of the relational schema to S is represented by a set of views. Each
view is a projection of a composition of equi-joins: Vi = Fi(X1; X2; ::; XN ), where
Xi is the relation of the schema. In the naive case Vi = V i(Xi). Figure 5 represents
a fragment of the script that generates a relational schema for the sample DTD in the
2 The same term is used to refer to the fragment of the original graph that includes vertices
comprising the factorized vertex and edges between them
CREATE TABLE U ( id INT NOT NULL, address VARCHAR(50) );
CREATE TABLE N ( id INT NOT NULL, rstname VARCHAR(50), lastname VARCHAR(50)
);
CREATE TABLE T ( id INT NOT NULL, title VARCHAR(50) );
CREATE TABLE B ( id INT NOT NULL, booktitle VARCHAR(50), id U INT NOT NULL );
CREATE TABLE A ( id INT NOT NULL, authorId VARCHAR(50), id N INT NOT NULL );
CREATE TABLE M ( id INT NOT NULL, id U INT NOT NULL, id N INT NOT NULL, id T
INT NOT NULL, id M INT NOT NULL);
ALTER TABLE B ADD CONTRAINT fk U FOREIGN KEY ( id U ) REFERENCES U (id);
...
1. Closure: 8t 2 V 0 Pe(t); De(t) V 0
2. Rootedness: 9T 2 V 0 8t 2 V 0 T 2 P L(t) ^ Pe(T ) = ;
3. Pointedness: 9 ? V 0 8t 2 V 0 t 2 P L(?)
4. Immediate predecessors:
8t 2 V 0 P (t) = Pe(t) [ x(P L(x) \ Pe(t) \ (P L(t) DL(t))
[ x(P L(x)\Pe(t)\(P L(t)\DL(t)) fxg; Pe(t)) [ x((P L(x)</p>
          <p>DL(t)) \ Pe(t); P L(x) \ DL(x) fxg)
5. Predecessors graph: 8t 2 V 0 P L(t) = [ x(P L(x); P (t)) [ ftg
6. Immediate descendants:
8t 2 V 0 D(t) = De(t) [ x(DL(x) \ De(t) \ (DL(t) P L(t))
[ x(DL(x)\De(t)\(DL(t)\P L(t)) fxg; De(t)) [ x((DL(x)</p>
          <p>P L(t)) \ De(t); DL(x) \ P L(x) fxg)
7. Descendants graph: 8t 2 V 0 DL(t) = [ x(DL(x); D(t)) [ ftg
8. Interface: 8t 2 V 0 I(t) = N (t) [ H(t)
9. Nativeness: 8t 2 V 0 N (t) = Ne(t) H(t)
10. Inheritance: 8t 2 V 0 H(t) = [ x(I(x); P (t))
fxg; Pe(t))
P (x))\(P L(t)</p>
          <p>fxg; De(t))</p>
          <p>D(x))\(DL(t)
naive case. Edges are represented by integrity constraints on the view representing the
end vertex.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Semantic constraints</title>
        <p>
          The semantic invariants for the evolution of S are de ned as a set of axioms similar
to the axiom sets used for the type lattices in OO databases in [
          <xref ref-type="bibr" rid="ref7 ref8">7,8</xref>
          ]. The axioms are
expressed as equations between the evolution model sets. Figure 6 contains the axioms.
The principal difference as compared to the evolution model sets used for type lattices
is that the absence of cyclic dependencies between sets is not required. Instead, a more
complicated lattice over a product of vertices and information about the calculated sets
of the model is used. An iterative algorithm ([
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]) that calculates the evolution model
sets on the complicated lattice is guaranteed to stop and return always the same results
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>The evolution model sets are de ned as follows. The hierarchy on the factorized
vertices is de ned by the sets of immediate predecessors P (t) for each vertex t.
Immediate predecessors are vertices that explicitly include vertex t. Essential predecessors
Pe(t) are explicitly speci ed by the schema designer sets of vertices. It is required that
P (t) Pe(t). Vertex subhierarchy P L(t) is a sub-graph of a factorized DTD graph,
which vertex set consists of vertices, from which t is reachable. Native attributes N (t)
are a set of attributes de ned in vertex t. Inherited attributes H (t) are the union of
attributes of all its predecessors. Essential attributes Ne(t) are explicitly speci ed. It
is required that N (t) Ne(t). Interface I (t) is the union of its inherited and native
attributes. The reverse hierarchy is de ned by the sets of immediate descendants D(t)
for each vertex t. Immediate descendants are vertices that explicitly are included into
vertex t. Vertex reverse subhierarchy DL(t) is a sub-graph of a factorized DTD graph,
which vertex set consists of vertices reachable from t. Essential descendants De(t) are
explicitly speci ed sets of vertices. It is required that D(t) De(t).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Elementary operations taxonomy</title>
      <p>In the following sections the operations of the evolution model are discussed. Each
operation over the schemas belongs to one of the following classes:
 Xml-only operations (represented by the arrow marked with 1 on Figure 2)  the
operations that are applied to and affect only XML schema.
 S-operations (represented by the arrow marked with 2 on Figure 2)  the
operations over the common part.
 Relational-only operations (represented by the arrow marked with 3 on Figure 2)
 the operations that are applied to and affect only relational schema.
4.1</p>
      <sec id="sec-4-1">
        <title>Xml-only operations de nition</title>
        <p>Xml-only operations are divided into two groups: a) affecting DTD graph and b) not
affecting DTD graph.</p>
        <p>The operations of type 1a, i.e., those affecting DTD graph, are the following:
 move the vertex up/down; replaces a pair of elements &lt;!ELEM EN T A(B; ::) &gt;
and &lt;!ELEM EN T B(C; ::) &gt; from one factorized vertex with a pair
&lt;!ELEM EN T A(B; C; ::) &gt; and &lt;!ELEM EN T B(::) &gt; where B is not an
attribute3, and vice versa (see Figure 7a);
 add/remove leaf vertex; replaces an element &lt;!ELEM EN T A(::) &gt; where A is
not an attribute with a pair &lt;!ELEM EN T A(B) &gt; and &lt;!ELEM EN T B &gt;
where new element B is not an attribute, and vice versa (see Figure 7b);
3 of a factorized vertex of the model S; the same applies to other uses of attribute in the
operations de nitions
gA
gB</p>
        <p>gC
gA
?
(*)gB
(*)
gC
gC
gB
(*)
gA
?
gB
gB
gA
gA
?
gB
(*)
gC
b)
d)</p>
        <p>gA
gB</p>
        <p>gA
(*) g?B
gC</p>
        <p> add/remove upper edge; replaces a pair of elements &lt;!ELEM EN T A(B; C ; ::) &gt;
and &lt;!ELEM EN T B(C ; ::) &gt; where C is a start vertex and both stars are
optional with a pair &lt;!ELEM EN T A(B; ::) &gt; and &lt;!ELEM EN T B(C ; ::) &gt;,
and vice versa (see Figure 7c)4;
 move edge; replaces a pair of elements &lt;!ELEM EN T A(B; ::) &gt; and
&lt;!ELEM EN T B(C ; ::) &gt; where C is a start vertex and a star after C is optional
with a pair &lt;!ELEM EN T A(B; C ; ::) &gt; and &lt;!ELEM EN T B(::) &gt; where B
is not an attribute, and vice versa (see Figure 7d);
 mark/unmark edge; replaces a an element &lt;!ELEM EN T A(C ; ::) &gt; with an
element &lt;!ELEM EN T A(C; ::) &gt; if exists element &lt;!ELEM EN T B(C ; ::) &gt;
from another factorized vertex with an optional star after C, and vice versa (see
Figure 7e).
In this section it will be shown that it is possible to convert a given DTD that maps to
a schema S into any arbitrary DTD that maps into the same schema S by applying a
sequence of Xml-only operations. This would mean that the solution provided by the
model is a complete solution: the model allows to obtain any XML with the given
intermediate schema S, any other changes would necessarily lead to changes in relational
schema5.</p>
        <p>A fragment of the DTD graph contained in one factorized vertex is a tree, e.g. name,
firstname and lastname and edges between them from the sample DTD (see
4 Note that semantics are not violated if the auxiliary attribute that was mentioned in Section 3
is present
5 Consequent changes in relational schema may compensate each other, but the model allows to
obtain the same resulting schema without changing relational schema at all.
+
gN</p>
        <p>- g
PPPqP
g
...</p>
        <p>g
+ AQsQ
gN A g</p>
        <p>A
UA
g
...
a)
b)</p>
        <p>g
+ AQsQ
gN A g</p>
        <p>A
UA
g
...</p>
        <p>c)
gN
g
Q
A sQ
A g</p>
        <p>A
UA
g
...</p>
        <p>
          Statement 4.1. The move vertex up/down operation allows that any given set of vertices
of a factorized vertex with a xed start-vertex can be organized into any tree.
The proof can be done by induction by the number of vertices in a factorized vertex.
Provided that vertices are organized into some tree we try to evolve to the target tree.
Figure 8 shows the operations sequence. The full proof of this and subsequent
statements of this subsection can be found in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>Statement 4.2. The move vertex up/down and add/remove leaf vertex allow to organize
vertices of a factorized vertex with the given start-vertex and attribute set into any tree6.
Statement 4.3. Add/remove upper edge, move edge, and mark/unmark edge allow to
obtain any set of outgoing edges that connect the vertices of a given factorized vertex with
a xed set of start-vertices.</p>
        <p>The above three statements have the following obvious corollary, which forms the
next statement.</p>
        <p>Statement 4.4. The operations of the type 1a enable to transform a given DTD graph to
any other DTD graph that maps to the same schema S.</p>
        <p>Statement 4 implies that, as soon as there is a list of operations of type 1b that allow
to transform a DTD having a given DTD graph to any other DTD with the same DTD
graph, are available the compatibility task can be completely solved within the model.
The following operations compose the necessary operations of the type 1b list:
 inline/de-inline entity;
 convert leaf tag to attribute and vice versa;
 simplify/de-simplify regular expression;
 change processing instruction node;
 add/remove comment or namespace node.
6 Note that add/remove leaf vertex operation alone is not enough because of the presence of
attributes
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>S-operations</title>
        <p>
          There are eight operations de ned for the S-model. They are similar to operations used
in OO databases evolution model in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Two additional operations, merge and split
vertex, are added; they are not found in OO prototypes. Here these operations are expressed
in terms of the schema S only due to the lack of space. An example of their application
in the sample is given. More details on S-operations can be found in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>AddAttribute. Attribute a (the one being added) of vertex t is included into Ne(t).
The sets Ne, N , H are recalculated for the vertex t and vertices reachable from t.
Schema designer (or an external algorithm) may include this attribute into Ne sets of
some successors of vertex t.</p>
        <p>RemoveAttribute. Attribute a (the one being deleted) of vertex t is excluded from
Ne(t), the sets Ne , N , H are recalculated for the vertex t and vertices reachable from t.
Note, if t is in P (s) or a is in Ne(s), according to the axiom set, attribute a is included
into N (s).</p>
        <p>AddEdge. Let edge (t; s) is added. s is added into Pe(t) and t is added into De(s).
Expressions dependent on Pe(t) and De(s) are recalculated. Note, that s is added into
P (t) if there is no other path from s to t (same for D(s)).</p>
        <p>RemoveEdge. This operation is complex, it may cause generation of new edges in
the schema in order to keep from violating the axiom set. New edges will start in
predecessors of the end vertex of a deleted edge and will end in its successors. Provided that
we remove edge (t; s), s is deleted from Pe(t). All expressions dependent on Pe(t) are
recalculated. If axioms are violated then the operation is rejected by the system.
(Alternatively, the vertex t may be included into root vertex T , the change can be achieved in
two stages: adding T into Pe(t), RemoveEdge for the edge (s; t), and if s is in Pe(t)
s is added into the graph. The same may apply to vertex s and the stop vertex of the
hierarchy). Analogous operations are performed with De(s).</p>
        <p>AddVertex. Provided vertex t is added. t is included into hierarchy. The sets Pe(t)
and De(s) are to be de ned by schema designer (or an external algorithm) to
generate incoming edges of t. t is added into Pe(s) to satisfy axioms. In the simplest case,
Pe(t) = T .</p>
        <p>RemoveVertex. It is a complex modi cation. First, the vertex t is to be removed with
the edges starting and ending in it. Second, a number of edges from its predecessors to
its successors may be added. Third, attributes of vertex t, that are essential for its
successors should migrate properly. The operation is implemented in three stages: applying
RemoveAttribute to all attributes of t, applying RemoveEdge to all outgoing edges, and
applying RemoveEdge to all incoming edges.</p>
        <p>MergeVertex. Vertices being merged must be connected by an edge. Merged
vertex Pe , De and Ne sets are unions of Pe , De and Ne sets of vertices being merged
with exclusion of themselves. In Pe sets of reachable and De set of reaching vertices
occurrences of the vertices being merged are replaced with occurrence of the merged
one.</p>
        <p>SplitVertex. Pe , De and Ne sets of split vertex are separated into two disjoint sets
each. In Pe and Ne sets of reachable vertices occurrences of split vertex are replaced
with one or both of the created without violating inclusion of P (t) in Pe(t) and D(t) in
De(t) properties.
wid
create table U(id, address)
&lt;element author(id,address,.)&gt;
gauthor
?
waddress
alter table add(postalCode,town)
&lt;element authorAddress(address,
postalCode,town,.)&gt;
wid
b)
waddress
&lt;element author(id,authorAddress,.)&gt;
&lt;element authorAddress(address,.)&gt;
gauthor
@
R@gauthorAddress
alter table drop address
&lt;element authorAddress(postalCode,town,.)&gt;
wid
d)</p>
        <p>?
w wtown</p>
        <p>postalCode</p>
        <p>In the sample we consider a revision of author address information. First, the
authoraddress tag is added through the add leaf vertex Xml-only operation. Next, the move
vertex down Xml-only operation is used. Then attributes with postal code, town and
local address are added, and the old attribute with address information is removed.
Changes in schemas in the process of transformation are shown on Figure 9.
The functionality task obviously has a complete solution, which is enforced by the
design of the model. That is, since the schema S is expressed in both, relational and XML,
terms and only schema S contains semantics, necessarily, all semantics is contained in
well de ned way in relational schema and can be queried directly through SQL.</p>
        <p>The complete solution for the richness task means that any XML schema satisfying
given semantic requirements may be obtained in the model. That enables solutions of
schema transformation and other common problems inside the model.
Statement 4.5. The operations of the 2nd type allow to transform any given schema S to
any other schema S having the same Pe, De and Ne sets.</p>
        <p>
          It is trivial corollary of completeness of the schema S model. The proof of the latter can
be found in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>Statement 5 enables us to provide a complete solution for the richness task. Indeed,
an arbitrary XML schema complying with a given set of semantic constraints can be
obtained from the given one complying with the constraints by consequent application
of operations of the 2nd and 1st types. Statement 5 ensures that the schema S of the new
a) CREATE TABLE U(id INT NOT NULL, postalCode NUMBER, town VARCHAR(50));
Vu := U
b) CREATE TABLE U1(postalCode NUMBER);
INSERT INTO U1 (SELECT postalCode from Vu GROUP BY postalCode);
Vu := SELECT id, postalCode, town, localAddress FROM U JOIN U1 ON U.postalCode =
U1.postalCode
c) ALTER TABLE U1 ADD(town VARCHAR(50));
UPDATE U1 SET U1.town = U.town FROM Vu WHERE U1.postalCode = U.postalCode;
Vu := SELECT id, postalCode, town, localAddress FROM U JOIN U1 ON U.postalCode =
U1.postalCode AND U.town = U1.town
d) ALTER TABLE U DROP town;
Vu := SELECT id, postalCode, town, localAddress FROM U JOIN U1 ON U.postalCode =
U1.postalCode</p>
        <p>XML schema is obtained. The solution of compatibility problem enables to transform
the intermediate result into the target XML schema.
The relational-only operations consist of one operation: remove/add duplicate attribute
in another relation. It replaces a pair of relations X1(A; ::) and X2(A; ::) with a pair
of relations X1(::) (X1 may be dropped if empty; in the reverse operation X1 may be
created) and X2(A; ::) where A is an attribute of one of the factorized vertices, and vice
versa. The operation also affects the formulae of the mapping to schema S. Namely, the
equi-joins of X1 and X2 (if any) used in the de nitions of Vi receive additional attribute
A to the joined attributes set. Thus, the operation changes but does not break the form
of the formulae that de ne the views.</p>
        <p>In the sample we normalize the relations supposing that a town can be derived from
a postal code. First, the postal code is duplicated in a new relation U1. Then the town
information is moved from the relation U, which represents author information, to U1.
Changes in S and relational schema in the process of transformation are shown on
Figure 10.
4.6</p>
      </sec>
      <sec id="sec-4-3">
        <title>Performance task solution</title>
        <p>The complete solution of the performance task ensures that we are able to perform
arbitrary relational schema transformations as far as we do not need to change the XML
schema. This task is similar to compatibility task with the schemas exchanging places.
The following statement ensures that performance task is completely solved.
Statement 4.6. Operations of the 3d type allow to transform a set of relations with the
given sets of attributes and views values to any other set of relations with the same set
of attributes and views values, if each attribute is used in the de nition of at least one
view.</p>
        <p>
          Proof. Provided we have an initial set of relations we try to evolve this set to a target
set of relations. Each relation includes a set of attributes as its header (which is part of
its relational type [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]). When an operation is applied one of the relation headers loses
or obtains an attribute (namely, the header of X1 in the notation of the de nition). It
is obvious that starting from the original set of relations we can obtain a relations set
which relations will have the same headers as the headers of the target set. We will show
that in case that the headers sets are identical the relation sets will be identical as well.
We apply induction by the number of attributes. Note that the proof will disregard
operations  we only show that constraints in the form of view values and relation headers
de ne the values of the relations.
        </p>
        <p>If we have one attribute (the same works for none attributes but the case of none
attributes can be omitted) there is at least one view with the header that includes all (that
is, one) attributes. All relations are necessarily projections of this view and since the
view value does not change, the relations have the same values.</p>
        <p>Now suppose we have N attributes in relations. Consider the selections on a xed value
of Nth attribute of all relations and views. In the formulae that de ne the views selection
commutes7 with other operations and the formulae may be transformed to make them
have the form as required by the statement. The induction proposition can be applied to
these selections. Since the selections on every value of the Nth attribute are equal, the
original relations are equal (we use the fact that the real attribute domains that are really
used are all nite).</p>
        <p>We have shown that relation headers set determines the relations values, on the other
hand, target schema headers set can be obtained by the operations of the 3rd type. That
means that the target relations set can be obtained, QED.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We considered evolution model as a way to design XML-relational systems in the
environment with rapidly changing requirements. We de ned compatibility, functionality,
performance, and richness tasks that an XML-relational model should be able to solve.
XML-only schema evolution models are not expressive enough to address all these
tasks. The tasks need simultaneous consideration of both mutually dependent schemas.</p>
      <p>We have suggested an XML-relational evolution model that allows to perform
XMLor relational-only changes. We have shown that it ensures that semantic constraints can
be obtained from relational schema, consequently, the model does not require obtaining
an original XML document to evaluate an XML query. The model is rich enough to
evolve a schema to any given XML schema. Thus, the employed model, contrary to
XML-only models, is able to answer the stated requirements.</p>
      <p>
        Several directions for future research exist. Intermediate schemas of the employed
XML-relational model may be enriched with entities that will represent regular
expressions and namespaces of the XML schema, or indices of the relational schema. Another
7 The proof could avoid using this fact as well as the niteness of domains, but it would require
more space.
interesting direction of research is to modify the model to allow a larger range of
mappings from XML to S-schemas. For example, the model can be enlarged to allow an
edge-table approach [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to be employed for storing part of XML data.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chien-Tsai</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          et al:
          <article-title>Database schema evolution through the speci cation and maintenance of changes on entities and relationships</article-title>
          .
          <source>Entity-Relationship Approach - ER'94</source>
          ,
          <string-name>
            <given-names>Business</given-names>
            <surname>Modelling</surname>
          </string-name>
          and Re-Engineering, 13th International Conference on the
          <string-name>
            <surname>Entity-Relationship</surname>
            <given-names>Approach</given-names>
          </string-name>
          , Manchester,
          <string-name>
            <surname>U.K.</surname>
          </string-name>
          ,
          <source>December 13-16</source>
          ,
          <year>1994</year>
          ,
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          (
          <year>1994</year>
          )
          <volume>881</volume>
          :
          <fpage>132</fpage>
          
          <fpage>151</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Coox</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Axiomatization of the evolution of xml database schema</article-title>
          .
          <source>Programming</source>
          (
          <year>2003</year>
          )
          <volume>29</volume>
          (
          <issue>3</issue>
          ):
          <volume>140</volume>
          
          <fpage>146</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Date</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          et al:
          <article-title>Temporal data and relational model (</article-title>
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Florescu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kossman</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A performance evaluation of alternative mapping schemes for storing xml data in a relational database</article-title>
          .
          <source>technical report 3684</source>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kramer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rundensteiner</surname>
            ,
            <given-names>E. A.</given-names>
          </string-name>
          : Xem: Xml evolution management http://www.citeseer.ist.psu.edu/su02xem.html
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Krishnamurthy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaushik</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naughton</surname>
          </string-name>
          , J.:
          <article-title>Ef cient xml-to-sql query translation: Where to add the intelligence</article-title>
          ? http://www.vldb04.org/protected/ eProceedings/contents/pdf/RS4P3.PDF
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozsu</surname>
          </string-name>
          , M. T.:
          <article-title>Tigukat: a uniform behavioral objectbase management system</article-title>
          .
          <source>The VLDB Journal</source>
          (
          <year>1995</year>
          )
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <volume>445</volume>
          
          <fpage>492</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozsu</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          :
          <article-title>An axiomatic model of dynamic schema evolution in objectbase systems</article-title>
          .
          <source>ACM Transactions on Database Systems</source>
          (
          <year>1997</year>
          )
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <volume>75</volume>
          
          <fpage>114</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ling</surname>
          </string-name>
          , T. W.:
          <article-title>Extending and inferring functional dependencies in schema transformation</article-title>
          .
          <source>In CIKM '04: Proceedings of the Thirteenth ACM conference on Information and knowledge management</source>
          , ACM Press (
          <year>2004</year>
          )
          <volume>12</volume>
          
          <fpage>21</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Shanmugasundaram</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al:
          <article-title>Relational databases for querying xml documents: Limitations and opportunities</article-title>
          .
          <source>In Proc. VLDB Edinburgh</source>
          ,
          <string-name>
            <surname>Scotland</surname>
          </string-name>
          (
          <year>1999</year>
          )
          <volume>302</volume>
          
          <fpage>314</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Shanmugasundaram</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al:
          <article-title>A general technique for querying xml documents using a relational database system</article-title>
          .
          <source>SIGMOD Record</source>
          (
          <year>2001</year>
          )
          <article-title>(3</article-title>
          ):
          <volume>20</volume>
          
          <fpage>26</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Simanovsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Evolution of schema of xml-documents stored in a relational database</article-title>
          .
          <source>In proceedings of 6th Baltic DBIS Conf</source>
          .
          <article-title>(</article-title>
          <year>2004</year>
          )
          <volume>192</volume>
          
          <fpage>204</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Simanovsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Simultaneous evolution of mutually dependent xml and relational schemas</article-title>
          .
          <source>In Proc. SYRCoDIS St Petersburg</source>
          ,
          <string-name>
            <surname>Russia</surname>
          </string-name>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Tatarinov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          et al:
          <article-title>Storing and querying ordered XML using a relational database system (</article-title>
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          et al:
          <article-title>Mapping xml documents to relations in the presence of functional dependencies</article-title>
          .
          <source>Journal of Software</source>
          (
          <year>2003</year>
          )
          <article-title>(7</article-title>
          ):
          <volume>1275</volume>
          
          <fpage>1281</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Yamamoto</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al:
          <article-title>Formalization of graph search algorithms and its applications</article-title>
          .
          <source>In proceedings of Theorem Proving in Higher Order Logics (TPHOLs'98)</source>
          , LNCS, SpringerVerlag (
          <year>1998</year>
          )
          <volume>1479</volume>
          :
          <fpage>479</fpage>
          
          <fpage>496</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Yoshikawa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amagasa</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Xrel: a path-based approach to storage and retrieval of xml documents using relational databases</article-title>
          .
          <source>ACM Transactions on Internet Technology</source>
          (
          <year>2001</year>
          )
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <volume>110</volume>
          
          <fpage>141</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>