<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>XXMMLL DDooccuummeenntt VVeerrssiioonniinngg aanndd RReevvaalliiddaattiioonn??</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jakub Maly´</string-name>
          <email>l@icksi.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub Kl´ımek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irena Mly´nkov´a</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Neˇcasky´ Jakub Maly</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub Kl mek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irena Mlynkova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Necasky</string-name>
          <email>necaskyg@ksi.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ItemTester @ tester</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Product @ code</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>XML Research Group, Department of Software Engineering FacuXlMtyLofRMesaetahrechmaGtricosupa,ndDePphayrstimcse,nCthoafrSleosftUwnariveeErsnitgyinienerPinraggue Faculty of MMatahleomstaratincska ́ennda ́Pmhˇeysts ́ıic2s5,</institution>
          ,
          <addr-line>C1h1a8rl0e0s PUrnaihvears1ity in Prague MalostransTkheenCamzeecsht R2e5p,u1b1l8ic00 Praha 1</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <fpage>49</fpage>
      <lpage>60</lpage>
      <abstract>
        <p>One of the prominent characteristics of XML applications is their dynamic nature. When a system grows and evolves, old user requirements change and/or new requirements accumulate. Apart from changes in the interface, it is also necessary to modify the existing documents with each new version, so they are valid against the new specification. The approach presented in this work extends an existing conceptual model with the support for multiple versions of the model. Thanks to this extension, it is possible to define a set of changes between two versions of a schema. This work contains an outline of an algorithm that compares two versions of a schema and produces a revalidation script in XSL.</p>
      </abstract>
      <kwd-group>
        <kwd>XML</kwd>
        <kwd>schema</kwd>
        <kwd>schema evolution</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Recently, XML [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] has become a corner stone of many information systems. It
is a de facto standard for data exchange (i.e. Web services [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) and it is also a
popular data model in databases [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. XML schemas are utilized in two scenarios:
1) to describe structure and check validity of internal documents, and 2) to
define the interface of a component of the system for other components or of the
system to the outer world.
      </p>
      <p>Requirements change during the life cycle of the system and so do the
XML schemas. Without any tools to help, the old and new schema need to
be examined by a domain expert. Each change must be identified, analyzed and
all the relevant parts of the system modified and the existing documents
updated. This can be a time-consuming and error-prone process, but, in fact, a
significant portion of the operations could be performed automatically.</p>
      <p>The existing approaches work directly at the level of XML schemas, which
leads to problems with recognizing the semantics of each change. Our approach
utilizes the conceptual model for understanding the semantics. Also, most of
the existing approaches require the user to either revalidate the documents after
? This work was supported in part by the Czech Science Foundation (GA CˇR), grant
number P202/10/0573, and by the grant SVV-2011-263312.
every performed evolution operation, other require at least all the operations
to be performed in one session and directly modify the old version (effectively
replacing it with the new version). Our approach imposes no limits on the amount
of operations performed between the versions and the amount of versions present
in the systems.</p>
      <p>The main aim of this paper is to simplify the whole process of system
evolution and make the transition to the new version semi-automatic using our
conceptual XML modeling framework with the support for multiple versions. We
carry on by formally defining possible changes between versions. When changes
are detected, we use this information to decide, whether the revalidation is
necessary and if it is the case, we generate a script that revalidates existing XML
documents against the new version.
1.1</p>
      <sec id="sec-1-1">
        <title>Outline</title>
        <p>The rest of this paper is organized as follows: in Section 2, we analyze the existing
approaches. In Section 3, we introduce the XSEM conceptual model. In Section
4, we extend the model with versioning support. In Section 5, we formally define
changes between versions. In Section 6, we outline how a revalidation script is
generated. Section 7 concludes and lists the open problems and areas of future
research.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Approaches can be categorized on the basis of the level where the changes are
made by the designer and detected by the algorithm as suggested in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]:
– in XML schemas (in one of the XML schema languages), so-called logical
level. At the logical level, changes are detected directly in the evolved schema.
– in diagrams visualizing the XML schema (diagrams directly visualizing
constructs of a specific XML schema language).
– in a UML diagram used to model an XML format (usually annotated using
comments or stereotypes to specify the translation to an XML schema).
      </p>
      <p>
        X-Evolution, proposed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], is an example of a system built upon a graphical
editor for creating schemas in the XML Schema language [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. At first, the
authors defined a set of evolution primitives, i.e. basic operations that modify the
evolved schema (e.g. insert local element, remove type). Impact of each primitive
on the validity of the existing XML documents is examined. The presented Adapt
algorithm takes as an input an evolution primitive, an XML schema and an
XML document valid against the schema and returns an evolved document valid
against the schema on which the evolution primitive is applied.
      </p>
      <p>X-Evolution system provides the user with a narrow set of evolution
primitives that can be performed upon XML schemas of a certain format. There
are no links to the conceptual model which could be utilized when new content
needs to be added during document revalidation; thus, it is up to the user to
write the adaptation clauses or update the documents after revalidation. Some
very common real-world evolution operations are not considered in the
evolution primitives, e.g. moving content, adding a wrapping element for elements
or transforming attributes to subelements and vice versa. All algorithms and
operations expect a single evolution primitive as an input. And only a single
change is supported in each evolution cycle, evolution with multiple changes is
not elaborated.</p>
      <p>
        XML Evolution Management, XEM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is an approach to manage schema
evolution where DTD is used as a schema language. It deals both with changes
in DTD and XML documents (instance documents ).
      </p>
      <p>The proposed primitives are divided into two categories: 1) changes in DTD
2) changes in instance documents. It is proven that the proposed set of primitives
is complete, in the sense that any possible change in DTD can be expressed as a
sequence of application of the primitives. However, the method used in the proof
in many common cases degenerates in removing a large part of the tree and
creating it from scratch (e.g. when an element is renamed, it must be removed
and then added under new name; if the root element is removed, the whole XML
document is first deleted and its structure recreated again). In this process, the
structure is created properly by the algorithm, but the data is lost. Again, as in
the case of X-Evolution, the conceptual model is not considered and thus cannot
be utilized when new content is added in the instance documents.</p>
      <p>
        Unlike the previous approaches, CoDEX, Conceptual Design and Evolution
of XML schemas [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], allows the user to perform a sequence of evolution operation
before the documents are revalidated. Each user’s single action is recorded by
the logging component. When the user is satisfied with the new version of the
model, the recorded design steps are minimized and normalized. The proposed
conceptual model is closer to a visualization of the XML Schema language than
to a platform-independent model and the lack of a conceptual model is directly
responsible for problems declared as in-built (e.g. the algorithm can not detect
such a change where the same structural element acquires different semantics).
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>XSEM</title>
    </sec>
    <sec id="sec-4">
      <title>Conceptual Model</title>
      <p>Conceptual modeling is a top-down approach to system and data design, which
concentrates on correct depicting the problem domain – defining the particular
concepts comprising the system and relationships between these concepts.</p>
      <p>
        XSEM is an approach to modeling XML data using Model Driven
Architecture (MDA) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. A prototype implementation of XSEM in a CASE1 tool called
XCase [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is available. The model has two interconnected layers –
platformindependent model (PIM) which utilizes UML [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] class diagrams and
platformspecific model (PSM) which again utilizes UML class diagrams but extended
with so-called XSEM profile. In the rest of this paper, we will use the formal
definitions of PIM (Definition 1) and PSM (Definition 2) levels.
1 Computer-aided software engineering
3.1
      </p>
      <sec id="sec-4-1">
        <title>PIM Level</title>
        <p>
          A schema in a platform-independent model (PIM) models real-world concepts
and relationships between them without any details of their representation in a
specific data model (XML in our case). As a PIM, we use the classical model
of UML class diagrams in our work. This model is widely supported by the
majority of tools for data engineering and there exists the XMI [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] standard for
exchanging diagrams between them. Therefore, it is natural to use UML in our
approach as well. We introduce PIM schemas formally in Definition 1.
Definition 1. A platform-independent schema (PIM schema) is a triple S =
(Sc, Sa, Sr) of disjoint sets of classes, attributes, and associations, respectively.
In addition, PIM schema defines functions name, card, type, class and
participant:
– Class C ∈ Sc has a name name(C).
– Attribute A ∈ Sa has a name, data type and cardinality name(A), type(A)
and card(A), respectively. A is associated with a class from Sc class(A).
– Association R ∈ Sr is a set R = {E1, E2} where E1 and E2 are called
association ends of R. R has a name name(R). E ∈ R has a cardinality card(E)
and is associated with a class from Sc participant(E). For R, participant(E1)
and participant(E2) are called participants of R.
        </p>
        <p>For simplicity, we will also use the notation attributes (C ) to denote the set
{A ∈ Sa : class(A) = C} and associations (C ) to denote the set {R ∈ Sr : (∃E ∈
R)(participant(E) = C)} for each class C ∈ Sc.</p>
        <p>PIM schema components have a usual semantics. A class models a real-world
concept. An attribute of that class models a property of the concept. And, an
association models a kind of relationships between two concepts modeled by the
connected classes. We display each PIM schema as a UML class diagram.
Example 1. Figure 1 shows a sample PIM in XSEM modeling a component
accepting purchases. Classes (Address, Purchase, . . .) are depicted as boxes with
the list of attributes (street, city, . . .), associations as labeled lines connecting
classes (bill-to, ordered, . . .).
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>PSM Level</title>
        <p>A schema in a platform-specific model (PSM) models how a part of the reality
modeled by the PIM schema is represented in a particular data model. In our
case, we consider XML as the data model and our PSM schema specifies
representation in a particular XML format. Our PSM allows for specifying more
different XML formats on the base of the same PIM schema. As a PSM we also
use UML class diagrams extended for the purposes of XML data modeling. We
introduce PSM formally in Definition 2.
Purchase 1..*ordered
- code
- date
- status
1..* contains
- amIteomunt 0..*
- price
- tester</p>
        <p>Customer
- login
- name
- phone
- email [1,*]</p>
        <p>Product 1..*delivers0..*
- code
- price
- color</p>
        <p>Supplier
- name
- phone
- email
provides 1..*</p>
        <p>Supply
- date
- price
- amount
Definition 2. A platform-specific schema (PSM schema) is a 5-tuple S0 =
(Sc0 , Sa0, Sr0 , Sm0, CS00 ) of disjoint sets of classes, attributes, associations, and
content models, respectively, and one specific class CS00 ∈ Sc0 called schema class. S0
must be a forest with one of its trees rooted in CS0
0 . In addition, PSM schema
defines functions name’, card’, type’, xform’, cmtype’ class’ and participant’:
– Class C0 ∈ Sc0 has a name denoted name0(C0).
– Attribute A0 ∈ Sa0 has a name, data type, cardinality and XML form denoted
name0(A0), type0(A0), card0(A0) and xform’0(A0) ∈ {e, a}, respectively.
Moreover, it is associated with a class from Sc0 denoted class0(A0) and has a
position denoted indexOf(A0) within the all attributes associated with class0(A0).
– Association R0 ∈ Sr0 is a pair R0 = (E10, E20) where E10 and E20 are called
association ends of R0. E0 ∈ R0 has a cardinality denoted card0(E0) and is
associated with a class from Sc0 denoted participant0(E0). participant0(E10) and
participant0(E20) are called parent and child of R and denoted parent’0(R0)
and child’0(R0), respectively. R0 has a name denoted name0(R0) and has a
position denoted indexOf(R0) among associations with the same parent.
– Content model M 0 ∈ Sm0 has a content model type cmtype’0(M 0) ∈ {sequence,
choice, set}.</p>
        <p>Sc ∪ Sr ∪ Sa ∪ (SS0∈S∗0 (Sc0 ∪ Sr0 ∪ Sa0)).</p>
        <p>For simplicity, we will also use the notation attributes (C0 ) to denote the
sequence (A0i ∈ Sa0 : class(A0) = C0 ∧ i = indexOf(A0)) for each class C ∈ Sc.</p>
        <p>Let us denote N 0 = Sc0 ∪ Sm0. The graph Sb0 = (N 0, Sr0 ) with classes and
content models as nodes and associations as ordered edges must be a directed
forest with one of its trees rooted in the schema class CS00 . Members of Sc0 , Sa0,
Sr0 , and Sm0 are called components of S0. The set of all PSM schemas will be
denoted S∗0. Finally we define the set of all constructs (PIM and PSM): M =</p>
        <p>The PSM constructs in a concrete PSM schema are linked to the PIM schema
constructs (we will say that the PSM schema is derived from the PIM schema
and the PSM constructs have an interpretation in the PIM). We will not include
formal definition of interpretation in this paper and use these terms intuitively
instead.</p>
        <p>Example 2. Figure 2(a) shows a sample PSM schema modeling an XML
document with a purchase. Again, classes, associations and attributes are depicted
0..1
shipto
billto
cdetail</p>
        <p>PurchRQSchema
purchaseRQ
Purchase
@ code
@ date
@ version
items
Items
item
Item
1..*</p>
        <p>|</p>
        <p>ItemAmount
- amount
- price
(a) PSM schema</p>
        <p>Address
- street
- city
- country
&lt;purchaseRQ code="#"
date="#" version="#"&gt;
&lt;shipto gps="#"&gt;
&lt;street&gt;#&lt;/street&gt;
&lt;city&gt;#&lt;/city&gt;
&lt;country&gt;#&lt;/country&gt;
&lt;/shipto&gt;
&lt;billto&gt;
&lt;street&gt;#&lt;/street&gt;
&lt;city&gt;#&lt;/city&gt;
&lt;country&gt;#&lt;/country&gt;
&lt;/billto&gt;
&lt;cdetail login="#"&gt;</p>
        <p>
          &lt;name&gt;#&lt;/name&gt;
&lt;/cdetail&gt;
&lt;items&gt;
&lt;item code="#" tester="#"/&gt;
&lt;item code="#"&gt;
&lt;amount&gt;#&lt;/amount&gt;
&lt;price&gt;#&lt;/price&gt;
&lt;/item&gt;
&lt;/items&gt;
&lt;/purchaseRQ&gt;
(b) XML document
similarly as in PIM schemas, the difference is in associations, which are now
directed. The diagram also contains one content model of type choice (parent
node of classes ItemAmount and ItemTester). The XML document depicted in
Figure 2(b) illustrates how PSM schema models XML documents. Briefly, named
associations model element-subelement nesting, unnamed association inline the
content to the element above (see association Item-Product), associations
starting at the schema class model allowed root elements. Attributes model XML
attributes or elements with simple content (depending on the value of xform’ for
the attribute, value a is depicted by @ before the name of the attribute).
Structural representatives are used for content reuse (see ShipAddr and BillAddr
reusing content of Address). The in-depth description of the relation of PSM
schemas and modeled documents can be found in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evolution</title>
      <p>For the goal of determining whether XML documents were invalidated due to
schema evolution, the system must recognize and analyze the differences
between the old (S0) and new (Se0) version of the schema. There are two possible
approaches: 1) recording the changes as they are conducted during the design
process or 2) comparing the two versions of the diagram. Table 1 compares both
approaches, for the stated reasons we chose the second approach for our work.</p>
      <p>As a motivation for our further progress, take a look at Figure 3. It contains
two versions of a fragment of a PSM schema. In the old version, class Item has
an attribute price, in the new version, class Purchase has an attribute price.
recording changes schema comparison
the recorded set should be normalized to eliminate re- No redundancies; the set of
dundancies (repeated changes in the same place etc.). changes is always minimal.
Once the evolution process is started, the old version Both versions and new can be
can not be easily changed. edited without limitations.
Possibility to interrupt the work and continue in an- The process of evolution can
other session later. The sequence of recorded changes be arbitrarily stopped and
rewould have to be stored and recording resumed later. sumed.</p>
      <p>To retrieve the sequence for reverse process, the user The reverse operation can be
will have to either start with the new version and record easily handled by the same
the operations needed to go back to the old version algorithm, only with the two
again, or the system will have to be able to create in- schemas on the input swapped.
verse sequence for each sequence of operations.</p>
      <p>For a schema from an outer source, the sequence of A schema from an outer source
operation changes can not be retrieved directly; the can be imported2 and serve as
user must start with the old version of the schema and an input to the change
detecmanually adjust it to match the new schema. tion algorithm.</p>
      <p>Table 1. Approaches to Evolution: Recording Changes vs. Schema Comparison
Without any further information, there can be
two interpretations of the change: 1) attribute
price was moved from Item to Purchase or 2)
the attribute was removed from Item and a new
attribute was added to Purchase, coincidentally
having the same name. Revalidation of the
documents is even more difficult to decide. In the
second case, a correct value must be assigned to
the new attribute3. In the first case, the value
of attribute price should be set to the sum of
the values of price in all the items in the first
version.</p>
      <p>PurchaseSchema</p>
      <p>purchase
Purchase
item 1..*</p>
      <p>Item
- name
- price
version v</p>
      <p>PurchaseSchema</p>
      <p>purchase
Purchase
- price
item 1..*</p>
      <p>Item
- name</p>
      <p>version ~v
Fig. 3. Evolved PSM Schema</p>
      <p>To achieve correct identification of “addition/removal” from “move” (as
opposed to existing approaches), we need to extend the XSEM framework with
a support for having multiple version in the model and possibility to link
constructs in different versions.</p>
      <p>Definition 3 (ver, version link, getInVer ). Let V be the set of versions
for the model M. Function ver : M ∪ S∗0 → V returns the version a given
construct belongs to. We require a natural condition of all the constructs of a
PSM schema S0 to belong to the same version, i.e. (∀S0 ∈ S∗0)(x ∈ Sa0ll ↔
ver(x) = ver(S0)). An equivalence relation of version links VL ⊂ M × M
3 This is an open problem for our future work, see Section 7.
change description
classAdded new class was added to the schema
classRemoved class was removed from the schema
classRenamed existing class was renamed
classMoved class is moved to the PSM tree
attributeAdded/Removed/Renamed attribute was added/removed/renamed
attributeTypeChanged attribute type changed
attributeIndexChanged attributes were reordered in the class
attributeCardinalityChanged cardinality of an attribute was changed
attributeMoved attribute was moved from one class to another
associationAdded/Removed/Renamed association was added/removed/renamed
associationIndexChanged associations were reordered in the class
associationCardinalityChanged cardinality of an attribute was changed
associationMoved association is moved in the PSM tree
Table 2. Changes Identified in XSEM-Evo between two Versions of the same Schema
contains pairs of constructs that are different versions of the same construct.
Partial function getInVer : (M × V) → M , getInVer(e, v) = e˜ ↔ ∃ e˜ ∈ M :
(e, e˜) ∈ VL ∧ ver(e˜) = v returns a construct in a desired version (if e does not
exist in version v, getInVer(e, v) is undefined). If (x, x˜) ∈ VL, x and x˜ must both
be constructs of the same kind (e.g. both classes, attributes, PSM associations...).</p>
      <p>In the following text, we will assume |V| = 2, unless explicitly stated
otherwise (i.e. we expect that there are two versions in the system - an old version
(v ∈ V) and a new version (v˜ ∈ V)). Values of ver function form the input of
the change detection algorithm (and so does the relation VL). In XCase, values
of ver are assigned automatically by the system during and maintained As the
user edits the schema (either the old or the new version).
5</p>
    </sec>
    <sec id="sec-6">
      <title>Changes</title>
      <p>Based on Definitions 1 and 2, we can identify all possible changes that can occur
between two versions. Each change is formally defined as a predicate with
certain amount of parameters characterizing the change. Detecting changes means
finding the constructs that match each respective predicate. Table 2 lists all
recognized changes.</p>
      <p>Describing each change and it’s revalidation is beyond the scope of this paper.
Therefore we will describe only attributeMoved change in detail (an example of
this change is depictend in Figure 3), other changes are treated in a similar
manner. Change attributeMoved means that the value of class’(A0) is changed
i.e. an attribute is moved from class Co0 to another PSM class Cfn0 (and the new
index of A˜0 is i˜0). Formally attributeMoved is defined as a ternary predicate:
attributeMoved(A˜0, Cfn0, i˜0) ↔ A˜0 ∈ Sfa0 ∧ Cfn0 ∈ Sfc0 ∧ i˜0 ∈ N0 ∧ getInVer(A˜0, v) 6= null∧
class’(A˜0) = Cfn0 ∧ (∃Co0 ∈ Sc0 )(class’(getInVer(A˜0, v)) = Co0 ∧ getInVer(Cfn0, v) 6= Co0)</p>
      <p>
        Moving an attribute is an evolution operation that requires more in-depth
enquiry. The aim of our approach is to keep the semantics of the revalidated
document and not to lose the existing data during revalidation. The trivial solution
– deleting the attribute from its former location in the document and creating
a new attribute in the new location (as used in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) – is not suitable,
because the value of the attribute is lost. The most general approach is to couple
each instance (A˜0, Cfn0, i˜0) of attributeMoved change with a revalidation function
attMoveA˜0 (oldLocations, newLocation): XSEMPath × XSEMPath → type’(A˜0)4
where oldLocations is an expression returning all existing instances of A0 and
newLocation is an expression containing the path to one instance in the new
document. The result of the function is the new value for the instance of A˜0 in
the new document. In the general case the function attMoveA˜0 is defined by the
user, but the system can give the user a suggestion in certain cases – several
types of the most common scenarios can be distinguished.
      </p>
      <p>e
b
h
a
c
f
i</p>
      <p>The figure on the left shows an example of tree function,
d the result of tree(f, g, i) is the set {a, b, d, f, g, i, a−b, b−
g f, f −i, a−d, d−g}, a is the root of the common subtree
and a − b the edge from a to b.</p>
      <p>In the following text we expect that the attribute was moved between classes
Co0 and Cfn0, A0 ∈ attributes’(Co0), A˜0 ∈ attributes’(Cfn0). Let Cfo0 = getInVer(Co0, v˜)
and Cn0 = getInVer(Cfn0, v), be the new version of class Co0 and the old version
of class Cfn0 respectively, both can be null. In the situation depicted in Figure
3 A0 = price1, A˜0 = price2, Co0 = Item1, Cn0 = Purchase1, Cfo0 = Item2, Cfn0 =
Purchase2 (we use subscripts to distinguish constructs in version v and v˜).</p>
      <p>Let tree(X 0) return the subgraph of the smallest common subtree for a set
X 0 ⊆ N 0 containing the root B0 of the common subtree, members of X 0 and for
each X0 ∈ X 0 path between X0 and the root B0 (see Figure 4 for an example).
We define predicate stable(X 0) ↔ X 0 ⊆ {Sa0ll \ Sa0} ∧ {∀X0 ∈ X 0 : X0 is present
in the new version, it was not moved, added or deleted and cardinality was not
changed (if X0 is an association)}. The intuitive meaning of predicate stable(X 0)
is that there were no radical changes in the structure for the members of X 0. If
Cn0 6= null, T 0 = tree({Co0, Cn0}) and stable(T 0) holds, than if:
4 The formal definition of XSEMPath expressions is beyond the scope of this paper.</p>
      <p>
        Intuitively, the notation and usage is similar to XPath [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], only the XSEMPath
expressions refer to the constructs from a PSM schema instead of the names of
XML elements and attributes in the document. When executed upon a document
T ∈ T (S0), XSEMPath query is translated to XPath query and the result of this
query upon T is returned.
1. ∀ association R0 ∈ T 0 : card’(R0) = mR0 ..1 (only cardinalities 0..1 and 1..1
are allowed in the affected part of the schema). In this simplest case the
attribute A˜0 will have 0 or 1 instance in the old schema. Then this instance can
be copied to the only one new location (attMoveA˜0 will be identity function).
2. Co0 is a descendant of Cn0 in the PSM tree (the attribute is moved upwards,
but the associations between Co0 and Cn0 can have arbitrary cardinalities).
Then all instances of A0 under each instance of Cn0 should be “aggregated”
to one instance of A˜0 (this is the case depicted in Figure 3). Several
aggregation functions can be offered (e.g. sum, count, avg, max, min known from
relational databases or concat inlining the respective values).
3. Co0 is a descendant of Cn0 in the PSM tree, card’(A0) = m0..1 and card’(A˜0) =
mf0..∗. Then this case is similar as the case above, but the cardinality of
attribute A˜0 is adjusted, so all the values from existing instances can be used
as values of A˜0, no aggregation is needed (attMoveA˜0 can be called toList ).
4. Co0 is an ancestor of Cn0 in the PSM tree, card’(A0) = m0..∗ and card’(A˜0) =
mf0..1. Then this is an inverse case to the one above. The respective values
of A0 can be distributed to the new locations. Nonetheless, a user may have
to specify the distribution precisely (attMoveA˜0 can be called distribute).
When none of the conditions above is satisfied, a possible general approach is
to use the function attMoveA˜0 = identityN which returns the value of the n-th
instance of A0 when required on the n-th location in the revalidated document.
Other attMoveA˜0 functions must be entered by the user.
6
      </p>
    </sec>
    <sec id="sec-7">
      <title>Revalidation</title>
      <p>In the first step, the revalidation algorithm detects all changes between two
versions of the model. If changes that require revalidation occur, a revalidation
script is generated. Applying the script on a document valid against the old
version produces a document valid against the new version. Since changes are
defined as changes in the XSEM model, similarly as XSEM model can be
translated to an XML schema in any XML schema language, the set of changes can be
used to generate a revalidation script in any kind of implementation language.
Example 3. The in-depth description of the process of generating the
revalidation script is beyond the scope of this paper. As an example, we will show the
revalidation script in XSL for the schema depicted in Figure 3, concretely if
the first interpretation from Example 4. In that case, we have to revalidate the
attributeMoved change. In this example, we will use the subscript 1 for constructs
from the old version and 2 for constructs from the new version. We are
revalidating attributeMoved(price2, Purchase2, 0). Since there are no other changes,
stable({Purchase1, Item1}) also holds and since the association Purchase-Item
has cardinality 1..∗ and Purchase is a parent of Item, this case belongs to the
second group mentioned on page 9 during the description of the revalidation
of attributeMoved change (aggregation upwards). Before generating the script,
The XSL revalidation script consists
of four templates. The first template
processes purchase element – it
creates the wrapping tag and calls a
template for the moved PSM attribute
pur-price and for item elements.</p>
      <p>The second template processes item
elements (the template is necessary,
because we need to remove the price
subelement from item elements).</p>
      <p>The fourth template copies the content
of each name element to the output.</p>
      <p>The third, named, template
purprice calls the attMoveprice1
function (the XSEMPath expressions are
converted to regular XPath
expressions – oldLocations to ‘item/price’
and newLocation to ‘price’). The
user selected the function sum as
attMoveprice1 , a generic function built
in the system. If none of the built-in
function is suitable for the situation,
the user can extend the system with
a custom set of functions (having the
same header as sum).
&lt;xsl : stylesheet&gt;
&lt;xsl : template match=‘/ purchase ’&gt;
&lt;purchase&gt;
&lt;xsl : call −template</p>
      <p>name=”pur−p r i c e ’ /&gt;
&lt;xsl : apply−templates</p>
      <p>s e l e c t =‘item ’ /&gt;
&lt;/purchase&gt;
&lt;/xsl : template&gt;
&lt;xsl : template
match=‘/ purchase / item ’&gt;
&lt;item&gt;</p>
      <p>&lt;xsl : copy−of s e l e c t =‘name’ /&gt;
&lt;/item&gt;
&lt;/xsl : template&gt;
&lt;xsl : template name=‘pur−p r i c e ’&gt;
&lt;p r i c e &gt;
&lt;xsl : value−of s e l e c t=
‘ xc : sum( item / p r i c e ,</p>
      <p>p r i c e ) ’/ &gt;
&lt;/p r i c e &gt;
&lt;/xsl : template&gt;
&lt;xsl : template
match=‘/ purchase / item /name’&gt;
&lt;xsl : copy−of s e l e c t = ‘. ’ /&gt;
&lt;/xsl : template&gt;
&lt;xsl : function name=‘xc : sum ’ &gt;
&lt;xsl : param</p>
      <p>name=‘ ol d Lo c at i o ns ’ /&gt;
&lt;xsl : param</p>
      <p>name=‘ newLocation ’ /&gt;
&lt;xsl : value−of s e l e c t=</p>
      <p>‘sum($ o l d L o c a t i o n s ) ’ /&gt;
&lt;/xsl : function&gt;
&lt;/xsl : stylesheet&gt;
the user selects the function sum to be used as attMoveprice1 . The resulting
revalidation script in XSL is depicted and explained in Figure 5.
7</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion and Future Work</title>
      <p>In this paper we proposed a formal model for schema evolution. We formally
defined changes that can be detected between two versions of a schema. We
sketched an implementation of an algorithm that produces revalidation script
on the basis of the detected set of changes.</p>
      <p>The set of the detected changes is useful not only for the revalidation
algorithm, but it is also possible to immediately decide, whether revalidation is
needed or not and help the user to locate the changes in the schema.</p>
      <p>The revalidation script can deal with structural modifications and with a
significant part of the content modifications automatically, user input is required
only where necessary (e.g. when a new content must be added during
revalidation). We plan to extend its capabilities in adding content in our future work.</p>
      <p>
        The algorithm in its current version deals mainly with revalidation of 1)
structure and 2) data already present in the document. Because new data are
often required for new versions, we will focus our future work on obtaining this
data for the revalidated documents. For this purpose, we will utilize the existing
connection between PIM and PSM and and a new similar connection between
PIM and the model of a data storage (e.g. an ER schema [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]).
      </p>
      <p>The change detection algorithm requires semantic links between the two
compared versions of the schema, which are not available for schemas were imported
and not created with an XSEM-enabled tool. Existing approaches to schema
comparison and matching can be utilized as heuristics for creating these links.</p>
      <p>Type inheritance is a fundamental part of large specifications, but support
for inheritance is limited in the current version of XSEM. We intend to extend
both PIM and PSM levels with inheritance support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>XCase - tool for XML data modeling</article-title>
          . http://xcase.codeplex.com/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>2. MOF 2</source>
          .0 / XMI Mapping Specification,
          <year>v2</year>
          .
          <fpage>1</fpage>
          .1. OMG,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>R.</given-names>
            <surname>Bourret</surname>
          </string-name>
          .
          <article-title>XML and Databases</article-title>
          .
          <source>September</source>
          <year>2005</year>
          . http://www.rpbourret.com/ xml/XMLAndDatabases.htm.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Booth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Liu. Web Services Description</surname>
          </string-name>
          <article-title>Language (WSDL) Version 2.0 Part 0: Primer</article-title>
          . W3C,
          <year>June 2007</year>
          . http://www.w3.org/TR/wsdl20-primer/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>G.</given-names>
            <surname>Guerrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mesiti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Sorrenti</surname>
          </string-name>
          .
          <article-title>Xml schema evolution: Incremental validation and efficient document adaptation</article-title>
          .
          <source>In XSym</source>
          , volume
          <volume>4704</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>92</fpage>
          -
          <lpage>106</lpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Hong</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Kramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Rundensteiner</surname>
          </string-name>
          .
          <article-title>XEM: XML Evolution Management</article-title>
          ,
          <source>Technical Report WPI-CS-TR-02-09</source>
          .
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Klettke</surname>
          </string-name>
          .
          <article-title>Conceptual xml schema evolution - the codex approach for design and redesign</article-title>
          . In Workshop Proceedings Datenbanksysteme in Business,
          <source>Technologie und Web (BTW</source>
          <year>2007</year>
          ), pages
          <fpage>53</fpage>
          -
          <lpage>63</lpage>
          , Aachen, Germany,
          <year>March 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Miller</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Mukerji</surname>
          </string-name>
          .
          <source>MDA Guide Version 1.0</source>
          .1. Object Management Group,
          <year>2003</year>
          . http://www.omg.org/docs/omg/03-06-01.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Necasky</surname>
          </string-name>
          .
          <article-title>Conceptual Modeling for XML, volume 99 of Dissertations in Database and Information Systems Series</article-title>
          . IOS Press/AKA Verlag,
          <year>January 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. M.
          <article-title>Necasky´ and I. Mly´nkova´. Five-level multi-application schema evolution</article-title>
          .
          <source>In DATESO</source>
          , pages
          <fpage>90</fpage>
          -
          <lpage>104</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Object Management Group.
          <source>UML 2.1.2 Specification</source>
          , nov
          <year>2007</year>
          . http://www.omg. org/spec/UML/2.1.2/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>T.</given-names>
            <surname>Bray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Paoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Sperberg-McQueen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Maler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yergeau. Extensible Markup</surname>
          </string-name>
          <article-title>Language (XML) 1.0 (Fifth Edition)</article-title>
          .
          <source>W3C</source>
          ,
          <year>November 2008</year>
          . http:// www.w3.org/TR/2008/REC-xml-
          <volume>20081126</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>B.</given-names>
            <surname>Thalheim</surname>
          </string-name>
          .
          <string-name>
            <surname>Entity-Relationship</surname>
            <given-names>Modeling</given-names>
          </string-name>
          : Foundations of Database Technology. Springer Verlag, Berlin, Germany,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Beech</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maloney</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Mendelsohn</surname>
          </string-name>
          .
          <source>XML Schema Part</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Structures (Second Edition). W3C</surname>
          </string-name>
          ,
          <year>October 2004</year>
          . http://www.w3.org/TR/ xmlschema-1/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>W3C. XML Path</surname>
          </string-name>
          <article-title>Language (XPath) 2.0</article-title>
          . http://www.w3.org/TR/xpath20/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>