<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deductive Approach to Semistructured Schema Evolution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>D. Luciv</string-name>
          <email>dluciv@lanit-tercom.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>'ref'</institution>
          ,
          <addr-line>addAttribute(ResName, RefAttrName), useOrCreateKey(Tag, PathWithoutLast, Child, KeyAttr, ResName), concat(PathWithoutLast</addr-line>
          ,
          <institution>[KeyTag]</institution>
          ,
          <addr-line>Path), onePathFromTo(ResName, Tag, ResPath),!, %gives one, cuts off. addReference( Tag, ResPath, RefAttrName, Tag, Path, KeyAttr )</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Proceedings of the Spring Young Researcher's Colloquium on Database and Information Systems SYRCoDIS</institution>
          ,
          <addr-line>St.-Petersburg, Russia, 2005</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Saint-Petersburg State University</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>useOrCreateKey(Tag</institution>
          ,
          <addr-line>PathWithoutLast, KeyTag, KeyAttr, MustReferTag):- concat(PathWithoutLast</addr-line>
          ,
          <institution>[KeyTag]</institution>
          ,
          <addr-line>Path), inNK(Tag, Path, KeyAttr), inPL(MustReferTag, Tag), !. %one is enough useOrCreateKey(Tag, PathWithoutLast, KeyTag, KeyAttr, MustReferTag):- leastCommonEPL(KeyTag,MustReferTag, Tag,Path,RPath), KeyAttr = Keytag</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Schema evolution for different methodologies is an interesting area of research for now and future. Some results of research of semistructured database schema evolution are presented here. Semistructured database schema model based on [11, 4] and its evolution techniques are introduced. Extensions to data model are also described. Some refactoring problems are described. Declarative approach to schema evolution and refactoring is presented. ∗ This research was partially supported by Microsoft Research grant No. 2004-459A and RFBR grant No. 04-01-00173 lational database has powerful interface for schema altering, and relational databases are mostly used to store large amounts of data. Other kinds of databases often use relational engines behind their semistructured and/or object-oriented interfaces to increase productivity. Usually people think about real-world concepts like entities, relations and attributes when modeling the data. The same can be said about data schema evolution controlled by human operator. Existing semistructured database schema evolution techniques provide “lowlevel” evolution operations over particular schema items like tags, attributes, associations, etc. However sometimes operator thinks about more high-level operations like splitting entities, moving attributes from one entity to other, etc - which can be called “refactoring”. This paper describes some investigations in area of schema evolution and refactoring and possible approaches to it. The remaining paper is organized as follows: 2. basics for semistructured hierarchical data modeling are defined; 3. restrictions on XPath expressions are defined and applied when modeling associations; 5. declarative approach to invariants implementation is presented; 6. declarative approach to refactoring is introduced, refactoring operations are described;</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        As one of the most popular universal data interchange
formats, XML [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] needs to be structured and constraints
applied to XML data are to be formalized.
      </p>
      <p>XML is now often used for short message interchange
and storing small pieces of data for particular
applications. Number of such applications is relatively big
and in this case data structure of XML can be restricted
with only constraints applied by XML itself: the data
should have hierarchical form with tags and attributed
and should obey XML syntax.</p>
      <p>
        For more serious applications XML data schema
constraints are practically defined and then checked using
such means as DTD [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], XML Schema [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ],
RelaxNG [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], etc. Although for productivity reasons
genuineXML (unparsed textual) data packages are still relatively
small for such applications, they can have a complicated
structure.
      </p>
      <p>
        For all methodologies maintaining data structure is
interesting, but difficult. One of research areas is database
schema evolution for relational, object-oriented [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
semistructured hierarchical [
        <xref ref-type="bibr" rid="ref10 ref14 ref15 ref4 ref5">4, 5, 10, 14, 15</xref>
        ] databases,
and a number of related areas like XML-relational
mapping [
        <xref ref-type="bibr" rid="ref14 ref3 ref7 ref8">3, 7, 8, 14</xref>
        ].
      </p>
      <p>Traditionally, data schema evolution is not a part of
the both object-oriented, relational and semistructured
data definition methodologies, however data structure
often requires being alterable. This comes from
practice. As an illustration we can say that almost any
re</p>
      <p>
        Another interesting research area of research is
onthology evolution [
        <xref ref-type="bibr" rid="ref17 ref18">18, 17</xref>
        ]. Some onthology
models, like conceptual graphs [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], can be considered a
conservative extension to first-order logic and used to
express XML data model too. Example Prolog
interpreter with conceptual graph support can be obtained
at http://prologpluscg.sourceforge.net/
as a part of Amine Platform
(http://amine-platform.sourceforge.net/).
3
3.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Evolution of Data Schema</title>
      <sec id="sec-2-1">
        <title>Data Modeling Basics</title>
        <p>Let us consider sample data model to be managed (fig. 1).</p>
        <p>Document</p>
        <p>Street
name : String</p>
        <p>0..*
0..*</p>
        <p>City
name : String</p>
        <p>Region
name : String</p>
        <p>Square
name : String</p>
        <p>Building
number : int
0..*
0..*</p>
        <p>Clinic</p>
        <p>
          Given data structure mostly consists of entities and
aggregations between them. Some associations are not
supported by XML syntax: in fact XML supports only
strong aggregation. Other kind of associations that is
described in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and in this paper are associations based on
XML references. Those associations are one-way
navigable.
        </p>
        <p>The model needs some change to become realistic.
Modified model is shown on fig. 2. Here we have
resolving entities for Street and Building. Two resolving
tags for entities those are to be referenced using “? → ∗”
associations are here added to the model. All
associations are now references with single direction (as
navigability shows). By the way, resulting second model is
poorer than first one: for example, we can’t list Regions
containing given Street using revised data model.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Notations and Definitions Basic Notations</title>
        <p>
          notations are defined
tag graph
tags
attribute
tag or attribute
“native” attributes of t
essential “native” attributes of t
by
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]:
3.2
3.2.1
Here we constrain XPath expression class to be simple
enough to analyze. Some of described notions were
already defined and used in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], here they are described
broader.
        </p>
        <p>Let us consider association to be reference from tag t
towards tag s. Tag s is identified by b attribute, tag t uses
a attribute to refer tag s. The reference is implemented
using XML key and keyref constructs.</p>
        <p>Example XPath expression looks like
“ max(/Document/City/Street/Building
[parent::Street/@name="Smiths’"]/@number)”.</p>
        <p>
          It contains attributes (@name, @number), axes
(implicit child:: and explicit parent::), predicate
(...="Smiths’") and term (max(...)). See[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] for
more information.
        </p>
        <p>Although above XPath expression does not employ
every possible XPath functionality (e.g. “|”, other
operations and advanced features), it is complicated enough.
Here we do not need to formalize every kind of XPath
expression.</p>
        <p>
          The biggest own subclass of XPath expressions
defined in this paper is a class of Simplified expressions.
Such expressions do not use axes (except child::),
predicates, terms and operations. The most powerful
element of this class is “//” axis. This axis can be easily
simulated by substitution of “//” with all possible path
sections and then uniting search results. This operation
is finite because [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] does not use cycles in document
schemas.
        </p>
        <p>Simplified expressions, like other XPath expressions,
are in real XML documents, not schemas. However
Simplified expression can be described using document
model terminology. It can be considered as a function
p : τ → 2τ because if we know what tag is an argument
instance of, we always know all possible tags which are
resulting values instances of.</p>
        <p>We can constrain result set of Simplified expression
this way: ♯{p(t)} = 1, ∀p ∈ SP, t ∈ τ .</p>
        <p>Members of such expressions class act as SP ∋ sp :
τ → τ . SP class is called Simple expressions here. We
here denote the set of simple XPath expressions as SP ,
so SP (t) is the set of tags s which can be reached from
t using simple XPath expressions. So αt(SP (t), υ) ⊂ τ
for υ ⊂ τ . We are also to define A as St∈τ N (t). So
notation now includes:</p>
        <p>SP (t) subgraph of τ reached by simple XPath
expressions from t</p>
        <p>A union of all tags’ native attributes</p>
        <p>Above explanations’ goal is to make nature of SP
easy to understand. Formally we can define SP as
{s|t ∈ P L(s)} ∪ t.</p>
        <p>
          Associations are represented with keys and references
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]:
keydecl : τ × SP × (A ∪ τ ) → K, so
K = {fk : {val(m)} → {inst(s ∈ τ )}},
        </p>
        <p>fk is the key function
and
keyrefdecl : τ × SP × (A ∪ τ ) × K → R, so</p>
        <p>R = {fr : {inst(t)} → {val(m)}},</p>
        <p>fr is the key reference function.</p>
        <p>Finally, for instances of tags we require
fk = keydecl(t1, p1, m1), fr = keyrefdecl(t2, p2, m2, fk)
∀e2 ∈ {inst(p2(t2))}</p>
        <p>∃e1 ∈ {inst(p1(t1))} :
t1 ∈ P L(t2);
(fk · fr)(e2) = e1.
3.2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Sets</title>
        <p>
          Following notations for key and reference modeling are
introduced by [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]:
        </p>
        <p>KC(fk) Coverage of key with function fk =
keydecl(t, p1, a1)
RC(fr) Coverage of reference with function
fr = keyrefdecl(s, p2, a2, fk′ )
CK(t) Key declarations covering t ∈ τ
CR(t) Reference declarations covering t ∈ τ
N K(t) “Native” key declarations of t ∈ τ
N R(t) “Native” reference declarations of t ∈ τ
N Re(t)
Axiom
(KR1)
(KR2)
(KR3)
(KR4)
(KR5)
“Essential” reference declarations of
t ∈ τ . Here we mean reference
declarations which base on essential attributes
of referencing tag and it’s essential
parents
“Essential native” reference
declarations of t ∈ τ . Here we mean reference
declarations which base on essential
attributes of referencing tag
Formal representation
N K(t) = alphas(keydecl(
s, p : p(s) = t, a ∈ N (t)), P L(t))
N R(t) = αs(keyrefdecl(
s, p : p(s) = t, a ∈ N (t), ∗), P L(t))
∀fr ∈ R ∃fk ∈ K :
fr = keyrefdecl(t ∈ τ, p ∈ SP,
m ∈ (A ∪ τ ), fk)
∀t ∈ τ CR(t) ∈ αs(
keyrefdecl(∗, ∗, ∗, s), CK(t))
N Re(t) = αs(keyrefdecl(
s, p : p(s) = t, a ∈ Ne(t)), P L(t), ∗)</p>
      </sec>
      <sec id="sec-2-4">
        <title>3.2.4 Invariants</title>
        <p>
          The set of axioms (denoted as (1)–(9) here) given in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
is extended by following ones in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]:
        </p>
        <p>They are named as axioms of -native keys, -native
references, -declaration, -coverage, -“essential” references
respectively.</p>
        <p>
          Consistency and completeness of invariants (1)–(9)
are proved in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] by following theorem:
Theorem 1 Schema axioms are sound and complete,
so if we have Pe(t) and Ne(t) defined for each tag
in schema our schema satisfies axiom set and making
changes to this schema is govered by the sets.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Proof</title>
        <p>
          is given in[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>End of Proof</title>
        <p>
          Key and reference invariants and formulated in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
and proof idea can be expressed as follows:
Theorem 2 Schema axioms are sound and complete, so
if we have Pe(t), Ne(t), N K(t), N Re(t) defined for
each tag in schema our schema satisfies axiom set and
making changes to this schema is governed by the sets.
        </p>
      </sec>
      <sec id="sec-2-7">
        <title>Proof</title>
        <p>Theorem 1 says that schema without key and
reference declarations satisfies axioms (1)–(9) when Pe(t)
and Ne(t) are defined for each tag. But axioms (1)–(9)
say nothing about keys and references so new model
satisfies those axioms. The same fact relates to model
without keys and references and axioms (KR1)–(KR5): no
proof needed.</p>
        <p>Hence we are now to prove that keys and references
of document schema satisfy new axioms (KR1)–(KR5).
(KR1),(KR2) Those axioms are satisfied obviously
due to key and reference semantics
presented here.
(KR3) Every reference should base on
corresponding key.
(KR4) We have noted above that reference
coverage is always contained by key
coverage. Thus if we use reference
covering our tag, we are also to have
key declaration covering our tag.
(KR5) Like (KR1),(KR2) is satisfied due to
reference semantics. It is always able
to build N R(t) set using N Re(t) and</p>
        <p>N (t) sets.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4 Declarative Approach</title>
      <sec id="sec-3-1">
        <title>4.1 Description of Schema</title>
        <p>
          All schema invariants are already defined using
firstorder logic. Evolution is done by modifying governing
sets [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and recalculating other ones.
        </p>
        <p>Possible way to make the schema practically
declarative is to assume XML document schema stored in
deductive database.</p>
        <p>
          Description below is equivalent to above schema
invariants but can be programmed using Prolog [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] in
nearly unchanged form. However description is
simplified here to make it more illustrative yet functional.
        </p>
        <p>Note that next section describes data schema facts, but
does not describe any method to ensure schema is
consistent. It is not a vulnerability of the approach: evolution
operation set is full, any operation leaves correct schema
in correct state, so any correct schema can be “unrolled”
from empty one.</p>
        <p>Example set of facts below can be interpreted as state
of schema at particular moment. Loading data schema
state “as-is” can also be useful, like restoring relational
databases from dumps with constraints disabled
temporarily for productivity.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.1.1 Facts of Schema</title>
        <p>Facts of deductive databases are used to store
information that can not be deduced. This is primary information
about data schema.</p>
        <p>Illustration below defines subset of entities from data
model shown on fig. 2. Schema can be described as
follows:
%--- EntityModel
tag(’:root’). tag(’:term’).
tag(’Document’). tag(’City’).
tag(’Square’). tag(’Street’).
tag(’Region’). tag(’res_Street’).
inPe(’City’, ’Document’).
inPe(’Square’, ’City’). inPe(’Square’, ’Region’).
inPe(’Street’, ’City’). inPe(’Region’, ’City’).
inPe(’res_Street’, ’Region’). % - non-essential
inPe(’:term’, ’Square’). inPe(’:term’, ’Street’).</p>
        <p>Notes (here and below):
• Prolog syntax is somewhat changed to optimize
paper usage and to group some data;
• attributes are named uniquely for simplicity to
distinguish them by name.</p>
        <p>In case we can use model from fig. 1 directly (our
methodology allowed them), we should write a fact like
inPe(’Street’,’Region’). Moreover, Schema
defined is not complete in area of Region→Street
reference because res_Street references Street via
reference based on attribute streetref.</p>
        <p>Attributes can be described as follows:
%--- AttribModel
inNe(’City’, ’City@name’).
inNe(’Street’, ’Street@name’).
inNe(’Region’, ’Region@name’).
inNe(’Square’, ’Square@name’).
inNe(’res_Street’, ’res_Street@streetref’).</p>
        <p>
          Two sections of Prolog facts above define almost
everything needed to describe model as [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] proposes, but
references defined in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] are also to be defined in case
when we consider them as evolution object.
        </p>
        <p>
          keydecl and keyrefdecl constructs require a bit more
explanation before defining them. Reference axiomatic
in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] is based on key and reference functions which
allow us, when superposed, identify referred tag instance
for referring one including all references defined for it.
        </p>
        <p>Those functions are “generated” by keydecl and
keyrefdecl which are, formally, functionals. Such a
description with functionals and functions can be left
almost unchanged when implementing XML DBMS using
a kind of functional language for it, but we use logical
language to describe data schema here.</p>
        <p>keydecl construct can be used to identify resulting key
function keydecl(t, p, a) as:
%--- RefersModel
%---- Keys
inNK(’City’, [’Street’], ’Street@name’).</p>
        <p>or, with other context:
inNK(’Document’, [’City’, ’Street’],
’Street@name’).</p>
        <p>in case we want document-unique street name.</p>
        <p>It is possible to reach key-covered tag by different
ways so we need a list (or other ordered container) to
represent path from SP here.</p>
        <p>keyrefdecl functional gets keydecl result as an
argument, but it is correct to say that keydecl can be identified
by it’s argument, so practically keyrefdecl gets 6
parameters in deductive database, e.g.:
%--- RefersModel
%---- Refs
inNRe(
’City’, [’Region’, ’res_Street’],
’res_Street@streetref’, ’City’,
[’Street’], ’Street@name’ ).</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.1.2 Goals of Schema</title>
        <p>
          Sets which are not yet defined using Prolog are
calculable and can be omitted theoretically, however [
          <xref ref-type="bibr" rid="ref10 ref11 ref4">11, 4, 10</xref>
          ]
defined them for simplicity, and we will also use those
definitions below.
        </p>
        <p>In deductive database those sets should not be
asserted during schema evolution, they can be considered
as “read-only” and calculated using schema invariants.
Thus they are defined using goals, not facts.
%--- Calculable Sets
% Pred. Graph
inPL(S, T)
:inP(S, T).
inPL(S, T)
:inP(S, U),
U=\=T,
inPL(U, T).
inPL(S,S).
% Inh. attributes
inH(T, A)
:inP(T, S),
inI(S, A).
% Interface
inI(T, A)
:inN(T, A).
inI(T, A)
:inH(T, A).
% Immediate Pred.
inP(T, S)
:inPe(T, S),
inPe(T, X),
X=\=S,
not(inPL(X,S)).
inP(T, S)
:inPe(T, S),
tag(X),
X=\=S, %single Pe
not(inPe(T,X)).
% Native Attributes
inN(T, A)
:inNe(T, A),
not(inH(T, A)).
% :root &amp; term
inPL(_,’:root’).
inPL(’:term’,_).</p>
        <p>
          Above Prolog goals can be used almost without
changes except recursive goals for P L sets. In
experiments they were optimized and allowed to check if any of
arguments are not free term and then decide how to recur.
For example, when we try to solve inPL(X, ’City’)
it is better to recur down from City tag, however
axioms and goals above recur upwards only. Calculation
of P sets was also changed in practice: in [
          <xref ref-type="bibr" rid="ref11 ref4">11, 4</xref>
          ]
invariants were defined to handle sets of unique elements,
however Prolog gives a number of non-unique solutions
for inP and inPL above goals (all variants) for any
unbound arguments. Practically, some tricks with Prolog
cut-off were used. All those tricks and optimizations are
beyond this paper.
        </p>
        <p>
          Detailed descriptions of sets above and formulae for
schema invariants are placed in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Those formulae are
used “as-is” their calculation require no imperative
description at all.
        </p>
        <p>
          N K(τ ) and N Re(τ ) sets are governing sets for keys
and references, other are calculated. Invariants from [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
can be described like this:
last(l, e)
:eq(l, (h)), eq(e, h).
last(l,e)
:eq(l, (h|t)), last(t,e).
inNR(Rtag, Rpath, Rattr, Ktag, Kpath,
Kattr):inNRe(Rtag, Rpath, Rattr, Ktag, Kpath, Kattr),
last(Rpath, RRtag),
inN(RRtag, Rattr).
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>4.2 Declarative Approach to Evolution</title>
        <p>
          Basic operations which were initially defined in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for
the data model without associations are modified to
handle references. Evolution operations from [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] are here
defined considering schema as deductive database
described above.
        </p>
        <p>
          Note that we use some ISO Prolog goals like
findall(Var, Term, Result) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. However Prolog
code below can be used as-is, this code is dedicated to
illustrate main ideas of possible schema evolution engine
implementation. Some checks are here omitted and
evolution operations are defined as simply as possible.
        </p>
        <p>Evolution operations are labeled «evo.M.N» below.
evo.1.1. adding new attribute</p>
        <p>
          Rule described in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. No additional requirements and
activities needed.
addAttribute(Tag,Attr)
:tag(Tag),
assert(inNe(Tag,Attr)).
evo.1.2. deleting attribute
        </p>
        <p>When deleting attribute a of tag t, all key declarations
in N K(t) and references in N R(t) are deleted. See
operation of key removal (3.2) below.</p>
        <p>
          As noted in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], an ability to add deleted attribute to
Ne of tags in SP (t) is useful. Implementation of this
functionality should also redeclare keys in N K(t) and
references in N R(t). Sample implementation is
cumbersome and omitted here.
evo.2.1. adding aggregation between two tags
        </p>
        <p>
          Rule described in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. No additional requirements and
activities needed.
addAggregation(Tag,Parent)
:tag(Tag), tag(Parent),
assert(inPe(Tag,Parent)).
evo.2.2. removing aggregation between two tags
        </p>
        <p>
          This operation is relatively complicated in both [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
and here.
        </p>
        <p>Let us call aggregated tag t.</p>
        <p>All key declarations in N K(t) are modified to satisfy
(KR4) axiom. This means that declarations move upward
unless they cover all references again. Note that if t is
no more aggregated by other tags it implicitly becomes
child of root tag ⊤, and all corresponding key
declarations move to ⊤.</p>
        <p>All reference declarations in N R(t) are deleted if they
become invalid. All reference declarations in t move up
to save references in αs(N R(s), SP (t)).</p>
        <p>A sample code (qualitatively simplified) here executes
key redeclaration according to (KR4).
l.e.a.st.CommonEPL(Tag1,Tag2,Pred,Path1,Path2)
:% finds least common predecessor in both
% PL(Tag1)\{Tag1} and PL(Tag2)\{Tag2} and
% pathes to it. Trivial but cumbersome.
retractDependentRefs(Child, Parent, DKP)
:findall((RT,RP,RA,KT,KP,KA),
(inNRe(RT,RP,RA,KT,KP,KA),
member(Child, KP),member(Parent, [KT | KP]),
retract(inNRe(RT,RP,RA,KT,KP,KA)) ), DKP).
correctKeyRefs(Child, Parent, DK)
:findall(inNRe(RT,RP,RA,KT,KP,KA),
(inNRe(RT,RP,RA,KT,KP,KA),
member(Child, KP),
member(Parent, [KT | KP]),
retract(inNRe(RT,RP,RA,KT,KP,KA)),
retract(inNR(KT,KP,KA)),
leastCommonEPL(Parent,Child,LKE,PPath,CPath),
addKey(LKE,CPath,KA),
addReference(RT,RP,RA,LKE,PPath,KA)
), DK).</p>
        <p>Then we can invoke them:
removeAggregation(Child,Parent)
:retractDependentRefs(Child, Parent, _),
correctKeyRefs(Child, Parent, _),
retract(inPe(Child,Parent)).
evo.2.3. adding new tag</p>
        <p>
          The rule is described in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. No additional
requirements and activities needed. Prolog goal below also
illustrates some checks of bound variables. Those checks
are useful live applications.
addTag(Tag, PeList)
:findall(X, (member(X, PeList),
not(tag(X))), BadList), %only sample check
length(BadList, 0), % they all are tags
assert(tag(Tag)),
findall(X, (member(X, PeList),
addAggregation(Tag, X)), Successfull),
addAggregation(’:term’, Tag).
evo.2.4. removing a tag
        </p>
        <p>
          The rule is described in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for basic structure without
keys and references. Removed tag is called t. Like in
operation (1.2) of attribute removal, the reference
declarations in N Re(t) can be modified to cover all children
of t. For all such reference declarations we are to create
corresponding reference attributes in children of t.
Sample Prolog implementation is not placed here because of
its awkwardness.
evo.3.1. declaring new key
        </p>
        <p>Key declaration is canceled if it does not satisfy one or
more of (KR1)-(KR5). Sample Prolog code omits checks
because they are cumbersome and can be considered as
unnecessary technical details here. It requires almost no
checks in theory, either.
addKey(ContextTag, Path, KeyAttr)
:% no checks
assert(inNK(ContextTag, Path, KeyAttr)).
evo.3.2. removing a key</p>
        <p>During key declaration fk = keydecl(t, p, a) removal,
all reference declarations fr = keyrefdecl(t′, p′, a′, fk)
are also removed. Sample Prolog implementation is not
interesting enough to place here.</p>
        <p>With current key-reference model is is the only way
to perform such an operation. However if we support
Simplified XPath expressions instead of Simple ones, we
can identify instances of many tags with one key. In this
case we potentially can redeclare the key by adding it to
N K sets of t tag’s children. This will keep references to
fk unchanged. But this case is not considered here.
evo.4.1. declaring new reference</p>
        <p>Reference declaration is canceled if it does not satisfy
one or more of (KR1)-(KR5). Sample Prolog
implementation is not interesting enough to place it here.
evo.4.2. removing reference</p>
        <p>This operation does not require any additional
operations. Like operation (1.2), it can be compensated with
declaration of references belonging to N R sets of some
child tags. Sample Prolog implementation is not
interesting enough to place here.</p>
        <p>In case when we support simplified XPath
expressions, we can cover some child tags with one reference
declaration instead of declaring new references, but we
do not consider this way here as it was noted for (3.2)
operation.
4.3</p>
      </sec>
      <sec id="sec-3-5">
        <title>Declarative Approach to Refactoring</title>
        <p>Human-language description of evolution operations is
cumbersome and complicated. However this is only a
set of possible low-level schema modification operations.
Refactoring operations are rather more complicated than
ones above. Imperative description of refactoring
operations should be even more complicated and unclear so
defining them in declarative way seems to be a way to
simplify them.</p>
        <p>Here some possible refactoring operations are
listed and sample implementation ideas
provided. Wider operation list can be found at
http://www.agiledata.org/ in Database
Refactoring Catalog section.</p>
        <p>Refactoring operations listed below are mostly
projected into sequences of schema evolution operations.
Some of refactoring operations require analysis of real
data corresponding to given schema. Some of them are
unapplicable yet because XML Schema support in data
model above is not yet powerful enough.
4.3.1</p>
      </sec>
      <sec id="sec-3-6">
        <title>Refactoring Operations over Existing Data</title>
      </sec>
      <sec id="sec-3-7">
        <title>Model</title>
        <p>Refactoring operations defined and described below are
labeled «rft.N» below.
rft.1. consolidating attributes representing single
concept</p>
        <p>For attributes, which represent single concept,
consolidation can be executed. For example, address is
sometimes represented by two strings and can be consolidated
into one string.
rft.2. consolidating tags</p>
        <p>Sometimes two model tags can be consolidated
because they can be interpreted as the same entity.
rft.3. splitting an attribute</p>
        <p>Splitting an attribute is useful when the model is
growing from more simple one or from a prototype.
Example above can be reversed: if we know particular
addressing system, we can divide attribute into smaller ones
representing more atomic values.
rft.4. splitting a tag</p>
        <p>Splitting a tag can be useful, for example, when we
would like to detach a child tag from parent (instead of
adding attributes to all instances of existing tag).
rft.5. adding a key to data schema</p>
        <p>Adding new key operation is basic evolution
operation and it is already defined. Although in real databases
adding a key to schema can be also combined with
integrity checking and index creation. Automating of those
tasks can lead to complicated refactoring operation.
rft.6. adding new attribute to avoid referencing</p>
        <p>Instead of dereferencing other entity to obtain the
value of single attribute, we can store this attribute in the
referencing entity. Although productivity can be
essentially improved, this approach can lead us to inconsistent
data.
rft.7. moving an attribute</p>
        <p>Moving from one tag to another lets us to:
• avoid possible referencing when accessing
attributes of different entities;
• separate seldomly-used attributes to improve
productivity;
• separate read-only attributes if DBMS has no
permission support below entity level or when it helps
DBMS to optimize its work.
rft.8. moving a tag within τ</p>
        <p>This operation is very common and affects the whole
schema. It should be decomposed into sequence of
aggregation removal and creation.
rft.9. safe-deleting entity</p>
        <p>This refactoring operation is already supported as
elementary evolution operation.
rft.10. replacing existing natural key with surrogate one</p>
        <p>This operation declares new key and probably
removes existing one (existing key becomes simply an
attribute). It can also try to maintain references to initial
key and switch them to new one.</p>
        <p>The purpose of this operation is to change key
semantics.
rft.11. replacing “1 → ∗” aggregative association with
“∗ → ∗” one</p>
        <p>As an example we can imagine that we are going
to let the streets to pass across several cities (fig. 2).
In this case Street becomes a child of Document and
new resolver between City and Street is constructed
as shown at fig. 3.</p>
        <p>For this refactoring operation sample implementation
is provided. It is quite simple (some supplementary goals
are also defined, obvious goals are used without
definition):</p>
      </sec>
      <sec id="sec-3-8">
        <title>4.3.2 Refactoring Operations over Advanced XML</title>
      </sec>
      <sec id="sec-3-9">
        <title>Data Structures and Models</title>
        <p>rft.12. renaming anything</p>
        <p>This refactoring operation is placed in “advanced”
section because current data model does not handle such
advanced XML naming facilities as namespaces, etc.
If these facilities are appropriately supported, renaming
tag, attribute, key or any other entity in XML data model
becomes complicated task.
rft.13. restore standard types for attributes</p>
        <p>Theoretically, this operation is very interesting.
Refactoring engine should analyze user types defined in
data schema, perform some calculations of
corresponding domains and decide if some of types can be replaced
by built-in (like integer) types. Moreover, merging
similar types (and type fragments of compound types) and
simplifying type description is very interesting computer
science problem. This interesting operation is beyond
current work.
rft.14. adding cascade delete for weak entities</p>
        <p>This operation creates implicit triggers in underlying
DBMS in case one exists to delete weak entities when
their parents are destroyed.
rft.15. discover groups and turn them into entities</p>
        <p>Existing data can be analyzed to discover repeating
values of attributes and tags. Some of them can be
merged and then replaced with references to newly
created merged groups.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5 Conclusion</title>
      <p>Given association evolution model makes a
contribution to existing XML schema evolution models.
Although given model does not provide full support of
XML Schema or even DTD constructions, it seems to be
enough powerful for usage with real XML documents.</p>
      <p>
        Model can be modified if needed. For example,
support of simplified XPath expressions can make model
much more flexible. New association model can be also
integrated into document model described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] instead
of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Both solutions increase model flexibility and DTD
or XML schema support, but resulting model will be still
not enough powerful to support complete DTD and XML
schema. Both solutions lead to complicated model that
is not so illustrative as given one, so they are to be used
in cases when their techniques are useful for particular
applications.
      </p>
      <p>
        It can be sensible to combine some other set
calculation models together with Prolog goal approvals. Sample
sample system is term rewriting engine [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] available at
http://www.gradsoft.com.ua/.
      </p>
      <p>
        Although above paper describes schema evolutions,
one of possible applications of given evolution rules is
automatic transformation of XML documents content.
For example, two versions of database can have different
XML representation of their data. On-the-fly
transformations of XML content can help legacy applications to
access new database and can also make legacy database
available for new applications. Transformation itself can
be composed using operations described in this paper.
For real world application this transformation can be
executed by XSLT [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] generated from evolution operations
list.
      </p>
      <p>
        For XML document with complicated structure it is
easier for human operator to think about more
“highlevel” transformations, than most of primitive evolution
operations defined in [
        <xref ref-type="bibr" rid="ref10 ref4 ref5">4, 5, 10</xref>
        ] and here. Such
operations, like splitting entity or moving attribute from one
entity to another, etc. should be considered schema
refactoring, because in their case simple local changes often
induce global changes to whole schema. Some of
possible schema refactoring operations are described here and
are also objects of future investigations.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Relax ng compact syntax</article-title>
          ,
          <year>November 2002</year>
          . Specification.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Paoli</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Sperberg-McQueen</surname>
          </string-name>
          (
          <article-title>Eds). Extensible markup language (XML) 1.0 (2nd edition)</article-title>
          .
          <source>W3C Recommendation</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Surajit</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          , Raghav Kaushik,
          <string-name>
            <given-names>and Jeffrey F.</given-names>
            <surname>Naughton</surname>
          </string-name>
          .
          <article-title>On relational support for XML publishing: Beyond sorting and tagging</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Coox</surname>
          </string-name>
          .
          <article-title>Xml database schema evolution axiomatization</article-title>
          .
          <source>Programming and Computer Software</source>
          ,
          <volume>29</volume>
          (
          <issue>3</issue>
          ):
          <fpage>140</fpage>
          -
          <lpage>146</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Sergey</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Coox</surname>
            and
            <given-names>Andrey A.</given-names>
          </string-name>
          <string-name>
            <surname>Simanovsky</surname>
          </string-name>
          .
          <article-title>Regular expressions in xml schema evolution</article-title>
          .
          <source>Kharkiv</source>
          , Ukraine,
          <year>June 2003</year>
          . ISTA.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>DEIS</given-names>
            ,
            <surname>Universit</surname>
          </string-name>
          <string-name>
            <surname>'</surname>
          </string-name>
          <article-title>a di Bologna a Cesena, Italy. tuProlog User's Guide, 1st edition</article-title>
          ,
          <year>Sep 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Florescu</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          .
          <article-title>A performance evaluation of alternative mapping schemes for storing XML data in a relational database</article-title>
          .
          <source>Technical Report 3684, INRIA</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Florescu</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          .
          <article-title>Storing and querying xml data using an rdbms</article-title>
          .
          <source>In Data Engineering Bulletin</source>
          , volume
          <volume>22</volume>
          , pages
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          . IEEE,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Doores</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.R.</given-names>
            <surname>Reiblein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Vadera</surname>
          </string-name>
          .
          <article-title>Prolog - Programming for Tomorrow</article-title>
          . Sigma Press, Wilmslow, UK,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Dmitry</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Luciv</surname>
          </string-name>
          .
          <article-title>Semistructured database scheme evolution and refactoring</article-title>
          .
          <source>In SYRCoDIS</source>
          , pages
          <fpage>71</fpage>
          -
          <lpage>74</lpage>
          , May
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tamer Oszu Randal J. Peters</surname>
          </string-name>
          .
          <article-title>Axiomatization of dynamic schema evolution in objectbases</article-title>
          .
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Erik</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Ray</surname>
          </string-name>
          .
          <source>Learning XML. O'Reilly</source>
          ,
          <year>2001</year>
          . pp.
          <fpage>143</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shevchenko</surname>
          </string-name>
          and
          <string-name>
            <surname>A. Doroshenko.</surname>
          </string-name>
          <article-title>The system of symbolic computing for programming the dynamic applications</article-title>
          .
          <source>Interntet publication</source>
          ,
          <year>2003</year>
          . (in russian).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Simanovsky</surname>
          </string-name>
          .
          <article-title>Evolution of schema of xmldocuments stored in a relational database</article-title>
          .
          <source>In proceedings of Baltic DB&amp;IS</source>
          , Riga, Latvia,
          <year>2004</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Andrew</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Simanovsky</surname>
          </string-name>
          .
          <article-title>Applying the reconfiguration-design formalism to xml stored in a relational database</article-title>
          .
          <source>In SYRCoDIS</source>
          , pages
          <fpage>75</fpage>
          -
          <lpage>77</lpage>
          , May
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>John</surname>
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sowa</surname>
          </string-name>
          .
          <article-title>Conceptual graphs</article-title>
          .
          <source>ISO/JTC1/SC 32/WG2</source>
          , April
          <year>2001</year>
          . Standard Draft.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Stojanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maedche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Stojanovic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Studer</surname>
          </string-name>
          .
          <article-title>Ontology evolution as reconfigurationdesign problem solving</article-title>
          .
          <source>In proceedings of International Conference on Knowledge Management</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stumptner</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wotawa</surname>
          </string-name>
          .
          <article-title>Model-based reconfiguration</article-title>
          .
          <source>In proceedings Artificial Intelligence in Design</source>
          , Lisbon, Portugal.,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>W3C. XML Path</surname>
          </string-name>
          <article-title>Language (XPath), 1.0 edition</article-title>
          , November
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>W3C. XSL</surname>
          </string-name>
          <article-title>Transformations (XSLT), 1.0 edition, November 1999</article-title>
          . Recommendation.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[21] W3C. XML Schema Part</source>
          <volume>0</volume>
          : Primer, May
          <year>2001</year>
          . Recommendation.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>