=Paper= {{Paper |id=Vol-154/paper-4 |storemode=property |title=Deductive Approach to Semistructured Schema Evolution |pdfUrl=https://ceur-ws.org/Vol-154/dluciv.pdf |volume=Vol-154 }} ==Deductive Approach to Semistructured Schema Evolution== https://ceur-ws.org/Vol-154/dluciv.pdf
                                                                                                                         ∗
   Deductive Approach to Semistructured Schema Evolution

                                   c D. Luciv                   Ph.D. adviser: B. Novikov
                                                Saint-Petersburg State University
                                                    dluciv@lanit-tercom.com



                          Abstract                                  lational database has powerful interface for schema al-
                                                                    tering, and relational databases are mostly used to store
     Schema evolution for different methodologies                   large amounts of data. Other kinds of databases often
     is an interesting area of research for now and                 use relational engines behind their semistructured and/or
     future.                                                        object-oriented interfaces to increase productivity.
     Some results of research of semistructured                         Usually people think about real-world concepts like
     database schema evolution are presented here.                  entities, relations and attributes when modeling the data.
     Semistructured database schema model based                     The same can be said about data schema evolution
     on [11, 4] and its evolution techniques are in-                controlled by human operator. Existing semistructured
     troduced. Extensions to data model are also de-                database schema evolution techniques provide “low-
     scribed.                                                       level” evolution operations over particular schema items
                                                                    like tags, attributes, associations, etc. However some-
     Some refactoring problems are described.                       times operator thinks about more high-level operations
     Declarative approach to schema evolution and                   like splitting entities, moving attributes from one entity
     refactoring is presented.                                      to other, etc - which can be called “refactoring”.
                                                                        This paper describes some investigations in area
1 Introduction                                                      of schema evolution and refactoring and possible ap-
                                                                    proaches to it.
As one of the most popular universal data interchange
                                                                        The remaining paper is organized as follows:
formats, XML [2] needs to be structured and constraints
applied to XML data are to be formalized.                            1. related work is analyzed;
   XML is now often used for short message interchange
and storing small pieces of data for particular appli-               2. basics for semistructured hierarchical data model-
cations. Number of such applications is relatively big                  ing are defined;
and in this case data structure of XML can be restricted             3. restrictions on XPath expressions are defined and
with only constraints applied by XML itself: the data                   applied when modeling associations;
should have hierarchical form with tags and attributed               4. schema invariants are described;
and should obey XML syntax.
                                                                     5. declarative approach to invariants implementation is
   For more serious applications XML data schema con-
                                                                        presented;
straints are practically defined and then checked using
such means as DTD [12], XML Schema [21], Relax-                      6. declarative approach to refactoring is introduced,
NG [1], etc. Although for productivity reasons genuine-                 refactoring operations are described;
XML (unparsed textual) data packages are still relatively            7. some conclusions are presented.
small for such applications, they can have a complicated
structure.
   For all methodologies maintaining data structure is in-
                                                                    2 Related Work
teresting, but difficult. One of research areas is database         Schema evolution is an interesting problem for almost all
schema evolution for relational, object-oriented [11],              data modeling methodologies.
semistructured hierarchical [4, 5, 10, 14, 15] databases,               Object-oriented database schema evolution technique
and a number of related areas like XML-relational map-              for TIGUKAT ODBMS was proposed in [11]. This
ping [3, 7, 8, 14].                                                 model was selected as basic one to describe schema evo-
   Traditionally, data schema evolution is not a part of            lution for hierarchical semistructured databases, particu-
the both object-oriented, relational and semistructured             larly, XML in [4].
data definition methodologies, however data structure                   Both [11] and [4] models are simple and illustrative.
often requires being alterable. This comes from prac-               To make them more flexible, they were modified to han-
tice. As an illustration we can say that almost any re-             dle regular expressions [5] and reference-based associa-
    ∗ This research was partially supported by Microsoft Research
                                                                    tions [10].
grant No. 2004-459A and RFBR grant No. 04-01-00173                      We already have results in extending existing
Proceedings of the Spring Young Researcher’s Colloquium
                                                                    semistructured database model by adding associations to
on Database and Information Systems SYRCoDIS, St.-Petersburg,       it. Like [5], we base on [4] model for document tag struc-
Russia, 2005                                                        ture. New rules for association evolution are also added.
   Another interesting research area of research is                                                                          Document

onthology evolution [18, 17]. Some onthology mod-
els, like conceptual graphs [16], can be considered a
conservative extension to first-order logic and used to                                           res_Street
                                                                                                                                 City
express XML data model too. Example Prolog inter-                                           streetref : Street
                                                                                                                            name : String

preter with conceptual graph support can be obtained
                                                                                                  0..*
at        http://prologpluscg.sourceforge.net/
as       a     part      of       Amine        Platform
(http://amine-platform.sourceforge.net/).                                                Street                   Region                       Square            Clinic

                                                                                     name : String          name : String                   name : String


3 Evolution of Data Schema
3.1 Data Modeling Basics
Let us consider sample data model to be managed (fig. 1).                                                        Building                    res_Building

                                                                                                            number : int                buildingref : Building
                                                                                                                                 0..*
              Document

                                        City

                                  name : String
                                                                                         Figure 2: Real (physical) Data Structure


                                      Region
                                                                                 P (t)         set of immediate predecessors of
                                  name : String                                                tag t
                                                                                  Pe (t)       set of essential immediate prede-
                               0..*                             Clinic                         cessors of tag t
                            0..*
               Street
                                      Square
                                                                                  P L(t)       t predecessors graph
            name : String
                                  name : String                                   H(t)         t inherited attributes
                                                         0..*                     I(t)         t interface - inherited and native
                                                                                               attributes
                                      Building
                                                  0..*
                                                                                  inst(s), inst(a)     instances of tags s or a
                                  number : int                                    val(inst(s)),        values of inst(s), inst(a)
                                                                                  val(inst(a))         respectively
                                                                                  A                    union of all tags’ native
                    Figure 1: Data Structure                                                           attributes
                                                                                αx (f, τ ) = {f (x)|x ∈ τ } – the set of values of f
                                                                                where x passes τ
   Given data structure mostly consists of entities and                             Other definitions, dedicated for association modeling,
aggregations between them. Some associations are not                            are introduced below.
supported by XML syntax: in fact XML supports only                                  All desired invariants (schema axioms) are defined in
strong aggregation. Other kind of associations that is de-                      [4, 10].
scribed in [10] and in this paper are associations based on
XML references. Those associations are one-way navi-
gable.                                                                          3.2.2 Restrictions To XPath, Association Modeling
   The model needs some change to become realistic.                             Here we constrain XPath expression class to be simple
Modified model is shown on fig. 2. Here we have re-                             enough to analyze. Some of described notions were al-
solving entities for Street and Building. Two resolving                         ready defined and used in [10], here they are described
tags for entities those are to be referenced using “? → ∗”                      broader.
associations are here added to the model. All associa-
                                                                                   Let us consider association to be reference from tag t
tions are now references with single direction (as navi-
                                                                                towards tag s. Tag s is identified by b attribute, tag t uses
gability shows). By the way, resulting second model is
                                                                                a attribute to refer tag s. The reference is implemented
poorer than first one: for example, we can’t list Regions
                                                                                using XML key and keyref constructs.
containing given Street using revised data model.
                                                                                   Example XPath expression looks like
                                                                                “ max(/Document/City/Street/Building
3.2 Notations and Definitions                                                   [parent::Street/@name="Smiths’"]/@number)”.
                                                                                   It contains attributes (@name, @number), axes (im-
3.2.1 Basic Notations                                                           plicit child:: and explicit parent::), predicate
Following        notations     are     defined     by                    [4]:   (...="Smiths’") and term (max(...)). See[19] for
 τ              tag graph                                                       more information.
 s, t           tags                                                               Although above XPath expression does not employ
 a              attribute                                                       every possible XPath functionality (e.g. “|”, other oper-
 m              tag or attribute                                                ations and advanced features), it is complicated enough.
 N (t)          “native” attributes of t                                        Here we do not need to formalize every kind of XPath
 Ne (t)         essential “native” attributes of t                              expression.
   The biggest own subclass of XPath expressions de-                   CRe (t)      “Essential” reference declarations of
fined in this paper is a class of Simplified expressions.                           t ∈ τ . Here we mean reference declara-
Such expressions do not use axes (except child::),                                  tions which base on essential attributes
predicates, terms and operations. The most powerful el-                             of referencing tag and it’s essential par-
ement of this class is “//” axis. This axis can be easily                           ents
simulated by substitution of “//” with all possible path               N Re (t)     “Essential native” reference declara-
sections and then uniting search results. This operation                            tions of t ∈ τ . Here we mean reference
is finite because [4] does not use cycles in document                               declarations which base on essential at-
schemas.                                                                            tributes of referencing tag
   Simplified expressions, like other XPath expressions,
are in real XML documents, not schemas. However
Simplified expression can be described using document              3.2.4 Invariants
model terminology. It can be considered as a function
p : τ → 2τ because if we know what tag is an argument              The set of axioms (denoted as (1)–(9) here) given in [4]
instance of, we always know all possible tags which are            is extended by following ones in [10]:
resulting values instances of.                                          Axiom        Formal representation
   We can constrain result set of Simplified expression                 (KR1)         N K(t) = alphas (keydecl(
this way: ♯{p(t)} = 1, ∀p ∈ SP, t ∈ τ .                                              s, p : p(s) = t, a ∈ N (t)), P L(t))
   Members of such expressions class act as SP ∋ sp :                   (KR2)         N R(t) = αs (keyrefdecl(
τ → τ . SP class is called Simple expressions here. We                               s, p : p(s) = t, a ∈ N (t), ∗), P L(t))
here denote the set of simple XPath expressions as SP ,                 (KR3)         ∀fr ∈ R ∃fk ∈ K :
so SP (t) is the set of tags s which can be reached from                             fr = keyrefdecl(t ∈ τ, p ∈ SP,
t using simple XPath expressions. So αt (SP
                                          S (t), υ) ⊂ τ                              m ∈ (A ∪ τ ), fk )
for υ ⊂ τ . We are also to define A as t∈τ N (t). So                    (KR4)         ∀t ∈ τ CR(t) ∈ αs (
notation now includes:                                                               keyrefdecl(∗, ∗, ∗, s), CK(t))
     SP (t) subgraph of τ reached by simple XPath                       (KR5)         N Re (t) = αs (keyrefdecl(
               expressions from t                                                     s, p : p(s) = t, a ∈ Ne (t)), P L(t), ∗)
          A union of all tags’ native attributes
   Above explanations’ goal is to make nature of SP                   They are named as axioms of -native keys, -native ref-
easy to understand. Formally we can define SP as                   erences, -declaration, -coverage, -“essential” references
{s|t ∈ P L(s)} ∪ t.                                                respectively.
   Associations are represented with keys and references
                                                                      Consistency and completeness of invariants (1)–(9)
[10]:
                                                                   are proved in [4] by following theorem:
          keydecl : τ × SP × (A ∪ τ ) → K, so
         K = {fk : {val(m)} → {inst(s ∈ τ )}},                     Theorem 1 Schema axioms are sound and complete,
                   fk is the key function                          so if we have Pe (t) and Ne (t) defined for each tag
                                                                   in schema our schema satisfies axiom set and making
and                                                                changes to this schema is govered by the sets.
       keyrefdecl : τ × SP × (A ∪ τ ) × K → R, so                  Proof
            R = {fr : {inst(t)} → {val(m)}},                          is given in[4]
              fr is the key reference function.                    End of Proof

   Finally, for instances of tags we require
                                                                      Key and reference invariants and formulated in [10]
fk = keydecl(t1 , p1 , m1 ), fr = keyrefdecl(t2 , p2 , m2 , fk )   and proof idea can be expressed as follows:
                       t1 ∈ P L(t2 );
      ∀e2 ∈ {inst(p2 (t2 ))} ∃e1 ∈ {inst(p1 (t1 ))} :              Theorem 2 Schema axioms are sound and complete, so
                   (fk · fr )(e2 ) = e1 .                          if we have Pe (t), Ne (t), N K(t), N Re (t) defined for
                                                                   each tag in schema our schema satisfies axiom set and
3.2.3 Sets                                                         making changes to this schema is governed by the sets.
                                                                   Proof
Following notations for key and reference modeling are
introduced by [10]:                                                    Theorem 1 says that schema without key and refer-
     KC(fk ) Coverage of key with function fk =                    ence declarations satisfies axioms (1)–(9) when Pe (t)
                keydecl(t, p1 , a1 )                               and Ne (t) are defined for each tag. But axioms (1)–(9)
     RC(fr )    Coverage of reference with function                say nothing about keys and references so new model sat-
                fr = keyrefdecl(s, p2 , a2 , fk′ )                 isfies those axioms. The same fact relates to model with-
     CK(t)      Key declarations covering t ∈ τ                    out keys and references and axioms (KR1)–(KR5): no
     CR(t)      Reference declarations covering t ∈ τ              proof needed.
     N K(t)     “Native” key declarations of t ∈ τ                     Hence we are now to prove that keys and references
     N R(t)     “Native” reference declarations of t ∈ τ           of document schema satisfy new axioms (KR1)–(KR5).
 (KR1),(KR2)      Those axioms are satisfied obviously          • attributes are named uniquely for simplicity to dis-
                  due to key and reference semantics              tinguish them by name.
                  presented here.
        (KR3)     Every reference should base on corre-          In case we can use model from fig. 1 directly (our
                  sponding key.                               methodology allowed them), we should write a fact like
        (KR4)     We have noted above that reference          inPe(’Street’,’Region’). Moreover, Schema de-
                  coverage is always contained by key         fined is not complete in area of Region→Street ref-
                  coverage. Thus if we use reference          erence because res_Street references Street via ref-
                  covering our tag, we are also to have       erence based on attribute streetref.
                  key declaration covering our tag.              Attributes can be described as follows:
        (KR5)     Like (KR1),(KR2) is satisfied due to        %--- AttribModel
                  reference semantics. It is always able      inNe(’City’, ’City@name’).
                                                              inNe(’Street’, ’Street@name’).
                  to build N R(t) set using N Re (t) and      inNe(’Region’, ’Region@name’).
                  N (t) sets.                                 inNe(’Square’, ’Square@name’).
End of Proof                                                  inNe(’res_Street’, ’res_Street@streetref’).

4 Declarative Approach                                           Two sections of Prolog facts above define almost ev-
                                                              erything needed to describe model as [4] proposes, but
4.1 Description of Schema                                     references defined in [10] are also to be defined in case
All schema invariants are already defined using first-        when we consider them as evolution object.
order logic. Evolution is done by modifying governing            keydecl and keyrefdecl constructs require a bit more
sets [4] and recalculating other ones.                        explanation before defining them. Reference axiomatic
   Possible way to make the schema practically declar-        in [10] is based on key and reference functions which
ative is to assume XML document schema stored in de-          allow us, when superposed, identify referred tag instance
ductive database.                                             for referring one including all references defined for it.
                                                                 Those functions are “generated” by keydecl and
   Description below is equivalent to above schema in-
                                                              keyrefdecl which are, formally, functionals. Such a de-
variants but can be programmed using Prolog [9] in
                                                              scription with functionals and functions can be left al-
nearly unchanged form. However description is simpli-
fied here to make it more illustrative yet functional.        most unchanged when implementing XML DBMS using
                                                              a kind of functional language for it, but we use logical
   Note that next section describes data schema facts, but
                                                              language to describe data schema here.
does not describe any method to ensure schema is consis-
                                                                 keydecl construct can be used to identify resulting key
tent. It is not a vulnerability of the approach: evolution
                                                              function keydecl(t, p, a) as:
operation set is full, any operation leaves correct schema
in correct state, so any correct schema can be “unrolled”     %--- RefersModel
from empty one.                                               %---- Keys
                                                              inNK(’City’, [’Street’], ’Street@name’).
   Example set of facts below can be interpreted as state
of schema at particular moment. Loading data schema              or, with other context:
state “as-is” can also be useful, like restoring relational
                                                              inNK(’Document’, [’City’, ’Street’],
databases from dumps with constraints disabled tem-           ’Street@name’).
porarily for productivity.
                                                                 in case we want document-unique street name.
4.1.1 Facts of Schema                                            It is possible to reach key-covered tag by different
                                                              ways so we need a list (or other ordered container) to
Facts of deductive databases are used to store informa-       represent path from SP here.
tion that can not be deduced. This is primary information        keyrefdecl functional gets keydecl result as an argu-
about data schema.                                            ment, but it is correct to say that keydecl can be identified
   Illustration below defines subset of entities from data    by it’s argument, so practically keyrefdecl gets 6 param-
model shown on fig. 2. Schema can be described as fol-        eters in deductive database, e.g.:
lows:
                                                              %--- RefersModel
%--- EntityModel                                              %---- Refs
tag(’:root’). tag(’:term’).                                   inNRe(
                                                               ’City’, [’Region’, ’res_Street’],
tag(’Document’). tag(’City’).                                  ’res_Street@streetref’, ’City’,
tag(’Square’). tag(’Street’).                                  [’Street’], ’Street@name’ ).
tag(’Region’). tag(’res_Street’).
inPe(’City’, ’Document’).                                     4.1.2 Goals of Schema
inPe(’Square’, ’City’). inPe(’Square’, ’Region’).
inPe(’Street’, ’City’). inPe(’Region’, ’City’).               Sets which are not yet defined using Prolog are calcula-
inPe(’res_Street’, ’Region’). % - non-essential               ble and can be omitted theoretically, however [11, 4, 10]
inPe(’:term’, ’Square’). inPe(’:term’, ’Street’).             defined them for simplicity, and we will also use those
                                                              definitions below.
   Notes (here and below):                                       In deductive database those sets should not be as-
                                                              serted during schema evolution, they can be considered
  • Prolog syntax is somewhat changed to optimize pa-         as “read-only” and calculated using schema invariants.
    per usage and to group some data;                         Thus they are defined using goals, not facts.
%--- Calculable Sets                                             Evolution operations are labeled «evo.M.N» below.
                                                              evo.1.1. adding new attribute
 % Pred. Graph                    % Immediate Pred.              Rule described in [4]. No additional requirements and
 inPL(S, T) :-                    inP(T, S) :-
                                   inPe(T, S),                activities needed.
  inP(S, T).
 inPL(S, T) :-                     inPe(T, X),
                                   X=\=S,                     addAttribute(Tag,Attr) :-
  inP(S, U),                                                      tag(Tag),
  U=\=T,                           not(inPL(X,S)).
  inPL(U, T).                                                     assert(inNe(Tag,Attr)).
                                  inP(T, S) :-
 inPL(S,S).                        inPe(T, S),
                                   tag(X),                    evo.1.2. deleting attribute
                                   X=\=S, %single Pe             When deleting attribute a of tag t, all key declarations
                                   not(inPe(T,X)).            in N K(t) and references in N R(t) are deleted. See op-
                                                              eration of key removal (3.2) below.
 % Inh. attributes                % Native Attributes            As noted in [4], an ability to add deleted attribute to
 inH(T, A) :-                     inN(T, A) :-                Ne of tags in SP (t) is useful. Implementation of this
  inP(T, S),                       inNe(T, A),
  inI(S, A).                       not(inH(T, A)).            functionality should also redeclare keys in N K(t) and
                                                              references in N R(t). Sample implementation is cum-
                                                              bersome and omitted here.
 % Interface                      % :root & term
 inI(T, A) :-                     inPL(_,’:root’).            evo.2.1. adding aggregation between two tags
  inN(T, A).                      inPL(’:term’,_).               Rule described in [4]. No additional requirements and
 inI(T, A) :-
  inH(T, A).                                                  activities needed.
                                                              addAggregation(Tag,Parent) :-
    Above Prolog goals can be used almost without                 tag(Tag), tag(Parent),
changes except recursive goals for P L sets. In experi-           assert(inPe(Tag,Parent)).
ments they were optimized and allowed to check if any of
arguments are not free term and then decide how to recur.     evo.2.2. removing aggregation between two tags
For example, when we try to solve inPL(X, ’City’)                This operation is relatively complicated in both [4]
it is better to recur down from City tag, however ax-         and here.
ioms and goals above recur upwards only. Calculation             Let us call aggregated tag t.
of P sets was also changed in practice: in [11, 4] in-           All key declarations in N K(t) are modified to satisfy
variants were defined to handle sets of unique elements,      (KR4) axiom. This means that declarations move upward
however Prolog gives a number of non-unique solutions         unless they cover all references again. Note that if t is
for inP and inPL above goals (all variants) for any un-       no more aggregated by other tags it implicitly becomes
bound arguments. Practically, some tricks with Prolog         child of root tag ⊤, and all corresponding key declara-
cut-off were used. All those tricks and optimizations are     tions move to ⊤.
beyond this paper.                                               All reference declarations in N R(t) are deleted if they
    Detailed descriptions of sets above and formulae for      become invalid. All reference declarations in t move up
schema invariants are placed in [4]. Those formulae are       to save references in αs (N R(s), SP (t)).
used “as-is” their calculation require no imperative de-         A sample code (qualitatively simplified) here executes
scription at all.                                             key redeclaration according to (KR4).
    N K(τ ) and N Re (τ ) sets are governing sets for keys
and references, other are calculated. Invariants from [10]    leastCommonEPL(Tag1,Tag2,Pred,Path1,Path2)
                                                              ... .                                      :-
can be described like this:                                   % finds least common predecessor in both
                                                              % PL(Tag1)\{Tag1} and PL(Tag2)\{Tag2} and
last(l, e) :-                                                 % pathes to it. Trivial but cumbersome.
 eq(l, (h)), eq(e, h).
last(l,e) :-                                                  retractDependentRefs(Child, Parent, DKP) :-
 eq(l, (h|t)), last(t,e).                                      findall((RT,RP,RA,KT,KP,KA),
                                                               (inNRe(RT,RP,RA,KT,KP,KA),
inNR(Rtag, Rpath, Rattr, Ktag, Kpath, Kattr):-                  member(Child, KP),member(Parent, [KT | KP]),
 inNRe(Rtag, Rpath, Rattr, Ktag, Kpath, Kattr),                 retract(inNRe(RT,RP,RA,KT,KP,KA)) ), DKP).
 last(Rpath, RRtag),
 inN(RRtag, Rattr).                                           correctKeyRefs(Child, Parent, DK) :-
                                                               findall(inNRe(RT,RP,RA,KT,KP,KA),
                                                               (inNRe(RT,RP,RA,KT,KP,KA),
4.2 Declarative Approach to Evolution                           member(Child, KP),
                                                                member(Parent, [KT | KP]),
Basic operations which were initially defined in [4] for        retract(inNRe(RT,RP,RA,KT,KP,KA)),
the data model without associations are modified to han-        retract(inNR(KT,KP,KA)),
dle references. Evolution operations from [10] are here         leastCommonEPL(Parent,Child,LKE,PPath,CPath),
                                                                addKey(LKE,CPath,KA),
defined considering schema as deductive database de-            addReference(RT,RP,RA,LKE,PPath,KA)
scribed above.                                                 ), DK).
    Note that we use some ISO Prolog goals like
findall(Var, Term, Result) [6]. However Prolog                   Then we can invoke them:
code below can be used as-is, this code is dedicated to il-
                                                              removeAggregation(Child,Parent) :-
lustrate main ideas of possible schema evolution engine        retractDependentRefs(Child, Parent, _),
implementation. Some checks are here omitted and evo-          correctKeyRefs(Child, Parent, _),
lution operations are defined as simply as possible.           retract(inPe(Child,Parent)).
evo.2.3. adding new tag                                          set of possible low-level schema modification operations.
   The rule is described in [4]. No additional require-          Refactoring operations are rather more complicated than
ments and activities needed. Prolog goal below also il-          ones above. Imperative description of refactoring oper-
lustrates some checks of bound variables. Those checks           ations should be even more complicated and unclear so
are useful live applications.                                    defining them in declarative way seems to be a way to
                                                                 simplify them.
addTag(Tag, PeList) :-                                               Here some possible refactoring operations are
 findall(X, (member(X, PeList),
  not(tag(X))), BadList), %only sample check                     listed and sample implementation ideas pro-
 length(BadList, 0), % they all are tags                         vided.      Wider operation list can be found at
 assert(tag(Tag)),                                               http://www.agiledata.org/ in Database Refactor-
 findall(X, (member(X, PeList),                                  ing Catalog section.
  addAggregation(Tag, X)), Successfull),
 addAggregation(’:term’, Tag).                                       Refactoring operations listed below are mostly pro-
                                                                 jected into sequences of schema evolution operations.
evo.2.4. removing a tag                                          Some of refactoring operations require analysis of real
    The rule is described in [4] for basic structure without     data corresponding to given schema. Some of them are
keys and references. Removed tag is called t. Like in            unapplicable yet because XML Schema support in data
operation (1.2) of attribute removal, the reference decla-       model above is not yet powerful enough.
rations in N Re (t) can be modified to cover all children
of t. For all such reference declarations we are to create       4.3.1 Refactoring Operations over Existing Data
corresponding reference attributes in children of t. Sam-              Model
ple Prolog implementation is not placed here because of
its awkwardness.                                                 Refactoring operations defined and described below are
evo.3.1. declaring new key                                       labeled «rft.N» below.
    Key declaration is canceled if it does not satisfy one or    rft.1. consolidating attributes representing single con-
more of (KR1)-(KR5). Sample Prolog code omits checks             cept
because they are cumbersome and can be considered as                 For attributes, which represent single concept, consol-
unnecessary technical details here. It requires almost no        idation can be executed. For example, address is some-
checks in theory, either.                                        times represented by two strings and can be consolidated
                                                                 into one string.
addKey(ContextTag, Path, KeyAttr) :-                             rft.2. consolidating tags
 % no checks                                                         Sometimes two model tags can be consolidated be-
 assert(inNK(ContextTag, Path, KeyAttr)).
                                                                 cause they can be interpreted as the same entity.
evo.3.2. removing a key                                          rft.3. splitting an attribute
    During key declaration fk = keydecl(t, p, a) removal,            Splitting an attribute is useful when the model is
all reference declarations fr = keyrefdecl(t′ , p′ , a′ , fk )   growing from more simple one or from a prototype. Ex-
are also removed. Sample Prolog implementation is not            ample above can be reversed: if we know particular ad-
interesting enough to place here.                                dressing system, we can divide attribute into smaller ones
    With current key-reference model is is the only way          representing more atomic values.
to perform such an operation. However if we support              rft.4. splitting a tag
Simplified XPath expressions instead of Simple ones, we              Splitting a tag can be useful, for example, when we
can identify instances of many tags with one key. In this        would like to detach a child tag from parent (instead of
case we potentially can redeclare the key by adding it to        adding attributes to all instances of existing tag).
N K sets of t tag’s children. This will keep references to       rft.5. adding a key to data schema
fk unchanged. But this case is not considered here.                  Adding new key operation is basic evolution opera-
evo.4.1. declaring new reference                                 tion and it is already defined. Although in real databases
    Reference declaration is canceled if it does not satisfy     adding a key to schema can be also combined with in-
one or more of (KR1)-(KR5). Sample Prolog implemen-              tegrity checking and index creation. Automating of those
tation is not interesting enough to place it here.               tasks can lead to complicated refactoring operation.
evo.4.2. removing reference                                      rft.6. adding new attribute to avoid referencing
    This operation does not require any additional opera-            Instead of dereferencing other entity to obtain the
tions. Like operation (1.2), it can be compensated with          value of single attribute, we can store this attribute in the
declaration of references belonging to N R sets of some          referencing entity. Although productivity can be essen-
child tags. Sample Prolog implementation is not inter-           tially improved, this approach can lead us to inconsistent
esting enough to place here.                                     data.
    In case when we support simplified XPath expres-             rft.7. moving an attribute
sions, we can cover some child tags with one reference               Moving from one tag to another lets us to:
declaration instead of declaring new references, but we            • avoid possible referencing when accessing at-
do not consider this way here as it was noted for (3.2)              tributes of different entities;
operation.                                                         • separate seldomly-used attributes to improve pro-
                                                                     ductivity;
4.3 Declarative Approach to Refactoring
                                                                   • separate read-only attributes if DBMS has no per-
Human-language description of evolution operations is                mission support below entity level or when it helps
cumbersome and complicated. However this is only a                   DBMS to optimize its work.
                                                              4.3.2 Refactoring Operations over Advanced XML
                                                                    Data Structures and Models
                                                              rft.12. renaming anything
                                                                  This refactoring operation is placed in “advanced”
                                                              section because current data model does not handle such
                                                              advanced XML naming facilities as namespaces, etc.
                                                              If these facilities are appropriately supported, renaming
         Figure 3: Refactoring Model fragment                 tag, attribute, key or any other entity in XML data model
                                                              becomes complicated task.
                                                              rft.13. restore standard types for attributes
rft.8. moving a tag within τ                                      Theoretically, this operation is very interesting.
                                                              Refactoring engine should analyze user types defined in
    This operation is very common and affects the whole       data schema, perform some calculations of correspond-
schema. It should be decomposed into sequence of ag-          ing domains and decide if some of types can be replaced
gregation removal and creation.                               by built-in (like integer) types. Moreover, merging sim-
rft.9. safe-deleting entity                                   ilar types (and type fragments of compound types) and
    This refactoring operation is already supported as el-    simplifying type description is very interesting computer
ementary evolution operation.                                 science problem. This interesting operation is beyond
rft.10. replacing existing natural key with surrogate one     current work.
    This operation declares new key and probably re-          rft.14. adding cascade delete for weak entities
moves existing one (existing key becomes simply an at-            This operation creates implicit triggers in underlying
tribute). It can also try to maintain references to initial   DBMS in case one exists to delete weak entities when
key and switch them to new one.                               their parents are destroyed.
                                                              rft.15. discover groups and turn them into entities
    The purpose of this operation is to change key seman-
                                                                  Existing data can be analyzed to discover repeating
tics.
                                                              values of attributes and tags. Some of them can be
rft.11. replacing “1 → ∗” aggregative association with        merged and then replaced with references to newly cre-
“∗ → ∗” one                                                   ated merged groups.
    As an example we can imagine that we are going
to let the streets to pass across several cities (fig. 2).    5 Conclusion
In this case Street becomes a child of Document and
new resolver between City and Street is constructed           Given association evolution model makes a contribu-
as shown at fig. 3.                                           tion to existing XML schema evolution models. Al-
    For this refactoring operation sample implementation      though given model does not provide full support of
is provided. It is quite simple (some supplementary goals     XML Schema or even DTD constructions, it seems to be
are also defined, obvious goals are used without defini-      enough powerful for usage with real XML documents.
tion):                                                            Model can be modified if needed. For example, sup-
                                                              port of simplified XPath expressions can make model
                                                              much more flexible. New association model can be also
useOrCreateKey(Tag, PathWithoutLast,                          integrated into document model described in [5] instead
 KeyTag, KeyAttr, MustReferTag):-                             of [4]. Both solutions increase model flexibility and DTD
 concat(PathWithoutLast, [KeyTag], Path),
 inNK(Tag, Path, KeyAttr),                                    or XML schema support, but resulting model will be still
 inPL(MustReferTag, Tag),                                     not enough powerful to support complete DTD and XML
 !. %one is enough                                            schema. Both solutions lead to complicated model that
useOrCreateKey(Tag, PathWithoutLast,                          is not so illustrative as given one, so they are to be used
 KeyTag, KeyAttr, MustReferTag):-                             in cases when their techniques are useful for particular
 leastCommonEPL(KeyTag,MustReferTag,
  Tag,Path,RPath),                                            applications.
 KeyAttr = Keytag + ’@idFor_’ + MustReferTag,                     It can be sensible to combine some other set calcula-
 addAttribute(KeyTag, KeyAttr),                               tion models together with Prolog goal approvals. Sample
 addKey(Tag, Path, KeyAttr),                                  sample system is term rewriting engine [13] available at
 removeLast(Path, PathWithoutLast).
replaceAggregationWithResolver
                                                              http://www.gradsoft.com.ua/.
 (Child, Parent, ResName) :-                                      Although above paper describes schema evolutions,
 ResName = ’res_’ + Parent + ’_’ + Child,                     one of possible applications of given evolution rules is
 addTag(ResName, [Parent]),                                   automatic transformation of XML documents content.
 removeAggergation(Child, Parent),
 RefAttrName = ResName + ’@’ + Child + ’ref’,                 For example, two versions of database can have different
 addAttribute(ResName, RefAttrName),                          XML representation of their data. On-the-fly transfor-
 useOrCreateKey(Tag, PathWithoutLast,                         mations of XML content can help legacy applications to
  Child, KeyAttr, ResName),
 concat(PathWithoutLast, [KeyTag], Path),                     access new database and can also make legacy database
 onePathFromTo(ResName, Tag, ResPath),!,                      available for new applications. Transformation itself can
 %gives one, cuts off.                                        be composed using operations described in this paper.
 addReference(                                                For real world application this transformation can be exe-
  Tag, ResPath, RefAttrName,
  Tag, Path, KeyAttr                                          cuted by XSLT [20] generated from evolution operations
 ).                                                           list.
   For XML document with complicated structure it is         [15] Andrew A. Simanovsky.             Applying the
easier for human operator to think about more “high-              reconfiguration-design formalism to xml stored in
level” transformations, than most of primitive evolution          a relational database. In SYRCoDIS, pages 75–77,
operations defined in [4, 5, 10] and here. Such opera-            May 2004.
tions, like splitting entity or moving attribute from one
entity to another, etc. should be considered schema refac-   [16] John F. Sowa. Conceptual graphs. ISO/JTC1/SC
toring, because in their case simple local changes often          32/WG2, April 2001. Standard Draft.
induce global changes to whole schema. Some of possi-        [17] L. Stojanovic, A. Maedche, N. Stojanovic, and
ble schema refactoring operations are described here and          R. Studer. Ontology evolution as reconfiguration-
are also objects of future investigations.                        design problem solving. In proceedings of Inter-
                                                                  national Conference on Knowledge Management,
References                                                        2003.
 [1] Relax ng compact syntax, November 2002. Speci-          [18] M. Stumptner and F. Wotawa. Model-based recon-
     fication.                                                    figuration. In proceedings Artificial Intelligence in
 [2] T. Bray, J. Paoli, and C. M. Sperberg-McQueen                Design, Lisbon, Portugal., 1998.
     (Eds). Extensible markup language (XML) 1.0             [19] W3C. XML Path Language (XPath), 1.0 edition,
     (2nd edition). W3C Recommendation, 2000.                     November 1999.
 [3] Surajit Chaudhuri, Raghav Kaushik, and Jeffrey F.       [20] W3C. XSL Transformations (XSLT), 1.0 edition,
     Naughton. On relational support for XML publish-             November 1999. Recommendation.
     ing: Beyond sorting and tagging.
                                                             [21] W3C. XML Schema Part 0: Primer, May 2001.
 [4] S. Coox. Xml database schema evolution axiom-                Recommendation.
     atization. Programming and Computer Software,
     29(3):140–146, 2003.
 [5] Sergey V. Coox and Andrey A. Simanovsky. Regu-
     lar expressions in xml schema evolution. Kharkiv,
     Ukraine, June 2003. ISTA.
 [6] DEIS, Universit‘a di Bologna a Cesena, Italy.
     tuProlog User’s Guide, 1st edition, Sep 2002.
 [7] D. Florescu and D. Kossmann. A performance eval-
     uation of alternative mapping schemes for storing
     XML data in a relational database. Technical Re-
     port 3684, INRIA, 1999.
 [8] D. Florescu and D. Kossmann. Storing and query-
     ing xml data using an rdbms. In Data Engineering
     Bulletin, volume 22, pages 27–34. IEEE, 1999.
 [9] J.Doores, A.R.Reiblein, and S.Vadera. Prolog –
     Programming for Tomorrow. Sigma Press, Wilm-
     slow, UK, 1987.
[10] Dmitry V. Luciv. Semistructured database scheme
     evolution and refactoring. In SYRCoDIS, pages 71–
     74, May 2004.
[11] M. Tamer Oszu Randal J. Peters. Axiomatization
     of dynamic schema evolution in objectbases. 1997.
[12] Erik T. Ray. Learning XML. O’Reilly, 2001. pp.
     143-189.
[13] R. Shevchenko and A. Doroshenko. The system of
     symbolic computing for programming the dynamic
     applications. Interntet publication, 2003. (in rus-
     sian).
[14] A. Simanovsky. Evolution of schema of xml-
     documents stored in a relational database. In pro-
     ceedings of Baltic DB&IS, Riga, Latvia, 2004. As-
     sociation for Computing Machinery. To appear.