=Paper=
{{Paper
|id=Vol-154/paper-4
|storemode=property
|title=Deductive Approach to Semistructured Schema Evolution
|pdfUrl=https://ceur-ws.org/Vol-154/dluciv.pdf
|volume=Vol-154
}}
==Deductive Approach to Semistructured Schema Evolution==
∗
Deductive Approach to Semistructured Schema Evolution
c D. Luciv Ph.D. adviser: B. Novikov
Saint-Petersburg State University
dluciv@lanit-tercom.com
Abstract lational database has powerful interface for schema al-
tering, and relational databases are mostly used to store
Schema evolution for different methodologies large amounts of data. Other kinds of databases often
is an interesting area of research for now and use relational engines behind their semistructured and/or
future. object-oriented interfaces to increase productivity.
Some results of research of semistructured Usually people think about real-world concepts like
database schema evolution are presented here. entities, relations and attributes when modeling the data.
Semistructured database schema model based The same can be said about data schema evolution
on [11, 4] and its evolution techniques are in- controlled by human operator. Existing semistructured
troduced. Extensions to data model are also de- database schema evolution techniques provide “low-
scribed. level” evolution operations over particular schema items
like tags, attributes, associations, etc. However some-
Some refactoring problems are described. times operator thinks about more high-level operations
Declarative approach to schema evolution and like splitting entities, moving attributes from one entity
refactoring is presented. to other, etc - which can be called “refactoring”.
This paper describes some investigations in area
1 Introduction of schema evolution and refactoring and possible ap-
proaches to it.
As one of the most popular universal data interchange
The remaining paper is organized as follows:
formats, XML [2] needs to be structured and constraints
applied to XML data are to be formalized. 1. related work is analyzed;
XML is now often used for short message interchange
and storing small pieces of data for particular appli- 2. basics for semistructured hierarchical data model-
cations. Number of such applications is relatively big ing are defined;
and in this case data structure of XML can be restricted 3. restrictions on XPath expressions are defined and
with only constraints applied by XML itself: the data applied when modeling associations;
should have hierarchical form with tags and attributed 4. schema invariants are described;
and should obey XML syntax.
5. declarative approach to invariants implementation is
For more serious applications XML data schema con-
presented;
straints are practically defined and then checked using
such means as DTD [12], XML Schema [21], Relax- 6. declarative approach to refactoring is introduced,
NG [1], etc. Although for productivity reasons genuine- refactoring operations are described;
XML (unparsed textual) data packages are still relatively 7. some conclusions are presented.
small for such applications, they can have a complicated
structure.
For all methodologies maintaining data structure is in-
2 Related Work
teresting, but difficult. One of research areas is database Schema evolution is an interesting problem for almost all
schema evolution for relational, object-oriented [11], data modeling methodologies.
semistructured hierarchical [4, 5, 10, 14, 15] databases, Object-oriented database schema evolution technique
and a number of related areas like XML-relational map- for TIGUKAT ODBMS was proposed in [11]. This
ping [3, 7, 8, 14]. model was selected as basic one to describe schema evo-
Traditionally, data schema evolution is not a part of lution for hierarchical semistructured databases, particu-
the both object-oriented, relational and semistructured larly, XML in [4].
data definition methodologies, however data structure Both [11] and [4] models are simple and illustrative.
often requires being alterable. This comes from prac- To make them more flexible, they were modified to han-
tice. As an illustration we can say that almost any re- dle regular expressions [5] and reference-based associa-
∗ This research was partially supported by Microsoft Research
tions [10].
grant No. 2004-459A and RFBR grant No. 04-01-00173 We already have results in extending existing
Proceedings of the Spring Young Researcher’s Colloquium
semistructured database model by adding associations to
on Database and Information Systems SYRCoDIS, St.-Petersburg, it. Like [5], we base on [4] model for document tag struc-
Russia, 2005 ture. New rules for association evolution are also added.
Another interesting research area of research is Document
onthology evolution [18, 17]. Some onthology mod-
els, like conceptual graphs [16], can be considered a
conservative extension to first-order logic and used to res_Street
City
express XML data model too. Example Prolog inter- streetref : Street
name : String
preter with conceptual graph support can be obtained
0..*
at http://prologpluscg.sourceforge.net/
as a part of Amine Platform
(http://amine-platform.sourceforge.net/). Street Region Square Clinic
name : String name : String name : String
3 Evolution of Data Schema
3.1 Data Modeling Basics
Let us consider sample data model to be managed (fig. 1). Building res_Building
number : int buildingref : Building
0..*
Document
City
name : String
Figure 2: Real (physical) Data Structure
Region
P (t) set of immediate predecessors of
name : String tag t
Pe (t) set of essential immediate prede-
0..* Clinic cessors of tag t
0..*
Street
Square
P L(t) t predecessors graph
name : String
name : String H(t) t inherited attributes
0..* I(t) t interface - inherited and native
attributes
Building
0..*
inst(s), inst(a) instances of tags s or a
number : int val(inst(s)), values of inst(s), inst(a)
val(inst(a)) respectively
A union of all tags’ native
Figure 1: Data Structure attributes
αx (f, τ ) = {f (x)|x ∈ τ } – the set of values of f
where x passes τ
Given data structure mostly consists of entities and Other definitions, dedicated for association modeling,
aggregations between them. Some associations are not are introduced below.
supported by XML syntax: in fact XML supports only All desired invariants (schema axioms) are defined in
strong aggregation. Other kind of associations that is de- [4, 10].
scribed in [10] and in this paper are associations based on
XML references. Those associations are one-way navi-
gable. 3.2.2 Restrictions To XPath, Association Modeling
The model needs some change to become realistic. Here we constrain XPath expression class to be simple
Modified model is shown on fig. 2. Here we have re- enough to analyze. Some of described notions were al-
solving entities for Street and Building. Two resolving ready defined and used in [10], here they are described
tags for entities those are to be referenced using “? → ∗” broader.
associations are here added to the model. All associa-
Let us consider association to be reference from tag t
tions are now references with single direction (as navi-
towards tag s. Tag s is identified by b attribute, tag t uses
gability shows). By the way, resulting second model is
a attribute to refer tag s. The reference is implemented
poorer than first one: for example, we can’t list Regions
using XML key and keyref constructs.
containing given Street using revised data model.
Example XPath expression looks like
“ max(/Document/City/Street/Building
3.2 Notations and Definitions [parent::Street/@name="Smiths’"]/@number)”.
It contains attributes (@name, @number), axes (im-
3.2.1 Basic Notations plicit child:: and explicit parent::), predicate
Following notations are defined by [4]: (...="Smiths’") and term (max(...)). See[19] for
τ tag graph more information.
s, t tags Although above XPath expression does not employ
a attribute every possible XPath functionality (e.g. “|”, other oper-
m tag or attribute ations and advanced features), it is complicated enough.
N (t) “native” attributes of t Here we do not need to formalize every kind of XPath
Ne (t) essential “native” attributes of t expression.
The biggest own subclass of XPath expressions de- CRe (t) “Essential” reference declarations of
fined in this paper is a class of Simplified expressions. t ∈ τ . Here we mean reference declara-
Such expressions do not use axes (except child::), tions which base on essential attributes
predicates, terms and operations. The most powerful el- of referencing tag and it’s essential par-
ement of this class is “//” axis. This axis can be easily ents
simulated by substitution of “//” with all possible path N Re (t) “Essential native” reference declara-
sections and then uniting search results. This operation tions of t ∈ τ . Here we mean reference
is finite because [4] does not use cycles in document declarations which base on essential at-
schemas. tributes of referencing tag
Simplified expressions, like other XPath expressions,
are in real XML documents, not schemas. However
Simplified expression can be described using document 3.2.4 Invariants
model terminology. It can be considered as a function
p : τ → 2τ because if we know what tag is an argument The set of axioms (denoted as (1)–(9) here) given in [4]
instance of, we always know all possible tags which are is extended by following ones in [10]:
resulting values instances of. Axiom Formal representation
We can constrain result set of Simplified expression (KR1) N K(t) = alphas (keydecl(
this way: ♯{p(t)} = 1, ∀p ∈ SP, t ∈ τ . s, p : p(s) = t, a ∈ N (t)), P L(t))
Members of such expressions class act as SP ∋ sp : (KR2) N R(t) = αs (keyrefdecl(
τ → τ . SP class is called Simple expressions here. We s, p : p(s) = t, a ∈ N (t), ∗), P L(t))
here denote the set of simple XPath expressions as SP , (KR3) ∀fr ∈ R ∃fk ∈ K :
so SP (t) is the set of tags s which can be reached from fr = keyrefdecl(t ∈ τ, p ∈ SP,
t using simple XPath expressions. So αt (SP
S (t), υ) ⊂ τ m ∈ (A ∪ τ ), fk )
for υ ⊂ τ . We are also to define A as t∈τ N (t). So (KR4) ∀t ∈ τ CR(t) ∈ αs (
notation now includes: keyrefdecl(∗, ∗, ∗, s), CK(t))
SP (t) subgraph of τ reached by simple XPath (KR5) N Re (t) = αs (keyrefdecl(
expressions from t s, p : p(s) = t, a ∈ Ne (t)), P L(t), ∗)
A union of all tags’ native attributes
Above explanations’ goal is to make nature of SP They are named as axioms of -native keys, -native ref-
easy to understand. Formally we can define SP as erences, -declaration, -coverage, -“essential” references
{s|t ∈ P L(s)} ∪ t. respectively.
Associations are represented with keys and references
Consistency and completeness of invariants (1)–(9)
[10]:
are proved in [4] by following theorem:
keydecl : τ × SP × (A ∪ τ ) → K, so
K = {fk : {val(m)} → {inst(s ∈ τ )}}, Theorem 1 Schema axioms are sound and complete,
fk is the key function so if we have Pe (t) and Ne (t) defined for each tag
in schema our schema satisfies axiom set and making
and changes to this schema is govered by the sets.
keyrefdecl : τ × SP × (A ∪ τ ) × K → R, so Proof
R = {fr : {inst(t)} → {val(m)}}, is given in[4]
fr is the key reference function. End of Proof
Finally, for instances of tags we require
Key and reference invariants and formulated in [10]
fk = keydecl(t1 , p1 , m1 ), fr = keyrefdecl(t2 , p2 , m2 , fk ) and proof idea can be expressed as follows:
t1 ∈ P L(t2 );
∀e2 ∈ {inst(p2 (t2 ))} ∃e1 ∈ {inst(p1 (t1 ))} : Theorem 2 Schema axioms are sound and complete, so
(fk · fr )(e2 ) = e1 . if we have Pe (t), Ne (t), N K(t), N Re (t) defined for
each tag in schema our schema satisfies axiom set and
3.2.3 Sets making changes to this schema is governed by the sets.
Proof
Following notations for key and reference modeling are
introduced by [10]: Theorem 1 says that schema without key and refer-
KC(fk ) Coverage of key with function fk = ence declarations satisfies axioms (1)–(9) when Pe (t)
keydecl(t, p1 , a1 ) and Ne (t) are defined for each tag. But axioms (1)–(9)
RC(fr ) Coverage of reference with function say nothing about keys and references so new model sat-
fr = keyrefdecl(s, p2 , a2 , fk′ ) isfies those axioms. The same fact relates to model with-
CK(t) Key declarations covering t ∈ τ out keys and references and axioms (KR1)–(KR5): no
CR(t) Reference declarations covering t ∈ τ proof needed.
N K(t) “Native” key declarations of t ∈ τ Hence we are now to prove that keys and references
N R(t) “Native” reference declarations of t ∈ τ of document schema satisfy new axioms (KR1)–(KR5).
(KR1),(KR2) Those axioms are satisfied obviously • attributes are named uniquely for simplicity to dis-
due to key and reference semantics tinguish them by name.
presented here.
(KR3) Every reference should base on corre- In case we can use model from fig. 1 directly (our
sponding key. methodology allowed them), we should write a fact like
(KR4) We have noted above that reference inPe(’Street’,’Region’). Moreover, Schema de-
coverage is always contained by key fined is not complete in area of Region→Street ref-
coverage. Thus if we use reference erence because res_Street references Street via ref-
covering our tag, we are also to have erence based on attribute streetref.
key declaration covering our tag. Attributes can be described as follows:
(KR5) Like (KR1),(KR2) is satisfied due to %--- AttribModel
reference semantics. It is always able inNe(’City’, ’City@name’).
inNe(’Street’, ’Street@name’).
to build N R(t) set using N Re (t) and inNe(’Region’, ’Region@name’).
N (t) sets. inNe(’Square’, ’Square@name’).
End of Proof inNe(’res_Street’, ’res_Street@streetref’).
4 Declarative Approach Two sections of Prolog facts above define almost ev-
erything needed to describe model as [4] proposes, but
4.1 Description of Schema references defined in [10] are also to be defined in case
All schema invariants are already defined using first- when we consider them as evolution object.
order logic. Evolution is done by modifying governing keydecl and keyrefdecl constructs require a bit more
sets [4] and recalculating other ones. explanation before defining them. Reference axiomatic
Possible way to make the schema practically declar- in [10] is based on key and reference functions which
ative is to assume XML document schema stored in de- allow us, when superposed, identify referred tag instance
ductive database. for referring one including all references defined for it.
Those functions are “generated” by keydecl and
Description below is equivalent to above schema in-
keyrefdecl which are, formally, functionals. Such a de-
variants but can be programmed using Prolog [9] in
scription with functionals and functions can be left al-
nearly unchanged form. However description is simpli-
fied here to make it more illustrative yet functional. most unchanged when implementing XML DBMS using
a kind of functional language for it, but we use logical
Note that next section describes data schema facts, but
language to describe data schema here.
does not describe any method to ensure schema is consis-
keydecl construct can be used to identify resulting key
tent. It is not a vulnerability of the approach: evolution
function keydecl(t, p, a) as:
operation set is full, any operation leaves correct schema
in correct state, so any correct schema can be “unrolled” %--- RefersModel
from empty one. %---- Keys
inNK(’City’, [’Street’], ’Street@name’).
Example set of facts below can be interpreted as state
of schema at particular moment. Loading data schema or, with other context:
state “as-is” can also be useful, like restoring relational
inNK(’Document’, [’City’, ’Street’],
databases from dumps with constraints disabled tem- ’Street@name’).
porarily for productivity.
in case we want document-unique street name.
4.1.1 Facts of Schema It is possible to reach key-covered tag by different
ways so we need a list (or other ordered container) to
Facts of deductive databases are used to store informa- represent path from SP here.
tion that can not be deduced. This is primary information keyrefdecl functional gets keydecl result as an argu-
about data schema. ment, but it is correct to say that keydecl can be identified
Illustration below defines subset of entities from data by it’s argument, so practically keyrefdecl gets 6 param-
model shown on fig. 2. Schema can be described as fol- eters in deductive database, e.g.:
lows:
%--- RefersModel
%--- EntityModel %---- Refs
tag(’:root’). tag(’:term’). inNRe(
’City’, [’Region’, ’res_Street’],
tag(’Document’). tag(’City’). ’res_Street@streetref’, ’City’,
tag(’Square’). tag(’Street’). [’Street’], ’Street@name’ ).
tag(’Region’). tag(’res_Street’).
inPe(’City’, ’Document’). 4.1.2 Goals of Schema
inPe(’Square’, ’City’). inPe(’Square’, ’Region’).
inPe(’Street’, ’City’). inPe(’Region’, ’City’). Sets which are not yet defined using Prolog are calcula-
inPe(’res_Street’, ’Region’). % - non-essential ble and can be omitted theoretically, however [11, 4, 10]
inPe(’:term’, ’Square’). inPe(’:term’, ’Street’). defined them for simplicity, and we will also use those
definitions below.
Notes (here and below): In deductive database those sets should not be as-
serted during schema evolution, they can be considered
• Prolog syntax is somewhat changed to optimize pa- as “read-only” and calculated using schema invariants.
per usage and to group some data; Thus they are defined using goals, not facts.
%--- Calculable Sets Evolution operations are labeled «evo.M.N» below.
evo.1.1. adding new attribute
% Pred. Graph % Immediate Pred. Rule described in [4]. No additional requirements and
inPL(S, T) :- inP(T, S) :-
inPe(T, S), activities needed.
inP(S, T).
inPL(S, T) :- inPe(T, X),
X=\=S, addAttribute(Tag,Attr) :-
inP(S, U), tag(Tag),
U=\=T, not(inPL(X,S)).
inPL(U, T). assert(inNe(Tag,Attr)).
inP(T, S) :-
inPL(S,S). inPe(T, S),
tag(X), evo.1.2. deleting attribute
X=\=S, %single Pe When deleting attribute a of tag t, all key declarations
not(inPe(T,X)). in N K(t) and references in N R(t) are deleted. See op-
eration of key removal (3.2) below.
% Inh. attributes % Native Attributes As noted in [4], an ability to add deleted attribute to
inH(T, A) :- inN(T, A) :- Ne of tags in SP (t) is useful. Implementation of this
inP(T, S), inNe(T, A),
inI(S, A). not(inH(T, A)). functionality should also redeclare keys in N K(t) and
references in N R(t). Sample implementation is cum-
bersome and omitted here.
% Interface % :root & term
inI(T, A) :- inPL(_,’:root’). evo.2.1. adding aggregation between two tags
inN(T, A). inPL(’:term’,_). Rule described in [4]. No additional requirements and
inI(T, A) :-
inH(T, A). activities needed.
addAggregation(Tag,Parent) :-
Above Prolog goals can be used almost without tag(Tag), tag(Parent),
changes except recursive goals for P L sets. In experi- assert(inPe(Tag,Parent)).
ments they were optimized and allowed to check if any of
arguments are not free term and then decide how to recur. evo.2.2. removing aggregation between two tags
For example, when we try to solve inPL(X, ’City’) This operation is relatively complicated in both [4]
it is better to recur down from City tag, however ax- and here.
ioms and goals above recur upwards only. Calculation Let us call aggregated tag t.
of P sets was also changed in practice: in [11, 4] in- All key declarations in N K(t) are modified to satisfy
variants were defined to handle sets of unique elements, (KR4) axiom. This means that declarations move upward
however Prolog gives a number of non-unique solutions unless they cover all references again. Note that if t is
for inP and inPL above goals (all variants) for any un- no more aggregated by other tags it implicitly becomes
bound arguments. Practically, some tricks with Prolog child of root tag ⊤, and all corresponding key declara-
cut-off were used. All those tricks and optimizations are tions move to ⊤.
beyond this paper. All reference declarations in N R(t) are deleted if they
Detailed descriptions of sets above and formulae for become invalid. All reference declarations in t move up
schema invariants are placed in [4]. Those formulae are to save references in αs (N R(s), SP (t)).
used “as-is” their calculation require no imperative de- A sample code (qualitatively simplified) here executes
scription at all. key redeclaration according to (KR4).
N K(τ ) and N Re (τ ) sets are governing sets for keys
and references, other are calculated. Invariants from [10] leastCommonEPL(Tag1,Tag2,Pred,Path1,Path2)
... . :-
can be described like this: % finds least common predecessor in both
% PL(Tag1)\{Tag1} and PL(Tag2)\{Tag2} and
last(l, e) :- % pathes to it. Trivial but cumbersome.
eq(l, (h)), eq(e, h).
last(l,e) :- retractDependentRefs(Child, Parent, DKP) :-
eq(l, (h|t)), last(t,e). findall((RT,RP,RA,KT,KP,KA),
(inNRe(RT,RP,RA,KT,KP,KA),
inNR(Rtag, Rpath, Rattr, Ktag, Kpath, Kattr):- member(Child, KP),member(Parent, [KT | KP]),
inNRe(Rtag, Rpath, Rattr, Ktag, Kpath, Kattr), retract(inNRe(RT,RP,RA,KT,KP,KA)) ), DKP).
last(Rpath, RRtag),
inN(RRtag, Rattr). correctKeyRefs(Child, Parent, DK) :-
findall(inNRe(RT,RP,RA,KT,KP,KA),
(inNRe(RT,RP,RA,KT,KP,KA),
4.2 Declarative Approach to Evolution member(Child, KP),
member(Parent, [KT | KP]),
Basic operations which were initially defined in [4] for retract(inNRe(RT,RP,RA,KT,KP,KA)),
the data model without associations are modified to han- retract(inNR(KT,KP,KA)),
dle references. Evolution operations from [10] are here leastCommonEPL(Parent,Child,LKE,PPath,CPath),
addKey(LKE,CPath,KA),
defined considering schema as deductive database de- addReference(RT,RP,RA,LKE,PPath,KA)
scribed above. ), DK).
Note that we use some ISO Prolog goals like
findall(Var, Term, Result) [6]. However Prolog Then we can invoke them:
code below can be used as-is, this code is dedicated to il-
removeAggregation(Child,Parent) :-
lustrate main ideas of possible schema evolution engine retractDependentRefs(Child, Parent, _),
implementation. Some checks are here omitted and evo- correctKeyRefs(Child, Parent, _),
lution operations are defined as simply as possible. retract(inPe(Child,Parent)).
evo.2.3. adding new tag set of possible low-level schema modification operations.
The rule is described in [4]. No additional require- Refactoring operations are rather more complicated than
ments and activities needed. Prolog goal below also il- ones above. Imperative description of refactoring oper-
lustrates some checks of bound variables. Those checks ations should be even more complicated and unclear so
are useful live applications. defining them in declarative way seems to be a way to
simplify them.
addTag(Tag, PeList) :- Here some possible refactoring operations are
findall(X, (member(X, PeList),
not(tag(X))), BadList), %only sample check listed and sample implementation ideas pro-
length(BadList, 0), % they all are tags vided. Wider operation list can be found at
assert(tag(Tag)), http://www.agiledata.org/ in Database Refactor-
findall(X, (member(X, PeList), ing Catalog section.
addAggregation(Tag, X)), Successfull),
addAggregation(’:term’, Tag). Refactoring operations listed below are mostly pro-
jected into sequences of schema evolution operations.
evo.2.4. removing a tag Some of refactoring operations require analysis of real
The rule is described in [4] for basic structure without data corresponding to given schema. Some of them are
keys and references. Removed tag is called t. Like in unapplicable yet because XML Schema support in data
operation (1.2) of attribute removal, the reference decla- model above is not yet powerful enough.
rations in N Re (t) can be modified to cover all children
of t. For all such reference declarations we are to create 4.3.1 Refactoring Operations over Existing Data
corresponding reference attributes in children of t. Sam- Model
ple Prolog implementation is not placed here because of
its awkwardness. Refactoring operations defined and described below are
evo.3.1. declaring new key labeled «rft.N» below.
Key declaration is canceled if it does not satisfy one or rft.1. consolidating attributes representing single con-
more of (KR1)-(KR5). Sample Prolog code omits checks cept
because they are cumbersome and can be considered as For attributes, which represent single concept, consol-
unnecessary technical details here. It requires almost no idation can be executed. For example, address is some-
checks in theory, either. times represented by two strings and can be consolidated
into one string.
addKey(ContextTag, Path, KeyAttr) :- rft.2. consolidating tags
% no checks Sometimes two model tags can be consolidated be-
assert(inNK(ContextTag, Path, KeyAttr)).
cause they can be interpreted as the same entity.
evo.3.2. removing a key rft.3. splitting an attribute
During key declaration fk = keydecl(t, p, a) removal, Splitting an attribute is useful when the model is
all reference declarations fr = keyrefdecl(t′ , p′ , a′ , fk ) growing from more simple one or from a prototype. Ex-
are also removed. Sample Prolog implementation is not ample above can be reversed: if we know particular ad-
interesting enough to place here. dressing system, we can divide attribute into smaller ones
With current key-reference model is is the only way representing more atomic values.
to perform such an operation. However if we support rft.4. splitting a tag
Simplified XPath expressions instead of Simple ones, we Splitting a tag can be useful, for example, when we
can identify instances of many tags with one key. In this would like to detach a child tag from parent (instead of
case we potentially can redeclare the key by adding it to adding attributes to all instances of existing tag).
N K sets of t tag’s children. This will keep references to rft.5. adding a key to data schema
fk unchanged. But this case is not considered here. Adding new key operation is basic evolution opera-
evo.4.1. declaring new reference tion and it is already defined. Although in real databases
Reference declaration is canceled if it does not satisfy adding a key to schema can be also combined with in-
one or more of (KR1)-(KR5). Sample Prolog implemen- tegrity checking and index creation. Automating of those
tation is not interesting enough to place it here. tasks can lead to complicated refactoring operation.
evo.4.2. removing reference rft.6. adding new attribute to avoid referencing
This operation does not require any additional opera- Instead of dereferencing other entity to obtain the
tions. Like operation (1.2), it can be compensated with value of single attribute, we can store this attribute in the
declaration of references belonging to N R sets of some referencing entity. Although productivity can be essen-
child tags. Sample Prolog implementation is not inter- tially improved, this approach can lead us to inconsistent
esting enough to place here. data.
In case when we support simplified XPath expres- rft.7. moving an attribute
sions, we can cover some child tags with one reference Moving from one tag to another lets us to:
declaration instead of declaring new references, but we • avoid possible referencing when accessing at-
do not consider this way here as it was noted for (3.2) tributes of different entities;
operation. • separate seldomly-used attributes to improve pro-
ductivity;
4.3 Declarative Approach to Refactoring
• separate read-only attributes if DBMS has no per-
Human-language description of evolution operations is mission support below entity level or when it helps
cumbersome and complicated. However this is only a DBMS to optimize its work.
4.3.2 Refactoring Operations over Advanced XML
Data Structures and Models
rft.12. renaming anything
This refactoring operation is placed in “advanced”
section because current data model does not handle such
advanced XML naming facilities as namespaces, etc.
If these facilities are appropriately supported, renaming
Figure 3: Refactoring Model fragment tag, attribute, key or any other entity in XML data model
becomes complicated task.
rft.13. restore standard types for attributes
rft.8. moving a tag within τ Theoretically, this operation is very interesting.
Refactoring engine should analyze user types defined in
This operation is very common and affects the whole data schema, perform some calculations of correspond-
schema. It should be decomposed into sequence of ag- ing domains and decide if some of types can be replaced
gregation removal and creation. by built-in (like integer) types. Moreover, merging sim-
rft.9. safe-deleting entity ilar types (and type fragments of compound types) and
This refactoring operation is already supported as el- simplifying type description is very interesting computer
ementary evolution operation. science problem. This interesting operation is beyond
rft.10. replacing existing natural key with surrogate one current work.
This operation declares new key and probably re- rft.14. adding cascade delete for weak entities
moves existing one (existing key becomes simply an at- This operation creates implicit triggers in underlying
tribute). It can also try to maintain references to initial DBMS in case one exists to delete weak entities when
key and switch them to new one. their parents are destroyed.
rft.15. discover groups and turn them into entities
The purpose of this operation is to change key seman-
Existing data can be analyzed to discover repeating
tics.
values of attributes and tags. Some of them can be
rft.11. replacing “1 → ∗” aggregative association with merged and then replaced with references to newly cre-
“∗ → ∗” one ated merged groups.
As an example we can imagine that we are going
to let the streets to pass across several cities (fig. 2). 5 Conclusion
In this case Street becomes a child of Document and
new resolver between City and Street is constructed Given association evolution model makes a contribu-
as shown at fig. 3. tion to existing XML schema evolution models. Al-
For this refactoring operation sample implementation though given model does not provide full support of
is provided. It is quite simple (some supplementary goals XML Schema or even DTD constructions, it seems to be
are also defined, obvious goals are used without defini- enough powerful for usage with real XML documents.
tion): Model can be modified if needed. For example, sup-
port of simplified XPath expressions can make model
much more flexible. New association model can be also
useOrCreateKey(Tag, PathWithoutLast, integrated into document model described in [5] instead
KeyTag, KeyAttr, MustReferTag):- of [4]. Both solutions increase model flexibility and DTD
concat(PathWithoutLast, [KeyTag], Path),
inNK(Tag, Path, KeyAttr), or XML schema support, but resulting model will be still
inPL(MustReferTag, Tag), not enough powerful to support complete DTD and XML
!. %one is enough schema. Both solutions lead to complicated model that
useOrCreateKey(Tag, PathWithoutLast, is not so illustrative as given one, so they are to be used
KeyTag, KeyAttr, MustReferTag):- in cases when their techniques are useful for particular
leastCommonEPL(KeyTag,MustReferTag,
Tag,Path,RPath), applications.
KeyAttr = Keytag + ’@idFor_’ + MustReferTag, It can be sensible to combine some other set calcula-
addAttribute(KeyTag, KeyAttr), tion models together with Prolog goal approvals. Sample
addKey(Tag, Path, KeyAttr), sample system is term rewriting engine [13] available at
removeLast(Path, PathWithoutLast).
replaceAggregationWithResolver
http://www.gradsoft.com.ua/.
(Child, Parent, ResName) :- Although above paper describes schema evolutions,
ResName = ’res_’ + Parent + ’_’ + Child, one of possible applications of given evolution rules is
addTag(ResName, [Parent]), automatic transformation of XML documents content.
removeAggergation(Child, Parent),
RefAttrName = ResName + ’@’ + Child + ’ref’, For example, two versions of database can have different
addAttribute(ResName, RefAttrName), XML representation of their data. On-the-fly transfor-
useOrCreateKey(Tag, PathWithoutLast, mations of XML content can help legacy applications to
Child, KeyAttr, ResName),
concat(PathWithoutLast, [KeyTag], Path), access new database and can also make legacy database
onePathFromTo(ResName, Tag, ResPath),!, available for new applications. Transformation itself can
%gives one, cuts off. be composed using operations described in this paper.
addReference( For real world application this transformation can be exe-
Tag, ResPath, RefAttrName,
Tag, Path, KeyAttr cuted by XSLT [20] generated from evolution operations
). list.
For XML document with complicated structure it is [15] Andrew A. Simanovsky. Applying the
easier for human operator to think about more “high- reconfiguration-design formalism to xml stored in
level” transformations, than most of primitive evolution a relational database. In SYRCoDIS, pages 75–77,
operations defined in [4, 5, 10] and here. Such opera- May 2004.
tions, like splitting entity or moving attribute from one
entity to another, etc. should be considered schema refac- [16] John F. Sowa. Conceptual graphs. ISO/JTC1/SC
toring, because in their case simple local changes often 32/WG2, April 2001. Standard Draft.
induce global changes to whole schema. Some of possi- [17] L. Stojanovic, A. Maedche, N. Stojanovic, and
ble schema refactoring operations are described here and R. Studer. Ontology evolution as reconfiguration-
are also objects of future investigations. design problem solving. In proceedings of Inter-
national Conference on Knowledge Management,
References 2003.
[1] Relax ng compact syntax, November 2002. Speci- [18] M. Stumptner and F. Wotawa. Model-based recon-
fication. figuration. In proceedings Artificial Intelligence in
[2] T. Bray, J. Paoli, and C. M. Sperberg-McQueen Design, Lisbon, Portugal., 1998.
(Eds). Extensible markup language (XML) 1.0 [19] W3C. XML Path Language (XPath), 1.0 edition,
(2nd edition). W3C Recommendation, 2000. November 1999.
[3] Surajit Chaudhuri, Raghav Kaushik, and Jeffrey F. [20] W3C. XSL Transformations (XSLT), 1.0 edition,
Naughton. On relational support for XML publish- November 1999. Recommendation.
ing: Beyond sorting and tagging.
[21] W3C. XML Schema Part 0: Primer, May 2001.
[4] S. Coox. Xml database schema evolution axiom- Recommendation.
atization. Programming and Computer Software,
29(3):140–146, 2003.
[5] Sergey V. Coox and Andrey A. Simanovsky. Regu-
lar expressions in xml schema evolution. Kharkiv,
Ukraine, June 2003. ISTA.
[6] DEIS, Universit‘a di Bologna a Cesena, Italy.
tuProlog User’s Guide, 1st edition, Sep 2002.
[7] D. Florescu and D. Kossmann. A performance eval-
uation of alternative mapping schemes for storing
XML data in a relational database. Technical Re-
port 3684, INRIA, 1999.
[8] D. Florescu and D. Kossmann. Storing and query-
ing xml data using an rdbms. In Data Engineering
Bulletin, volume 22, pages 27–34. IEEE, 1999.
[9] J.Doores, A.R.Reiblein, and S.Vadera. Prolog –
Programming for Tomorrow. Sigma Press, Wilm-
slow, UK, 1987.
[10] Dmitry V. Luciv. Semistructured database scheme
evolution and refactoring. In SYRCoDIS, pages 71–
74, May 2004.
[11] M. Tamer Oszu Randal J. Peters. Axiomatization
of dynamic schema evolution in objectbases. 1997.
[12] Erik T. Ray. Learning XML. O’Reilly, 2001. pp.
143-189.
[13] R. Shevchenko and A. Doroshenko. The system of
symbolic computing for programming the dynamic
applications. Interntet publication, 2003. (in rus-
sian).
[14] A. Simanovsky. Evolution of schema of xml-
documents stored in a relational database. In pro-
ceedings of Baltic DB&IS, Riga, Latvia, 2004. As-
sociation for Computing Machinery. To appear.