∗ Deductive Approach to Semistructured Schema Evolution c D. Luciv Ph.D. adviser: B. Novikov Saint-Petersburg State University dluciv@lanit-tercom.com Abstract lational database has powerful interface for schema al- tering, and relational databases are mostly used to store Schema evolution for different methodologies large amounts of data. Other kinds of databases often is an interesting area of research for now and use relational engines behind their semistructured and/or future. object-oriented interfaces to increase productivity. Some results of research of semistructured Usually people think about real-world concepts like database schema evolution are presented here. entities, relations and attributes when modeling the data. Semistructured database schema model based The same can be said about data schema evolution on [11, 4] and its evolution techniques are in- controlled by human operator. Existing semistructured troduced. Extensions to data model are also de- database schema evolution techniques provide “low- scribed. level” evolution operations over particular schema items like tags, attributes, associations, etc. However some- Some refactoring problems are described. times operator thinks about more high-level operations Declarative approach to schema evolution and like splitting entities, moving attributes from one entity refactoring is presented. to other, etc - which can be called “refactoring”. This paper describes some investigations in area 1 Introduction of schema evolution and refactoring and possible ap- proaches to it. As one of the most popular universal data interchange The remaining paper is organized as follows: formats, XML [2] needs to be structured and constraints applied to XML data are to be formalized. 1. related work is analyzed; XML is now often used for short message interchange and storing small pieces of data for particular appli- 2. basics for semistructured hierarchical data model- cations. Number of such applications is relatively big ing are defined; and in this case data structure of XML can be restricted 3. restrictions on XPath expressions are defined and with only constraints applied by XML itself: the data applied when modeling associations; should have hierarchical form with tags and attributed 4. schema invariants are described; and should obey XML syntax. 5. declarative approach to invariants implementation is For more serious applications XML data schema con- presented; straints are practically defined and then checked using such means as DTD [12], XML Schema [21], Relax- 6. declarative approach to refactoring is introduced, NG [1], etc. Although for productivity reasons genuine- refactoring operations are described; XML (unparsed textual) data packages are still relatively 7. some conclusions are presented. small for such applications, they can have a complicated structure. For all methodologies maintaining data structure is in- 2 Related Work teresting, but difficult. One of research areas is database Schema evolution is an interesting problem for almost all schema evolution for relational, object-oriented [11], data modeling methodologies. semistructured hierarchical [4, 5, 10, 14, 15] databases, Object-oriented database schema evolution technique and a number of related areas like XML-relational map- for TIGUKAT ODBMS was proposed in [11]. This ping [3, 7, 8, 14]. model was selected as basic one to describe schema evo- Traditionally, data schema evolution is not a part of lution for hierarchical semistructured databases, particu- the both object-oriented, relational and semistructured larly, XML in [4]. data definition methodologies, however data structure Both [11] and [4] models are simple and illustrative. often requires being alterable. This comes from prac- To make them more flexible, they were modified to han- tice. As an illustration we can say that almost any re- dle regular expressions [5] and reference-based associa- ∗ This research was partially supported by Microsoft Research tions [10]. grant No. 2004-459A and RFBR grant No. 04-01-00173 We already have results in extending existing Proceedings of the Spring Young Researcher’s Colloquium semistructured database model by adding associations to on Database and Information Systems SYRCoDIS, St.-Petersburg, it. Like [5], we base on [4] model for document tag struc- Russia, 2005 ture. New rules for association evolution are also added. Another interesting research area of research is Document onthology evolution [18, 17]. Some onthology mod- els, like conceptual graphs [16], can be considered a conservative extension to first-order logic and used to res_Street City express XML data model too. Example Prolog inter- streetref : Street name : String preter with conceptual graph support can be obtained 0..* at http://prologpluscg.sourceforge.net/ as a part of Amine Platform (http://amine-platform.sourceforge.net/). Street Region Square Clinic name : String name : String name : String 3 Evolution of Data Schema 3.1 Data Modeling Basics Let us consider sample data model to be managed (fig. 1). Building res_Building number : int buildingref : Building 0..* Document City name : String Figure 2: Real (physical) Data Structure Region P (t) set of immediate predecessors of name : String tag t Pe (t) set of essential immediate prede- 0..* Clinic cessors of tag t 0..* Street Square P L(t) t predecessors graph name : String name : String H(t) t inherited attributes 0..* I(t) t interface - inherited and native attributes Building 0..* inst(s), inst(a) instances of tags s or a number : int val(inst(s)), values of inst(s), inst(a) val(inst(a)) respectively A union of all tags’ native Figure 1: Data Structure attributes αx (f, τ ) = {f (x)|x ∈ τ } – the set of values of f where x passes τ Given data structure mostly consists of entities and Other definitions, dedicated for association modeling, aggregations between them. Some associations are not are introduced below. supported by XML syntax: in fact XML supports only All desired invariants (schema axioms) are defined in strong aggregation. Other kind of associations that is de- [4, 10]. scribed in [10] and in this paper are associations based on XML references. Those associations are one-way navi- gable. 3.2.2 Restrictions To XPath, Association Modeling The model needs some change to become realistic. Here we constrain XPath expression class to be simple Modified model is shown on fig. 2. Here we have re- enough to analyze. Some of described notions were al- solving entities for Street and Building. Two resolving ready defined and used in [10], here they are described tags for entities those are to be referenced using “? → ∗” broader. associations are here added to the model. All associa- Let us consider association to be reference from tag t tions are now references with single direction (as navi- towards tag s. Tag s is identified by b attribute, tag t uses gability shows). By the way, resulting second model is a attribute to refer tag s. The reference is implemented poorer than first one: for example, we can’t list Regions using XML key and keyref constructs. containing given Street using revised data model. Example XPath expression looks like “ max(/Document/City/Street/Building 3.2 Notations and Definitions [parent::Street/@name="Smiths’"]/@number)”. It contains attributes (@name, @number), axes (im- 3.2.1 Basic Notations plicit child:: and explicit parent::), predicate Following notations are defined by [4]: (...="Smiths’") and term (max(...)). See[19] for τ tag graph more information. s, t tags Although above XPath expression does not employ a attribute every possible XPath functionality (e.g. “|”, other oper- m tag or attribute ations and advanced features), it is complicated enough. N (t) “native” attributes of t Here we do not need to formalize every kind of XPath Ne (t) essential “native” attributes of t expression. The biggest own subclass of XPath expressions de- CRe (t) “Essential” reference declarations of fined in this paper is a class of Simplified expressions. t ∈ τ . Here we mean reference declara- Such expressions do not use axes (except child::), tions which base on essential attributes predicates, terms and operations. The most powerful el- of referencing tag and it’s essential par- ement of this class is “//” axis. This axis can be easily ents simulated by substitution of “//” with all possible path N Re (t) “Essential native” reference declara- sections and then uniting search results. This operation tions of t ∈ τ . Here we mean reference is finite because [4] does not use cycles in document declarations which base on essential at- schemas. tributes of referencing tag Simplified expressions, like other XPath expressions, are in real XML documents, not schemas. However Simplified expression can be described using document 3.2.4 Invariants model terminology. It can be considered as a function p : τ → 2τ because if we know what tag is an argument The set of axioms (denoted as (1)–(9) here) given in [4] instance of, we always know all possible tags which are is extended by following ones in [10]: resulting values instances of. Axiom Formal representation We can constrain result set of Simplified expression (KR1) N K(t) = alphas (keydecl( this way: ♯{p(t)} = 1, ∀p ∈ SP, t ∈ τ . s, p : p(s) = t, a ∈ N (t)), P L(t)) Members of such expressions class act as SP ∋ sp : (KR2) N R(t) = αs (keyrefdecl( τ → τ . SP class is called Simple expressions here. We s, p : p(s) = t, a ∈ N (t), ∗), P L(t)) here denote the set of simple XPath expressions as SP , (KR3) ∀fr ∈ R ∃fk ∈ K : so SP (t) is the set of tags s which can be reached from fr = keyrefdecl(t ∈ τ, p ∈ SP, t using simple XPath expressions. So αt (SP S (t), υ) ⊂ τ m ∈ (A ∪ τ ), fk ) for υ ⊂ τ . We are also to define A as t∈τ N (t). So (KR4) ∀t ∈ τ CR(t) ∈ αs ( notation now includes: keyrefdecl(∗, ∗, ∗, s), CK(t)) SP (t) subgraph of τ reached by simple XPath (KR5) N Re (t) = αs (keyrefdecl( expressions from t s, p : p(s) = t, a ∈ Ne (t)), P L(t), ∗) A union of all tags’ native attributes Above explanations’ goal is to make nature of SP They are named as axioms of -native keys, -native ref- easy to understand. Formally we can define SP as erences, -declaration, -coverage, -“essential” references {s|t ∈ P L(s)} ∪ t. respectively. Associations are represented with keys and references Consistency and completeness of invariants (1)–(9) [10]: are proved in [4] by following theorem: keydecl : τ × SP × (A ∪ τ ) → K, so K = {fk : {val(m)} → {inst(s ∈ τ )}}, Theorem 1 Schema axioms are sound and complete, fk is the key function so if we have Pe (t) and Ne (t) defined for each tag in schema our schema satisfies axiom set and making and changes to this schema is govered by the sets. keyrefdecl : τ × SP × (A ∪ τ ) × K → R, so Proof R = {fr : {inst(t)} → {val(m)}}, is given in[4] fr is the key reference function. End of Proof Finally, for instances of tags we require Key and reference invariants and formulated in [10] fk = keydecl(t1 , p1 , m1 ), fr = keyrefdecl(t2 , p2 , m2 , fk ) and proof idea can be expressed as follows: t1 ∈ P L(t2 ); ∀e2 ∈ {inst(p2 (t2 ))} ∃e1 ∈ {inst(p1 (t1 ))} : Theorem 2 Schema axioms are sound and complete, so (fk · fr )(e2 ) = e1 . if we have Pe (t), Ne (t), N K(t), N Re (t) defined for each tag in schema our schema satisfies axiom set and 3.2.3 Sets making changes to this schema is governed by the sets. Proof Following notations for key and reference modeling are introduced by [10]: Theorem 1 says that schema without key and refer- KC(fk ) Coverage of key with function fk = ence declarations satisfies axioms (1)–(9) when Pe (t) keydecl(t, p1 , a1 ) and Ne (t) are defined for each tag. But axioms (1)–(9) RC(fr ) Coverage of reference with function say nothing about keys and references so new model sat- fr = keyrefdecl(s, p2 , a2 , fk′ ) isfies those axioms. The same fact relates to model with- CK(t) Key declarations covering t ∈ τ out keys and references and axioms (KR1)–(KR5): no CR(t) Reference declarations covering t ∈ τ proof needed. N K(t) “Native” key declarations of t ∈ τ Hence we are now to prove that keys and references N R(t) “Native” reference declarations of t ∈ τ of document schema satisfy new axioms (KR1)–(KR5). (KR1),(KR2) Those axioms are satisfied obviously • attributes are named uniquely for simplicity to dis- due to key and reference semantics tinguish them by name. presented here. (KR3) Every reference should base on corre- In case we can use model from fig. 1 directly (our sponding key. methodology allowed them), we should write a fact like (KR4) We have noted above that reference inPe(’Street’,’Region’). Moreover, Schema de- coverage is always contained by key fined is not complete in area of Region→Street ref- coverage. Thus if we use reference erence because res_Street references Street via ref- covering our tag, we are also to have erence based on attribute streetref. key declaration covering our tag. Attributes can be described as follows: (KR5) Like (KR1),(KR2) is satisfied due to %--- AttribModel reference semantics. It is always able inNe(’City’, ’City@name’). inNe(’Street’, ’Street@name’). to build N R(t) set using N Re (t) and inNe(’Region’, ’Region@name’). N (t) sets. inNe(’Square’, ’Square@name’). End of Proof inNe(’res_Street’, ’res_Street@streetref’). 4 Declarative Approach Two sections of Prolog facts above define almost ev- erything needed to describe model as [4] proposes, but 4.1 Description of Schema references defined in [10] are also to be defined in case All schema invariants are already defined using first- when we consider them as evolution object. order logic. Evolution is done by modifying governing keydecl and keyrefdecl constructs require a bit more sets [4] and recalculating other ones. explanation before defining them. Reference axiomatic Possible way to make the schema practically declar- in [10] is based on key and reference functions which ative is to assume XML document schema stored in de- allow us, when superposed, identify referred tag instance ductive database. for referring one including all references defined for it. Those functions are “generated” by keydecl and Description below is equivalent to above schema in- keyrefdecl which are, formally, functionals. Such a de- variants but can be programmed using Prolog [9] in scription with functionals and functions can be left al- nearly unchanged form. However description is simpli- fied here to make it more illustrative yet functional. most unchanged when implementing XML DBMS using a kind of functional language for it, but we use logical Note that next section describes data schema facts, but language to describe data schema here. does not describe any method to ensure schema is consis- keydecl construct can be used to identify resulting key tent. It is not a vulnerability of the approach: evolution function keydecl(t, p, a) as: operation set is full, any operation leaves correct schema in correct state, so any correct schema can be “unrolled” %--- RefersModel from empty one. %---- Keys inNK(’City’, [’Street’], ’Street@name’). Example set of facts below can be interpreted as state of schema at particular moment. Loading data schema or, with other context: state “as-is” can also be useful, like restoring relational inNK(’Document’, [’City’, ’Street’], databases from dumps with constraints disabled tem- ’Street@name’). porarily for productivity. in case we want document-unique street name. 4.1.1 Facts of Schema It is possible to reach key-covered tag by different ways so we need a list (or other ordered container) to Facts of deductive databases are used to store informa- represent path from SP here. tion that can not be deduced. This is primary information keyrefdecl functional gets keydecl result as an argu- about data schema. ment, but it is correct to say that keydecl can be identified Illustration below defines subset of entities from data by it’s argument, so practically keyrefdecl gets 6 param- model shown on fig. 2. Schema can be described as fol- eters in deductive database, e.g.: lows: %--- RefersModel %--- EntityModel %---- Refs tag(’:root’). tag(’:term’). inNRe( ’City’, [’Region’, ’res_Street’], tag(’Document’). tag(’City’). ’res_Street@streetref’, ’City’, tag(’Square’). tag(’Street’). [’Street’], ’Street@name’ ). tag(’Region’). tag(’res_Street’). inPe(’City’, ’Document’). 4.1.2 Goals of Schema inPe(’Square’, ’City’). inPe(’Square’, ’Region’). inPe(’Street’, ’City’). inPe(’Region’, ’City’). Sets which are not yet defined using Prolog are calcula- inPe(’res_Street’, ’Region’). % - non-essential ble and can be omitted theoretically, however [11, 4, 10] inPe(’:term’, ’Square’). inPe(’:term’, ’Street’). defined them for simplicity, and we will also use those definitions below. Notes (here and below): In deductive database those sets should not be as- serted during schema evolution, they can be considered • Prolog syntax is somewhat changed to optimize pa- as “read-only” and calculated using schema invariants. per usage and to group some data; Thus they are defined using goals, not facts. %--- Calculable Sets Evolution operations are labeled «evo.M.N» below. evo.1.1. adding new attribute % Pred. Graph % Immediate Pred. Rule described in [4]. No additional requirements and inPL(S, T) :- inP(T, S) :- inPe(T, S), activities needed. inP(S, T). inPL(S, T) :- inPe(T, X), X=\=S, addAttribute(Tag,Attr) :- inP(S, U), tag(Tag), U=\=T, not(inPL(X,S)). inPL(U, T). assert(inNe(Tag,Attr)). inP(T, S) :- inPL(S,S). inPe(T, S), tag(X), evo.1.2. deleting attribute X=\=S, %single Pe When deleting attribute a of tag t, all key declarations not(inPe(T,X)). in N K(t) and references in N R(t) are deleted. See op- eration of key removal (3.2) below. % Inh. attributes % Native Attributes As noted in [4], an ability to add deleted attribute to inH(T, A) :- inN(T, A) :- Ne of tags in SP (t) is useful. Implementation of this inP(T, S), inNe(T, A), inI(S, A). not(inH(T, A)). functionality should also redeclare keys in N K(t) and references in N R(t). Sample implementation is cum- bersome and omitted here. % Interface % :root & term inI(T, A) :- inPL(_,’:root’). evo.2.1. adding aggregation between two tags inN(T, A). inPL(’:term’,_). Rule described in [4]. No additional requirements and inI(T, A) :- inH(T, A). activities needed. addAggregation(Tag,Parent) :- Above Prolog goals can be used almost without tag(Tag), tag(Parent), changes except recursive goals for P L sets. In experi- assert(inPe(Tag,Parent)). ments they were optimized and allowed to check if any of arguments are not free term and then decide how to recur. evo.2.2. removing aggregation between two tags For example, when we try to solve inPL(X, ’City’) This operation is relatively complicated in both [4] it is better to recur down from City tag, however ax- and here. ioms and goals above recur upwards only. Calculation Let us call aggregated tag t. of P sets was also changed in practice: in [11, 4] in- All key declarations in N K(t) are modified to satisfy variants were defined to handle sets of unique elements, (KR4) axiom. This means that declarations move upward however Prolog gives a number of non-unique solutions unless they cover all references again. Note that if t is for inP and inPL above goals (all variants) for any un- no more aggregated by other tags it implicitly becomes bound arguments. Practically, some tricks with Prolog child of root tag ⊤, and all corresponding key declara- cut-off were used. All those tricks and optimizations are tions move to ⊤. beyond this paper. All reference declarations in N R(t) are deleted if they Detailed descriptions of sets above and formulae for become invalid. All reference declarations in t move up schema invariants are placed in [4]. Those formulae are to save references in αs (N R(s), SP (t)). used “as-is” their calculation require no imperative de- A sample code (qualitatively simplified) here executes scription at all. key redeclaration according to (KR4). N K(τ ) and N Re (τ ) sets are governing sets for keys and references, other are calculated. Invariants from [10] leastCommonEPL(Tag1,Tag2,Pred,Path1,Path2) ... . :- can be described like this: % finds least common predecessor in both % PL(Tag1)\{Tag1} and PL(Tag2)\{Tag2} and last(l, e) :- % pathes to it. Trivial but cumbersome. eq(l, (h)), eq(e, h). last(l,e) :- retractDependentRefs(Child, Parent, DKP) :- eq(l, (h|t)), last(t,e). findall((RT,RP,RA,KT,KP,KA), (inNRe(RT,RP,RA,KT,KP,KA), inNR(Rtag, Rpath, Rattr, Ktag, Kpath, Kattr):- member(Child, KP),member(Parent, [KT | KP]), inNRe(Rtag, Rpath, Rattr, Ktag, Kpath, Kattr), retract(inNRe(RT,RP,RA,KT,KP,KA)) ), DKP). last(Rpath, RRtag), inN(RRtag, Rattr). correctKeyRefs(Child, Parent, DK) :- findall(inNRe(RT,RP,RA,KT,KP,KA), (inNRe(RT,RP,RA,KT,KP,KA), 4.2 Declarative Approach to Evolution member(Child, KP), member(Parent, [KT | KP]), Basic operations which were initially defined in [4] for retract(inNRe(RT,RP,RA,KT,KP,KA)), the data model without associations are modified to han- retract(inNR(KT,KP,KA)), dle references. Evolution operations from [10] are here leastCommonEPL(Parent,Child,LKE,PPath,CPath), addKey(LKE,CPath,KA), defined considering schema as deductive database de- addReference(RT,RP,RA,LKE,PPath,KA) scribed above. ), DK). Note that we use some ISO Prolog goals like findall(Var, Term, Result) [6]. However Prolog Then we can invoke them: code below can be used as-is, this code is dedicated to il- removeAggregation(Child,Parent) :- lustrate main ideas of possible schema evolution engine retractDependentRefs(Child, Parent, _), implementation. Some checks are here omitted and evo- correctKeyRefs(Child, Parent, _), lution operations are defined as simply as possible. retract(inPe(Child,Parent)). evo.2.3. adding new tag set of possible low-level schema modification operations. The rule is described in [4]. No additional require- Refactoring operations are rather more complicated than ments and activities needed. Prolog goal below also il- ones above. Imperative description of refactoring oper- lustrates some checks of bound variables. Those checks ations should be even more complicated and unclear so are useful live applications. defining them in declarative way seems to be a way to simplify them. addTag(Tag, PeList) :- Here some possible refactoring operations are findall(X, (member(X, PeList), not(tag(X))), BadList), %only sample check listed and sample implementation ideas pro- length(BadList, 0), % they all are tags vided. Wider operation list can be found at assert(tag(Tag)), http://www.agiledata.org/ in Database Refactor- findall(X, (member(X, PeList), ing Catalog section. addAggregation(Tag, X)), Successfull), addAggregation(’:term’, Tag). Refactoring operations listed below are mostly pro- jected into sequences of schema evolution operations. evo.2.4. removing a tag Some of refactoring operations require analysis of real The rule is described in [4] for basic structure without data corresponding to given schema. Some of them are keys and references. Removed tag is called t. Like in unapplicable yet because XML Schema support in data operation (1.2) of attribute removal, the reference decla- model above is not yet powerful enough. rations in N Re (t) can be modified to cover all children of t. For all such reference declarations we are to create 4.3.1 Refactoring Operations over Existing Data corresponding reference attributes in children of t. Sam- Model ple Prolog implementation is not placed here because of its awkwardness. Refactoring operations defined and described below are evo.3.1. declaring new key labeled «rft.N» below. Key declaration is canceled if it does not satisfy one or rft.1. consolidating attributes representing single con- more of (KR1)-(KR5). Sample Prolog code omits checks cept because they are cumbersome and can be considered as For attributes, which represent single concept, consol- unnecessary technical details here. It requires almost no idation can be executed. For example, address is some- checks in theory, either. times represented by two strings and can be consolidated into one string. addKey(ContextTag, Path, KeyAttr) :- rft.2. consolidating tags % no checks Sometimes two model tags can be consolidated be- assert(inNK(ContextTag, Path, KeyAttr)). cause they can be interpreted as the same entity. evo.3.2. removing a key rft.3. splitting an attribute During key declaration fk = keydecl(t, p, a) removal, Splitting an attribute is useful when the model is all reference declarations fr = keyrefdecl(t′ , p′ , a′ , fk ) growing from more simple one or from a prototype. Ex- are also removed. Sample Prolog implementation is not ample above can be reversed: if we know particular ad- interesting enough to place here. dressing system, we can divide attribute into smaller ones With current key-reference model is is the only way representing more atomic values. to perform such an operation. However if we support rft.4. splitting a tag Simplified XPath expressions instead of Simple ones, we Splitting a tag can be useful, for example, when we can identify instances of many tags with one key. In this would like to detach a child tag from parent (instead of case we potentially can redeclare the key by adding it to adding attributes to all instances of existing tag). N K sets of t tag’s children. This will keep references to rft.5. adding a key to data schema fk unchanged. But this case is not considered here. Adding new key operation is basic evolution opera- evo.4.1. declaring new reference tion and it is already defined. Although in real databases Reference declaration is canceled if it does not satisfy adding a key to schema can be also combined with in- one or more of (KR1)-(KR5). Sample Prolog implemen- tegrity checking and index creation. Automating of those tation is not interesting enough to place it here. tasks can lead to complicated refactoring operation. evo.4.2. removing reference rft.6. adding new attribute to avoid referencing This operation does not require any additional opera- Instead of dereferencing other entity to obtain the tions. Like operation (1.2), it can be compensated with value of single attribute, we can store this attribute in the declaration of references belonging to N R sets of some referencing entity. Although productivity can be essen- child tags. Sample Prolog implementation is not inter- tially improved, this approach can lead us to inconsistent esting enough to place here. data. In case when we support simplified XPath expres- rft.7. moving an attribute sions, we can cover some child tags with one reference Moving from one tag to another lets us to: declaration instead of declaring new references, but we • avoid possible referencing when accessing at- do not consider this way here as it was noted for (3.2) tributes of different entities; operation. • separate seldomly-used attributes to improve pro- ductivity; 4.3 Declarative Approach to Refactoring • separate read-only attributes if DBMS has no per- Human-language description of evolution operations is mission support below entity level or when it helps cumbersome and complicated. However this is only a DBMS to optimize its work. 4.3.2 Refactoring Operations over Advanced XML Data Structures and Models rft.12. renaming anything This refactoring operation is placed in “advanced” section because current data model does not handle such advanced XML naming facilities as namespaces, etc. If these facilities are appropriately supported, renaming Figure 3: Refactoring Model fragment tag, attribute, key or any other entity in XML data model becomes complicated task. rft.13. restore standard types for attributes rft.8. moving a tag within τ Theoretically, this operation is very interesting. Refactoring engine should analyze user types defined in This operation is very common and affects the whole data schema, perform some calculations of correspond- schema. It should be decomposed into sequence of ag- ing domains and decide if some of types can be replaced gregation removal and creation. by built-in (like integer) types. Moreover, merging sim- rft.9. safe-deleting entity ilar types (and type fragments of compound types) and This refactoring operation is already supported as el- simplifying type description is very interesting computer ementary evolution operation. science problem. This interesting operation is beyond rft.10. replacing existing natural key with surrogate one current work. This operation declares new key and probably re- rft.14. adding cascade delete for weak entities moves existing one (existing key becomes simply an at- This operation creates implicit triggers in underlying tribute). It can also try to maintain references to initial DBMS in case one exists to delete weak entities when key and switch them to new one. their parents are destroyed. rft.15. discover groups and turn them into entities The purpose of this operation is to change key seman- Existing data can be analyzed to discover repeating tics. values of attributes and tags. Some of them can be rft.11. replacing “1 → ∗” aggregative association with merged and then replaced with references to newly cre- “∗ → ∗” one ated merged groups. As an example we can imagine that we are going to let the streets to pass across several cities (fig. 2). 5 Conclusion In this case Street becomes a child of Document and new resolver between City and Street is constructed Given association evolution model makes a contribu- as shown at fig. 3. tion to existing XML schema evolution models. Al- For this refactoring operation sample implementation though given model does not provide full support of is provided. It is quite simple (some supplementary goals XML Schema or even DTD constructions, it seems to be are also defined, obvious goals are used without defini- enough powerful for usage with real XML documents. tion): Model can be modified if needed. For example, sup- port of simplified XPath expressions can make model much more flexible. New association model can be also useOrCreateKey(Tag, PathWithoutLast, integrated into document model described in [5] instead KeyTag, KeyAttr, MustReferTag):- of [4]. Both solutions increase model flexibility and DTD concat(PathWithoutLast, [KeyTag], Path), inNK(Tag, Path, KeyAttr), or XML schema support, but resulting model will be still inPL(MustReferTag, Tag), not enough powerful to support complete DTD and XML !. %one is enough schema. Both solutions lead to complicated model that useOrCreateKey(Tag, PathWithoutLast, is not so illustrative as given one, so they are to be used KeyTag, KeyAttr, MustReferTag):- in cases when their techniques are useful for particular leastCommonEPL(KeyTag,MustReferTag, Tag,Path,RPath), applications. KeyAttr = Keytag + ’@idFor_’ + MustReferTag, It can be sensible to combine some other set calcula- addAttribute(KeyTag, KeyAttr), tion models together with Prolog goal approvals. Sample addKey(Tag, Path, KeyAttr), sample system is term rewriting engine [13] available at removeLast(Path, PathWithoutLast). replaceAggregationWithResolver http://www.gradsoft.com.ua/. (Child, Parent, ResName) :- Although above paper describes schema evolutions, ResName = ’res_’ + Parent + ’_’ + Child, one of possible applications of given evolution rules is addTag(ResName, [Parent]), automatic transformation of XML documents content. removeAggergation(Child, Parent), RefAttrName = ResName + ’@’ + Child + ’ref’, For example, two versions of database can have different addAttribute(ResName, RefAttrName), XML representation of their data. On-the-fly transfor- useOrCreateKey(Tag, PathWithoutLast, mations of XML content can help legacy applications to Child, KeyAttr, ResName), concat(PathWithoutLast, [KeyTag], Path), access new database and can also make legacy database onePathFromTo(ResName, Tag, ResPath),!, available for new applications. Transformation itself can %gives one, cuts off. be composed using operations described in this paper. addReference( For real world application this transformation can be exe- Tag, ResPath, RefAttrName, Tag, Path, KeyAttr cuted by XSLT [20] generated from evolution operations ). list. For XML document with complicated structure it is [15] Andrew A. Simanovsky. Applying the easier for human operator to think about more “high- reconfiguration-design formalism to xml stored in level” transformations, than most of primitive evolution a relational database. In SYRCoDIS, pages 75–77, operations defined in [4, 5, 10] and here. Such opera- May 2004. tions, like splitting entity or moving attribute from one entity to another, etc. should be considered schema refac- [16] John F. Sowa. Conceptual graphs. ISO/JTC1/SC toring, because in their case simple local changes often 32/WG2, April 2001. Standard Draft. induce global changes to whole schema. Some of possi- [17] L. Stojanovic, A. Maedche, N. Stojanovic, and ble schema refactoring operations are described here and R. Studer. Ontology evolution as reconfiguration- are also objects of future investigations. design problem solving. In proceedings of Inter- national Conference on Knowledge Management, References 2003. [1] Relax ng compact syntax, November 2002. Speci- [18] M. Stumptner and F. Wotawa. Model-based recon- fication. figuration. In proceedings Artificial Intelligence in [2] T. Bray, J. Paoli, and C. M. Sperberg-McQueen Design, Lisbon, Portugal., 1998. (Eds). Extensible markup language (XML) 1.0 [19] W3C. XML Path Language (XPath), 1.0 edition, (2nd edition). W3C Recommendation, 2000. November 1999. [3] Surajit Chaudhuri, Raghav Kaushik, and Jeffrey F. [20] W3C. XSL Transformations (XSLT), 1.0 edition, Naughton. On relational support for XML publish- November 1999. Recommendation. ing: Beyond sorting and tagging. [21] W3C. XML Schema Part 0: Primer, May 2001. [4] S. Coox. Xml database schema evolution axiom- Recommendation. atization. Programming and Computer Software, 29(3):140–146, 2003. [5] Sergey V. Coox and Andrey A. Simanovsky. Regu- lar expressions in xml schema evolution. Kharkiv, Ukraine, June 2003. ISTA. [6] DEIS, Universit‘a di Bologna a Cesena, Italy. tuProlog User’s Guide, 1st edition, Sep 2002. [7] D. Florescu and D. Kossmann. A performance eval- uation of alternative mapping schemes for storing XML data in a relational database. Technical Re- port 3684, INRIA, 1999. [8] D. Florescu and D. Kossmann. Storing and query- ing xml data using an rdbms. In Data Engineering Bulletin, volume 22, pages 27–34. IEEE, 1999. [9] J.Doores, A.R.Reiblein, and S.Vadera. Prolog – Programming for Tomorrow. Sigma Press, Wilm- slow, UK, 1987. [10] Dmitry V. Luciv. Semistructured database scheme evolution and refactoring. In SYRCoDIS, pages 71– 74, May 2004. [11] M. Tamer Oszu Randal J. Peters. Axiomatization of dynamic schema evolution in objectbases. 1997. [12] Erik T. Ray. Learning XML. O’Reilly, 2001. pp. 143-189. [13] R. Shevchenko and A. Doroshenko. The system of symbolic computing for programming the dynamic applications. Interntet publication, 2003. (in rus- sian). [14] A. Simanovsky. Evolution of schema of xml- documents stored in a relational database. In pro- ceedings of Baltic DB&IS, Riga, Latvia, 2004. As- sociation for Computing Machinery. To appear.