<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On classification of XML document On classification of XML document ttrraannssffoorrmmaattiioonnss∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jana Dvoˇra´kova´? Jana Dvoˇr´akov´a</string-name>
          <email>Jana.Dvorakova@dcs.fmph.uniba.sk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Faculty of Mathematics</institution>
          ,
          <addr-line>Physics and Informatics</addr-line>
          ,
          <institution>Department of Computer CScoimenecnei</institution>
          ,
          <addr-line>uFsaUcunlitvyerosfitMy, aBtrhaetmisalatvicas, Physics and Informatics</addr-line>
        </aff>
      </contrib-group>
      <fpage>69</fpage>
      <lpage>83</lpage>
      <abstract>
        <p>As XML has become a very popular standard for data in many fields, the domain of XML documents transformations is becoming more and more important. In this paper we propose classification hierarchy for XML document transformations. We assign implemented transformation systems into defined groups according to the type of possible transformations. This enables user to choose appropriate transformation system according to the requirements for transformation. Secondly, we are concerned with the group of transformation systems, which enable type transformation. We define underlying formal models in common framework and discuss their applicability for these transformations.</p>
      </abstract>
      <kwd-group>
        <kwd>XML</kwd>
        <kwd>Structured document</kwd>
        <kwd>Document type</kwd>
        <kwd>Document transformation</kwd>
        <kwd>Transformation classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        ? This work was supported in part by the grant VEGA 1/0131/03
in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] introduces also some basic categorization, however in none of these works
a complete classification system has been defined.
      </p>
      <p>The rest of this paper is organized as follows: After defining some notions
used through-out the paper we introduce defined classification system in Section
3. We discuss reasons for choosing particular criteria and we assign implemented
transformations systems for structured documents into defined groups. In
section 4, we are dealing with one specific group of transformations, namely type
transformations. We are examining formal models on which these
transformations are based and we define them in a common framework. At last we present
several results obtained by comparing these formal models. Section 5 contains a
brief conclusion and outlines our future research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Notions and notations</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>XML document</title>
        <p>Basic logical unit of an XML document is an element. The content of an element
is delimited by a start-tag and an end-tag. It can contain text and other nested
elements as well. Variables called attributes can be assigned to an element.
Obviously, an XML document has a hierarchical structure and it can be represented
by a tree, where internal nodes are elements and leaves contain textual content.</p>
        <p>
          The framework considered in this paper is restricted in two ways. Firstly, we
do not consider element attributes of XML documents since we are concerned
mainly with transformations at the level of elements. Secondly, we assume that
element names of XML documents are from a finite and known set denoted by E.
This enables us to define XML documents as trees over finite alphabets, which
is the most natural way as far as we consider XML documents transformation.
Furthermore we will use symbol Char for alphabet of characters allowed to
appear in the textual content of XML document. Char is specified in W3C
recommendation [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ].
        </p>
        <p>Definition 1. Let Σ, Γ be alphabets. The set of trees over (Γ, Σ) denoted by
TΓ (Σ) is define d as follows:
– a ∈ Σ, then t = a ∈ TΓ (Σ) and root(t) = a.
– A ∈ Γ , t1, . . . , tk ∈ TΓ (Σ), k &gt; 0, then t = A(t1, . . . , tk) ∈ TΓ (Σ) and
root(t) = A.
– nothing else belongs to TΓ (Σ).</p>
        <p>We call Σ a leaf alphabet and Γ an internal node alphabet. Alphabets Σ, Γ
do not need to be disjoint.</p>
        <p>Obviously the set of XML documents denoted by D equals to the set of trees
over (E, Char), i.e., D = TE(Char).</p>
        <p>DTD
An XML document type is a class, which contains XML documents with similar
structure. It is specified by a type definition , which describes the set of elements,
that should be contained in the document as well as relationships among the
elements. W3C has defined two ways of notation - DTD and XML schema.
DTD is simpler and describes particularly syntax of the type while XML schema
introduces more aspects like namespaces, restrictions for attribute values, etc.
However, syntactic features of both models can be easily captured by
contextfree grammar. We will use this concept in the rest of this paper. Now we will
define context-free grammars and their subset - type grammars, which describe
XML document types.</p>
        <p>Definition 2. Context-free grammar (CFG) is a 4-tuple G = (N, T, P, S), where
N is an alphabet of nonterminal symbols, T an alphabet of terminal symbols,
P ⊆ N × (N ∪ T )∗ is a finite set of production rules and S ∈ N is a starting
symbol.</p>
        <p>Definition 3. Let G = (N, T, P, S) to be a CFG, then a set of derivation
subtrees of G denoted by SG is define d as follows:
– a ∈ T , then t = a ∈ SG.
– A ∈ N , t1, . . . , tk ∈ SG and A → root(t1), . . . , root(tk) ∈ P , then t =</p>
        <p>A(t1, . . . , tk) ∈ SG.</p>
        <p>– nothing else belongs to SG.</p>
        <p>A set of derivation trees of G - TG is a subset of SG such that t ∈ TG ⇔ t ∈ SG
and root(t) = S.</p>
        <p>A type grammar is a context free grammar G = (N, T, P, S), such that
N = E, T = Char. We denote its set of derivation trees DG since we consider
documents rather then general trees. Obviously it holds DG ⊆ D. Nonterminals
of a type grammar represent element names and the set of terminals equals to
the text alphabet.</p>
        <p>If we have a given type grammar, it generates a class of XML documents,
which equals to the set of its derivation trees. Validation of XML document
against a grammar is a process, which gives us as a result an answer, whether
given XML document belongs to the class of documents generated by given
grammar. If the answer is yes, we say that the XML document is valid for a
given type grammar (or correct ).
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>XML transformations</title>
        <p>A transformation of an XML document takes some source documents as an
input, it processes them according to the transformation specification and it
gives target documents as an output.</p>
        <p>It is reasonable to define a transformation of XML documents as a relation
on the set of XML documents rather then a function 1. As usual we begin with
more general definition, i.e., tree transformation.</p>
        <p>Definition 4. Let Σ, Σ0, Γ , Γ 0 be alphabets. A tree transformation from (Γ, Σ)
to (Γ 0, Σ0) is a relation τ ⊆ TΓ (Σ) × TΓ 0 (Σ0).</p>
        <p>If we apply previous definition on XML documents, we obtain restricted
case again. Thus, an XML documents transformation is a relation τ ⊆ D × D.
Obviously if we consider any XML document transformation, the source leaf
alphabet equals to the target leaf alphabet.
2.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Tree properties</title>
        <p>Now we introduce several definitions related to trees that we will need later in
this paper.</p>
        <p>Let Σ, Σ0, Γ, Γ 0 be alphabets.</p>
        <p>Let X = { x1, x2, . . .} be a set of variable symbols. For each k &gt; 0, we denote
by Xk a set of first k variables, thus Xk = { x1, . . . , xk} . We denote by TΓ (Σ ∪X),
TΓ (Σ ∪ Xk) sets of trees with these variables, where it must hold Σ ∩ X = ∅.</p>
        <p>Definition of a tree substitution follows. Let t ∈ TΓ (Σ ∪ Xk), k &gt; 0 and
t1, . . . , tk ∈ TΓ 0 (Σ0). Then t[x1 ← t1, . . . , xk ← tk] is a tree obtained from t by
replacing each occurrence of xi by a tree ti, 1 ≤ i ≤ k2.</p>
        <p>Let Z = { z1, z2, . . .} be a set of variables disjoint to X and k &gt; 0. A set of
(Γ, Σ, k)-contexts denoted CΓ (Σ, k) is the set of trees t from TΓ (Σ ∪ Zk), such
that for each 1 ≤ i ≤ k the symbol zi appears exactly once in t.</p>
        <p>We will use a simpler notation for context substitution. Let t to be a (Γ, Σ,
k)context and t1, . . . , tk trees. Then we use t[t1, . . . , tk] instead of t[z1 ← t1, . . . , zk ←
tk].</p>
        <p>Let t, t0 ∈ TΓ (Σ). Then t0 is a subtree of t if there exists such a context
β ∈ CΓ (Σ, 1) that t = β[t0].</p>
        <p>For each tree t ∈ TΓ (Σ) path(t) ⊆ { 1, 2, . . .} ∗ is a set of all paths of the tree t,
so that each of them unambiguously identifies a node of the tree t. Then symbol
ε represents root(t). For each path w ∈ path(t), we denote by labelt(w) a label
of the node identified by w and by t| w a subtree under this node.</p>
        <p>Let t ∈ TΓ (Σ), t0 ∈ TΓ 0 (Σ0). We obtain tree t[w ←p t0] ∈ TΓ ∪Γ 0 (Σ ∪ Σ0)
from t by replacing subtree in w ∈ path(t) by tree t0.</p>
        <p>
          Let t ∈ TΓ (Σ), patt ∈ TΓ (Γ ∪ Σ). We say, that t and patt match, if there is
such a context γ ∈ CΓ (Σ, k) that t = γ[t1, . . . , tk] for some t1, . . . , tk ∈ TΓ (Σ)
and it holds patt = γ[root(t1), . . . , root(tk )].
1 For example. in the system SynDoc [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], several target documents can be generated
as the result of transformation and user can choose the most appropriate one.
2 In the definition of tree substitution variables are not typed, however, later we will
require newly connected subtrees to satisfy some specific conditions.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Classi cation hierarchy</title>
      <p>
        There are several possibilities, how to divide XML document transformations
into categories. According to the driving element of the transformation we
obtain three basic categories as follows. In each of them different techniques for
implementing particular transformation systems are used, thus, it is reasonable
to start with this criterion.
1. Source grammar transformations - Transformation process is driven by the
source structure. First, the source document is parsed and then according
to recognized syntactic elements parts of the output documents are
constructed. Usually it is possible to check correctness of input documents, but
target correctness is not ensured in this case. If necessary, user must perform
validation explicitly. Next we can divide this category into two more specific
groups according to the way how the source document is parsed:
• Event-driven transformations - The source document is read as a data
stream. As the result, we obtain a list of recognized events. The event
can be an occurrence of a start-tag, an end-tag, an attribute, etc. The
transformation system reads the list o events and performs corresponding
actions. If we allow side-effect functions, an output can be generated
already at parsing time. As a formal model, an attribute grammar is
often used. Event-driven models require little memory, and provide fast
and effective way of accessing XML data. On the other hand, only simple
transformations can be performed since it is not possible to return to
an earlier part of the document. In some document transformers (e.g.
CoST [
        <xref ref-type="bibr" rid="ref12 ref7">12, 7</xref>
        ], OmniMark [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) the list of recognized events is represented
by ESIS (Element Structure Information Set), which is the part of ISO
SGML standard [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However, currently especially SAX (Simple API
for XML) [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] is more and more in use as event-driven mechanism for
parsing XML documents. It generates specific SAX events and sends
them to the application.
• Tree-based transformations - First, an internal syntactic tree of the source
document is constructed and a target document is generated via queries
on this tree. In most of the cases transformation systems are based on
tree pattern matching and replacement. We can assign widely used XSLT
[
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] and its predecessor DSSSL [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] as well as systems Scrimshaw [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
Metamorphosis [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], Balise [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and TranSID [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] to this group.
      </p>
      <p>
        XSLT, a recommendation of W3 Consortium, has become very popular
recently. It is a language, itself written in XML, with powerful
capabilities for specifying transformations of XML documents. XSLT program
(called stylesheet) consists of a set of template rules. A template rule
associates a pattern, which matches nodes in the source document with
corresponding template that can be instantiated to form the target tree.
Although XSLT is a powerful transformation language, it has several
drawbacks. Firstly, it appears to be a complex language and therefore
transformation specification must be written by an expert. Secondly,
XSLT processor cannot guarantee the correctness of target documents
as well as other systems in this group. However, an XSLT stylesheet
can be produced as an output of some two-grammars systems (see next
section). Then it is ensured, that generated stylesheet specifies a type
transformation and XSLT processor can be used to perform
transformation.
2. Target grammar transformations - Target document is created according
to the rules of a given target grammar while relevant data from a source
document are extracted via queries. This ensures target correctnes. However,
we do not need to have any information about source document type. We
found only one transformation system, which belongs to this group - TREX
[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
3. Two grammar transformations (type transformations) - Documents of a
given source type are translated into documents of a given target type.
Transformation systems are mostly based on one of these formal models - syntax
directed translation schema, tree transformation grammar, descending tree
transducer and higher order attribute grammar. We discuss these models as
well as two grammar transformation systems in detail in the next section.
However, there are few systems, which performed two-grammars
transformations, but underlying formal model cannot be clearly recognized. These
are Grif [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] and Thot [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>From a different point of view, we distinguish static type transformations,
which translate whole documents, and dynamic type transformations, which
are used to perform dynamic operations on XML documents. An example is
cut and paste operation, which is a standard operation in XML document
editors. Unlike in common editors, in this case it is necessary to preserve
document structure. Thus, a local transformation must be performed in the
place, where a part of another document has been pasted.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Type transformations</title>
      <p>The group of type transformations (or two-grammar transformations) is a
specific one. In this case there must be only correct documents taken as an input
and correct documents must be generated on the output side. To satisfy these
conditions, it is usually inevitable to implement transformation system
performing this kind of transformations on some underlying formal model. In this section
we introduce four formal models on which some of the implemented
transformation systems are based. We developed common framework in which we define
these models. Our intention was to create a formal base, so that the models
can be studied and mutually compared later. We also present current results
obtained by comparing some of them according to their transformational power.
Obviously there is a direct proportion between the complexity of transformation
specification and the set of possible transformations. Thus, the more powerful
particular model is, the more complex specification the user is supposed to write.</p>
      <sec id="sec-4-1">
        <title>Syntax-directed translation schema</title>
        <p>Definition 6. A configur ation of SDTS (SSDTS, ESDTS) Ω is a tree from the
set TN∪{ } (Σ ∪ Δ).</p>
        <p>Definition 7. We say, that a pair of trees (A(t1, . . . , tn), A(s1, . . . , sm)), A ∈
N , t1, . . . , tn, s1, . . . , sm ∈ TN∪{ } (Σ∪Δ) realizes a rule A → u1 . . . un, v1 . . . vm ∈
R if the following holds:
1. u1 . . . un = root(t1) . . . root(tn) and v1 . . . vm = root(s1) . . . root(sm),
2. hN (u1 . . . un) = ui1 . . . uir and hN (v1 . . . vm) = vj1 . . . vjp = uik1 . . . uikp ,
i1, . . . , ir ∈ { 1, . . . , n} (different), j1, . . . , jp ∈ { 1, . . . , m} (different),
k1, . . . , kp ∈ { 1, . . . , r} (different),
3. sjl = tikl for l ∈ { 1, . . . , p} , sl = vl, vl ∈ Δ ∪ { ε} for l ∈/ { j1, . . . , jp} and
tl = ul, ul ∈ Σ ∪ { ε} for l ∈/ { i1, . . . , ir} .</p>
        <p>Definition 8. A translation step of SDTS (SSDTS, ESDTS) Ω is a relation
⇒Ω over the set of configur ations define d as follows:
1. β[ (a)] ⇒Ω β[a], a ∈ Δ ∪ { ε} .
2. β[ (A(t1, . . . , tn))] ⇒Ω β[A( (s1), . . . , (sm))] and a pair of trees
(A(t1, . . . , tn), A(s1, . . . , sm)) realizes some rule r ∈ R.</p>
        <p>In both cases β is a context from CN∪{ } (Σ ∪ Δ, 1).</p>
        <p>A</p>
        <p>u1 ... un , v1 ... vm
u1
t1
β
z1
A</p>
        <p>An input tree is processed from the root to the leaves, while unprocessed
subtrees are marked by symbol . Initially, the whole source tree is marked
as unproccessed. If the root of a currently processed subtree has a label from
output alphabet or ε, then it is a leaf and we stop translation in this branch by
removing the symbol . If we are processing a subtree, whose root is an internal
node, first we reorder and delete children subtrees according to corresponding
rule. Subsequently, if there are some leaf childrens, we reorganize them in such
a way, that the result adheres to the output side of the rule.</p>
        <p>Definition 9. A transformation generated by a SDTS (SSDTS, ESDTS) Ω is
a set τ (Ω) = { (ts, tt) | ts ∈ TN (Σ), root(ts) = S, tt ∈ TN (Δ), (ts) ⇒∗Ω tt} .</p>
        <p>A SSDTS enables very simple transformations of trees. It does not change
structure of the source tree, it is only possible to delete, add or reorder leaves. A
SDTS enables reordering of subtrees connected to the same node moreover. An
ESDTS is the most powerful modification, unlike SDTS it enables also deleting
of arbitrary subtree in the source tree.</p>
        <p>A transformations performed by SSDTS as well as SDTS satisfy the definition
of the type transformation. In the case of ESDTS a problem arises, because if
we delete a subtree in the source tree, it is not processed anymore. Thus, it is
not possible to check, whether this part of the source tree was correct.</p>
        <p>
          There are several implemented transformation systems that are using SDTS
and its modifications as underlying model. System SynDoc [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] is based on
ESDTS, and SDTT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], ICA [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], HST [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] are based on SDTS.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Tree transformation grammar</title>
        <p>A tree transformation grammar is similar to SDTS, however in this case we
associate two groups of production rules. The first group - a source subgrammar
contains some of the source grammar production rules and the second one - a
target subgrammar conatins some of the target grammar production rules. Unlike
in SDTS, nonterminals in the source and target subgrammar must be associated
explicitely. This means, that it is possible to associate also nonterminals with
different names. Additionally, tree-transformation grammars work with as many
levels in the source tree as necessary.</p>
        <p>Definition 10. A tree transformation grammar - TTG is a 6-tuple G = (Gs, Gt,
Subs, Subt, P A, SA), where
– Gs = (Ns, Σs, Ps, Ss) and Gt = (Nt, Σt, Pt, St) are the source and target
grammars,
– Subs ⊆ 2Ps and Subt ⊆ 2Pt are sets of source and target subgrammars,
– P A ⊆ Subs × Subt is a set of production group associations,
– SA ⊆ Ns × Nt is a set of symbol associations.</p>
        <p>There are some restrictions put on the source subgrammar. It must contain a
single start nonterminal, and every other nonterminal in the source subgrammar
must be derivable from this start nonterminal. Source subgrammars represent
subtree patterns to be matched against in the source tree; target subgrammars
represent subtrees, that are to be generated as part of the target tree. A target
subgrammar can contain several start nonterminals, thus we can obtain a forest
of target subtrees as a result. Transformation via TTG is performed by
processing source tree nodes one by one. For a particular node we look for matching
source subgrammars. If there is some, corresponding target subtrees are
constructed and links among nodes in source subgrammar and target subgrammar
are marked according to symbol associations. After the whole source tree has
been processed, target subtrees can join to bigger trees according to the marked
symbol associations.</p>
        <p>
          Tree transformation grammars are described in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Authors’ intention was
to define a powerful two-grammar transformations model and, at the same time,
provide simple and natural way to write transformation specification.
        </p>
        <p>
          Several modifications of tree transformation grammars have been
implemented:
• Dual grammar translation scheme (DGTS) - system SSAGS [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
• Single input production-explicitely qualifie d (SIPEQ) - system SIPEQ [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
• TT grammar - system Alchemist [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>
          However, semantics of the model remains unclear as no formal definitions
have been introduced in any of the resources mentioned. Now we present
definitions, that we created according to the alghorithm of tree transformation via
TT grammars mentioned in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>Definition 11. A configur ation of TTG Ω is a pair (ts, Tt), ts ∈ TNs∪{ } (Σs)
and Tt ⊆ { (t, W ) | t ∈ TNt (Nt ∪ Σt), W ⊆ 2(path(t)×path(t))} .</p>
        <p>A tree ts is the source tree with marked unprocessed nodes. Tt is a set of
target linked trees, i.e., it contains pairs of the form (t, W ), where t is a particular
target tree and W is a set of path pairs that we call a set of links of a tree t. We
use the notation Wt as well.</p>
        <p>Definition 12. A translation step of TTG Ω is a relation ⇒Ω over configur
ations define d by (ts, Tt) ⇒Ω (t0s, Tt0) ⇐⇒
(1) generating step
– there is such a path ws in a tree ts that ts| ws = (A(t1, . . . , tn)),
– t0s = ts[ws ←p A( (t1), . . . , (tn))],
– there is such a source subgrammar sg ∈ Subs that tsg (a derivation tree
corresponding to the rules sg) and A(t1, . . . , tn) match,
– there is such a target subgrammar tg ∈ Subt that (sg, tg) ∈ P A and
Ttg = { r1, . . . rm} (derivation trees corresponding to the rules tg). Let us
denote Ri = (ri, { (w1, wsw2) | (labelri(w1), labeltsg (w2)) ∈ SA} ), i ∈ { 1, . . . , m} .</p>
        <p>Then Tt0 = Tt ∪ S1≤i≤m Ri.
(2) connecting step
– t0s = ts,
– there are such R1, R2 ∈ Tt, R1 = (r1, W1), R2 = (r2, W2) that
– root(r2) = A ∈ Nt,
– there is such a path w ∈ path(r1) that r1| w = A,
– there is such a path ws ∈ path(ts) that (w, ws) ∈ W1 and (ε, ws) ∈ W2.
Let r3 = r1[w ←p r2], W3 = W1 ∪ { (ww1, w2) | (w1, w2) ∈ W2} a R3 =
(r3, W3). Then Tt = Tt0 − R1 − R2 ∪ R3.</p>
        <p>Transformation works in similar way as with SDTS. The source tree is passed
in the top-down manner, while unprocessed nodes are marked by symbol ,
initially the whole source tree is marked.</p>
        <p>During the generation step, we first choose such a source subgrammar
(pattern), that matches a subtree under currently processed node (Fig. 2). The model
is nondeterministic, i.e., there can be more suitable source subrammars. Then
we add target subtrees (corresponding to the associated target subgrammar)
together with links to the set of linked target trees. Links are created according to
symbol associations (Fig. 3). At last we remove symbol from current subtree
and again we mark all connected subtrees to indicate they need to be processed.</p>
        <p>If we apply connecting step (Fig. 4), we first choose two subtrees r1 and r2
from the set of target linked trees. These trees must satisfy some conditions;
...
patt
A
ws
patt
...</p>
        <p>B
the root of the first one and a leaf of the second one must have the same label
and must be linked to the same node in the source tree. Consequently, the set
of target linked trees is updated so that subtrees r1, r2 are removed and a
new subtree created by connection of r1, r2 is added. Links between this new
subtree and the source tree will be updated as well according to the links in
old subtrees. However, we want to obtain one tree as a result of transformation.
Therefore in following definition of tree transformation via TTG we require the
last configuration to consist of a single element.</p>
        <p>Definition 13. A transformation generated by TTG Ω is a set τ (Ω) = { (ts, tt) ⊆
TNs(Σs) × TNt(Σt) | ( (ts), ∅) ⇒∗Ω (ts, { (tt, W )} )} .
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Descending tree transducer</title>
        <p>A descending tree transducer is an automaton, which passes a source tree from
the root to the leaves and performs changes according to the current state, node</p>
        <p>A</p>
        <p>
          w
b)
c)
and lookeahead. Unlike previous formal models, it works with trees, where labels
of internal nodes are from a ranked alphabet. Basically, ranked alphabet is a pair
(Γ, rank), where Γ is an alphabet and rank : Γ → N is a ranking function, which
assigns positive integer to each symbol in Γ . We usually omit the function rank
in the notation and we say that Γ is a ranked alphabet. If Σ is an alphabet
and Γ a ranked alphabet, then each internal node in a tree over (Γ, Σ) must
have exactly as many children as is the value of the rank of its label. We will
not introduce formal definitions here, all of them can be found in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and the
notation is very similar to that one used in previous cases.
        </p>
        <p>A DTT does not represent a model for two-grammars transformations since
it works on higher level of abstraction. However, it is easy to restrict definitions
in such a way, that we obtain DTT, which transforms derivation trees of a given
source grammar into derivation trees of a given target grammar.</p>
        <p>
          A DTT with finite and regular lookahead has been used in the transformation
system SynDoc [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. This system generates also an XSLT stylesheet as an output
and it is possible to use existing tools for XSLT if further processing is needed.
However, in that case the target correctness is not guaranteed anymore.
4.4
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Higher order attribute grammar</title>
        <p>Basically, an attribute grammar is a context free grammar, such that each rule
is associated with a set of semantics rules. These rules have a form X.b :=
f (Y1.c1, . . . , Yn.cn), where X, Yi are nonterminals, b, ci are attributes and f is
an n-ary function.</p>
        <p>
          All definitions presented in this section results from [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>Definition</title>
        <p>14. An attribute grammar - AG is a triple AG = (G, A, R), where
• G = (N, Σ, P, S) is a context-free grammar,
• A = SX∈N∪Σ AIS(X) is a finite set of attributes, where AIS(X) is a set
of attributes associated with nonterminal X,
• R = Sp∈P R(p) is a finite set of attribute rules. If AIS(X) ∩ AIS(Y ) 6= 0,
then X = Y . For every occurrence of a nonterminal in a derivation tree of
G there must be exactly one attribute rule applicable for computation of a
value for an attribute a ∈ A. Rules in R(p) have a form α = f (. . . , γ, . . .),
where f is a name of the function, α and γ are attributes of a form X.a.
Definition 15. For each p : X0 → X1 . . . Xn ∈ P we define a set of attribute
evaluation occurrences as AF (p) = { Xi.a | Xi.a = f (. . .) ∈ R(p)} . An attribute
X.a is called a synthetized attribute if there is a rule r : X → χ and X.a ∈ AF (r);
an inherited attribute, if there is a rule q : Y → μXν and X.a ∈ AF (q).</p>
        <p>We denote by AS(X) a set of synthetized attributes of a nonterminal X and
by AI(X) a set of inherited attributes of a nonterminal X. After evaluation
process these attributes contain a subtree as a computed value. Thus, unlike in
AG, in HAG the domain of the derivation tree and the domain of attributes
overlap.</p>
        <p>A higher-order attribute grammar (HAG) is an extended attribute grammar.
A special type of attribute - nonterminal attribute is introduced.
Definition 16. For each p : X0 → X1 . . . Xn ∈ P a set of nonterminal
attributes is define d: N T A(p) = { Xj| Xj := f (. . .) ∈ R(p)} .</p>
        <p>An unevaluated nonterminal attribute represents a leaf node of the actual
tree. At this point we call it virtual nonterminal, since there is no subtree
connected to it. After the evaluation is performed, a subtree is computed as a value
for nonterminal attribute and consequently it is connected to this attribute (leaf
node). Evaluated nonterminals and common nonterminals we also call
instatiated nonterminals since they have got an instance, i.e. connected subtree. The
set of instantiated nonterminals represents internal nodes of the actual tree.</p>
        <p>
          A HAG has been implemented in transformation system SIMON [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], which
takes advantage of the power of this model and enables several complex
transformations. However, there is one major drawback. System requires a user to
write a complete HAG as a transformation specification and this is usually a
nontrivial task.
        </p>
        <p>
          For HAG there has been defined formal model - a higher-order attribute
tree transducer [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], which represents some abstraction and therefore it is more
suitable for studying its properties.
4.5
        </p>
      </sec>
      <sec id="sec-4-6">
        <title>Comparison results</title>
        <p>
          Finally, we briefly introduce results obtained by comparing formal models for
type transformations according to their transformational power. Since we defined
transformations as relations over sets of trees, we can consider them as sets
containing pairs of trees as elements. Thus, we can also compare them as sets.
See Table 1 for comparison results. Symbol N indicates that transformations
of two corresponding formal models are not comparable. More details and all
formal proofs can be found in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <p>In this paper we introduced a system for classification of XML documents
transformations. Our intention was to simplify choosing of appropriate transformation
system according to given requirements for transformation. We defined
categories in bottom-up way, i.e. first we were examining implemented
transformation systems and consequently we abstracted from them to higher-level groups
of transformations sharing similar features.</p>
      <p>We were examining in detail formal models used for type transformation.
We defined common framework for these models and we introduced current
results of their mutual comparison. In our future work, we plan to make more
comparisons according to transformational power and complexity as well. We
intend to include also other models, which might appear to be suitable for XML
document transformations.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Aho</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Ullman</surname>
          </string-name>
          .
          <article-title>The theory of parsing, translation and compiling</article-title>
          , Vol. I: Parsing. Prentice-Hall, Inc.,
          <string-name>
            <surname>Englewood Cliffs</surname>
            ,
            <given-names>N.J.</given-names>
          </string-name>
          ,
          <source>USA</source>
          ,
          <year>1972</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Arnon</surname>
          </string-name>
          .
          <article-title>Scrimshaw: A language for document queries and transformations</article-title>
          .
          <source>Electronic Publishing</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Berger-Levrault/AIS: Balise Reference Manual, Release 3,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S.</given-names>
            <surname>Bonhomme</surname>
          </string-name>
          .
          <article-title>Transformation de documents structur´es, une combinaison des approches explicite et automatique</article-title>
          .
          <source>PhD thesis</source>
          , University Joseph Fourier - Grenoble,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>K.</given-names>
            <surname>Chiba</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kyojima</surname>
          </string-name>
          .
          <article-title>Document transformation based on syntax-directed tree translation</article-title>
          .
          <source>Electronic Publishing</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ),
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. J. Dvoˇr´akova´.
          <article-title>Transforma´cie XML dokumentov</article-title>
          .
          <source>Master thesis</source>
          , FMFI UK Bratislava,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Joe</given-names>
            <surname>English</surname>
          </string-name>
          .
          <source>CoST 2 Reference Manual. Release 3</source>
          ,
          <year>1996</year>
          . http://www.art.com/cost/manual.html.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Exoterica</given-names>
            <surname>Corporation. OmniMark Programmer's Guide</surname>
          </string-name>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.</given-names>
            <surname>Feng</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Wakayama</surname>
          </string-name>
          . SIMON:
          <article-title>A grammar-based transformation system for structured documents</article-title>
          .
          <source>Electronic Publishing</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fu</surname>
          </string-name>
          <article-title>¨ll¨op and</article-title>
          <string-name>
            <given-names>H.</given-names>
            <surname>Vogler</surname>
          </string-name>
          .
          <article-title>Syntax-directed semantics: Formal models based on tree transducers</article-title>
          . Springer-Verlag,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. V.
          <article-title>Hamb´alkov´a. Transforma´cie ˇstruktu´rovan´ych dokumentov</article-title>
          .
          <source>FMFI UK Bratislava</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>K.</given-names>
            <surname>Harbo</surname>
          </string-name>
          .
          <source>CoST Version 0</source>
          .
          <fpage>2</fpage>
          - Copenhagen SGML Tool. Department of Computer Science and Euromath Center, University of Copenhagen,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. ISO - International
          <source>Organization for Standardization. Information Processing - Text and Office Systems - Standard Generalized Markup Langugage (SGML)</source>
          ,
          <source>ISO 8879</source>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. ISO - International Organization for Standardization.
          <source>Information Technology - Text and Office Systems - Document Style Semantics and Specific ation Language (DSSSL)</source>
          ,
          <source>ISO/IEC DIS 10179</source>
          :
          <year>1996</year>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. S. Keller,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Perkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. F.</given-names>
            <surname>Payton</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Mardinly</surname>
          </string-name>
          .
          <article-title>Tree transformation techniques and experience</article-title>
          .
          <source>Proceedings of the ACM SIGPLAN '84 Symposium on Compiler Construction, SIGPLAN Notices</source>
          ,
          <volume>19</volume>
          (
          <issue>6</issue>
          ), Montreal, Canada,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. P. Kilelinen, Greger Lind´en,
          <string-name>
            <given-names>H.</given-names>
            <surname>Manilla</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Nikunen</surname>
          </string-name>
          .
          <article-title>A structured document database system</article-title>
          .
          <source>EP90 - Proceedings of the International Conference on Electronic Publishing, Document Manipulation and Typography</source>
          , Gaithersburg, Maryland, Cambridge University Press,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. E. Kuikka,
          <string-name>
            <given-names>P.</given-names>
            <surname>Leinonen</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Penttonen</surname>
          </string-name>
          .
          <article-title>Towards automating of document structure transformation</article-title>
          .
          <source>DocEng '02</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>E.</given-names>
            <surname>Kuikka</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Nikunen</surname>
          </string-name>
          .
          <article-title>Survey of software for structured text</article-title>
          .
          <source>Technical report</source>
          , Department of Computer Science and Applied Mathematics, University of Kuopio,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. E. Kuikka and
          <string-name>
            <given-names>M.</given-names>
            <surname>Penttonen</surname>
          </string-name>
          .
          <article-title>Transformation of structured documents</article-title>
          .
          <source>Electronic Publishing</source>
          ,
          <volume>8</volume>
          (
          <issue>4</issue>
          ),
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. G.
          <article-title>Lind´en. Structured document transformations</article-title>
          .
          <source>PhD thesis</source>
          , University of Helsinki,
          <year>1997</year>
          . ISBN 951-45-7766-3.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Mamrak</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. S. O'Connel</surname>
            and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Barnes</surname>
          </string-name>
          .
          <article-title>Integrated Chameleon Architecture</article-title>
          . Prentice Hall, Englewood Cliffs, USA,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. MID/Information Logistics Group GmbH.
          <source>MetaMorphosis Reference Manual</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>T.</given-names>
            <surname>Noll</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Vogler</surname>
          </string-name>
          .
          <article-title>The universality of higher-order tree transducers</article-title>
          .
          <source>Theory of computing systems</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ),
          <fpage>2001</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>T. F.</given-names>
            <surname>Payton</surname>
          </string-name>
          . SSAGS:
          <article-title>A syntax and semantics analysis and generation system</article-title>
          .
          <source>Attribute Grammars. Definitions, Systems and Bibliography. Lecture Notes in Computer Science 323</source>
          , Springer-Verlag, Berlin,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>V.</given-names>
            <surname>Quint</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Vatton. GRIF:</surname>
          </string-name>
          <article-title>An interactive system for structured document manipulation</article-title>
          .
          <source>EP 86 - Proceedings of the International Conference on Text Processingand Document Manipulation</source>
          , Nottingham,
          <string-name>
            <surname>UK</surname>
          </string-name>
          , Cambridge University Press,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>SAX - Simple</surname>
            <given-names>API</given-names>
          </string-name>
          <source>for XML, version 2.0.2</source>
          ,
          <year>2004</year>
          . http://www.saxproject.org/.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>D.</given-names>
            <surname>Swierstra</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Vogt</surname>
          </string-name>
          .
          <article-title>Higher Order Attribute Grammars</article-title>
          .
          <source>Technical report</source>
          . Utrecht University,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28. W3C.
          <article-title>Extensible Markup Language (XML) 1.0 (Third edition</article-title>
          ),
          <source>W3C Recommendation</source>
          ,
          <year>2004</year>
          . http://www.w3.org/TR/REC-xml.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29. W3C.
          <source>XSL Transformations XSLT Version 2</source>
          .0,
          <string-name>
            <given-names>W3C</given-names>
            <surname>Recommendation</surname>
          </string-name>
          ,
          <year>1999</year>
          . http://www.w3.org/TR/xslt20/.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>A. Zhou</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhengand H. Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Yue</surname>
            and
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Fan</surname>
          </string-name>
          . TREX:
          <article-title>DTD-conforming XML to XML transformations</article-title>
          .
          <source>SIGMOD</source>
          <year>2003</year>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>