=Paper=
{{Paper
|id=None
|storemode=property
|title=An Algorithm for Transforming XPath Expressions According to Schema Evolution
|pdfUrl=https://ceur-ws.org/Vol-1008/paper4.pdf
|volume=Vol-1008
|dblpUrl=https://dblp.org/rec/conf/doceng/HasegawaIS13
}}
==An Algorithm for Transforming XPath Expressions According to Schema Evolution==
An Algorithm for Transforming XPath Expressions
According to Schema Evolution
Kazuma Hasegawa Kosetsu Ikeda Nobutaka Suzuki
Graduate School of Library, Graduate School of Library, Faculty of Library, Information
Information and Media Studies Information and Media Studies and Media Science
University of Tsukuba University of Tsukuba University of Tsukuba
s1221595@u.tsukuba.ac.jp lumely@slis.tsukuba.ac.jp nsuzuki@slis.tsukuba.ac.jp
ABSTRACT
XML is a de-fact standard format on the Web. In general,
schemas of XML documents are continuously updated ac-
cording to changes in real world. If a schema is updated,
then query expressions have to be transformed so that they
are “valid” under the updated schema, since the expressions
are no longer valid under the updated schema due to the
schema update. However, this is not an easy task since
many of recent schemas are large and complex and thus it is
becoming difficult to know how to update the query expres-
sions correctly. In this paper, we propose an algorithm for
transforming XPath expressions according to schema evolu-
tion. For an XPath expression p and a schema S, our algo-
rithm treats both p and S as tree automata T Ap and T AS ,
respectively. Our algorithm first takes the product automa-
ton T AR of T Ap and T AS , then analyze T AR to find the
correspondence between the states of T Ap and T AS . Based
on this correspondence, the algorithm transforms T Ap ac-
cording to an update operation applied to T AS . We also
show some preliminary experimental results.
Categories and Subject Descriptors
H.2.1 [Database Management]: Logical Design—Data
models
General Terms Figure 1: An example of XPath transformations
Algorithms
Keywords real world, which is called schema evolution. If a schema is
updated, then query expressions have to be transformed so
XML, XPath, schema evolution, tree automaton that they are “valid” under the updated schema, since the
expressions are no longer valid against the updated schema
1. INTRODUCTION due to the schema update. However, this is not an easy task
XML[5] is a de-fact standard format on the Web. An XML since many of recent schemas are large and complex, and
document is usually stored with its schema so that the struc- thus it is becoming very difficult to know how to update the
tural consistency of the document is ensured. In general, query expressions correctly.
schemas are continuously updated according to changes in In this paper, we propose an algorithm for transforming
XPath expressions according to schema evolution. Here,
XPath[4] is the most popular query language for XML. For
a given schema S, an edit operation op to S, and an XPath
expression p under S, our algorithm transforms p into an
XPath expression p′ “equivalent” to p whenever possible,
This work is licensed under the Creative Commons Attribution- that is, the result of p′ under op(S) coincides with that of
ShareAlike 3.0 Unported License (CC BY-SA 3.0). To view a copy p under S, where op(S) is the updated schema obtained by
of the license, visit http://creativecommons.org/licenses/by-sa/3.0/. applying op to S.
DChanges 2013, September 10th, 2013, Florence, Italy. To illustrate our algorithm, let us show a simple example.
ceur-ws.org Volume 1008, http://ceur-ws.org/Vol-1008/paper4.pdf . Let D be the DTD in Fig. 1(a), and suppose that element
students is inserted as the child of school (Fig. 1(b)). Ac- An XML document is modeled as a labeled ordered tree,
cording to this schema update, our algorithm transforms the and a schema is modeled as a tree automaton. Formally, a
following XPath expression tree automaton is a quadruple T A = (N, Σ, s, P ), where
/school/student[supervisor]/name • N is a set of states (element types),
• Σ is a finite set of element names,
into the following:
• s ∈ N is the start state,
/school/students/student[supervisor]/name.
• P is a set of transition rules of the form X → a(reg)
Then, suppose that element supervisor is deleted from the or X → Y , where X, Y ∈ N and reg is a regular
DTD in Fig. 1(b). In this case, it is impossible to transform expression over N .
the above XPath expression into an equivalent one. Thus,
For a transition rule X → a(reg), we say that X is the left-
as an alternative answer our algorithm deletes supervisor
hand side, a(reg) is the right-hand side, a is the label, and
from the above expression and returns the following:
reg is the content model of the rule. For example, consider
/school/students/student/name. the tree automaton T Aa shown in Fig. 2. This is equiva-
lent to the DTD in Fig. 1(a). “School” in N is the type of
Although these DTDs are very small, schemas used in prac- element “school” , and so on. We assume that element “pc-
tice are much larger and complex [6]. Therefore, a user tends data” represents an arbitrary string. By L(T A) we mean the
not to understand the entire structure of a schema exactly, language of tree automaton T A, i.e., the set of trees “valid”
and thus our algorithm is helpful to transform XPath ex- against T A.
pressions appropriately according to schema evolutions. Following [15], we define the product of tree automata. Let
In this paper, schema is modeled as (unranked) tree au- TA1 = (N1 , Σ1 , s1 , P1 ) and TA2 = (N2 , Σ2 , s2 , P2 ) be tree
tomaton, which is equivalent to regular tree grammar. Tree automata. Without loss of generality, we assume that Σ1 =
automaton is the formal model of RELAX NG, and XML Σ2 = Σ. First, we define the product reg 1 ⊕ reg 2 , where reg 1
Schema and DTD can also be modeled by tree automa- is a regular expression over N1 and reg 2 is a regular expres-
ton [14]. Moreover, tree automaton is equivalent to special- sion over N2 . Then reg 1 ⊕ reg 2 is a regular expression over
ized DTD [17]. Thus, our algorithm is applicable to most of N1 × N2 such that n11 n21 · · · ni1 matches reg 1 and n12 n22 · · · ni2
formal and practical XML schema languages. As for XPath, matches reg 2 if and only if [n11 , n12 ][n21 , n22 ] · · · [ni1 , ni2 ] matches
we focus on an XPath fragment using child and descendant- reg 1 ⊕ reg 2 , where n11 n21 · · · ni1 is a sequence of states in N1
or-self axes with predicates. Although our XPath fragment and n12 n22 · · · ni2 is a sequence of states in N2 . reg 1 ⊕ reg 2 is
supports no upward axes, this gives usually little problem constructed as follows.
since the majority of XPath queries uses only downward 1. reg ′1 is obtained from reg 1 by replacing each state n
axes[10]. Thus, we believe that our algorithm is useful to by [n1 , n12 ]|[n1 , n22 ]| · · · |[n1 , nk2 ], where n12 , n22 , · · · , nk2
correct a large number of XPath queries. is an enumeration of N2 . Similarly, reg ′2 is obtained
from reg 2 .
Related Work
The study most related to this paper is [13]. This study 2. Construct two automata A1 and A2 from reg ′1 and
proposes an algorithm for transforming XPath expressions reg ′2 , respectively.
according to schema evolution, assuming that a schema 3. Construct the product automaton of A1 and A2 .
monotonically increases (no element can be deleted from a
schema). On the other hand, this paper has no such an as- 4. Construct a regular expression, reg 1 ⊕ reg 2 , from the
sumption and allows more general schema updates. Since product automaton.
actual schema evolutions usually involve element deletions The product automaton of TA1 and TA2 is TA3 = (N1 ×
or some similar updates, our algorithm is more practical in N2 , Σ, s1 × s2 , P3 ), where
real world situations. To the best of our knowledge, there
is no study on transforming XPath expressions according to P3 = {[n1 , n2 ] → a(reg1 ⊕ reg2 ) | (n1 → a(reg1 ) ∈ P1 ,
schema evolution, except [13]. However, several studies deal n2 → a(reg2 ) ∈ P2 )
with update operations to schemas. For example, Ref. [18] ∪{[n1 , n2 ] → [n1 , n′2 ] | (n1 ∈ N1 , n2 → n′2 ∈ P2 )}
proposes a “complete” set of update operations to DTDs.
Refs. [12, 9] propose update operations schemas and algo- ∪{[n1 , n2 ] → [n′1 , n2 ] | (n1 → n′1 ∈ P1 , n2 ∈ N2 )}}.
rithms for extracting “diff” between two schemas. Refs. [7, By definition, it is immediate that for any tree t, t ∈ L(T A3 )
8, 19] propose update operations that assures any updated if and only if t ∈ L(T A1 ) and t ∈ L(T A2 ).
schema contains its original schema. Recently, Ref. [16] Let Σ be a set of element names. We define XPath ex-
introduces a taxonomy of possible problems induced by a pression p as follows.
schema change, and gives an algorithm to detect such prob-
lems. Ref. [11] studies query-update independence analysis, • p ::= /p′
and shows that the performance of [3] can be drastically • p′ ::= χ :: l | p′ /p′ | p′ [q], where l ∈ Σ
enhanced in the use of µ-calculus.
• χ ::=child | descendant-or-self
2. DEFINITIONS • q ::= p′
In this section, we define tree automaton and the product We call each χ ∈ {child, descendant-or-self} axis, and q pred-
of tree automata. icate, respectively.
N = {School, Student, ID, Name, Address, Supervisor, Pcdata},
Σ = {school, student, id, name, address, supervisor, pcdata},
P = {School → school(Student∗ ), Student → student(ID Name Address Supervisor?), ID → id(Pcdata),
Name → name(Pcdata), Address → address(Pcdata), Supervisor → supervisor(Pcdata), Pcdata → pcdata(ϵ)}.
Figure 2: Tree automaton T Aa = (N, Σ, School, P )
• del opr(A, a, i): Deletes the operator at position i in
the content model of r, where r is the transition rule
in P such that the left-hand side of r is A and that the
label of r is a.
• nest state(A, a, B, b, i): Replaces the subexpression E
at position i (i.e., subtree rooted at position i) in con-
tent model of r by state B, where r is the transition
rule in P such that the left-hand side of r is A and
Figure 3: Tree representation of (A|B)∗ that the label of r is a. Moreover, a transition rule
B → b(E) is added to P .
• unnest state(A, a, b, i): This is the inverse operation
3. UPDATE OPERATION TO TREE AU- of nest state, and replaces the state B at position i
TOMATON in the content model of r by reg, where (1) r is the
In this section, we define update operations to tree au- transition rule in P such that the left-hand side of r is
tomaton. A and that the label of r is a and (2) reg is the content
Let reg be a regular expression. To define update oper- model of the transition rule whose left-hand side is B
ations to a tree automaton, we need to identify the posi- and label is b.
tions of states and operators in reg. Thus we define the set
• replace state(A, a, B, i): Replaces the state at position
pos(reg) of positions in a reg, as follows.
i in the content model of r by B. In this paper, this
• If reg = ϵ or reg = a (a ∈ Σ), then pos(reg) = {λ}. operation is treated as a pair of unnest state(A, a, i)
and nest state(A, a, B, b, i). This operation is used in
• Otherwise, pos(reg) = {λ} ∪ {u | u = vw, 1 ≤ v ≤ order to “rename” element names.
n, w ∈ pos(reg v )}, where reg v is a subexpression con- For example, consider the tree automaton T Aa in Fig. 2.
sisting of the descendants of v in reg. Applying nest state(School, school, Students, students, λ)
to T Aa , we obtain the tree automaton T Ab in
For example, let reg = (A|B)∗ . Then pos(reg) = Fig. 4. Then applying del opr(Student, student, 4) and
{λ, 1, 11, 12}. As shown in Fig. 3, λ is the position of ∗, del state(Student, student, 4) to Tb , we obtain the tree
1 is the position of |, 11 is the position of A, and 12 is the automaton T Ac in Fig. 5.
position of B.
In the following, without loss of generality we assume
that for any state A and any element name a, there is at 4. TRANSFORMATION FROM XPATH
most one transition rule whose left-hand side is A and la- EXPRESSION INTO TREE AUTOMA-
bel is a. If there are transition rules A → a(reg1 ) and TON
A → a(reg2 ), then these can be merged into one transition
Our algorithm treats an XPath expression as a tree au-
rule A → a(reg1 |reg2 ).
tomaton. Thus, in this section we define a transformation
We now define update operations to a tree automaton
from an XPath expression into a tree automaton. This
T A = (N, Σ, s, P ) as follows.
transformation is based on [15].
• ins state(A, a, B, i): Inserts new state B at position i An XPath expression is transformed into a tree automaton
in the content model of r, where r is the transition rule as follows. First, we transform a location step χ :: l into a
in P such that the left-hand side of r is A and that the tree automaton. Each transformed tree automaton has input
label of r is a. and output states, corresponding to the “context node” for
χ :: l and the “result node” selected by χ :: l, respectively.
• ins opr(A, a, opr, i): Inserts an operator opr (∗, +, ?) For an XPath expression p of the form p1 /p2 , p′ [q], or /p′ , we
at position i in the content model of r, where r is the transform p into a tree automaton in a bottom-up manner,
transition rule in P such that the left-hand side of r is according to the structure of p.
A and that the label of r is a. We first define a tree automaton (N, Σ, s, P ) correspond-
ing to location step χ :: l. In the following, A → σ(· · · )
• del state(A, a, i): Deletes the state at position i in the denotes {A → σ(· · · ) | σ ∈ Σ}.
content model of r, where r is the transition rule in • If χ = child, then
P such that the left-hand side of r is A and that the
label of r is a. – The initial state s is B.
N = {School, Students, Student, ID, Name, Address, Supervisor, Pcdata},
Σ = {school, students, student, id, name, address, supervisor, pcdata},
P = {School → school(Students), Students → students(Student∗ ), Student → student(ID Name Address Supervisor?),
ID → id(Pcdata), Name → name(Pcdata), Address → address(Pcdata), Supervisor → supervisor(Pcdata), Pcdata → pcdata(ϵ)}.
Figure 4: Tree automaton T Ab = (N, Σ, School, P )
N = {School, Students, Student, ID, Name, Address, Pcdata},
Σ = {school, students, student, id, name, address, pcdata},
P = {School → school(Students), Students → students(Student∗ ), Student → student(ID Name Address), ID → id(Pcdata),
Name → name(Pcdata), Address → address(Pcdata), Pcdata → pcdata(ϵ)}.
Figure 5: Tree automaton T Ac = (N, Σ, School, P )
– P consists of the following transition rules. The case of p[q]
∗ A → σ(A∗ ) Let T Ap = (Np , Σ, sp , Pp ) be the tree automaton of p and
∗ B → σ(A∗ (B|C)A∗ ) T Aq = (Nq , Σ, sq , Pq ) be the tree automaton of q. Moreover,
∗ B→C let T A = (N, Σ, s, P ) be the product automaton T Ap and
∗ C → σ(A∗ DA∗ ) T Aq . Then the tree automaton T Ap[q] of p[q] is defined as
∗ D → l(A∗ ) T Ap[q] = (Np[q] , Σ, s, P ), where
• If χ = descendant-or-self, then Np[q] = {[np , nq ] | (np ∈ N Op , nq ∈ N Iq ) ∨
(np ∈ (Np − N Op ), nq ∈ (Nq − N Iq ))}.
– Initial state s is B1 .
– P consists of the following transition rules. The set N Ip[q] of input states and the set N Op[q] of output
∗ A → σ(A∗ ) states of T Ap[q] are defined as follows.
∗ B1 → σ(A∗ (B1 |C)A∗ ) N Ip[q] = {[np , nq ] | np ∈ N Ip , nq ∈ (Nq − N Iq )},
∗ B1 → C N Op[q] = {[np , nq ] | (np ∈ N Op , nq ∈ N Iq )}.
∗ C → σ(A∗ (B2 |D)A∗ )
∗ C→D The case of /p
∗ B2 → σ(A∗ (B2 |D)A∗ ) Let χ :: l be the first location step in p and let T Ap =
∗ D → l(A∗ ) (Np , Σ, sp , Pp ) be the tree automaton of p. Then the tree
automaton T A/p of /p is defined as T A/p = (N/p , Σ, R, P/p ),
The input state and the finite set of output states of a finite where
tree automata are {C} and {D}, respectively. C represents
the context node and D represents the node selected by the N/p = Np ∪ {R},
location step χ :: l. P/p = Pp ∪ {R → root(D)},
We next define the transformation from an XPath expres-
sion p into a tree automaton. We have to consider the case where R is the initial state of T A/p and root is the element
where p = p1 /p2 , p = p[q], and p = /p. In the following, for name corresponding to the root node.
a tree automaton T Ap of p, Np denotes the set of states of The set N I/p of input states and the set N O/p of output
T Ap , N Ip ⊂ Np denotes the set of input states of T Ap , and states of T A/p are defined as follows.
N Op ⊂ Np denotes the set of output states of T Ap . N I/p = {R},
The case of p1 /p2 N O/p = N Op ,
Let T Ap1 = (Np1 , Σ, sp1 , Pp1 ) be the tree automaton of p1 where N Op is the set of output states of T Ap .
and T Ap2 = (Np2 , Σ, sp2 , Pp2 ) be the tree automaton of p2 .
Moreover, let T A = (N, Σ, s, P ) be the product automaton
of T Ap1 and T Ap2 . Then the automaton T Ap1 /p2 of p1 /p2
5. ALGORITHM FOR TRANSFORMING
is defined as T Ap1 /p2 = (Np1 /p2 , Σ, s, P ), where XPATH EXPRESSION ACCORDING TO
Np1 /p2 = {[np1 , np2 ] | (np1 ∈ N Op1 , np2 ∈ N Ip2 ) ∨ SCHEMA EVOLUTION
(np1 ∈ (Np1 − N Op1 ), np2 ∈ (Np2 − N Ip2 ))}. In this section, we show an algorithm for transforming a
given XPath expression according to schema evolution.
The set N Ip1 /p2 of input states and the set N Op1 /p2 of out- To describe our algorithm, we need some definitions. For
put states of T Ap1 /p2 are defined as follows. an XPath expression p, the selection path of p is the XPath
expression obtained by dropping every predicate from p.
N Ip1 /p2 = {[np1 , np2 ] | np1 ∈ N Ip1 , np2 ∈ (Np2 − N Ip2 )},
Let p be an XPath expression of the form /p1 [q]/p2 . Then
N Op1 /p2 = {[np1 , np2 ] | np1 ∈ (Np1 −N Op1 ), np2 ∈ N Op2 }. /p1 /q is called a predicate path of p (the predicate path for
a predicate in q can be defined similarly). For example, Let – If all the result elements retrieved by p disappear
p = /a[f ]/b[c/e]/d. Then /a/b/d is the selection path of p, from T A due to op, the result of p becomes empty.
and /a/f and /a/b/c/e are the predicate paths of p. Then our algorithm returns nil.
Let us first show the “main” algorithm (Fig. 6). Let op be
– Otherwise, some element of op(T A) can still be
the update operation applied to tree automaton T A. More-
retrieved by p. Thus our algorithm transforms p
over, let p be the input XPath expression, p0 be the selection
so that such elements are retrieved.
path of p and p1 , · · · , pk be the predicate paths of p. We
transform pi to p′i according to op, for each i = 0, 1, · · · , k,
and merge the resulting k expressions p′0 , p′1 , · · · , p′k as the Now we present function Transform (Fig.7).
result. More concretely, the algorithm first partition p into state(A, a, i, P ) in steps 16, 23, and 29 denotes the
p0 , p1 , · · · , pk (step 1). Then p0 is transformed into p′0 ac- state at position i in the content model of r, where r is
cording to op by function Transform (shown later). If the the transition rule in P such that its left-hand side is A
result of p0 becomes empty due to del state, then p′0 = nil and its label is a. If op is ins state, ins opr, or del opr, p
and the transformation of p is terminated (steps 3 and 4). is unchanged (steps 4 to 7). If op is del state, we examine
Otherwise, we transform pi for each i = 1, · · · , k (steps whether the result of p becomes empty under op(T A) (steps
6 to 8), by Transform. Finally, p0 , p1 , · · · , pk are merged 8 to 11). If the result of p does not become empty, p is
and returned as the result. If a predicate path p′i is nil, unchanged (steps 11 to 12). If the result of p becomes
then p′i is just ignored when merged. For example, let empty and p is the selection path, the algorithm returns nil
p = /a[f ]/b[c/e]/d. Then p0 = /a/b/c, p1 = /a/f , and (steps 13 to 14). Otherwise (i.e., p is a predicate path), we
p2 = /a/b/c/e. Suppose that c is deleted by unnest state. delete the suffix of p that becomes invalid due to op (steps
Then we obtain p′0 = /a/b/d, p′1 = /a/f , and p′2 = /a/b/e 15 to 20). In step 17, P ′′ is obtained in step 2, and we
due to Transform, and merging these three expressions we say that r is a transition rule from A to C if the left-hand
have p′ = /a[f ]/b[e]/d, which is the result of the algorithm. side of r contains A and a state in the content model of
Let us next consider function Transform. Let T A = r contains C. In step 18, we say that lsj corresponds to
(N, Σ, s, P ) be a tree automaton, op be an edit operation r ∈ P ′′ if r is the intersection of (i) the transition rule
to T A, and p be an XPath expression having no predicate. corresponding to lsj in Pp and (ii) some transition rule in
Our objective is to transform p into an XPath expression p′ P . If op is nest state(A, a, B, b, i), we examine whether we
so that the result of p′ under op(T A) coincides with that of p need to transform p (steps 23 to 24). If we need to do, we
under T A. However, this is sometimes impossible if a state insert a new location step child :: b to p (steps 25 and 26). If
(and its corresponding element) is deleted from T A. Thus, op is unnest state(A, a, b, i), then we also examine whether
Transform is constructed as follows. we need to transform p (steps 30 to 31). Moreover, if the
location step lsj+1 to be deleted has a predicate q and the
• If op is ins state, ins opr, or del opr, then p is un- axis of lsj+1 is descendant-or-self, we delete q (steps 32 to
changed. 33). Otherwise, we delete lsj+1 from p (steps 34 to 38).
Finally, let us present the time complexity of the al-
• If op is nest state(A, a, B, b, i), then location step
gorithm. Let T A = (N, T, s, P ) be a tree automaton
child :: b is inserted to p at the position that nest-
(schema) p be an XPath expression that is partitioned into
ing is occurred, if necessary (i.e., unless p contains a
p0 , p1 , p2 , · · · , pk . Then the algorithm runs in O(k · |p|2 ·
descendant-or-self axis that “masks” the nest state op-
|T A|), where |p| is the number of location steps in p and
eration).
|T A| is the size of T A.
• If op is unnest state(A, a, b, i), then a location step
whose node test is b is deleted from p, if it is not
“masked” by a descendant-or-self axis. 6. EXPERIMENTAL RESULTS
To verify if our algorithm transforms XPath expressions
• If op is del state, then p is modified as follows. appropriately under real world schemas, we implemented
our algorithm (in Ruby) and made a few experimentations.
Input : XPath expression p, tree automaton T A, update opera- We use two pairs of schemas, MSRMEDOC DTDs (version
tion op to T A. 2.1.1 and 2.2.2)[2] and the NLM Journal Publishing Tag Set
Output : XPath expression or nil
Tag Library DTDs (version 2.3 and 3.0)[1].
1. Partition p into the selection path p0 and the predicate path First, we give the evaluation of our algorithm on MSRME-
p 1 , p2 , · · · , p k .
DOC DTDs. Let D211 be the version 2.1.1 MSRMEDOC
2. p′0 ← Transform(p0 , T A, op); DTD and D222 be the version 2.2.2 MSRMEDOC DTD. The
3. if p′0 = nil then number of elements of D211 is 185 and that of D222 is 205.
4. return nil; Table 1 shows the number of update operations between
D211 and D222 , where “others” are attributes insertions (not
5. else
supported by our algorithm).
6. for i ← 1 to k do We generate 90 XPath expressions under D211 by XQ-
7. p′i ← Transform(pi , T A, op); gen[20]. The average size (i.e., number of location steps) of
8. end the XPath expressions is 5, where the minimum size is 4 and
9. Merge p′0 , p′1 , · · · , p′k into p′ . the maximum size is 7. Each XPath expression uses child
axes and at most one descendant-or-self axes (78 XPath ex-
10. return p′ ;
pressions use a descendant-or-self axis). There are 5 XPath
expressions whose result becomes empty under D222 . Since
Figure 6: Main algorithm there is no predicate in these 90 XPath expressions, we
Table 1: Update operations between D211 and D222 and between D23 and D30
ins state del state nest state unnest state replace state ins opr/del opr others total
D211 → D222 61 11 27 0 0 60 32 191
D23 → D30 504 82 3 0 24 0 120 733
choose 4 expressions and add a predicate to each of 4 ex-
pressions (given later).
Function Transform(p, T A, op) We transform the above 90 XPath expressions by our al-
Input : XPath expression p = /ls1 / · · · /lsm , tree automaton gorithm. For 85 expressions, the elements retrieved under
T A = (N, Σ, s, P ), update operation op to T A. D222 coincides with the elements retrieved under D211 . For
Output : XPath expression or nil the rest 5 XPath expressions, our algorithm returns nil since
1. Construct the tree automaton T Ap = (Np , Σ, sp , Pp ) of p the results of these expressions become empty under D222 .
2. Construct the product automaton T A′′ = (ND × These 5 XPath expressions are shown in Table 2 (deleted
Np , Σ, sD × sp , P ′′ ) of T A and T Ap elements are italicized). Since elements LIST, NOTE, FIG-
3. switch op URE, and FORMULA (child elements of REMARK) and an
4. case ins state(A, a, B, i) element PRIVATE-CODES (child element of COMPANY-
5. case ins opr(A, a, opr, i) DOC-INFO) are deleted, the results of these XPath expres-
6. case del opr(A, a, i) sions become empty (note that if an element is deleted from
7. return p; a content model, then its descendants cannot be retrieved
8. case del state(A, a, i) either).
9. Construct the tree automaton T A′ = (N ′ , Σ, s′ , P ′ ) of The four XPath expressions with a predicate, denoted
op(T A) p1 , p2 , p3 , p4 , are transformed as follows (Table 3).
10. Construct the product automaton T A′p = (N ′ ×Np , Σ, s′ ×
sp , Pp′ ) of T A′ and T Ap • Since element REMARK (child element of
11. if L(T A′p ) ̸= ∅ then COMPANY-REVISION-INFO) is deleted, the
12. return p; predicate of p1 is deleted.
13. else if p is the selection path then
• Since element REMARK is deleted, the last location
14. return nil;
step in the predicate of p2 is deleted.
15. else
16. C ← state(A, a, i, P ) // C is deleted • Since element REMARK is deleted, the location
17. if P ′′ contains a transition rule r from A to C then step corresponding to an element REMARK in p3 is
18. Let lsj be the location step in p corresponding to r; deleted, therefore our algorithm return nil.
19. p ← /ls1 / · · · /lsj−1 ;
20. end • Since element L-1 is inserted between P and FT and
21. end between P and STD, two location steps L-1 are in-
22. case nest state(A, a, B, b, i) serted to p4 .
23. C ← state(A, a, i, P ) // C is nested
24. if P ′′ contains a transition rule r from A to C then As shown above, we can say that every XPath expres-
25. Let lsj be the location step in p corresponding to r; sion is transformed appropriately by our algorithm, even if
26. p ← /ls1 / · · · /lsj /child :: b/lsj+1 / · · · /lsm ; del state’s are involved in the schema evolution. The total
27. end execution time of the algorithm for the 90 XPath expres-
28. case unnest state(A, a, b, i)
sion and 191 update operations is 9.58 sec, thus it takes an
29. C ← state(A, a, i, P ) C is unnested
average 0.111 sec per one XPath expression.
30. if P ′′ contains a transition rule r from A to C then
We also made a similar experimentation using the NLM
31. Let lsj be the location step in p corresponding to r;
Journal Publishing Tag Set Tag Library. Let D23 be the
32. if the axis of lsj+1 is descendant-or-self and lsj+2 is the
version 2.3 The NLM Journal Publishing Tag Set Tag Li-
first location step of a predicate then brary DTD and D30 be the version 3.0 DTD. The number
33. p ← /ls1 / · · · /lsj−1 ; of elements of D23 is 211 and that of D30 is 233. Table 1
34. else shows the number of update operations between D23 and
35. axis ← the axis of lsj+1 ; D30 .
36. l ← the node test of lsj+2 ; We generate 97 XPath expressions under D23 by XQgen.
37. p ← /ls1 / · · · /lsj /axis :: l/lsj+3 / · · · /lsm ; The average size of the XPath expressions is 6. Each XPath
38. end expression uses child axes and at most once descendant-or-
39. end self axes, where 95 XPath expressions include descendant-
40. end or-self axes. There are 10 XPath expressions whose result
41. return p; becomes empty under D30 . There is no XPath expression
with a predicate, thus we choose 3 XPath expressions and
added a predicate to each expression.
Figure 7: Function Transform
We transform the 97 XPath expressions by our algorithm.
For 87 expressions, the elements retrieved under D30 coin-
cides with the elements retrieved under D23 . For the rest
10 XPath expressions, our algorithm returns nil since the [7] G. Guerrini, M. Mesiti, and D. Rossi. Impact of XML
results of these expressions become empty under D30 , which schema evolution on valid documents. In Proc.
are listed in Table 4. Since elements citation, contract-num, WIDM, pages 39–44, 2005.
contract-sponsor in element p are deleted, the results of [8] K. Hashimoto, Y. Ishihara, and T. Fujiwara. A
these XPath expressions become empty. The three XPath proposal of update operations for schema evolution in
expressions with a predicate, denoted p5 , p6 , p7 , are trans- XML databases and their properties on preservation
formed as follows (Table 5). of schema’s expressive power. IEICE Transactions on
Information and Systems (Japanese Edition),
• Since element chem-struct (child element of ack) is J90-D(4):990–1004, 2007.
deleted, the predicate of p5 is deleted.
[9] K. Horie and N. Suzuki. Extracting differences
• Since element chem-struct-wrapper is renamed to between regular tree grammars. In Proc. ACM SAC,
chem-struct-wrap, the predicate of p6 is renamed ac- pages 859–864, 2013.
cordingly. [10] Z. G. Ives, A. Y. Halevy, and D. S. Weld. An XML
query engine for network-bound data. The VLDB
• Since element custom-meta-wrap is renamed to Journal, 11(4):380–402, 2002.
custom-meta-group, the corresponding location step in [11] M. Junedi, P. Genevès, and N. Layaı̈da. Xml
p7 is renamed to custom-meta-group. query-update independence analysis revisited. In
Proceedings of the 2012 ACM symposium on
Again our algorithm seems to work well despite of del state Document engineering, DocEng ’12, pages 95–98, 2012.
operations. These results suggest that our algorithm can be
[12] E. Leonardi, T. T. Hoai, S. S. Bhowmick, and
applied to XPath expressions under real world schema evo-
S. Madria. DTD-Diff: a change detection algorithm
lutions. The total execution time of the algorithm for the 97
for DTDs. In Proc. DASFAA, pages 817–827, 2006.
XPath expression and 733 update operations is 77.561 sec,
[13] T. Morimoto, K. Hashimoto, Y. Ishinara, and
thus it takes an average 0.800 sec per one XPath expression.
T. Fujiwara. Translating XPath queries according to
XML schema evolution based on the tree-embedding
7. CONCLUSION relation. In IEICE Technical Report, vol.107, no.131,
In this paper, we proposed an algorithm for transforming DE2007-40, pages 109–114, 2007 (in Japanese).
an XPath expression according to schema evolution allow- [14] M. Murata, D. Lee, M. Mani, and K. Kawaguchi.
ing element deletions. However, this is just an ongoing work Taxonomy of XML schema languages using formal
and we have a lot to do. First, we would like to extend language theory. ACM Transactions on Internet
our algorithm so that it can handle more general XPath ex- Technology, 5:660–704, 2005.
pressions, e.g., supporting sibling and parent axes. Second, [15] A. Ohno, Y. Ishihara, and T. Fujiwara. Method of
we use only two schema evolutions in our experimentation. analyzing the correspondence between types defined
Thus we would like to evaluate our algorithm under more by an XML schema and XPath subexpressions. In
real world schemas. Third, our algorithm supports neither IPSJ SIG-Report, 2009-MPS-75(20), pages 1–7, 2009
attribute nor entity declaration in schemas. These should (in Japanese).
be incorporated into our algorithm. [16] R. Oliveira, P. Genevès, and N. Layaı̈da. Toward
automated schema-directed code revision. In
8. ACKNOWLEDGEMENT Proceedings of the 2012 ACM symposium on Document
This work is partly supported by Grants-in-aid for Scien- engineering, DocEng ’12, pages 103–106, 2012.
tific Research (23500110). [17] Y. Papakonstantinou and V. Vianu. DTD inference for
views of XML data. In Proc. ACM PODS, pages
35–46, 2000.
9. REFERENCES [18] H. Su, D. Kramer, L. Chen, K. Claypool, and E. A.
[1] “Journal Publishing Tag Set” NLM Journal Archiving Rundensteiner. XEM: managing the evolution of XML
and Interchange Tag Suite. documents. In Proc. IEEE RIDE, pages 103–110,
http://dtd.nlm.nih.gov/publishing/. 2001.
[2] “MSR Download” MSR Home. [19] N. Suzuki. An edit operation-based approach to the
http://www.msr-wg.de/medoc/downlo.html. inclusion problem for DTDs. In Proc. ACM SAC,
[3] M. Benedikt and J. Cheney. Destabilizers and pages 482–488, 2007.
independence of xml updates. Proc. VLDB Endow., [20] Y. Wu, N. Lele, R. Aroskar, S. Chinnusamy, and
3(1-2):906–917, 2010. S. Brenes. XQGen: an algebra-based xpath query
[4] A. Berglund, S. Boag, D. Chamberlin, M. F. generator for micro-benchmarking. In Proc. CIKM,
Fernandez, M. Kay, J. Robie, and J. Simeon, editors. CIKM ’09, pages 2109–2110, 2009.
XML Path Language (XPath) 2.0 (Second Edition).
http://www.w3.org/TR/xpath20/.
[5] T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler,
and F. Yergeau, editors. Extensible Markup Language
(XML) 1.0 (Fifth Edition).
http://www.w3.org/TR/xml/.
[6] B. Choi. What are real DTDs like? In Proc. WebDB,
pages 43–48, 2002.
Table 2: XPath expressions whose results become empty under D222
/MSRREP/MATCHING-DCIS/MATCHING-DCI/REMARK/LIST/ITEM
/MSRREP/MATCHING-DCIS/MATCHING-DCI/REMARK/NOTE
/MSRREP/MATCHING-DCIS/MATCHING-DCI/REMARK/FIGURE/DESC
/MSRREP/MATCHING-DCIS/MATCHING-DCI/REMARK/FORMULA/FORMULA-CAPTION/LONG-NAME
//ADMIN-DATA/COMPANY-DOC-INFOS/COMPANY-DOC-INFO/PRIVATE-CODES/PRIVATE-CODE
Table 3: XPath expressions with predicate under D211 and D222
XPath expression p1 : //DOC-REVISION/COMPANY-REVISION-INFOS/COMPANY-REVISION-
INFO[REMARK]/COMPANY-REF
Transformed result: //DOC-REVISION/COMPANY-REVISION-INFOS/COMPANY-REVISION-INFO/COMPANY-REF
XPath expression p2 : //DOC-REVISION[COMPANY-REVISION-INFOS/COMPANY-REVISION-
INFO/REMARK]/MODIFICATIONS/MODEFICATION
Transformed result: //DOC-REVISION[COMPANY-REVISION-INFOS/COMPANY-REVISION-
INFO]/MODIFICATIONS/MODEFICATION
XPath expression p3 : //DOC-REVISION[COMPANY-REVISION-INFOS/COMPANY-DOC-INFO/PRIVATE-
CODES/PRIVATE-CODE]/DOC-REVISIONS/DOC-REVISION/COMPANY-REVISION-
INFOS/COMPANY-REVISION-INFO/REMARK
Transformed result: nil
XPath expression p4 : //P[FT]/STD
Transformed result: //P[L-1/FT]/L-1/STD
Table 4: XPath expression whose results become empty under D30
//table-wrap-group/table-wrap/speech/p/citation/conf-name
//table-wrap-group/table-wrap/speech/p/citation/conf-loc
//table-wrap-group/table-wrap/speech/p/citation/comment
//table-wrap-group/table-wrap/speech/p/citation/article-title
//table-wrap-group/table-wrap/speech/p/citation/annotation
//list-item/p/contract-num/named-content/ack/p
//p/contract-num/named-content/ack/boxed-text/sec
//p/contract-sponsor/named-content/verse-group/verse-group/verse-group
//statement/p/contract-sponsor/named-content/verse-group/verse-group
//statement/p/contract-sponsor/named-content/verse-group/verse-line
Table 5: XPath expressions with predicate under D23 and D30
XPath expression p5 : //list-item/p/contract-num/named-content/ack[chem-struct]/p
Transformed result: //list-item/p/contract-num/named-content/ack/p
XPath expression p6 : //license/p/preformat/named-content/supplementary-material[chem-struct-wrapper]/disp-
quote
Transformed result: //license/p/preformat/named-content/supplementary-material[chem-struct-wrap]/disp-quote
XPath expression p7 : /article/front/journal-meta/custom-meta-wrap[custom-meta/meta-name]/custom-meta/meta-
value
Transformed result: /article/front/journal-meta/custom-meta-group[custom-meta/meta-name]/custom-meta/meta-
value