<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Coupling Fragments of XPath with XML Indexing and Query Decomposition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>George H.L. Fletcher</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dirk Van Gucht</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuqing Wu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc Gyssens</string-name>
          <email>marc.gyssens@uhasselt.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Paredaens</string-name>
          <email>jan.paredaens@ua.ac.be</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hasselt University &amp; Transnational University Limburg</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Indiana University</institution>
          ,
          <addr-line>Bloomington</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Antwerp</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent studies have proposed structural summary techniques for pathquery evaluation on semi-structured data sources. One major line of this research has been the introduction of the DataGuide, 1-index, 2-index, and A(k) indices, and subsequent investigations and generalizations. Another recent study has considered structural characterizations of fragments of XPath, the standard path navigation language for XML documents. In this paper we provide a methodology on XPath query processing that couples these two areas of research on structural indices and query languages. To illustrate this methodology, we apply it to couple an upward-only XPath fragment with the A(k) and P (k) structural indices. With an eye towards applying this result to XPath query processing, we (1) show how upward-only XPath expressions can be evaluated directly on the corresponding indices and (2) leverage these results to develop generic techniques for making effective use of A(k) and P (k) indices for more general, frequently occurring XPath expressions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>XML (eXtensible Markup Language) [2] provides a standard data format that is flexible
enough to be customized for various domains. An increasing number of applications
use XML as their data model or as the format for exchanging data among
heterogeneous data repositories for the purpose of collaboration, data integration, and
information sharing, and querying XML document databases.</p>
      <p>XQuery is currently the most popular XML query language [4]. The fundamental
building block of XQuery is XPath [5]. XPath expressions are used to specify small
node-labeled trees which match portions of the hierarchical XML data. How to support
efficient access to XML data using XPath continues to be a critical research problem in
this domain.</p>
      <p>In XPath query evaluation, indices similar to those used in relational database
systems – namely, value indices on tags and text values – are first used, together with
structural join algorithms [3, 1, 21]. This approach turns out to be simple and efficient.</p>
      <p>Index</p>
      <p>Description</p>
      <p>Labeling
DataGuide [7] Every unique Can not use in- Target node set
reachlabel path of a coming path for able by an XPath
exsource appear node labeling. pression is the superset
exactly once in of the result.</p>
      <p>the index graph.</p>
      <p>Strong Distinguishes Nodes that share Labeled by the Can be used to directly
DataGuide [7] nodes that are the same incom- unique incoming answer linear XPath
1-index [14] reachable by ing path up to the path up to the query of any length.</p>
      <p>different paths. root root
2-index[14] Distinguishes Pairs (vectors) of Can be used to
diT-index[14] pairs (vectors) of nodes that share rectly answer queries
nodes that shares the same paths that match a given
temthe same set of plate.</p>
      <p>paths.</p>
      <p>A(k)-index Distinguishes Nodes that share Can be used to
evalu[12] nodes with in- the same incom- ate linear XPath
exprescoming path up ing path, up to sion directly when local
to length k. length k similarity is sufficient.</p>
      <p>D(k)- Uses local index Nodes that shares Need verification when
index[17] to adapt to query the same incom- the XPath query
expresM (k)- workload ing path, accord- sion is longer than k.
index [9] ing to local
simi</p>
      <p>larity.</p>
      <p>However, the structural containment relationships native to XML data are not directly
captured by value indices.</p>
      <p>To directly capture the structural information of XML data, a family of XML
indices has been introduced. DataGuide [7] was the first to be proposed, followed by
the 1-index [14], which is based on a notion of bisimulation among nodes in an XML
document. These indices can be used to evaluate path expression accurately without
accessing the original data graph. Milo and Suciu [14] also introduced the 2-index and
T-index, based on similarity of pairs (vectors) of nodes. Unfortunately, these and other
early structural indices tend to be too large for practical use because they typically
maintain too fine-grained structural information about the document [10, 18].</p>
      <p>To remedy this, Kaushik et al. introduced the A(k)-index [12] which uses a notion of
bisimilarity on nodes relativized to paths of length k from nodes. This captures localized
structural information of a document, and can support path expressions of length up to
k. Focusing just on local similarity, the A(k)-index can be substantially smaller than
the 1-index and others.</p>
      <p>Several works have investigated maintenance and tuning of the A(k) indices. The
D(k)-index [17] and M (k)-index [9] extend the A(k)-index to adapt to query
workload. Yi et al. [20] developed update techniques for the A(k)-index and 1-index. Finally,
the integrated use of structural and values indices has been explored [11], and there have
also been investigations on covering indices [10, 18] and index selection [16, 19]. The
characteristics of many of these structural indices are summarized in Table 1.</p>
      <p>The introduction of structural indices for XML data has lead to significant
improvements in the performance of XPath expression evaluation. Furthermore, great advances
have been made on the theoretical and practical underpinnings of expression
processing [13]. However, to date there lacks a general framework for investigating some of the
most fundamental questions about using indices in expression evaluation. Such
questions include:
1. For which classes of XPath expressions are structural indices ideally suited?
2. Then, for these classes, how are its expressions optimally evaluated with the index?
3. Can the answers to these questions be bootstrapped to provide general techniques
for evaluation of arbitrary XPath expressions?</p>
      <p>
        To answer question (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), we follow a general, theoretical approach to providing
structural characterizations of fragments of XPath recently proposed by Gyssens et al. [8]. In
that paper, the well-known concept of semantic indistinguishability of objects and
relationships in a query language (in our case, nodes and paths in the tree-representations
of the XML documents, and XPath) is related to the purely structural, topological,
indistinguishability of nodes and paths in these tree-representations. For example, in that
paper it is shown that for a downward fragment of XPath, semantic indistinguishability
of nodes is equivalent to downward-bisimilarity indistinguishability of these nodes.
      </p>
      <p>
        Recaputilating, recent studies have (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) introduced structural indexes for efficient
XPath query processing and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) related semantic-indistinguishability with structural
indistinguishabilty of nodes and paths for certain fragments of XPath. In this paper we
introduce a new methodology that couples these two areas in a principled way, and that
is aimed for the benefit of efficiently evaluating classes of XPath expressions.
      </p>
      <p>
        To illustrate this methodology, we apply it to couple the important concept of A(k)
and P (k) structural indices with a certain upward XPath fragment. With an eye towards
applying this result to XPath query processing, we (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) show how upward-only XPath
expressions can be evaluated directly on the corresponding A(k) and P (k) indices and
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) leverage these results to develop XPath query-decomposition techniques for making
effective use of A(k) and P (k) indices for more general, but, in practice, frequently
occurring XPath expressions.
      </p>
      <p>
        In particular, we give a precise characterization of the family of A(k)-indices in
terms of a downward fragment of XPath (Section 3). With this result in hand, we
answer question (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) by showing how expressions in these fragments can be evaluated
directly in terms of the A(k) and P (k) indices (Section 3). Finally, to address question
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), we leverage our results on (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) to develop techniques for making
effective use of A(k) and P (k) indices for important practical classes of XPath (Section
4). These results identify a new methodology for the study of the interaction between
structural indices and the evaluation of XPath expressions. Furthermore, the nature of
this methodology is such that it sheds light on techniques with immediate practical
application. As such, our methodology is a good example of how the theory and practice
for XPath query processing can be nicely linked.
      </p>
      <p>n0
n4 Project
n5 Project</p>
      <p>Project n6
n2
2.1 The XPath-Algebra
We first introduce the XML data model and algebraization of the navigational core
of XPath, following the presentation of Gyssens et al. [8]. In this paper, documents
are finite unordered node-labeled trees. More formally, a document D is a 4-tuple
(V; Ed; r; ¸), with V the finite set of nodes, Ed µ V £ V the set of edges, r 2 V
the root, and ¸ : V ! L a node-labeling function into a countably infinite set of labels
L. A visualization of a fragment of a document is given in Figure 1. Useful notions of
height and distance in a document are defined as follows.</p>
      <p>Definition 1. Let D = (V; Ed; r; ¸) be a document and n; m 2 V . The distance from
n to m in D, distance(n; m), is the length of the path from n to m in D. The height of
D, height(D), is the length of the longest path in D starting from the root r. In other
words, height(D) = maxfdistance(r; n) j n 2 V g.</p>
      <p>We next present an algebraization of the logical navigational core of XPath.
Definition 2. The XPath-algebra consists of the primitives "; ;; #; ", and ` together with
the operations on expressions E1 ¦ E2; E1[E2]; E1 [ E2; E1 \ E2, and E1 ¡ E2.4
Given a document D = (V; Ed; r; ¸), the semantics of an XPath-algebra expression
E on D, denoted E(D), is always a binary relation over V . The semantics for each
operation is given in Table 2.</p>
      <p>The XPath-algebra semantics reflects a “global” perspective of expressions being
evaluated on an entire document. There is also a “local” semantic perspective, in which
expressions are viewed as working on a particular node, as follows.</p>
      <p>Definition 3. Let E be an XPath-algebra expression and let D = (V; Ed; r; ¸) be a
document. For m 2 V , E(D)(m) = fn 2 V j (m; n) 2 E(D)g.</p>
      <p>To illustrate the XPath-algebra, the XPath query for retrieving all departments with
websites on the document of Figure 1, =Projects=Department[:=Web], can be
formulated in the XPath-algebra with the expression Projects ¦ # ¦ Department[# ¦ Web]
4 Since we are concerned in this study with reasoning on a fixed document of which the height
is known, it is not necessary to introduce ancestor or descendant axes.</p>
      <p>"(D) = f(m; m) j m 2 V g
;(D) = ;
# (D) = Ed
" (D) = Ed¡1
`(D) = f(m; m) j m 2 V and ¸(m) = `g
E1 [ E2(D) = E1(D) [ E2(D)
E1 \ E2(D) = E1(D) \ E2(D)
E1 ¡ E2(D) = E1(D) ¡ E2(D)
E1 ¦ E2(D) = f(m; n) j 9w : (m; w) 2 E1(D)</p>
      <p>&amp; (w; n) 2 E2(D)g
E1[E2](D) = f(m; n) 2 E1(D)</p>
      <p>j 9w : (n; w) 2 E2(D)g:
evaluated “locally” at the root node n0. As further illustrations of XPath-algebra
expressions, consider:
E1. “Retrieve all department names in projects.”</p>
      <p>Projects ¦ # ¦ Department ¦ # ¦ Name</p>
    </sec>
    <sec id="sec-2">
      <title>E2. “Retrieve the leaders of projects that have websites.”</title>
      <p>Department ¦ # ¦ Project[# ¦ Web] ¦ # ¦ Lead</p>
    </sec>
    <sec id="sec-3">
      <title>E3. “Retrieve the leaders of projects that do not have websites.”</title>
      <p>Department ¦ # ¦ Project ¦ # ¦ Lead ¡ E2
E4. “Retrieve the names of all departments that have a website and a project with a
website.”</p>
      <p>Projects ¦ # ¦ Department[# ¦ Web][# ¦ Project ¦ # ¦ Web] ¦ # ¦ Name
E5. “Retrieve all projects which are sub-projects of projects with a website.”</p>
      <p>Project[" ¦ Project ¦ # ¦ Web]</p>
      <p>Recall that in the “global” semantics of the XPath-algebra, expressions evaluate to
pairs of satisfying nodes. For example, E1(D) = f(n0; n3); (n0; n7)g and E5(D) =
f(n17; n17); (n18; n18)g.
2.2</p>
      <p>A(k) Indexes
Kaushik et al. have proposed the family of A(k)-indexes for graph-structured data,
based on a notion of node bisimilarity [12]. An A(k)-index is built on a partitioning
of the nodes of a semi-structured document. The definition of the equivalence relation
´A(k), specialized to XML documents, which induces this partition is as follows.
Definition 4. Let D = (V; Ed; r; ¸) be a document, n1; n2 2 V , and k 2 N. We say
n1 and n2 are A(k)-equivalent in D, denoted n1 ´A(k) n2, if
1. ¸(n1) = ¸(n2); and
2. if k ¸ 1 then
(a) n1 has a parent in D if and only if n2 has a parent in D; and
(b) if n1 has parent p1 and n2 has parent p2, then p1 ´A(k¡1) p2.
n1 ´A(k+1) n2, then n1 ´A(k) n2.</p>
      <p>A(k) equivalence defines partition blocks on the nodes n 2 V of a document D =
(V; Ed; r; ¸),</p>
      <p>[n]k = fn0 j n0 2 V &amp; n ´A(k) n0g
and the full A(k)-index is built directly on the collection of these partition blocks,
D=A(k) = f[n]k j n 2 V g: We refer to this partition as simply the A(k)-index.</p>
      <p>Following Kaushik et al. [12], a full A(k)-index can be visualized as a graph wherein
each node represents a partition block and an edge exists from a node A to a node B
if an edge existed in the original document from a data node in A to a data node in
B. This construction is illustrated in Figure 2 on the Design Department subtree of the
document of Figure 1.</p>
      <p>We observe the following basic property of A(k) equivalence, which follows by a
simple induction on k.</p>
      <p>Proposition 1. Let D = (V; Ed; r; ¸) be a document, let k 2 N, and let n1, n2 2 V . If</p>
      <p>At first sight, we have defined an infinite class of equivalence relations on D.
However, there is no point to consider A(k) equivalence beyond k = height(D), as can
also be seen by an inductive argument.</p>
      <p>Proposition 2. Let D = (V; Ed; r; ¸) be a document, and let k ¸ height(D). Then
A(k) equivalence coincides with A(height(D)) equivalence.</p>
      <p>We now extend A(k) equivalence as follows to capture upward paths to the root in
a document.</p>
      <p>
        Definition 5. Let D = (V; Ed; r; ¸) be a document and n1; n2 2 V . We say n1 and n2
are A(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )-equivalent in D, denoted n1 ´A(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) n2, if
1. ¸(n1) = ¸(n2); and
2. n1 has a parent in D if and only if n2 has a parent in D; and
3. if n1 has parent p1 and n2 has parent p2, then p1 ´A(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) p2.
      </p>
      <p>Not surprisingly, we have the following.</p>
      <p>
        Proposition 3. Let D = (V; Ed; r; ¸) be a document. Then A(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) equivalence
coincides with A(height(D)) equivalence.
      </p>
      <p>
        Note that A(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) equivalence on nodes induces the 1-index proposed by Milo and
Suciu [14] and the strong DataGuide of Goldman and Widom [7].
2.3 P (k) Indexes
We next generalize A(k) equivalence (Definition 4) to pairs of nodes.
      </p>
      <p>Definition 6. Let D = (V; Ed; r; ¸) be a document, k 2 N, let m1, n1, m2, n2 2 V
such that m1 is an ancestor of n1 and m2 is an ancestor of n2. Then, (n1; m1) and
(n2; m2) are P (k) equivalent, denoted (n1; m1) ´P (k) (n2; m2), if
1. distance(n1; m1) = distance(n2; m2); and
2. n1 ´A(k) n2.</p>
      <p>Note that P (k) equivalence on pairs of nodes induces a “localized” version of the
2index proposed by Milo and Suciu [14], just as A(k) equivalence on nodes induces a
“localized” version of the 1-index [12].</p>
      <p>Consider the subtree D0 in the document of Figure 1 rooted at n4. As an illustration
of the P (k)-index (i.e., the partition set D=P (k) induced by P (k) equivalence on a
document D) the P (0)-index of D0 is:</p>
      <p>D0=P (0) = f[(n10; n10); (n19; n19)];
[(n10; n4); (n19; n9)]; [(n19; n4)];
[(n4; n4); (n9; n9)]; [(n9; n4)];
[(n11; n11); (n20; n20)];
[(n11; n4); (n20; n9)]; [(n20; n4)]g</p>
      <p>Properties 1 and 2 of A(k) equivalence can easily be bootstrapped to properties of
P (k) equivalence.</p>
      <p>Proposition 4. Let D = (V; Ed; r; ¸) be a document, let k 2 N and let m1, n1,
m2, n2 2 V such that m1 is an ancestor of n1 and m2 is an ancestor of n2. If
(n1; m1) ´P (k+1) (n2; m2) then (n1; m1) ´P (k) (n2; m2)
Proposition 5. Let D = (V; Ed; r; ¸) be a document, and let k ¸ height(D). Then
P (k) equivalence coincides with P (height(D)) equivalence.</p>
      <p>As with A(k) equivalence, we extend P (k) as follows to capture upward paths to
the root in a document.</p>
      <p>
        Definition 7. Let D = (V; Ed; r; ¸) be a document and let m1, n1, m2, n2 2 V such
that m1 is an ancestor of n1 and m2 is an ancestor of n2. Then, (n1; m1) and (n2; m2)
are P (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) equivalent, denoted (n1; m1) ´P(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) (n2; m2), if
1. distance(n1; m1) = distance(n2; m2); and
2. n1 ´A(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) n2.
      </p>
      <p>
        Proposition 3 for A(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) equivalence can be bootstrapped to P (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) equivalence in a
straightforward manner.
      </p>
      <p>
        Proposition 6. Let D = (V; Ed; r; ¸) be a document. Then P (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) equivalence
coincides with P (height(D)) equivalence.
      </p>
      <p>
        Note that P (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) equivalence induces the 2-index proposed by Milo and Suciu [14].
      </p>
      <p>As a closing remark, we observe from the definitions of A(k) and P (k) indices that,
given an A(k) index on a document D, it is straightforward to derive the corresponding
P (k) index for D, and vice versa.</p>
      <p>Algebraic Characterizations of the A(k) and P (k) Indices
In this section, we give characterizations of the A(k) and P (k) family of indices based
on notions of indistinguishability of nodes and pairs of nodes in fragments of the
XPathalgebra. The notion of language indistinguishability on nodes is made precise in the
following definition, following Gyssens et al. [8].</p>
      <p>Definition 8. Let D = (V; Ed; r; ¸) be a document, m1; m2 2 V , and F a fragment
of the XPath-algebra. We say m1 and m2 are indistinguishable by F , denoted m1 ´F
m2, if for any expression E in F , it is the case that E(D)(m1) = ; if and only if
E(D)(m2) = ;.</p>
      <p>Language indistinguishability on pairs of nodes is defined as follows.</p>
      <p>Definition 9. Let D = (V; Ed; r; ¸) be a document, n1; m1; n2; m2 2 V , and F a
fragment of the XPath-algebra. We say (n1; m1) and (n2; m2) are indistinguishable by
F , denoted (n1; m1) ´F (n2; m2), if for any expression E in F , it is the case that
(n1; m1) 2 E(D) if and only if (n2; m2) 2 E(D).</p>
      <p>Next, we introduce a family of algebras which we use to precisely characterize the
A(k) and P (k) indices.</p>
      <p>Definition 10. We recursively define the upward-k XPath algebras, U (k) for each k 2
N, as follows.
1. U (0) = the set of XPath-algebra expressions without occurrences of the “#” and
“"” operators.
2. For k ¸ 1,
(a) if E 2 U (k ¡ 1), then E 2 U (k);
(b) " 2 U (k);
(c) if E1 2 U (k) and E2 2 U (k), then</p>
      <p>– E1 ? E2 2 U (k), for ? = [; \; ¡;
(d) if E1 2 U (k1) and E2 2 U (k2), and k1 + k2 · k, then</p>
      <p>– E1 ¦ E2 2 U (k) and E1[E2] 2 U (k); and
3. No other expressions are in U (k).</p>
      <p>
        As a very simple example of U (k) expressions, note that " ¦ " ¦ " is in U (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) but not
in U (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ).
      </p>
      <p>The following useful observation about the U (k) algebras can be shown by a simple
inductive argument.</p>
      <p>Proposition 7. Let D = (V; Ed; r; ¸) be a document, k 2 N, m; n 2 V , and E 2
U (k). If (n; m) 2 E(D), then m is an ancestor of n such that distance(n; m) · k.</p>
      <p>The U (k) algebras are extended as follows to an algebra with unrestricted use of
the “"” primitive.</p>
      <p>
        Definition 11. The upward XPath algebra U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) is the set of all XPath-algebra
expressions without occurrences of the # primitive.
      </p>
      <p>
        In other words, U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) = Sk2N U (k).
      </p>
      <p>Compositions in which the number of nested “"” primitives is strictly greater than
height(D) must return the empty set, and are therefore not very meaningful. The
following result should therefore not come as a surprise.</p>
      <p>
        Proposition 8. The U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and U (height(D)) algebras are equivalent in expressive
power.
      </p>
      <p>
        In particular, indistinguishability by U (height(D)) and indistinguishability by
U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) coincide.
3.1
      </p>
      <p>
        Structural Characterizations of U (k) and U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Indistinguishability
In this section, we provide precise structural characterizations of indistinguishability
in the upward algebras. In particular, for each k &gt; 0, we show that on a fixed
document, indistinguishability of nodes by the U (k) algebra corresponds precisely with
A(k) equivalence of nodes, and indistinguishability of node pairs by the U (k) algebra
corresponds precisely with P (k) equivalence. We also extend these results to the full
upward algebra U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ). We follow the methodology of Gyssens et al. [8] to establish
these correspondences.
      </p>
      <p>The following facts are crucial in the sequel.</p>
      <p>Lemma 1. Let D = (V; Ed; r; ¸) be a document, k 2 N, and n1, m1, n2 2 V such
that m1 is an ancestor of n1 and distance(n1; m1) · k. If n1 ´A(k) n2, then there
exists m2 2 V such that m2 is an ancestor of n2 and (n1; m1) ´P (distance(n1;m1))
(n2; m2).5 Furthermore, m1 ´A(k¡distance(n1;m1)) m2.</p>
      <p>Proof. By induction on k. For the base case, k = 0, clearly m1 = n1 and ¸(n1) =
¸(n2). The statement holds for m2 = n2.</p>
      <p>
        For k ¸ 1, we can assume that the statement holds for k¡1. If n1 ´A(k) n2, then
either (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) both n1 and n2 have no parents, or (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) they both have parents p1 and p2,
respectively, such that p1 ´A(k¡1) p2 (by definition of A(k) equivalence). In case (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), clearly
m1 = n1 and the statement holds for m2 = n2. In case (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), distance(p1; m1) · k ¡1,
and by the definition of A(k) equivalence, p1 ´A(k¡1) p2. By the induction hypothesis,
there exists an ancestor m2 of p2 such that (p1; m1) ´P (distance(p1;m2)) (p2; m2) and
m1 ´A(k¡1¡distance(p1;m1)) m2. It readily follows that (n1; m1) ´P (distance(n1;m1))
(n2; m2) and m1 ´A(k¡distance(n1;m1)) m2.
      </p>
      <p>Proposition 9. Let D = (V; Ed; r; ¸) be a document, k 2 N, E 2 U (k), and n1; m1;
n2; m2 2 V such that m1 is an ancestor of n1 and m2 is an ancestor of n2, and
(n1; m1) ´P (k) (n2; m2). If (n1; m1) 2 E(D), then (n2; m2) 2 E(D), and vice
versa.</p>
      <p>Proof. First observe that it follows from E 2 U (k) and (n1; m1) 2 E(D) that
distance(n1; m1) · k, by Proposition 7.
5 And even stronger, (n1; m1) ´P (k) (n2; m2).</p>
      <p>The proof is by induction on k. The base case, k = 0, follows straightforwardly
from the definition of P (0) equivalence and a simple structural induction on expressions
in U (0). Now assume that k ¸ 1, and that the statement holds for 0; 1; 2; : : : ; k ¡ 1.
The proof goes by structural induction on expressions in U (k). Thus, let E 2 U (k).
– E 2 U (k ¡ 1). The statement holds by the induction hypothesis.
– E =". If (n1; m1) 2" (D), then m1 is the parent of n1. Since (n1; m1) ´P (k)
(n2; m2), it follows in particular that m2 is the parent of n2. We conclude that
(n2; m2) 2" (D).
– E = E1 [ E2, for E1 and E2 2 U (k). Suppose (n1; m1) 2 E(D). Then
(n1; m1) 2 E1(D) or (n1; m1) 2 E2(D). Without loss of generality, assume
(n1; m1) 2 E1(D). Then by structural induction, (n2; m2) 2 E1(D), and we
conclude (n2; m2) 2 E(D).
– E = E1 \ E2 or E = E1 ¡ E2, for E1 and E2 2 U (k). Similar to the previous
case.
– E = E1 ¦ E2, for E1 2 U (k1) and E2 2 U (k2), such that k1 + k2 · k.
Suppose (n1; m1) 2 E(D). Then there exists a node w1 2 V such that (n1; w1) 2
E1(D) and (w1; m1) 2 E2(D). By Lemma 7, distance(n1; w1) · k1 and
distance(w1; m1) · k2. By Lemma 1, there exists a node w2 2 V such that
(n1; w1) ´P (distance(n1;w1)) (n2; w2), and w1 ´A(k¡distance(n1;w1)) w2. Since,
k2 · k ¡ distance(n1; w1), by Lemma 1, a node m0 2 V exists with (w1; m1)
´P (distance(w1;m1)) (w2; m0). Since (n1; w1) ´P (distance(n1;w1)) (n2; w2), and
(w1; m1) ´P (distance(w1;m1)) (w2; m0), we know, by the definition of ´P (k1) and
´P (k2), that distance(n2; w2) = distance(n1; w1) and distance(w2; m0) =
distance(w1; m1). Consequently, distance(n2; m0) = distance(n1; m1), and
since m0 is the unique ancestor at this distance, we conclude that m0 = m2. Thus
(w1; m1) ´P (k2) (w2; m2). By the induction hypothesis, we can conclude that
(n2; w2) 2 E1(D) and (w2; m) 2 E2(D) and thus (n2; m2) 2 E(D).
– E = E1[E2], for E1 2 U (k1) and E2 2 U (k2), such that k1 + k2 · k. Similar to
the previous case.</p>
      <p>An important corollary of Proposition 9 is the following observation.</p>
      <p>Proposition 10. Let E 2 U (k), k 2 N, and let D = (V; Ed; r; ¸) be a document.
Then E(D) is the union of some partition blocks of the P (k)-index.</p>
      <p>We are now prepared to establish the main results of this section.</p>
      <p>Theorem 1. Let D = (V; Ed; r; ¸) be a document, n1; n2 2 V , and k 2 N. Then
n1 ´U (k) n2 if and only if n1 ´A(k) n2.</p>
      <p>Proof. (If) Suppose n1 ´A(k) n2 and let E 2 U (k) such that E(D)(n1) 6= ;. Hence,
there exists m1 2 V such that (n1; m1) 2 E(D). Then, by Lemma 1 and Proposition 9,
there exists m2 2 V such that (n2; m2) 2 E(D), and therefore E(D)(n2) 6= ;. It
follows symmetrically that if E(D)(n2) 6= ;, then E(D)(n1) 6= ;. We conclude that
n1 ´U (k) n2.</p>
      <p>(Only if) For the converse, assume that n1 ´U (k) n2. We first establish two facts.</p>
      <p>Following a similar argument, we have the following characterization of U (k)
indistinguishability on node pairs.</p>
      <p>Theorem 2. Let D = (V; Ed; r; ¸) be a document, n1; m1; n2; m2 2 V , and k 2 N.
Then (n1; m1) ´U (k) (n2; m2) if and only if (n1; m1) ´P (k) (n2; m2).</p>
      <p>
        We remind the reader that indistinguishability by U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and indistinguishability
by U (height(D)) coincide (Proposition 8). Furthermore, by Proposition 3, A(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
equivalence coincides with A(height(D)) equivalence. Therefore, the result below
immediately follows.
      </p>
      <p>
        Theorem 3. Let D = (V; Ed; r; ¸) be a document and n1; n2 2 V . Then n1 ´U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
n2 if and only if n1 ´A(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) n2.
      </p>
      <p>
        Similarly, by Proposition 6, P (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) equivalence coincides with P (height(D))
equivalence. Therefore, the result below immediately follows.
      </p>
      <p>
        Theorem 4. Let D = (V; Ed; r; ¸) be a document and n1, m1, n1, n2 2 V . Then
(n1; m1) ´U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) (n2; m2) if and only if (n1; m1) ´P(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) (n2; m2).
4
      </p>
      <p>
        XPath Query Evaluation with A(k) and P (k) Indices
In this section, we consider strategies to evaluate XPath expressions, leveraging the
results and characterizations of the A(k) and P (k) indices (Sections 3). First, we discuss
how A(k) and P (k) indices can be used in the evaluation of upward XPath
expressions, as expressed in the U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and U (k) algebras. Then, we focus on the evaluation
of XPath algebra query expressions that can be expressed in certain downward XPath
algebras – namely, the downward algebra D (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and the downward-k-algebras D (k)
(for each k 2 N). These algebras are defined just like the upward algebras except that
one is restricted to using # operations instead of " operations.
Our analysis of the A(k) and P (k) indices shows that they are the ideal index
structures for evaluation of U (k) expressions. Indeed, consider an expression E 2 U (k), a
document D, and its P (k)-index. Proposition 10 states that the evaluation of E on D,
E(D), can be evaluated as the union of certain P (k) partition blocks. Now, since A(k)
and P (k) indices are inter-derivable, it is clear that an A(k)-index on D can also be
used effectively in the evaluation of E(D).
      </p>
      <p>
        Actually, any U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) expression can also make effective use of an existing A(k)
or P (k) index. For example, consider the expression E =" ¦ " ¦ " ¦ " and suppose
that we have only the P (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )-index available. Since E 2 U (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ), in general E(D) will
not necessarily be a union of some of the partition blocks of P (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ). However, if we
decompose E as E1 ¦ E2, where E1 =" ¦ " and E2 =" ¦ ", then E(D) = E1(D) 1
E2(D), and since E1 and E2 are in U (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), they can obviously be evaluated directly with
the existing index as discussed above.
As shown in XPath use cases and various XML benchmarks, over ninety percent of
XPath queries used in real world applications use navigation just along the parent-child
axis, rather than mixing the parent-child and child-parent axes. These queries can be
naturally expressed in the D (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and D (k) algebras.
      </p>
      <p>
        A natural way to evaluate D (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) expressions is to perform top-down navigation.
We demonstrate that such expressions can also be evaluated using path decomposition
and A(k) and P (k) indices.
      </p>
      <p>The main idea behind turning top-down evaluation into bottom-up evaluation is
to convert a downward expression, or certain of its sub-expressions, into an upward
expression using a technique that we will refer to as “inverting expressions.” We first
illustrate this technique on predicate-free downward expressions, and then consider general
downward expressions.</p>
      <p>Downward Expressions without Predicates In general, predicate-free expressions
in the downward algebras can be “inverted” into predication-free expressions in the
corresponding upward algebras using the rewrite rules shown in Table 3.</p>
      <p>
        As discussed above, an upward algebraic expression in U (k) can be evaluated by
P (k)-index lookup and set unions. Therefore, a predicate-free downward algebraic
expression in D (k) can be evaluated by inversion, P (k)-index lookup, and set unions.
Downward Expressions with Predicates Now consider the evaluation of downward
algebra expressions wherein predicate operations occur. A simple example is the
expression E2 =# [#]: Applied to a document, E2 evaluates to the document’s
parentchild pairs for children that have at least one child itself. As in Section 4.2, to evaluate
E2 on a document D, we could consider the concept of inverting E2 into an
expression E2¡1 2 U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) such that E2(D) = (E2¡1(D))¡1. Unfortunately, this approach
does not work here. In fact, we can construct a document D2 such for each expression
F 2 U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), E2(D2) 6= F (D)¡1.
      </p>
      <p>
        Clearly E2 is equivalent to the XPath-algebra expression # ¦ # ¦ ". Notice that
this expression is neither a downward nor an upward expression. However its
subexpression G1 =# ¦ # is in D (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) and its sub-expression G2 =" is in U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ). Applying
the inversion technique described in Section 4.2 to G1, the evaluation E2(D) on a
document D, can be accomplished by computing the relation (G1¡1(D))¡1 ./ G2(D). And,
as indicated in this section, the evaluations of G1¡1(D) =" ¦ " (D) and G2(D) =" (D)
can be done efficiently using the P (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) and P (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) indices of D, respectively.
      </p>
      <p>Given that the selectivity of a longer path is no larger than that of short sub-paths of
the path, evaluating G1 reduces the search space to the minimum that can be obtained
on such a chain expression. Starting from any given node, upward navigation in an
XML data tree, unlike downward ones, has one and only one route to follow, which
is to reach its parent. Therefore, it is reasonable to claim that the result of G1¡1(D) is
substantially smaller than that of G2(D), and the ./ operation can be further optimized
by G1¡1(D) followed by an upward navigation.</p>
      <p>
        We will now consider a slightly more complicated downward expression E3 2
D (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) which retrieves information about leaders of projects that have a web site:
      </p>
      <p>E3 = Department ¦ # ¦ Project[# ¦ Web] ¦ # ¦ Lead:
E3 can be represented as an expression pattern tree, as illustrated in Figure 3(a). The
shaded node can be interpreted as the “answer” of E3.</p>
      <p>Lead
(a)</p>
      <p>Department</p>
      <p>
        First, assume there exists a P (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )-index (i.e., where k = 2, the height of the
expression). To take full advantage of the P (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )-index is to find the longest sub-expressions
that can be inverted and evaluated directly on it. As shown in Figure 3(b), there are two
natural chains of length 2, G1 and G2, in the pattern tree of E3. There are also natural
chains of length 1, G3, G4, and G5, as shown in Figure 3(c).
      </p>
      <p>Using G1, G2, and G4, the expression E3 is equivalent to the expression H1 defined
as follows:</p>
      <p>
        H1 = ((G1 ¦ ") \ (G2 ¦ ")) ¦ G4;
and therefore, for a document D, E3(D) can be computed as follows:
All sub-expressions in this transformed expression of E3 are in U (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), and hence can
be be directly evaluated using the P (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) partition blocks without any navigation.
      </p>
      <p>
        However, it may not always be the case that P (k)-partitions exist for k larger than
or equal to the height of the query expression. Again, for E3, consider the case when
only a P (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )-index is available. In this case, the longest path expressions that can take
advantage of the index are of length 1. Such expressions are G3, G4 and G5. Using
these subexpressions, E3 is equivalent with the expression H2 defined as follows:
      </p>
      <p>H2 = (((G3 ¦ G4) ¦ ") \ ((G3 ¦ G5) ¦ ")) ¦ G4:</p>
      <p>
        Similar to the generalization from E2 to E3, the decomposition + inversion
techniques can be used to transform any arbitrary expression in the downward algebra
D (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) to a set of predicate-free sub-expressions in D (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) that can be inverted to
corresponding expressions in the upward algebra U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), which in turn can be evaluated
directly by P (k)-index lookups and set unions. The decomposition is partially
determined by k, the degree of local similarity in the P (k)-indices which are available.
Furthermore, this algebraic transformation only provides guidelines for evaluating each
such expression. The optimal physical plan to evaluate an XPath query depends on other
factors, such as the cardinality of the intermediate results of each sub-expression, and
the cost model of the join operations.
5
      </p>
      <p>
        Future Directions
It is clear that many aspects of the query evaluation guidelines presented in the previous
section warrant further investigation. We close by listing a few of the more interesting
directions for research. First, a more thorough study of query decomposition and
inversion algorithms is crucial. Second, it is interesting to develop approaches to (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) quickly
identify which partition blocks correspond to an arbitrary given input expression and
document, and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) to use this identification to efficiently obtain the full query result. We
have already taken steps in this direction, by obtaining “labeling” expressions which
have the property that each of these evaluates to a unique partition block. Due to space
limitations, we were unable to incorporate these results in the paper. However, in [6]
(Section 4), we have identified these labeling expressions for U (()k) and U (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).
Finally, the results of this paper are applicable in workload driven scenarios [9, 15, 17],
such as determining which indices to materialize and which to derive, as a function of
the characteristics of an observed workload.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Shurug</given-names>
            <surname>Al-Khalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jignesh M. Patel</surname>
            , Yuqing Wu, Nick Koudas, and
            <given-names>Divesh</given-names>
          </string-name>
          <string-name>
            <surname>Srivastava</surname>
          </string-name>
          .
          <article-title>Structural joins: A primitive for efficient XML query pattern matching</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Paoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.M.</given-names>
            <surname>Sperberg-McQueen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Francois</given-names>
            <surname>Yergeau</surname>
          </string-name>
          .
          <article-title>Etensible Markup Language (XML) 1.0 (third edition) - W3C recommendation</article-title>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Bruno</surname>
          </string-name>
          , Nick Koudas, and Divesh Srivastava:.
          <article-title>Holistic twig joins: optimal XML pattern matching</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chamberlin</surname>
          </string-name>
          et al.
          <source>XQuery 1</source>
          .0:
          <string-name>
            <surname>An</surname>
            <given-names>XML</given-names>
          </string-name>
          <article-title>query language</article-title>
          ,
          <source>W3C</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          and
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>DeRose (eds</article-title>
          .).
          <article-title>XML path language (XPath) version 1.0</article-title>
          . http://www.w3.org/TR/XPATH.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>George</surname>
            <given-names>H. L.</given-names>
          </string-name>
          <string-name>
            <surname>Fletcher</surname>
          </string-name>
          , Dirk Van Gucht,
          <string-name>
            <surname>Yuqing Wu</surname>
            ,
            <given-names>Marc</given-names>
          </string-name>
          <string-name>
            <surname>Gyssens</surname>
            , and
            <given-names>Jan</given-names>
          </string-name>
          <string-name>
            <surname>Paredaens</surname>
          </string-name>
          .
          <article-title>An XPath Algebraic Characterization of A(k) and P(k) Indices with Applications to Query Processing</article-title>
          .
          <source>Indiana University Technical Report No. 639</source>
          ,
          <year>October 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Roy</given-names>
            <surname>Goldman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Widom</surname>
          </string-name>
          . Dataguides:
          <article-title>Enabling query formulation and optimization in semistructured databases</article-title>
          .
          <source>In VLDB</source>
          , pages
          <fpage>436</fpage>
          -
          <lpage>445</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Marc</given-names>
            <surname>Gyssens</surname>
          </string-name>
          , Jan Paredaens, Dirk Van Gucht, and
          <string-name>
            <surname>George</surname>
            <given-names>H. L.</given-names>
          </string-name>
          <string-name>
            <surname>Fletcher</surname>
          </string-name>
          .
          <article-title>Structural Characterizations of the Semantics of XPath as Navigation Tool on a Document</article-title>
          .
          <source>In ACM PODS</source>
          , pages
          <fpage>318</fpage>
          -
          <lpage>327</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Hao</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Multiresolution indexing of XML for frequent queries</article-title>
          .
          <source>In IEEE ICDE</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Raghav</surname>
            <given-names>Kaushik</given-names>
          </string-name>
          , Philip Bohannon,
          <string-name>
            <given-names>Jeffrey F.</given-names>
            <surname>Naughton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Henry F.</given-names>
            <surname>Korth</surname>
          </string-name>
          .
          <article-title>Covering indexes for branching path queries</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Raghav</surname>
            <given-names>Kaushik</given-names>
          </string-name>
          , Rajasekar Krishnamurthy,
          <string-name>
            <given-names>Jeffrey F.</given-names>
            <surname>Naughton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Raghu</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          .
          <article-title>On the integration of structure indexes and inverted lists</article-title>
          .
          <source>In ACM SIGMOD</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Raghav</surname>
            <given-names>Kaushik</given-names>
          </string-name>
          , Pradeep Shenoy, Philip Bohannon, and
          <string-name>
            <given-names>Ehud</given-names>
            <surname>Gudes</surname>
          </string-name>
          .
          <article-title>Exploiting local similarity for efficient indexing of paths in graph structured data</article-title>
          .
          <source>In IEEE ICDE</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Koch</surname>
          </string-name>
          .
          <article-title>Processing queries on tree-structured data efficiently</article-title>
          .
          <source>In ACM PODS</source>
          , pages
          <fpage>213</fpage>
          -
          <lpage>224</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Tova</given-names>
            <surname>Milo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <article-title>Index structures for path expressions</article-title>
          .
          <source>In ICDT</source>
          , pages
          <fpage>277</fpage>
          -
          <lpage>295</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Jun-Ki</surname>
            <given-names>Min</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chin-Wan Chung</surname>
            , and
            <given-names>Kyuseok</given-names>
          </string-name>
          <string-name>
            <surname>Shim</surname>
          </string-name>
          .
          <article-title>An Adaptive Path Index for XML Data using the Query Workload</article-title>
          .
          <source>Inf. Systems</source>
          ,
          <volume>30</volume>
          (
          <issue>6</issue>
          ):
          <fpage>467</fpage>
          -
          <lpage>487</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Mirella</given-names>
            <surname>Moura</surname>
          </string-name>
          <string-name>
            <surname>Moro</surname>
          </string-name>
          , Zografoula Vagena, and
          <string-name>
            <given-names>Vassilis J.</given-names>
            <surname>Tsotras</surname>
          </string-name>
          .
          <article-title>Tree-pattern queries on a lightweight XML processor</article-title>
          .
          <source>In VLDB</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Chen</surname>
            <given-names>Qun</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Lim</surname>
          </string-name>
          , and
          <article-title>Kian Win Ong. D(k)-index: An adaptive structural summary for graph-structured data</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Prakash</given-names>
            <surname>Ramanan</surname>
          </string-name>
          .
          <article-title>Covering indexes for XML queries: Bisimulation - simulation = negation</article-title>
          .
          <source>In VLDB</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Kanda</surname>
            <given-names>Runapongsa</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jignesh M. Patel</surname>
            , Rajesh Bordawekar, and
            <given-names>Sriram</given-names>
          </string-name>
          <string-name>
            <surname>Padmanabhan</surname>
          </string-name>
          .
          <article-title>XIST: An XML index selection tool</article-title>
          . In XSym, pages
          <fpage>219</fpage>
          -
          <lpage>234</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Ke</surname>
            <given-names>Yi</given-names>
          </string-name>
          , Hao He,
          <string-name>
            <surname>Ioana Stanoi</surname>
            , and
            <given-names>Jun</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Incremental maintenence of XML structural indexes</article-title>
          .
          <source>In ACM SIGMOD</source>
          , pages
          <fpage>491</fpage>
          -
          <lpage>502</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Chun</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Jeffrey F. Naughton,
          <string-name>
            <given-names>David J. DeWitt</given-names>
            , Qiong Luo, and
            <surname>Guy</surname>
          </string-name>
          <string-name>
            <given-names>M.</given-names>
            <surname>Lohman</surname>
          </string-name>
          .
          <article-title>On supporting containment queries in relational database management systems</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>