<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Cost-based Query Optimization in Native XML Database Management Systems ∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>°c Andreas M. Weiner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Theo Ha¨rder</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Databases and Information System Group Department of Computer Science University of Kaiserslautern P.</institution>
          <addr-line>O. Box 3049 D-67653 Kaiserslautern</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Proceedings of the Spring Young Researcher's Colloquium on Database and Information Systems</institution>
          ,
          <addr-line>Saint-Petersburg, Russia, 2008</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>∗ This work has been supported by the Rheinland-Pfalz cluster of excellence “Dependable adaptive systems and mathematical modelling” (see http://www.dasmod.de). documents. In recent years, the database community suggested many Path Processing Operators (PPOs) for efficiently evaluating structural relationships-such as parent/child or ancestor/descendant as defined by twig query patterns: Definition 1 (Twig query pattern) A twig query pattern (TQP) is a tree QT = (V, E, λ, r) with a set of vertices V , a set of edges E ⊆ V × V , a mapping λ : E → {child , descendant }, and the root of the tree r. Every TQP contains at least one output node, i. e., one or more vertices are part of the query result.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In the last few years, XML became a de-facto
standard for the exchange of structured and
semi-structured data. The database research
community took this development into account
by proposing native XML database
management systems for efficient and transactional
management of XML documents. One of the
most important factors for success of such
systems is a powerful query optimizer. Many
researchers proposed sophisticated Structural
Join and Holistic Twig Join algorithms as well
as several index structures supporting the
evaluation of twig query patterns. Even though
almost all XML query evaluation approaches
proposed so far use some of these methods, we
believe that they provide no sufficient input for
real-world cost-based query optimization
scenarios, because they only cover a small part of
the overall query evaluation process. To provide
adequate input for a cost-based XML query
optimizer, we propose the XML Query Graph
Model as a new internal representation enabling
a smooth transition between XQuery language
level and physical algebra operators.
Furthermore, we introduce a set of rewrite rules for
improving the execution of twig queries, e. g.,
by fusing two adjacent binary join operators to
a complex n-way join operator. By presenting
further rewrite rules, we make the most of
existing joins and indexes—even before query
transformation. Using these concepts, we are ready
to sketch its integration into our upcoming
costbased XML query optimizer.</p>
      <p>
        The algorithms for PPOs can be further partitioned into
two classes: Structural Joins (SJs) [
        <xref ref-type="bibr" rid="ref1 ref23 ref33">1, 32, 22</xref>
        ] and
Holistic Twig Joins (HTJs) [
        <xref ref-type="bibr" rid="ref15 ref5">5, 15</xref>
        ]. SJ algorithms decompose a
TQP into binary structural relationships, evaluate each of
those relationships separately, and finally “stitch” the
results together. In contrast, HTJ algorithms evaluate TQPs
as a whole.
      </p>
      <p>
        PPOs were optimized by various kinds of index
structures enabling fast access to element, attribute, and text
nodes. They even allow to obtain answers for complete
TQPs. Basically, the XML indexing algorithms
proposed so far can be partitioned into four classes: path
indexes, element indexes, content indexes, and hybrid
indexes. Path indexes [
        <xref ref-type="bibr" rid="ref10 ref17 ref26 ref9">25, 10, 9, 17</xref>
        ] are using
structural summaries such as Dataguides [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for providing
efficient access to nodes satisfying structural
relationships like parent/child or ancestor/descendant.
Element indexes [
        <xref ref-type="bibr" rid="ref15 ref5 ref8">5, 8, 15</xref>
        ], which are indexing element
nodes, serve for efficient input to SJ and HTJ operators.
Content indexes [
        <xref ref-type="bibr" rid="ref20 ref24">23, 19</xref>
        ] provide efficient access to text
or attribute value nodes. They can be implemented very
efficiently using B⋆-trees or inverted lists. Finally, hybrid
indexes [
        <xref ref-type="bibr" rid="ref16 ref18 ref31 ref34">30, 33, 16, 18</xref>
        ], which are also called
contentand-structure (CAS) indexes, are a promising approach
for indexing content and structure at a time. CAS indexes
can contribute in a cost-effective way to the evaluation of
components of—or even complete—TQPs. Therefore,
they are a challenging competitor for SJ/HTJ algorithms.
      </p>
      <p>
        Besides the various indexing approaches, there exist
three different classes of XML algebras that allow for an
algebraic optimization of XML queries: tree-based
algebras such as TAX [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or TLC [
        <xref ref-type="bibr" rid="ref28">27</xref>
        ], tuple-based algebras
such as the Natix Algebra (NAL) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or NALSTJ [
        <xref ref-type="bibr" rid="ref22">21</xref>
        ], and
finally hybrid approaches like the proposal of Re et al.
[
        <xref ref-type="bibr" rid="ref30">29</xref>
        ].
      </p>
      <p>Whereas the database community has been doing
research on query optimization of relational queries for 30
years, the field of XML query optimization is still in its
infancy. We have learned from the experience on
relational query optimization, that cost-based approaches
outperform heuristic-based approaches in most cases.
We believe that cost-based query optimization is also the
method of choice for XML query languages like XQuery.
To reach this goal, we have to define an overall process
for query evaluation that considers all stages of the query
evaluation lifecycle beginning with the translation of the
query and ending with the provision of the final query
result.
1.1</p>
    </sec>
    <sec id="sec-2">
      <title>The Query Evaluation Process</title>
      <p>
        Figure 1 shows the three stages of the overall query
evaluation process: analysis, optimization, and code
generation [
        <xref ref-type="bibr" rid="ref27">26</xref>
        ]. During the analysis stage, the query is checked
for syntactical and semantical correctness. These checks
are followed by a normalization phase, where
semantically equivalent queries are mapped to a common normal
form expression. The last step of this stage is formed by
a simplification process that removes redundant parts of
the query. The result of the first stage is delivered as a
Query Graph (QG) which is equivalent to a logical
algebra expression. During query rewrite, an algebraic
optimization of the QG is performed by transforming it into
a semantically equivalent structure which can be
evaluated more efficiently than the initial expression. In the
query transformation step, a rewritten QG is mapped to
a Query Execution Plan (QEP) (physical algebra
expression) using a heuristics-based or a cost-based plan
generation process. The third stage, which is responsible for
providing the query result, is executing the plan, either
by direct interpretation or by translating it first to an
executable module.
1.2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation Strategies for XML Queries</title>
      <p>In general, there are two strategies for evaluating
XQuery/XPath expressions: node-at-a-time and
set-ata-time processing.</p>
      <p>Node-at-a-time evaluation is inherent to the XQuery
Core Language and follows a nested-loops-style
evaluation approach which is similar to sub-selects in SQL.
Even though it is not very efficient in most cases, it can
be beneficial in low-selectivity scenarios. Figure 2(a)
shows how a simple XPath expression a//b/c is
evaluated using this strategy: For every a node, the
evaluation context for the evaluation of //b is provided (dashed
line). By iterating over all qualified b nodes the
evaluation context for /c is furnished. Finally, every qualified
c node is output.</p>
      <p>On the other hand, set-at-a-time query evaluation is
similar to relational merge joins. It is employed by
almost all SJ and HTJ algorithms and is in most cases very
efficient. Unfortunately, its employment is not always
possible, because SJ and HTJ operators provide only
limited support for XPath axes (in most cases only child
and descendant axes are supported). Figure 2(b) shows
how the XPath expression a//b/c is evaluated using this
approach: First, the structural predicate // is evaluated
between all a and b nodes. Afterwards, the result of
the first join operator serves together with all c nodes as
input for the second one which evaluates the structural
predicate /.</p>
      <p>
        (a) Node-at-a-time
(b) Set-at-a-time
Today, there is—to the best of our knowledge—no
query processing proposal that completely covers the
query evaluation process described in Section 1.1. At
an abstract level, all proposals can be partitioned into
two classes: advances in algebraic optimization of
XML query languages and sophisticated algorithms for
twig query processing. Algebraic optimization leads
to sophisticated ideas on normalizing and simplifying
XQuery expressions [
        <xref ref-type="bibr" rid="ref13 ref25 ref30 ref4 ref7">7, 29, 13, 24, 4</xref>
        ], which are mostly
driven by the formal semantics of XQuery. On the
other hand, query evaluation primarily focuses on
efficient evaluation algorithms for TQPs such as PPOs and
indexes. As a consequence, even the analysis stage of
the overall query evaluation process is not covered
completely. To provide adequate input for the optimization
stage, this divergence has to be closed by an
appropriate internal representation that is flexible enough to
handle both the node-at-a-time query processing inherent to
XQuery Core expressions and set-at-a-time query
processing performed by most PPOs and index operators.
Additionally, it must support an effective mapping of
logical operators to their physical counterparts.
      </p>
      <p>Using indexes for query evaluation as much as
possible is not only a good heuristics for the relational world.
We believe that an XML query evaluation engine should
also follow this approach. Nevertheless, there exists—to
the best of our knowledge—no approach which reduces
the usage of PPOs for twig query evaluation to a
minimum, provided there exist indexes that can deliver the
results for components of—or even complete—TQPs.</p>
      <p>The effective solution of these problems is
preconditional for providing appropriate input for query
optimization in general and for cost-based optimization scenarios
in particular.
1.4</p>
    </sec>
    <sec id="sec-4">
      <title>Our Contribution</title>
      <p>The contribution of this paper can be outlined as follows:
We introduce the XML Query Graph Model (XQGM) as
a new internal representation which is tailor-made for
XML query languages such as XQuery. By introducing a
new logical n-way join operator which permits the
evaluation of structural and positional predicates, the XQGM
is flexible enough to mediate between node-at-a-time and
set-at-a-time processing approaches as well as for finding
TQPs as early as possible. Using the XQGM, we provide
adequate input for the optimization stage.</p>
      <p>To allow for a simpler detection of TQPs and their
subsequent mapping to QEPs, we propose a set of rewrite
rules which allow to fuse as much as possible adjacent
binary SJ operators to a single complex n-way join
operator.</p>
      <p>To make the most of available indexes, we define
additional rules which permit to decompose n-way join
operators during query transformation, if the query
optimizer recognizes matching index structures. Using these
rules, we can exploit existing indexes as far as possible
and employ join operators only where it is absolutely
necessary.</p>
      <p>Finally, using the concepts proposed before, we
sketch the architecture of our upcoming query optimizer.
1.5</p>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>
        Pirahesh et al. [
        <xref ref-type="bibr" rid="ref29">28</xref>
        ] introduced the Query Graph Model
(QGM) in the context of Starburst as an extensible
internal representation for relational queries. The data model
of the QGM is strongly related to the relational data
model which has to be adjusted to satisfy the needs of
XML query languages. Additionally, their QGM does
not offer any support for the evaluation of structural
relationships expressed as XPath axis steps.
      </p>
      <p>
        Carey et al. [
        <xref ref-type="bibr" rid="ref19">6</xref>
        ] as well as Beyer et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provide
an XML query interface for object-relational database
systems. Compared to our approach, which relies on a
native XML storage rather than an object-relational one,
using their systems makes it very hard to exploit
stateof-the-art evaluation algorithms for TQPs as well as the
plethora of different indexing approaches described in
Section 1.
      </p>
      <p>
        The classic work of McHugh and Widom [
        <xref ref-type="bibr" rid="ref24">23</xref>
        ] on
optimization of XML queries only focuses on
optimizing path expressions using navigational access methods
and lacks support for sophisticated indexing methods like
CAS indexes.
      </p>
      <p>
        Chen et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed General Tree Patterns
(GTPs) as a generalization of TQPs enabling the
representation of a large XQuery subset. Compared to our
proposal, they perform a direct mapping of GTPs to
execution plans. We believe that this approach is not very
flexible, because some XQuery expressions consist of
parts which can be mapped to GTPs and others that
cannot be mapped to such structures. Our approach enables
the evaluation of XQuery expressions having subtrees
corresponding to TQPs and subtrees that do not belong
to this class of query patterns.
      </p>
      <p>
        Hidders et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] are focusing on finding tree
patterns as early as possible during query optimization by
concentrating on XPath expressions having only one
output node. Their rewrite rules work on XQuery Core. The
tree patterns found are directly mapped to the so-called
Tree Pattern Normal Form which is their internal
representation. By using XQuery Core as a baseline, they
loose immediate access to the where clause of a FLWOR
expression which makes the evaluation of value-based
join conditions very hard [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In contrast, our internal
representation can deal with structural, value-based, or
positional predicates.
      </p>
      <p>
        Re et al. [
        <xref ref-type="bibr" rid="ref30">29</xref>
        ] introduce a set of rules for mapping
XQuery expressions to their logical algebra.
Additionally, they introduce a set of rewrite rules which enable
an algebraic optimization. Unfortunately, they lack
support for the evaluation of n-way joins and do not consider
promising indexing approaches like CAS indexes.
      </p>
      <p>
        Michiels et al. [
        <xref ref-type="bibr" rid="ref25">24</xref>
        ] propose a set of rewrite rules for
XQuery expressions which correspond to single XPath
expressions having only one output node. Their rewrites
are based on the formal semantics of XQuery and consist
of a normalization and a simplification step. During
normalization, they map semantically equivalent XQuery
expressions to a common XQuery Core expression.
Simplification helps to remove unnecessary parts of XQuery
Core expressions.
      </p>
      <p>
        Mathis [
        <xref ref-type="bibr" rid="ref22">21</xref>
        ] presents NALSTJ as an extension of the
Natix Algebra [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which introduces an SJ operator as a
logical algebra operator. Additional rules for unnesting
algebra expressions permit set-at-a-time query
processing along with the node-at-a-time processing approach
inherent to the nested version. Our work extends these
ideas by introducing rules for fusing adjacent logical SJ
operators to complex n-way join operators. This idea is
driven by the evidence given in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that, in most cases,
TQPs can be evaluated more efficiently using HTJ rather
than SJ operators.
      </p>
      <p>
        Brantner et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduce a set of rewrite rules
for XQuery expressions. Their approach consists of
two stages: During the normalization stage they
prepare XQuery expressions as input for the second stage.
During the second stage, they try to merge inner and
outer XQuery FLWOR expressions into a single XQuery
FLWOR expression.
2
      </p>
      <sec id="sec-5-1">
        <title>The XML Query Graph Model</title>
        <p>
          The XML Query Graph Model (XQGM) is inspired by
Starburst’s Query Graph Model [
          <xref ref-type="bibr" rid="ref29">28</xref>
          ]. All XQGM graphs
are so-called operator graphs, where the nodes
represent Tuple Sequence Operators (TOs). The data flow of
a query is described by the edges between the nodes.
        </p>
        <p>
          We define a tuple as a mapping from a set of attributes
to a set of values. Every attribute has an assigned type
which corresponds to an XQuery node type or atomic
type, e. g., element or double. A value can be formed
by any atomic value in the value space of atomic types.
All tuples, produced by a TO as output, form a so-called
tuple sequence which is an ordered sequence of tuples. In
contrast to the relational model, we additionally approve
that an attribute value can be a sequence of tuples, too.
We say a tuple is nested, if at least one tuple contains a
tuple sequence as an attribute value. A TO is a generic
object that consumes tuple sequences, transforms them
according to its inherent evaluation strategy, and finally
delivers a tuple sequence as output. The following list
shows the operators we use in the context of the XQGM:
• SELECT: Selects tuples by means of value-based
and positional predicates. Additionally, it supports
the evaluation of aggregate functions.
• JOIN: Evaluates n-way structural joins. It is
introduced after applying query unnesting as described
by Mathis [
          <xref ref-type="bibr" rid="ref22">21</xref>
          ]. There exist various subtypes for the
evaluation of semi-, anti-, and left-outer joins.
• ACCESS: The Node Access (NA) operator accesses
a sequence of element nodes satisfying a given
predicate. In contrast, the Document Access (DA)
operator provides access to the root node of a
document to supply an initial context for query
evaluation.
• GROUP BY: Groups tuple sequences according to a
specific group predicate. It is also accountable for
calculating group-wise aggregate functions.
• UNION, INTSCT, DIFF: These operators calculate
from ordered tuple sequences order-preserving and
duplicate-eliminating union, intersection, and
difference.
• SORT: Performs additional sorting and duplicate
elimination on tuple sequences.
        </p>
        <p>
          All TOs inherit from a Generic Tuple Operator (GTO)
that provides basic functionality needed to create query
graphs. They are permitted to add additional features
such as new predicate types, e. g., for evaluating
structural relationships. Figure 4 shows how the constituents
of the GTO relate to each other [
          <xref ref-type="bibr" rid="ref27">26</xref>
          ].
        </p>
        <p>All TOs except the access operators consist of a Head
and a Body (Figure 5). The Head forms a projection
specification encompassing every attribute (Head Attr.) that
is sent to a consuming operator1.</p>
        <p>An attribute calculation (Attr. Calculation) may call
the ddo function. It has the same semantics as the
fn:distinct-doc-order function introduced by the
formal semantics of XQuery during query normalization.
It sorts the outgoing tuple sequence in document order
and performs duplicate elimination.</p>
        <p>The Body provides the procedural description of how
output tuples can be derived from input tuples using an
inner graph whose nodes are quantified tuple variables
(Tuple Var.). The edges of this inner graph can be
partitioned into three classes: Definition edges connect TOs
with the head of the TO that provides input tuples for the
current TO. Predicate edges describe predicate
expressions defined for the connected tuple variables. Output
edges connect tuple variables, which have to be made
1In our graphical query representation, we hide the head of a TO, if
it contains only one head attribute.
accessible to other TOs, with the head of the TO.</p>
        <p>A correlated edge connects tuple variables with the
body of other TOs, for which the current evaluation
context has to be provided, e. g., for evaluating existential
predicates. Correlated predicate edges are drawn as
dotted lines in the graphical representation. In contrast, all
other edges are illustrated as solid lines.</p>
        <p>
          Throughout the rest of this paper, we use the
following example—Q5 of the XMark benchmark queries
[
          <xref ref-type="bibr" rid="ref32">31</xref>
          ]—to show the application of our rewrite rules:
let $a := doc("auction.xml") return
count(for $i in $a/site/closed_auctions/
closed_auction
where $i/price/text() &gt;= 40
return $i/price)
Figure 3 shows the corresponding query graph built
during the analysis step described in Figure 1. To enable
a set-at-a-time processing rather than a node-at-a-time
processing of the query, the unnesting rules presented by
Mathis [
          <xref ref-type="bibr" rid="ref22">21</xref>
          ]—which we omit due to space restrictions—
are used to introduce a logical structural join operator
to the internal representation. Figure 6 shows the query
graph after unnesting. For the discussion of our rewrite
rules, it will be used as a baseline and serves as input for
the query optimization stage.
After discussing the constituents of a TO, we are now
ready to take an in-detail look at the semantics. A body
of a TO can contain tuple variables with two different
quantifiers: An F -quantified tuple variable works
similar to the for expression of XQuery: For every tuple
provided by an incoming output edge, it sends the tuple to a
connected predicate. In parallel, it can provide the
evaluation context for a nested sub-expression (dotted line).
On the other hand, an L-quantified tuple variable
corresponds to the let expression of XQuery. It provides the
complete tuple sequence associated with it and does not
iterate over it.
        </p>
        <p>As mentioned in Section 2, we provide two
different access operators: Node Access (NA) and Document
Access (DA). A DA operator provides the root node of
a document as a context node that serves as a starting
point for query evaluation. An NA operator supplies a
sequence of all nodes fulfilling the selection predicate,
e. g., a node name. For example, the left NA operator in
Figure 5 provides all price element nodes found in the
document.</p>
        <p>
          The join operator is responsible for the evaluation of
structural joins and is introduced during the unnesting
process as described by Mathis [
          <xref ref-type="bibr" rid="ref22">21</xref>
          ]. A join operator
contains two or more F -quantified tuple variables. A
binary join operator contains exactly two tuple variables
that are connected to a structural predicate (XPath axis).
The operator produces an output for each combination
of incoming tuples (associated with the tuple variables),
if they fulfill the structural predicate. We call a join
operator complex if it contains three or more F -quantified
tuple variables connected to structural predicates which
may be linked to logical operators. We call a join
operator a semi-join operator if at least one tuple variable is
not connected to the head of the operator.
        </p>
        <p>The select operator is accountable for the evaluation
of value-based and positional predicates which can be
assigned to F - or L-quantified tuple variables.
Additionally, a select operator can evaluate functions, e. g.,
fn:count.</p>
        <p>Let us have a look at our running example which is
shown in Figure 6: The left subtree provides a sequence
of closed auction tuples that satisfy the structural
relationships evaluated by the semi-join operators. For
each tuple, the lower select operator provides the
evaluation context for the right subtree which establishes
a sequence of tuples fulfilling the structural predicate
price/text() w. r. t. to the evaluation context. Each
text() tuple is sent to the top-most select operator if it
fulfills the predicate fn:data &gt;= 40. The top-most
select operator applies the fn:count aggregate function to
the tuple sequence and outputs the number of tuples in it.
3</p>
      </sec>
      <sec id="sec-5-2">
        <title>Rules for Query Rewrite</title>
        <p>Compared to query rewrite in relational database
systems, our approach considers the characteristics of
cost-effective PPOs (evaluation of TQPs as a whole)
and indexes (evaluation of components of—or even
complete—TQPs). For example, join operators can be
fused to a complex join operator that can be efficiently
evaluated using an HTJ or a CAS index access.
Being semantics-preserving transformations, the rules for
query rewrite are described according to the following
textual representation: IF (Condition) THEN (Action).
Condition describes the preconditions for applying the
rule. In contrast, Action is a sequence of operations
which have to be performed on the GTO and its subtypes.
Each operation is described by a function call whose
semantics is self-explanatory. The overall rewrite
philosophy that drives the rewrite process can be condensed into
the following statement:</p>
        <p>Whenever possible, a query should be
converted to a single join operator.</p>
        <p>
          We believe that this strategy is helpful for the evaluation
of TQPs, because they can often be evaluated more
efficiently using HTJ operators rather than SJ operators [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Fusion of Join Operators</title>
      <p>Rewrite rule 1 allows to fuse two adjacent join
operators to a single—but probably more complex—join
operator. It is only applicable for two adjacent join
operators that evaluate structural predicates with child or
descendant axes and which are connected over an F
quantified definition edge.</p>
      <p>The correctness of this rule is obvious: Fusing two
adjacent join operators along the definition edge—by
copying the body of the upper operator into the lower
operator’s one—does not change the semantics of the
structural relationships, because we still perform a
left-toright evaluation of the predicates.</p>
      <p>Rewrite rule 1 IF ((t is Join with F-quantified
tuple variable v) ∧ (v references b) ∧ (b
is Join) ∧ (b and t only contain structural
predicates evaluating child or descendant axes))
THEN ({CopyBody(b,t), CopyHead(b,t),
CopyPredicateEdge(b,t), UpdatePredicateEdge(v),
UpdateDefinitionEdge(v), Delete(b), UpdateHeadAttributes(t),
UpdateAttributeCalculation(t)});
Example 1 Figure 7 shows the query graph presented in
Figure 6 after applying rewrite rule 1 which results in a
complex 4-way join operator.
3.2</p>
    </sec>
    <sec id="sec-7">
      <title>Fusion of Select and Join Operators</title>
      <p>Rewrite rule 2 permits to fuse a join operator and an
adjacent select operator. The correctness of the rule is
obvious: Consider a join operator that receives its inputs
from a select operator. The structural predicate is
evaluated on a tuple sequence that has already been filtered by
the select operator. After rewrite, the join operator
evaluates the predicates added during the fusion, in addition to
its own ones. Those additional predicates filter the tuple
sequence in the same way, as it is done before rewrite.
Rewrite rule 2 IF ((t is Join with F-quantified
tuple variable v) ∧ (v references b) ∧ (b is
Select) ∧ (b contains no L-quantified tuple variable))
THEN ({CopyBody(b,t), CopyHead(b,t),
CopyPredicateEdge(b,t), UpdatePredicateEdge(v),
UpdateDefinitionEdge(v), Delete(b)});</p>
      <p>The correctness of this rewrite is preserved by the fact
that the early evaluation of the selection predicate by the
NA operator does not change the query semantics
compared to filtering the output of an NA operator by the
select operator.</p>
      <p>Rewrite rule 3 IF ((t is Select with F-quantified tuple
variable v) ∧ (v references b) ∧ (b is Node Access))
THEN ({CopyBody(t,b), CopyHead(t,b),
CopyPredicateEdge(t,b), UpdatePredicateEdge(v),
CopyOutputEdge(t,b), Delete(t)});
3.4</p>
    </sec>
    <sec id="sec-8">
      <title>Commutativity Rule</title>
      <p>
        In the relational world, commutativity is an important
property of binary join operators which approves to
exchange the left and the right join partner. Rewrite rule 4
defines how we can partially make use of this property to
provide an import operation for query transformation to
extend the search space for join reordering. This rule is
beneficial, e. g., for a hash-based structural join operator
as described by Mathis et al. [
        <xref ref-type="bibr" rid="ref23">22</xref>
        ], where an exchange of
the left and right join partner may lead to better
performance, because the hash table might be created for the
smaller input sequence rather than the larger one. The
commutativity rule holds for almost all XPath axes,
except for the attribute axis, because it has no reverse
axis. For all other axes, there exists a corresponding
reverse resp. forward axis. Exchanging join partners for
a join operator that evaluates a self axis is even trivial,
because it is reflexive.
      </p>
      <p>Rewrite rule 4 IF ((t0 is binary Join) ∧ (t0 contains
tuple variables v1 and v2) ∧ (v1 and v2 are connected
to a structural predicate p) ∧ (p does not evaluate
the attribute axis) ∧ (v1 references TO t1) ∧ (v2
references TO v2)) THEN ({ExchangeRef(v1:t1, v2:t1),
ExchangeRef(v2:t2, v1:t2), ReversePredicate(p),
UpdateOutputEdge(v1), UpdateOutputEdge(v2)});
3.3</p>
    </sec>
    <sec id="sec-9">
      <title>Fusion of Select and Node Access Operators 3.5</title>
    </sec>
    <sec id="sec-10">
      <title>Associativity Rules</title>
      <p>Rewrite rule 3 permits to merge a select and an adjacent
NA operator by moving the predicates from the select
operator to the body of the NA operator. This rewrite is
beneficial, because an NA operator can be directly mapped
to a physical element or CAS index access operator.
An associativity rule empowers a relational plan
generator to traverse the search space of semantically
equivalent queries by changing the join order. However, in
the world of XML query languages, a single
associativity rule is not sufficient, due to the dualism of content
and structure. Instead, we will need a rule for
reordering content-based joins and a set of associativity rules
for structural joins that take combinations of several axes
as well as early duplicate elimination and sorting into
account.</p>
      <p>Every TO that forms the root node of a tree-structured
query graph has to perform duplicate elimination and
sorting, independent of its operator type. On the other
hand, TOs that have incoming and outgoing edges
potentially need to perform duplicate elimination.
Fortunately, not every join operator needs additional
duplicate elimination operations. For example, a full-join TO
will not create any duplicates, independent of the
structural predicate it evaluates. On the other hand, a
semijoin TO can create duplicates on its output. We can
partition binary semi-join operators into two different
equivalence classes depending on the emergence of
duplicates: (1) semi-join operators where only tuples of one
incoming tuple sequence can contain duplicates after join
evaluation (join operators that evaluate parent/child
or previous-/following-sibling axes), and (2)
semi-join operators where both incoming tuple
sequences can contain duplicates after join evaluation
(join operators that evaluate ancestor/descendant or
previous/following axes).</p>
      <p>Example 2 Let a denote the left join partner, b denote
the right join partner of a binary structural semi-join j
that evaluates the child axis. If j only produces a tuples
satisfying the structural predicate, then duplicate
elimination has to be performed, because every node can have
multiple child nodes. In contrast, if j only delivers b
tuples to consuming operators, then duplicate elimination
is not needed, because every node has at most one
parent node. If j would evaluate a descendant axis, then
duplicate elimination could be necessary in both cases,
because every node can have multiple descendants and
multiple ancestors.</p>
      <p>As mentioned before, to provide a complete set of
associativity rules, all combinations of axes have to be
considered. Additionally, different output nodes need to be
taken into account. A node is called an output node if
its tuple sequence contributes to the query result or is
processed in a subsequent TO. Due to space constraints,
we cannot discuss all associativity rules provided in our
framework, except for one rule2.</p>
      <p>Figure 8 shows the associativity rule for one output
node and two adjacent semi-join operators that evaluate
the descendant axis3. To support a more fine-granular
treatment of sorting and duplicate elimination, we
replace a call to the ddo function, which only eliminates
duplicates, by D.</p>
      <p>We assume that the output of each join operator is
implicitly sorted by the node that is used by a subsequent
TO or that contributes to the final result. On the left hand
side of Figure 8, a structural full-join is performed
between tuples of TO A and B which needs no additional
sorting or duplicate elimination. The following
semijoin operator requires duplicate elimination and sorting
for two reasons: (1) it has only incoming edges, (2) each
tuple of the incoming tuple sequence can have multiple
descendant c nodes. On the right side, a structural join
is performed first between TO B and TO C. Because
this structural relationship is evaluated using a semi-join,
we need additional duplicate elimination, because every
b node can have multiple c descendants. The following
semi-join operator requires duplicate elimination for the
same reason, but it needs no additional sorting because
of our implicit sorting assumption.
The main idea that justifies our rewrite philosophy of
Section 3 is used for physical HTJ operators that can
efficiently evaluate n-way joins. If the plan generator
recognizes the availability of path or CAS indexes that can
evaluate a branch of—or even a complete—TQP, this
index should be used instead of evaluating the twig using
a join operator4. To enable a decomposition of complex
(n-way) join operators, we have to consider two cases,
depending on how many twig paths can be answered
using indexes. If there are several indexes available for a
given path, then the one is chosen that can answer the
largest fraction of the path. To make use of this index
for query evaluation, we apply rewrite rule 5 to split the
complex join operator into two parts: one part that is
evaluated using the index and the other part that will be
evaluated using an HTJ or an SJ operator.</p>
      <p>Rewrite rule 5 IF ((t is n-way Join) ∧ (t contains one
path that can be answered using an available index) ∧
(t contains tuple variables v0, v1, and v2) ∧ (v0 is the
twig’s root node) ∧ (all paths from v0 over v1 can be
answered by no index) ∧ (at least one index can answer
a path from v0 over v2)) THEN({CreateJoinOp(tb),
MovePath(v0, v2, t, tb), UpdateHead(tb),
InsertDefEdge(v0, tb});
If we can answer more than one twig path using
available indexes, then the path of those indexes overlaps at
the twig query’s root node. To get rid of this overlap, we
use rewrite rule 6 to split the operator into two new join
operators and connect them using the old join operator
now evaluating a structural self-join to perform an
intersection on the outgoing tuples of the newly created join
operators.</p>
      <p>Rewrite rule 6 IF (t is n-way Join) ∧ (t describes TQP
that can make use of at least two indexes) ∧ (t
contains tuple variables v0, v1, and v2) ∧ (v0 forms the
twig pattern’s root node) ∧ (v1 and v2 are child nodes
of v0) ∧ (v1 and v2 are root nodes of subtrees) ∧
4Using HTJ or SJ operators for twig query evaluation always serves
as a fallback strategy, if no index matches the twig.
(for each path from v0 over v1 resp. v2 at least
one index matches) THEN({CreateJoinOp(tl),
CreateJoinOp(tr), MoveSubtree(t, tl, v1), MoveSubtree(t,
tr, v2), InsertTupleVar(t, v3), InsertTupleVar(t, v4),
InsertPredicate(t, self, v3, v4), UpdateHead(t),
UpdateHead(tl), UpdateHead(tr), InsertDefEdge(v3, tl),
InsertDefEdge(t4, tr)});
By applying rewrite rules 5 and 6 recursively on the
query graph, we obtain a resulting query graph that uses
as much as possible existing path indexes or CAS indexes
for query evaluation.</p>
      <p>Example 3 To show the application of rewrite
rules 5, we assume that there exist two different
indexes which we can use for query evaluation:
I1(//closed auctions/closed auction) and
I2(//price[String]). The right subtree of the query
graph shown in Figure 7 can be directly mapped to an
index access operator for index I2. On the other hand,
index I1 cannot be used for query evaluation, yet. By
applying rewrite rule 5, we get the query graph shown in
Figure 9. As a consequence, index I1 can now be used
for query evaluation.</p>
      <p>Considering the transformations illustrated in Figures 6
to 9, we can see how the rewritings push the QG
towards a better starting point for query transformation by
finding TQPs and exploiting existing indexes as early as
possible. After query unnesting shown in Figure 6, the
sequence of three semi-join operators was transformed
into a single 4-way join operator (see Figure 7). Figure 8
sketched how associativity rules can be defined that help
to enhance the search space for a cost-based query
optimizer. Finally, Figure 9 exemplified how the
knowledge on existing indexes helps to rewrite the QG w. r. t. a
simpler mapping to index access operators during query
transformation.
4</p>
      <sec id="sec-10-1">
        <title>Setting-up an Infrastructure for Costbased XML Query Optimization</title>
        <p>After having introduced the XQGM as our internal
representation for XQuery expressions and defined various
rules for query rewrite, we are now ready to discuss the
infrastructure for cost-based XML query optimization
which takes a rewritten QG as input and transforms it
into a—hopefully—near-optimal QEP.
4.1</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>The Big Picture</title>
      <p>Figure 10 shows the overall architecture of an XML
query optimizer which will be integrated into our
prototype of a native XML database management
system called the XML Transaction Coordinator (XTC)5.
The optimizer consists of five major components: The
Generic Pattern Matcher (GPM) provides means for
extensible rule-based pattern matching on arbitrary
graphstructured trees. Using the GPM, a developer just
describes the pattern to be matched in a declarative manner.
As a consequence, the developer has only to implement
the transformation code manually.</p>
      <p>The Rewriter applies the rewrite rules described in
Section 3 to the QG and uses the GPM for pattern
matching.</p>
      <p>The Plan Generator is responsible for the
transformation of QGs to QEPs. Depending on the plan
generation strategy employed, a heuristics-based or a
costbased optimization approach is chosen. If no cost model
is present, a heuristics-based approach has to be followed
by using a static set of transformation rules. On the other
hand, using a cost model and statistical information, a
cost-based query optimization can be applied. No matter
which plan generation approach is used, we also use the
GPM for pattern matching to allow for a generation of
alternative plans.</p>
      <p>
        One of the main ingredients of a cost-based query
optimizer is the Cost Model. It uses the statistical
information provided by the Statistics component for cost
estimation. In contrast to classic cost-based query
optimization, the cost model is not solely used during query
transformation. If we step back and have a look at the overall
query evaluation process, the following question arises
even during the analysis stage: Is it beneficial to perform
a query unnesting or not?—in other words, shall we
follow a set-at-a-time rather than a node-at-a-time
processing approach. This is only one example where costing
information is needed even before query transformation.
We have learned from classic relational cost-based
optimization, that the search space—consisting of all
se5Project website: http://www.xtc-project.de
mantically equivalent QEPs—for a plan generator gets
tremendously large. If we consider cost-based
optimization of XML queries, the situation gets even worse due
to many different alternative evaluation methods for SJs
and HTJs as well as different index-based access
methods. To enable a query optimizer to keep as much as
possible promising QEPs in main-memory, we use a kind of
mesh data structure as proposed by Graefe and DeWitt
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Figure 11 shows how a new graph is stored in the
mesh. On the left-hand side, a graph Graph and its
corresponding mesh representation Mesh is shown. After
applying query rewrite by fusion two adjacent operators
to a new one, Graph is transformed into Graph’. The
transformation causes an update of the mesh. To allow
for space-efficient management of a large amount of
different plans, only the newly added nodes are physically
stored in the mesh. All other nodes are reused and
referenced by virtual edges (dotted lines). The filled
circles are associated with an equivalence class that
encompasses all semantically equivalent subtrees. For example,
the top-most operator in Mesh’ contains two equivalence
classes, where the first class has one representative and
the second class contains two alternatives.
To reduce the time for finding a near-optimal query plan,
we employ a two-phase query optimization approach:
First we choose a good initial QEP using a
heuristicsbased approach. This QEP can either be immediately
passed on to the code generation component or sent to a
cost-based query optimizer.
4.3.1
      </p>
    </sec>
    <sec id="sec-12">
      <title>Mapping of access operators</title>
      <p>
        The XQGM access operators are providing input for
consuming operators. In general, we can distinguish
between two access methods: document scan and index
access. A document access operator—e. g., the left-most
access operator in Figure 6—is always mapped onto a
document scan, because it only accesses the document’s
root node. For all other access operators, an element
index scan—if present—is the access method of choice.
As outlined in Section 1, there exists a huge variety of
different SJ and HTJ algorithms. Using the rewrite rules
of Section 3, we can reduce the number of binary
structural join operators to a minimum. If there are still
binary structural joins, we use the decision tree shown in
Figure 12 to choose the evaluation method. If positional
predicates are missing, the join operator has three options
depending on the structural relationship it has to
evaluate: The StackTree algorithm by Al-Khalifa et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is
used, when the join operator evaluates a parent/child
or ancestor/descendant axis. For the evaluation
of the previous-sibling/following-sibling axis,
we use one of the structural hash-join algorithms
proposed by Mathis and Ha¨rder [
        <xref ref-type="bibr" rid="ref23">22</xref>
        ]. In all other cases,
we use a classic nested-loops join approach. On the
other hand, if the binary structural join operator
evaluates a positional predicate, we have two alternative
evaluation methods: Vagena et al. [
        <xref ref-type="bibr" rid="ref33">32</xref>
        ] proposed EPPP
which will be used for the evaluation of parent/child
and previous-sibling/following-sibling
structural relationships. For all other combinations, we also
choose a nested-loops join algorithm.
4.3.3
      </p>
      <sec id="sec-12-1">
        <title>Mapping of complex n-way join operators</title>
        <p>
          Complex n-way join operators are created during query
rewrite by fusing two adjacent binary structural joins.
If there is no path or CAS index that can be used for
the evaluation of a complex n-way join, we use the
TwigStack algorithm proposed by Bruno et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
4.3.4
        </p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Making the most of available indexes</title>
      <p>In Section 3.6 we showed how to prepare QGs to make
the most of available indexes. During query
transformation, we can benefit from join decomposition. Whenever
a join operator—no matter whether binary or n-nary—
and its subtree(s) can be evaluated using an available
CAS or path index, we immediately map it to an index
access operator.
4.4</p>
    </sec>
    <sec id="sec-14">
      <title>Handling of Statistical Information by Profiles</title>
      <p>
        During query transformation, we use a profile-based
approach for managing statistical information as proposed
by Mannino et al. [
        <xref ref-type="bibr" rid="ref21">20</xref>
        ]. Every QG is associated with a
corresponding profile hierarchy which is formed by two
different node types: base profile (BP) and intermediate
profile (IP). Figure 13 shows a QG and its associated
profile hierarchy. Every QG leaf node (access operator) has
its own BP which contains statistical information about
the chosen physical access method. For example, an
index scan contains information about the height of the
search tree and the number of its leaf pages. Base
profiles are periodically updated by the database
management system and provide the foundation for cost
estimation.
      </p>
      <p>On the other hand, IPs are associated with QG inner
nodes and do not provide accurate statistical information.
Instead, they use the statistical information provided by
their child nodes (BPs or BIs) as input for cost
estimation. This is not done automatically by the database
management system. Instead, a database administrator has to
run a statistics collection tool similar to DB2’s runstats
manually to update the intermediate profiles. Using cost
formulas which strongly depend on the assigned
physical operator, they allow for an estimation of I/O and CPU
costs.
In this paper, we smoothed the way for cost-based XML
query optimization. First, we introduced the XML Query
Graph Model that serves as our internal representation
for XML queries and permits an easier transition
between QPs and QEPs. Second, we introduced a set of
rewrite rules which allow to fuse as much as possible
adjacent binary structural join operators to a single n-way
join operator. Using these rules, a query optimizer can
evaluate TQPs using HTJ or appropriate CAS or path
indexes more easily. To make the most out of all existing
indexes—especially those that can only answer parts of
a TQP—we introduced further rules for join
decomposition. Using these rules in combination with our
knowledge on existing indexes allows for a separation of paths
in TQPs, that can be answered using indexes, from those,
that have to be evaluated using PPOs. Third, we sketched
the architecture of our query optimizer that will be
integrated in our prototype of a native XML database
management system. Using the concepts described in this
paper, we are already able to provide a heuristics-based
query optimizer.</p>
      <p>In the future, we will focus on the definition of an
appropriate cost model. Choosing the right formulas for
cost estimation is far from trivial. Compared to relational
join algorithms—e. g., nested-loops join or merge join—
SJ and HTJ algorithms are very complex in terms of how
they process data. We expect that this leads to
complex cost formulas whose quality can only be verified
by extensive empirical experiments. Currently, we do
not know which plan generation strategy, e. g., dynamic
programming or simulated annealing, works best. For
that reason, our plan generator will be flexible enough to
switch between different strategies to allow for a
comparison w. r. t. execution time, quality of the chosen plan
and resource consumption.</p>
      <sec id="sec-14-1">
        <title>Acknowledgments</title>
        <p>We thank the anonymous reviewers for their valuable
comments on this paper.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Shurug</given-names>
            <surname>Al-Khalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jignesh M. Patel</surname>
            , Yuqing Wu, Nick Koudas, and
            <given-names>Divesh</given-names>
          </string-name>
          <string-name>
            <surname>Srivastava</surname>
          </string-name>
          .
          <article-title>Structural Joins: A Primitive for Efficient XML Query Pattern Matching</article-title>
          .
          <source>In Proc. ICDE</source>
          , pages
          <fpage>141</fpage>
          -
          <lpage>154</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Kevin</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Beyer</surname>
          </string-name>
          , Roberta Cochrane, Vanja Josifovski, Jim Kleewein, George Lapis,
          <string-name>
            <surname>Guy M. Lohman</surname>
            , Robert Lyle, Fatma O¨ zcan, Hamid Pirahesh, Norman Seemann, Tuong C. Truong, Bert Van der Linden, Brian Vickery,
            <given-names>and Chun Zhang. System RX</given-names>
          </string-name>
          :
          <article-title>One Part Relational, One Part XML</article-title>
          .
          <source>In Proc. SIGMOD Conference</source>
          , pages
          <fpage>347</fpage>
          -
          <lpage>358</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Matthias</given-names>
            <surname>Brantner</surname>
          </string-name>
          , Sven Helmer,
          <string-name>
            <surname>Carl-Christian Kanne</surname>
            , and
            <given-names>Guido</given-names>
          </string-name>
          <string-name>
            <surname>Moerkotte</surname>
          </string-name>
          .
          <article-title>Full-fledged Algebraic XPath Processing in Natix</article-title>
          .
          <source>In Proc. ICDE</source>
          , pages
          <fpage>705</fpage>
          -
          <lpage>716</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Matthias</given-names>
            <surname>Brantner</surname>
          </string-name>
          ,
          <string-name>
            <surname>Carl-Christian Kanne</surname>
            , and
            <given-names>Guido</given-names>
          </string-name>
          <string-name>
            <surname>Moerkotte</surname>
          </string-name>
          .
          <article-title>Let a Single FLWOR Bloom</article-title>
          .
          <source>In Proc. XSym</source>
          , LNCS
          <volume>4704</volume>
          , pages
          <fpage>46</fpage>
          -
          <lpage>61</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Bruno</surname>
          </string-name>
          , Nick Koudas, and
          <string-name>
            <given-names>Divesh</given-names>
            <surname>Srivastava</surname>
          </string-name>
          .
          <article-title>Holistic Twig Joins: Optimal XML Pattern Matching</article-title>
          .
          <source>In Proc. SIGMOD Conference</source>
          , pages
          <fpage>310</fpage>
          -
          <lpage>321</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>XPERANTO: Publishing Object-Relational Data as XML</article-title>
          .
          <source>In Proc. WebDB</source>
          , pages
          <fpage>105</fpage>
          -
          <lpage>110</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Zhimin</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <surname>Laks</surname>
            <given-names>V. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lakshmanan</surname>
            , and
            <given-names>Stelios</given-names>
          </string-name>
          <string-name>
            <surname>Paparizos</surname>
          </string-name>
          .
          <article-title>From Tree Patterns to Generalized Tree Patterns: On Efficient Evaluation of XQuery</article-title>
          .
          <source>In Proc. VLDB Conference</source>
          , pages
          <fpage>237</fpage>
          -
          <lpage>248</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Shu-Yao</surname>
            <given-names>Chien</given-names>
          </string-name>
          , Zografoula Vagena, Donghui Zhang, Vassilis J.
          <string-name>
            <surname>Tsotras</surname>
            , and
            <given-names>Carlo</given-names>
          </string-name>
          <string-name>
            <surname>Zaniolo</surname>
          </string-name>
          .
          <article-title>Efficient Structural Joins on Indexed XML Documents</article-title>
          .
          <source>In Proc. VLDB Conference</source>
          , pages
          <fpage>263</fpage>
          -
          <lpage>274</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Chin-Wan</surname>
            <given-names>Chung</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jun-Ki Min</surname>
            , and
            <given-names>Kyuseok</given-names>
          </string-name>
          <string-name>
            <surname>Shim</surname>
          </string-name>
          .
          <article-title>APEX: An Adaptive Path Index for XML Data</article-title>
          .
          <source>In Proc. SIGMOD Conference</source>
          , pages
          <fpage>121</fpage>
          -
          <lpage>132</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Brian</surname>
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Cooper</surname>
            , Neal Sample,
            <given-names>Michael J.</given-names>
          </string-name>
          <string-name>
            <surname>Franklin</surname>
            , G´ısli
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hjaltason</surname>
            , and
            <given-names>Moshe</given-names>
          </string-name>
          <string-name>
            <surname>Shadmon</surname>
          </string-name>
          .
          <article-title>A Fast Index for Semistructured Data</article-title>
          .
          <source>In Proc. VLDB Conference</source>
          , pages
          <fpage>341</fpage>
          -
          <lpage>350</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Roy</given-names>
            <surname>Goldman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Dataguides: Enabling Query Formulation and Optimization in Semistructured Databases</article-title>
          .
          <source>In Proc. VLDB Conference</source>
          , pages
          <fpage>436</fpage>
          -
          <lpage>445</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Goetz</given-names>
            <surname>Graefe</surname>
          </string-name>
          and
          <string-name>
            <surname>David J. DeWitt.</surname>
          </string-name>
          <article-title>The Exodus Optimizer Generator</article-title>
          . SIGMOD Rec.,
          <volume>16</volume>
          (
          <issue>3</issue>
          ):
          <fpage>160</fpage>
          -
          <lpage>172</lpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Jan</surname>
            <given-names>Hidders</given-names>
          </string-name>
          , Philippe Michiels, Je´roˆme Sime´on, and Roel Vercammen.
          <article-title>How To Recognize Different Kinds of Tree Patterns from Quite a Long Way Away</article-title>
          .
          <source>In Proc. Plan-X</source>
          , pages
          <fpage>14</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <surname>Laks</surname>
            <given-names>V. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lakshmanan</surname>
            , Divesh Srivastava, and
            <given-names>Keith</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
          </string-name>
          .
          <article-title>TAX: A Tree Algebra for XML</article-title>
          .
          <source>In Proc. DBPL</source>
          , LNCS
          <volume>2397</volume>
          , pages
          <fpage>149</fpage>
          -
          <lpage>164</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Haifeng</surname>
            <given-names>Jiang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hongjun Lu</surname>
          </string-name>
          , and Jeffrey Xu Yu.
          <article-title>Holistic Twig Joins on Indexed XML Documents</article-title>
          .
          <source>In Proc. VLDB Conference</source>
          , pages
          <fpage>273</fpage>
          -
          <lpage>284</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Raghav</surname>
            <given-names>Kaushik</given-names>
          </string-name>
          , Rajasekar Krishnamurthy,
          <string-name>
            <given-names>Jeffrey F.</given-names>
            <surname>Naughton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Raghu</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          .
          <article-title>On the Integration of Structure Indexes and Inverted Lists</article-title>
          .
          <source>In Proc. SIGMOD Conference</source>
          , pages
          <fpage>779</fpage>
          -
          <lpage>790</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Raghav</surname>
            <given-names>Kaushik</given-names>
          </string-name>
          , Pradeep Shenoy, Philip Bohannon, and
          <string-name>
            <given-names>Ehud</given-names>
            <surname>Gudes</surname>
          </string-name>
          .
          <article-title>Exploiting Local Similarity for Indexing Paths in Graph-Structured Data</article-title>
          .
          <source>In Proc. ICDE</source>
          , pages
          <fpage>129</fpage>
          -
          <lpage>140</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Hua-Gang Li</surname>
            ,
            <given-names>S. Alireza</given-names>
          </string-name>
          <string-name>
            <surname>Aghili</surname>
          </string-name>
          , Divyakant Agrawal, and
          <article-title>Amr El Abbadi</article-title>
          . FLUX:
          <article-title>Content and Structure Matching of XPath Queries with Range Predicates</article-title>
          .
          <source>In Proc. XSym</source>
          , pages
          <fpage>61</fpage>
          -
          <lpage>76</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Michael</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Carey</surname>
            , Daniela Florescu, Zachary G. Ives, Ying Lu, Jayavel Shanmugasundaram,
            <given-names>Eugene J.</given-names>
          </string-name>
          <string-name>
            <surname>Shekita</surname>
          </string-name>
          , and
          <string-name>
            <surname>Subbu</surname>
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Subramanian</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Quanzhong</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bongki</given-names>
            <surname>Moon</surname>
          </string-name>
          .
          <article-title>Indexing and Querying XML Data for Regular Path Expressions</article-title>
          .
          <source>In Proc. VLDB Conference</source>
          , pages
          <fpage>361</fpage>
          -
          <lpage>370</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Michael</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Mannino</surname>
          </string-name>
          , Paicheng Chu, and Thomas Sager.
          <source>Statistical Profile Estimation in Database Systems. ACM Comput. Surv.</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <fpage>191</fpage>
          -
          <lpage>221</lpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Mathis</surname>
          </string-name>
          .
          <article-title>Extending a Tuple-Based XPath Algebra to Enhance Evaluation Flexibility</article-title>
          . Informatik - Forschung
          <source>und Entwicklung</source>
          ,
          <volume>21</volume>
          (
          <issue>3-4</issue>
          ):
          <fpage>147</fpage>
          -
          <lpage>164</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Mathis</surname>
          </string-name>
          and
          <article-title>Theo Ha¨rder. Hash-Based Structural Join Algorithms</article-title>
          .
          <source>In Proc. EDBT DataX Workshop</source>
          , LNCS
          <volume>4254</volume>
          , pages
          <fpage>136</fpage>
          -
          <lpage>149</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Jason</surname>
            <given-names>McHugh</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Query Optimization for XML</article-title>
          .
          <source>In Proc. VLDB Conference</source>
          , pages
          <fpage>315</fpage>
          -
          <lpage>326</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Philippe</surname>
            <given-names>Michiels</given-names>
          </string-name>
          , George A.
          <string-name>
            <surname>Mihaila</surname>
          </string-name>
          , and Je´roˆme Sime´
          <article-title>on. Put a Tree Pattern in Your Algebra</article-title>
          .
          <source>In Proc. ICDE</source>
          , pages
          <fpage>246</fpage>
          -
          <lpage>255</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Tova</given-names>
            <surname>Milo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <article-title>Index Structures for Path Expressions</article-title>
          .
          <source>In Proc. ICDT</source>
          , pages
          <fpage>277</fpage>
          -
          <lpage>295</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Mitschang</surname>
          </string-name>
          .
          <article-title>Anfrageverarbeitung in Datenbanksystemen (German only)</article-title>
          .
          <source>Vieweg</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Stelios</surname>
            <given-names>Paparizos</given-names>
          </string-name>
          , Yuqing Wu,
          <string-name>
            <surname>Laks</surname>
            <given-names>V. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lakshmanan</surname>
            , and
            <given-names>H. V.</given-names>
          </string-name>
          <string-name>
            <surname>Jagadish</surname>
          </string-name>
          .
          <article-title>Tree Logical Classes for Efficient Evaluation of XQuery</article-title>
          .
          <source>In Proc. SIGMOD Conference</source>
          , pages
          <fpage>71</fpage>
          -
          <lpage>82</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Hamid</surname>
            <given-names>Pirahesh</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joseph M. Hellerstein</surname>
            , and
            <given-names>Waqar</given-names>
          </string-name>
          <string-name>
            <surname>Hasan</surname>
          </string-name>
          .
          <article-title>Extensible/Rule Based Query Rewrite Optimization in Starburst</article-title>
          .
          <source>In Proc. SIGMOD Conference</source>
          , pages
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Christopher</surname>
            <given-names>Re</given-names>
          </string-name>
          , Je´roˆme Sime´on, and Mary F.
          <article-title>Ferna´ndez. A Complete and Efficient Algebraic Compiler for XQuery</article-title>
          .
          <source>In Proc. ICDE</source>
          , page
          <volume>14</volume>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Flavio</given-names>
            <surname>Rizzolo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alberto O.</given-names>
            <surname>Mendelzon</surname>
          </string-name>
          .
          <article-title>Indexing XML Data with ToXin</article-title>
          .
          <source>In Proc. WebDB</source>
          , pages
          <fpage>49</fpage>
          -
          <lpage>54</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Albrecht</surname>
            <given-names>Schmidt</given-names>
          </string-name>
          , Florian Waas, Martin L.
          <string-name>
            <surname>Kersten</surname>
            ,
            <given-names>Michael J.</given-names>
          </string-name>
          <string-name>
            <surname>Carey</surname>
          </string-name>
          , Ioana Manolescu, and Ralph Busse.
          <article-title>XMark: A Benchmark for XML Data Management</article-title>
          .
          <source>In Proc. VLDB Conference</source>
          , pages
          <fpage>974</fpage>
          -
          <lpage>985</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Zografoula</surname>
            <given-names>Vagena</given-names>
          </string-name>
          , Nick Koudas, Divesh Srivastava, and
          <string-name>
            <given-names>Vassilis J.</given-names>
            <surname>Tsotras</surname>
          </string-name>
          .
          <article-title>Efficient Handling of Positional Predicates Within XML Query Processing</article-title>
          .
          <source>In Proc. XSym</source>
          , LNCS
          <volume>3671</volume>
          , pages
          <fpage>68</fpage>
          -
          <lpage>83</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Haixun</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Sanghyun Park, Wei Fan, and
          <string-name>
            <surname>Philip</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yu. ViST: A Dynamic Index</surname>
          </string-name>
          <article-title>Method for Querying XML Data by Tree Structures</article-title>
          .
          <source>In Proc. SIGMOD Conference</source>
          , pages
          <fpage>110</fpage>
          -
          <lpage>121</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Andreas</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Weiner</surname>
            ,
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Mathis</surname>
          </string-name>
          , and
          <article-title>Theo Ha¨rder. Associativity Rules for Native XML Databases</article-title>
          .
          <source>Internal Report</source>
          , AG DBIS, http://wwwlgis.informatik.uni-kl.de/,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>