<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>iiXXUUPPTT:: IInnddeexxiinngg XXMMLL UUssiinngg PPaatthh TTeemmppllaatteess</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tom´aˇs Bartoˇs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J´an Kasarda Tomas Bartos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Kasarda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Software Engineering, Faculty of Mathematics and Physics</institution>
          ,
          <addr-line>DepartmenCthoafrSleosftUwnariveeErsnitgyinienerPinragg,uFea,cMulatyloostfrManastkh ́eemn ́aamti.cs25a,nd Physics, Charles Un1iv1e8rs0i0tyPirnagPurea,gCuez,ecMhaRloespturabnliscke nam. 25</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>84</fpage>
      <lpage>95</lpage>
      <abstract>
        <p>The XML format has become the standard for data exchange because it is self-describing and it stores not only information but al so the relationships between data. Therefore it is used in very different areas. To find the right information in an XML file, we need to have a fast and an effective access to data. Similar to relational databases, we can create an index in order to speed up the querying for the information. There are several ways of indexing XML data but previous research showed that one of the most effective approaches is to index root-to-leaf paths in the input file. So we took the inspiration from existing path-based index ing concepts, enhanced those ideas, and created a new native XML indexing method derived from the combination of existing approaches in order to improve the evaluation time of regular path expressions in XPath queries.</p>
      </abstract>
      <kwd-group>
        <kwd>Indexing XML</kwd>
        <kwd>path-based indexing</kwd>
        <kwd>path templates</kwd>
        <kwd>XPath queries</kwd>
        <kwd>regular path expressions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Introduction
In past few years there has been an expansion of semi-structured data mostly
stored as XML files and used for saving and exchanging information over the
Internet. The simplicity is only one of the factors why the format became so
popular. As more and more files occurred in this format, we wanted to access
the stored data and search for the specific information according to prior criteria.
For this purpose languages such as XPath [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] or XQuery [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] have been created.
They allow searching for elements, attributes, or text values based on either
specific values or regular expressions. If there are multiple conditions in an XPath
query, we can combine them, and we get the regular path expression pattern.
      </p>
      <p>The path expression usually matches several elements in the input XML file
(the result set). The challenge is to find these elements quickly and efficiently,
especially in large files with a high number of elements and with different
structures. One came with an idea of indexing the XML data in order to quickly get
the the results for any query.</p>
      <p>No matter which indexing technique we use, if we had an XPath expression,
the most problematic queries would be those with /’/’ (relative paths) or ’*’
(wildcards). These queries match numerous distinct elements and are difficult to
handle compared to expressions with absolute paths only.</p>
      <p>In this paper, our goal is to combine the best concepts of existing indexing
methods and enhance them in order to improve the evaluation time of XPath
queries. To achieve this, we make contributions to these areas:
– The previous research showed that indexing paths is one of the effective ways
of indexing XML documents, so we use this approach and we create a new
indexing method based on indexing paths (Section 2).
– Additionally, we combine it with one of the numbering schemes in order to
accelerate the evaluation of XPath regular path expressions (see Section 3).
– Finally we compare the new concept with existing solutions in terms of time
complexity while evaluating sample XPath queries (Section 4).
2</p>
      <p>
        XML Indexing
There are various distinct approaches of indexing XML files. The main difference
between them is that each method focuses on the specific topic such as decreasing
the number of I/O operations [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], converting the XML format into tables in a
relational DBMS to leverage the database engine ([
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or GRIPP
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), or relying on the simplicity of numbering schemes and joining elements
(XISS [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] or twig joins [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]). The indexes might be based on a known data
structure, e.g. the Patricia tries [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] are used in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], but more often they
use custom structures.
      </p>
      <p>
        The method of labeling nodes in the tree with numbers might help with
discovering ancestor-descendant (A-D) relationships. Dietzs’ numbering scheme
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] inspired us to design our numbering scheme based on intervals that evaluates
A-D relationships in constant time. We found also motivation in XISS for t he
decomposition of XPath expressions, producing the intermediate results (called
candidates in our approach), and element joins.
      </p>
      <p>
        Specializing on paths, the DataGuide [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] handles raw paths and provided the
basis for future path indexing methods such as XDG (Extended DataGuide [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ])
or Index Fabric [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The interesting work of mapping all root-to-leaf paths into multi-dimensional
points (MDX [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or UB-trees [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]) influenced us on designing our indexing
method. These concepts try to avoid using structural joins because of their
time complexity compared to indexing paths. We understand the inefficiency of
element-based approaches but we also see the potential slowdown for the
multidimensional mapping methods when evaluating queries with multiple wildcards
and relative paths. In this case, the domain used for finding matching tuples
might grow faster.
      </p>
      <p>The aim of the proposed indexing scheme is to follow the multi-dimensional
techniques and leverage the path-based indexing with focus on grouping paths
according to common characteristics, path labels (see Section 2.2). We expect
this idea to eliminate many unsuitable paths and, as the result, to speed up the
evaluation of any query (especially those problematic ones; see Section 3).
faculty (0)
If we take XML files, they can be easily transformed into oriented graphs.
The elements and attributes correspond to graph nodes and the edges are
derived from the parent-child (or element-subelement) relationships. Generally, the
graph might contain loops but if we discard the IDREF attributes, we will get a
tree (see Figure 1). Our approach considers only tree structures as the input. We
also exclude attributes and text values from the indexing and querying methods
and focus primarily on the elements and their relationships. The support for
indexing and evaluating attributes is straightforward (any attribute of an element
might be considered as a specific subelement with a specific edge).</p>
      <p>Although several various indexing methods exist, the main purpose remains
the same - preprocess the source file and store information that will give fas t
response to almost any query. We suggest indexing paths, so we evaluate queries
only on such paths that are common for as many elements from the query as
possible. This method eliminates a lot of paths and elements that will not be in
the result and therefore it improves the querying time.
2.2</p>
      <p>Path-based Indexing
First of all, we assign elements in the source file the unique identification numbers
(NodeIDs) and convert the element (tag) names into integers (TagIDs). We prefer
numbers over strings because the comparison of integers is faster than comparing
strings with variable lengths although it brings some memory overhead. We use
the numbering method that assigns a node the NodeID when the corresponding
element is visited for the first time (using the SAX parsing method). The root
has the number 0, while the number of following nodes is always incremented
by 1 (see the sample XML file with element names and NodeIDs in Figure 1).</p>
      <p>Next, we store all root-to-leaf paths according to their path labels.
Whenever we reach a leaf node, a new root-to-leaf path occurs, and we store the
Path reference according to its path label. While the Path reference is
indicated by the sequence of NodeIDs, the path label is determined by the
sequence of TagIDs of all nodes on the path from the root to the current (leaf)
node. The path label is stored only once but one path label can contain several
Path references. Moreover, we convert the path label into Path Template ID
(PTID) that groups similar Path references together (for the detailed
specification of used terms see Table 1).</p>
      <p>Example 1. If we take the sample XML file from the Figure 1 and we visit a leaf
node fax (13), the Path reference (sequence of NodeIDs) will be (0,7,8,13).
Converting element names ” /faculty/department/contact/fax ” to TagIDs, we get
the path label ’ /0/7/1/9.’ For details about the conversion, see the Table 2(a)
(NodeTags table) and the Section 2.3.
2.3</p>
      <p>
        Structures
While we parse the input XML file, we maintain several structures that store the
root-to-leaf paths and other necessary information that we will later use when
we evaluate queries. There are three major data structures: the NodeTags table,
the Paths table, and the PrefixTree. While the first two are more like database
tables, the last one is definitely a tree structure (inspired by a Patricia trie [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]).
NodeTags table. This table provides conversion between the TagName and
the TagID. Furthermore it saves all path template IDs (PTIDs) where the
given TagID appear; see the Table 2(a).
      </p>
      <p>Paths table. This table contains the conversion between the Path label and
the PTID. As mentioned before, a single Path label provides grouping for
several Path references that are also stored here; see the Table 2(b).
Prefix tree. The PrefixTree covers all distinct prefixes of path labels from
the source file. Each node in the tree has the TagID and the path from the
root to a given node represents the path prefix. Nodes also contain all PTIDs
which do have the selected prefix. We use this structure to find all PTIDs to
which a specific NodeID might belong (for details see Section 3).</p>
      <p>faculty (0)
Example 2. If we take the node contact (1) from the sample XML (see Figure
1), it belongs to almost all PTIDs (according to the NodeTags table). And using
the PrefixTree, we find the prefix of ’/0/1’ that matches the PTIDs { 0,1,2,3} .
So the chosen node occurs only on paths with these path labels.
3</p>
      <p>Evaluating XPath queries
Before we start evaluating an XPath query, we need to parse and prepare the
query. We describe these steps in the following sections in more detail.
3.1</p>
      <p>Parsing XPath expressions
In the beginning, we convert the string query, that describes a regular path
expression, into the structure that is appropriate for the evaluation. We selected
the graph structure (XPathTree) created by two types of nodes (XPathNodes):
steps and predicates (see the sample queries in Figure 3).</p>
      <p>Steps If we divide the path expression into sequence of step expressions, they
will be represented by these nodes.</p>
      <p>Predicates We express the further filter expressions by the Predicate nodes.</p>
      <p>While the Step nodes cannot be added as subnodes, each node in the tree
might contain one or more Predicate subnodes.</p>
      <p>faculty
department
faculty
department
contact
(a)</p>
      <p>*
email
(b)
department</p>
      <p>fax
(c)
contact
After we parse the XPath expression and create the corresponding XPathTree,
we need to convert the stored element names into TagIDs. While converting, we
check element names whether they exist in the index. If a name does not occur
in the NodeTags table, the result is instantly available because no nodes will be
in the result set and no further evaluation is needed (we suppose all predicates
to co-exist at the same time).</p>
      <p>We also assign XPathNodes the integer Order, so we know the order in which
they are visited and evaluated.
Algorithm 1 Visit procedure for evaluating a node in the XPathTree
When we build and validate the XPathTree, we can start the evaluation. We
use slightly different evaluating methods according to the type of the current
XPathNode. When traversing the XPathTree, the Step nodes are evaluated with
the top-down approach, while the Predicate nodes are processed in the
bottomup style (from the lowest level upwards using depth-first search). The reason for
distinct methods is that all predicates must be resolved before we can continue
with the next XPathNode.</p>
      <p>The Algorithm 14 describes the procedure that we apply to all XPathNodes.
There are several steps that we need to explain in more detail but the main
principle is that we save candidate nodes (NodeIDs) for each XPathNode we
visit. So the candidate nodes of the last Step node represent the result set.
1. Save candidates</p>
      <p>The first step is to save candidate nodes for the current node. We store
them in the table called candidates (line 1). It contains PTIDs, where the
XPathNode occur, with corresponding NodeIDs. Each PTID is stored only
once but one NodeID might be saved for more PTIDs. It is because we focus
later on merging candidates according to PTIDs rather than NodeIDs.
To identify the PTIDs, we try to find the smallest PTID set that is common
for as many XPathNode nodes as possible (using the NodeTags table). For a
predicate node, this means to take PTIDs that are common for XPathNodes
on the path from the last Step node. Because we evaluate predicates
bottomup, we create the set of PTIDs on the way ”down.” For a Step node we take
the path from the first Step node. This holds only if all axis directions on
the path are the same. If we have an alternating1 XPathNode, it divides the
1 Alternating XPathNode is a node that changes the axis direction (the incoming edge
has different direction than the outgoing edge).
XPathNodes on the path into two groups for which the smallest PTID set
must be computed separately. We pre-compute the minimal PTID set for all
corresponding nodes when the first node in a specific group is visited.
When we obtain the minimal PTID set, we use the Paths table to find the
candidate nodes (NodeIDs) according to the Path references for a PTID. If
the TagID does not represent ’*’ , we find the positions of the TagID in the
Path label identified by the current PTID. We search only for positions that
are either after (forward axes) or before (reverse axes) the position of the last
node and we take all NodeIDs on those positions from the Path references.
If the TagID reflects ’*’ and the XPathNode does not have any predicates,
we skip it and save the minimal and the maximal number of positions to be
skipped when searching for positions in the next XPathNode. The numbers
are determined by the current axis: (1, 1) for the direct relative (parent,
child), and (1, ∞) for other axes (ancestor, descendant).</p>
      <p>Example 3. If we take the Query 3 in Figure 3(c), the alternating XPathNode
is the fax node. Therefore the minimal PTID subset can be computed only
for set of nodes { faculty,fax} and { fax,contact} . The candidates for this
XPathNode are shown in Table 3.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Evaluate predicates and voting</title>
      <p>If there are any predicates for the current XPathNode, we need to handle them
before we continue with the next XPathNode. Because predicates give
additional filtering criteria, not all candidate nodes from the current XPathNode’s
candidates will meet the new criteria. Therefore we use voting for candidate
nodes (line 4). Every predicate gives a vote for all candidate nodes in the
current XPathNode’s candidates table (no matter of their PTIDs) that are
reachable from a NodeID stored in the predicates’ candidates. The
reachability is dedicated from the axis type and the NodeIDs. Because we consider
only tree structures, we can use the interval (NodeID,LastNodeID) that is
available to any NodeID to test whether a path between two NodeIDs exists.
The path from a node n1 to a node n2 exists only if</p>
      <p>(n2.N odeID &gt; n1.N odeID) and (n2.N odeID ≤ n1.LastN odeID). (1)</p>
    </sec>
    <sec id="sec-3">
      <title>3. Filter candidates</title>
      <p>Whenever we use the voting mechanism, we need to finalize the candidates.</p>
      <p>We call this action filtering candidates (line 7). The candidate nodes that
have not received enough votes will be removed. Number of votes needed for
being kept equals the number of predicates that has been included in voting.
After we eliminate unsuitable candidate nodes, we need to update the PTIDs
for the next step. By updating PTIDs we mean find all PTIDs where a NodeID
might occur. If the axis of the next XPathNode has the same direction, we
take the PTIDs only from the smallest PTID set for the current XPathNode.
Otherwise, we need to consider potentially all PTIDs. To take only correct
PTIDs, we use the PrefixTree that determines only such PTIDs in which
the given NodeID might occur. We cannot use only the Paths table because
we will find PTIDs for any NodeID with the same TagName and that produces
bigger set than we need for a specific NodeID. So we use the PrefixTree
instead. The path prefix that we need for navigation in the PrefixTree is
defined by the current NodeID.
4. Merge candidates</p>
      <p>If we have two Step XPathNodes, we need to merge their candidate nodes
(line 10). Usually we take the candidates of the last Step XPathNode and
apply the same voting mechanism on the current XPathNode as with
predicates. After voting, we automatically filter candidates. The result for the
current XPathNode contains previous candidate nodes that received votes.
4</p>
      <p>Experimental results
We implemented the prototype of the iXUPT Index in .NET Framework 2.0
and use the laptop machine with Intel Core Duo processor (1.66GHz), 3GB
main memory, and Windows XP SP2 installed to execute the experiments.</p>
      <p>
        For our experiments, we use XMark [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] (the XML benchmark database)
as the data set. XMark is designed to generate XML documents with multiple
parts meeting various XML approaches (data-centric and document-centric).
Although the generated documents are valid to a specific DTD, we do not use
this schema as the hint for indexing or evaluating.
      </p>
      <p>
        To acquire the data set, we use the tool xmlgen [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] which is the
implementation of the XMark. We generated several documents with different characteristics
(see the Table 4 for details).
      </p>
      <p>We provide also a comparison between a number of all root-to-leaf paths
and a number of different Path labels (see Figure 4). The reason is that the
number of different Path labels is smaller than the number of all paths (and
might have an upper bound). So searching for candidate nodes on common PTIDs
(that uniquely identify Path labels) enhance the speed of the evaluation.
430
410
sID390
T
#P370
350
330
10000
] 1000
s
[em 100
m
iT 10
1</p>
      <p>We choose two sample queries according to the given DTD of the generated
documents that will cover as many features and functionality as possible.
1. site/regions/*/item/location</p>
      <p>The first query (Q1) is simple, there are no predicates and it uses mostly
direct relatives (the child axis).
2. //regions[europe]/ancestor::*/people//person</p>
      <p>This query (Q2) is more complex, there is a predicate, a branching node,
and an alternating node. Furthermore, the second Step node matches several
elements and different axis directions are used.</p>
      <p>iXUPT eXist XPN 2.0 Qizx MSSQL
iXUPT eXist XPN 2.0 Qizx MSSQL
1.12</p>
      <p>
        To eliminate any negative effects of the managed code, we present the average
time of 10 subsequent executions for each query. We compare the iXUPT
prototype with several other products such as eXist [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Qizx [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], MS SQL Server
2005 [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] (does not provide support for ancestor axis, so the query Q2 was not
executed) or the built-in XPathNavigator in .NET Framework 2.0 (marked as
XPN 2.0). The Figure 5 shows the evaluation times for both queries Q1 and Q2.
      </p>
      <p>We can see the improvements of the query evaluation especially for the first
query in the Figure 5(a). But we understand that the reason might be that
we do not consider the time needed to create the index or that our index is
fundamentally memory-based.
5</p>
      <sec id="sec-3-1">
        <title>Conclusion and future work</title>
        <p>In this paper, we have proposed a new XML indexing method based on storing
root-to-leaf paths and grouping them according to common Path labels in
order to enhance the evaluation time of regular path expressions in XPath queries.
The experimental results showed that there is an improvement of the evaluation
time in the category of small and medium-sized files. Although handling medium
files is still comparable to other approaches, evaluating large files and complex
queries did not bring the anticipated results.</p>
        <p>
          For the future, there are still several issues in the current prototype that
we would like to improve, such as speeding up the branch queries or optimizing
the tree structure that stores the XPath query before evaluating. Next, we want
to design an optimal structure for saving the iXUPT index to a hard drive.
Furthermore, we would like to provide support for graph-oriented XML files
(not only trees) which means to replace the interval-based path testing with a
more general structure (such as Rho-index [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]). Finally, we aspire to use our
indexing method in the environment of distributed XML processing.
6
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Acknowledgments</title>
        <p>This research has been partly supported by Czech Science Foundation (GACˇ R)
project Nr. 201/09/0683, by institutional research plan number MSM0021620838,
and by Czech Science Foundation (GACˇ R), grant Nr. P202/10/0573.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Paul F. Dietz:
          <article-title>Maintaining a Order in a linked list</article-title>
          .
          <source>Proceedings of the 14th Annual ACM Symposium on Theory of Computing</source>
          . San Francisco, California (
          <year>1982</year>
          )
          <fpage>122</fpage>
          -
          <lpage>127</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R.</given-names>
            <surname>Goldman</surname>
          </string-name>
          , J. Widom:
          <article-title>DataGuides: Enable query formulation and optimization in semistructured databases</article-title>
          .
          <source>Proceedings of 23rd International Conference on Very Large Data Bases</source>
          . Athens, Greece (
          <year>1997</year>
          )
          <fpage>436</fpage>
          -
          <lpage>445</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. M. Kr´atky´, R. Baˇca, V.
          <source>Sn´aˇsel: On the Efficient Processing Regular Path Expressions of an Enormous Volumne of XML Data. Lecture Notes in Computer Science</source>
          , Vol.
          <volume>4653</volume>
          . Springer-Verlag, Germany (
          <year>2007</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>J.M. Bremer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Gertz: An Efficient XML Node Identification</article-title>
          and
          <string-name>
            <given-names>Indexing</given-names>
            <surname>Scheme</surname>
          </string-name>
          .
          <source>Technical Report CSE-2003-04</source>
          . Dept. of Computer Science, University of California at Davis, (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>G.</given-names>
            <surname>Marks</surname>
          </string-name>
          , M.
          <source>Roantree: Pattern Based Processing of XPath Queries. IDEAS 2008 - International Symposium on Database Engineering and Applications</source>
          . Coim bra,
          <source>Portugal</source>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>S.</given-names>
            <surname>Trißl</surname>
          </string-name>
          ,
          <string-name>
            <surname>U.</surname>
          </string-name>
          <article-title>Leser: Fast and Practical Indexing and Querying of Very Large Graphs</article-title>
          .
          <source>Proceedings of the 2007 ACM SIGMOD international conference on Management of data. Beijing</source>
          , China (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.f.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.R.</given-names>
            <surname>Hjaltason</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Shadmon: A Fast Index for Semistructured Data</article-title>
          .
          <source>Proceedings of the 27th International Conference on Very Large Data Bases</source>
          . San Francisco, USA (
          <year>2001</year>
          )
          <fpage>341</fpage>
          -
          <lpage>350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Torsten</given-names>
            <surname>Grust</surname>
          </string-name>
          :
          <article-title>Accelerating XPath Location Steps</article-title>
          .
          <source>Proceedings of the 2002 ACM SIGMOD international conference on Management of data</source>
          . New York, USA (
          <year>2002</year>
          )
          <fpage>109</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D.</given-names>
            <surname>Barashev</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>Novikov: Indexing XML to Support Path Expressions</article-title>
          .
          <source>Proc. of the 6th East European Conf. on Advances in Databases and Information Systems (ADBIS 2002)</source>
          , Vol.
          <volume>2</volume>
          :
          <string-name>
            <given-names>Research</given-names>
            <surname>Communications</surname>
          </string-name>
          . Bratislava, Slovakia (
          <year>2002</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. M. Kr´atky´, J. Pokorny´,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Sn´aˇsel: Indexing XML Data with UB-trees</article-title>
          .
          <source>Proc. of the 6th East European Conf. on Advances in Databases and Information Systems (ADBIS 2002)</source>
          , Vol.
          <volume>2</volume>
          :
          <string-name>
            <given-names>Research</given-names>
            <surname>Communications</surname>
          </string-name>
          . Bratislava, Slovakia (
          <year>2002</year>
          )
          <fpage>155</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Quanzhong</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>Moon: Indexing and Querying XML Data for Regular Path Expressions</article-title>
          .
          <source>Proceedings of the 27th VLDB Conference</source>
          . Roma, Italy (
          <year>2001</year>
          )
          <fpage>361</fpage>
          -
          <lpage>370</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Donald</surname>
          </string-name>
          <article-title>Knuth: The Art of Computer Programming</article-title>
          . Volume III,
          <article-title>Sorting and Searching, Third Edition</article-title>
          . Addison Wesley, Reading, MA (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Wolfgang Meier:
          <source>eXist: An Open Source Native XML Database. Lecture Notes in Computer Science</source>
          , Vol.
          <volume>2593</volume>
          /
          <year>2009</year>
          . Springer Berlin, Heidelberg (
          <year>2003</year>
          )
          <fpage>169</fpage>
          -
          <lpage>183</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Waas</surname>
          </string-name>
          , I. Manolescu,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kersten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Busse</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.J. Carey:</surname>
          </string-name>
          <article-title>XMark: A Benchmark for XML Data Management</article-title>
          .
          <source>Proceedings of the 28th VLDB Conference. Hong Kong</source>
          ,
          <string-name>
            <surname>China</surname>
          </string-name>
          (
          <year>2002</year>
          )
          <fpage>974</fpage>
          -
          <lpage>985</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. S. Bartonˇ, P. Zezula:
          <article-title>Rho-index - An Index for Graph Structured Data</article-title>
          .
          <source>8th International Workshop of the DELOS Network of Excellence on Digital Libraries</source>
          . Schloss Dagstuhl,
          <string-name>
            <surname>Germany</surname>
          </string-name>
          (
          <year>2005</year>
          )
          <fpage>57</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. Yoshikawa</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Amagasa</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Shimura</surname>
            and
            <given-names>S.</given-names>
          </string-name>
          <article-title>Uemura: XRel: a Path-based Approach to Storage and Retrieval of XML Documents Using Relational Databases</article-title>
          .
          <source>ACM Transactions on Internet Technology (TOIT)</source>
          . New York, NY, USA (
          <year>2001</year>
          )
          <fpage>110</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Zhiyuan</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.J.</given-names>
            <surname>Korn</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Koudas</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Shanmugasundaram</surname>
          </string-name>
          and
          <string-name>
            <surname>J.D.</surname>
          </string-name>
          <article-title>Srivastava: Index Structures for Matching XML Twigs Using Relational Query Processors, Data &amp; Knowledge Engineering</article-title>
          . Amsterdam, The Netherlands (
          <year>2007</year>
          )
          <fpage>283</fpage>
          -
          <lpage>302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ling</surname>
          </string-name>
          , T.W.:
          <article-title>On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques</article-title>
          ,
          <source>Proceedings of the 2005 ACM SIGMOD international conference on Management of data</source>
          . New York, NY, USA (
          <year>2005</year>
          )
          <fpage>455</fpage>
          -
          <lpage>466</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>XML Path</surname>
          </string-name>
          <article-title>Language (XPath) 2.0</article-title>
          . http://www.w3.org/TR/xpath20.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <article-title>XQuery 1.0: An XML Query Language</article-title>
          . http://www.w3.org/TR/xquery.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <article-title>eXistd-b Open Source Native XML Database</article-title>
          . http://exist.sourceforge .net.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Albrecht</surname>
          </string-name>
          <article-title>Schmidt: xmlgen - The Benchmark Data Generator</article-title>
          . http://www.xmlbenchmark.org.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Qizx</surname>
          </string-name>
          ,
          <article-title>a fast XML repository and search engine fully supporting XQuery</article-title>
          . http://www.xmlmind.com/qizx.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Microsoft</surname>
            <given-names>SQL Server</given-names>
          </string-name>
          <year>2005</year>
          . http://www.microsoft.com/sqlserver/
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>