<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>EEfficciieenntt IImmpplleemmeennttaattiioonn ooff XXPPaatthh PPrroocceessssoorr oonn MMuullttii--CCoorree CCPPUUss</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Krulsi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ˇ Jakub Yaghob Martin Krulis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub Yaghob</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Software Engineering FDaecpualrttymoefnMtoafthSeomftawtaicrse aEnndgiPnheeyrsiincsg Facu</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Republic Malos</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>60</fpage>
      <lpage>71</lpage>
      <abstract>
        <p>Current XPath processors use direct approach to query evaluation which is quite inefficient in some cases and usually implemented serially. This may be a problem in case of processing complex queries on large documents. We propose algorithms and XML indexing techniques which are more efficient and which can utilize standard parallel templates. Our implementation is highly scalable and outperforms common XML libraries.</p>
      </abstract>
      <kwd-group>
        <kwd>XML</kwd>
        <kwd>XPath</kwd>
        <kwd>parallel</kwd>
        <kwd>multi-threaded</kwd>
        <kwd>multi-core</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Introduction</p>
      <p>Most XML documents have the properties described above, so our
implementation will process them just fine. Documents that do not conform to our
assumptions will not be considered in this paper for the sake of scope.</p>
      <p>
        XPath specification [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] contains a lot of details, many of which are
uninteresting from our perspective. We will focus on processing location paths, since they
present the most challenging part of XPath. Common implementations [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
take direct approach which can lead to inefficient algorithms. We demonstrate
the problem on an example. Consider simple location path:
      </p>
      <p>/descendant::a/following::b[P ]</p>
      <p>Let us assume that all elements b are in ‘following’ relation with any element
a. The following::b step then yields the same set of nodes for every element
a produced by previous step. Therefore, predicate P will be evaluated for each
node b O(Na) times, where Na denotes total number of elements a.</p>
      <p>
        Straightforward implementation leads to an algorithm which has its time
complexity exponentially dependent on the level of predicate nesting.
Fortunately, this problem can be avoided by introducing vector operations and
mincontext rules [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. When each predicate is processed for whole vector of contexts
instead of traditional recursive evaluation, the exponential phantom menace is
disposed of and evaluation time remains polynomial in both size of the document
and query nesting depth.
      </p>
      <p>The paper is organized as follows. Section 2 revises related work. Section 3
describes XML representation and indexing techniques. Section 4 inspects
proposed algorithms in detail. Section 5 suggests possible parallelism exploitation.
Section 6 presents experimental results and Section 7 concludes.
2</p>
      <p>
        Related Work
Efficient implementation of XPath has been addressed in several papers. One
of the most significant works was presented by Gottlob et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It proposes
algorithms for processing XPath queries in polynomial time. XPath also relies
on efficient evaluation of location axes which was addressed in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The topic of
XML processing parallelization is relatively new. First studies focused on XML
parsing [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and processing in a general way. Successful approach is to employ
work stealing [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in order to reduce load balancing problems.
      </p>
      <p>
        To our best knowledge, the only work directly related to parallel processing
of XPath queries was presented by IBM [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The paper proposes data, query and
hybrid partition strategies, how to divide workload among multiple threads. The
partitioned queries are then processed by Xalan. This technique can be improved
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by introducing metrics for processing costs. These metrics determine the best
points for partitioning.
      </p>
      <p>
        We take different approach. Our strategy is to design algorithms which will
utilize parallel templates and ideas presented by Intel TBB [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. These templates
implement solutions for common patterns in parallelism such as parallel-for or
parallel-scan algorithms.
      </p>
      <p>XML Index
XPath processor expects XML document to be loaded in main memory in
abstract tree structure (for example DOM). This structure provides basic
operations for traversing the document such as finding children, parent, or attributes
of given node. Even though these operations are sufficient to traverse the whole
tree and process XPath query, they are quite ineffective for complex operations.</p>
      <p>When processing XPath query, we are often required to retrieve all nodes
with specific name or of specific type that are in relation (defined by used axis)
with initial node-set. This task can be accelerated significantly when proper
index is built. We have used the following two indices in our implementation:
the Left-right-depth index and the element index.</p>
      <p>
        The Left-right-depth index (also denoted LRD) is often used by many XML
algorithms and data structures [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref16">16</xref>
        ][
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] to determine the relation of two nodes
in constant time. The index is based on tagging nodes of DOM structure with
three integers - left, right, and depth value. We have tagged only element nodes
and attributes as other nodes are irrelevant for XPath evaluation. We denote l(v),
r(v), and d(v) the left-value, right-value, and depth of the node v respectively.
      </p>
      <p>The element index is formed by hash table where element names are keys.
Each name holds a list of references to the DOM structure pointing to all
elements with this name as depicted on Figure 1.</p>
      <p>a
b
x</p>
      <p>Links in the reference list are ordered by the position (i.e. the left-value) of
the nodes to which they point to. This ordering is very useful, as shown later.
3.1</p>
      <p>XPath Axes Implementation
Let us have a look on each axis and how it can benefit from these indices.
In following descriptions, we expect to have context node c and location step
axis::x where the axis will vary. In every case, we extract the reference list
from element index using x as key, so we have list of all x elements at our disposal.</p>
      <p>Implementation of the descendant axis is quite simple. The index reference
list is ordered by left-values of the nodes. Furthermore, we know that every
descendant of context element c has its left-value in range (l(c), r(c)i. Such nodes
are definitely stored in continuous subrange of the reference list. A binary-search
algorithm can be applied to find both beginning and end of the subrange.</p>
      <p>The following axis uses the same approach. If we have an element c, all
following nodes have their left-values greater than r(c). We simply find the first
one and then take all nodes from its position to the end of the list.</p>
      <p>The preceding axis is a little more complicated than the previous two. In
order to determine whether node u precedes context c, we need to compare r(u)
with l(c). Unfortunately, the reference list is not ordered by right-values. As the
right-value is always greater or equal left-value (of the same node), we will use
all nodes with left-value lesser than l(c) as candidates. These candidates contain
two types of nodes – predecessors and ancestors of c. We can test right-value of
each candidate and filter out the predecessors.</p>
      <p>We might create second element index where references will be ordered by
right-values and implement evaluation of the preceding axis analogically to the
following axis. However, we did not use it in our implementation.</p>
      <p>The ancestor axis cannot be effectively accelerated by this index. On the
other hand, we expect that XML documents are shallow, therefore the best way
to find all ancestors of context node is to follow parental links in XML tree
structure and filter nodes one by one.</p>
      <p>Techniques described for descendant, following, and preceding axes could be
extended also to child, following-sibling, and preceding-sibling relations. In order
to do that, an element index must be also built for every node with children. Such
index contains only children elements of associated node. The implementation
would be analogical.
4</p>
      <p>Algorithms
Before we describe optimization and implementation details of our XPath
processor, we will revise recursive algorithm for location path evaluation. A location
path consists of sequence of location steps. Each step takes node-set produced
by previous step (called initial node-set ) and generates another node-set which
is used by successive step. First step uses the context node as a singleton and
last step produces node-set which is also the result of the whole location path.</p>
      <p>The recursive algorithm evaluates location steps as follows. First, an
intermediate node-set Si is generated for every node vi in the initial set by application
of location axis on vi and node test to filter out nodes of specific type or name1.
Each set Si is filtered by predicates and Si0 (Si0 ⊆ Si) is produced. Finally, all Si0
sets are united and the union is yielded as the result of the location step.</p>
      <p>When a node-set is filtered by a predicate, the predicate is executed
recursively for every node in the set using the node, its position and size of the set
as context values. Result of the predicate is converted into Boolean value, and
if true, the tested node is included in the result. A location step may have more
than one predicate. In that case, predicates are applied one by one, each
producing another node-set which is handed over to successive predicate.
1 In our case, we focus only on name filters.</p>
      <p>As mentioned before, the recursive algorithm is not very efficient. Now, we
describe the most important optimizations used in our implementation.
4.1</p>
      <p>Minimal Context Optimizations
First, we define context flags for each XPath expression (and subexpression):
• flag n for context node,
• flag p for position,
• and flag s for size.</p>
      <p>These flags indicate, which parts of evaluation context are required by the
expression. Expression E is tagged with set of flags denoted cf(E) ⊆ {n, p, s}.
When a flag is present in the set cf(E), corresponding part of context is required
for evaluation of E. Formally, we define cf as shown in Table 1.</p>
      <p>expression E
arithmetic operators (+, −, ∗, div, mod)
comparisons (&lt;, ≤, &gt;, ≥, =, 6=)
logical operators (and, or)
absolute location path
relative location path
location step</p>
      <p>Even though there are eight (23) types of expressions depending on which of
n, p, s flags they have, we will streamline the situation by recognizing only three
context classes :
• Context-free class represents expressions with empty cf set. These expressions
are either constant (like literals) or contain absolute location paths.
• Expressions in context-node class requires only context node, thus their
cf(E) = {n}. Typical representants are relative location paths and location
steps.</p>
      <p>• Finally, we place all remaining expressions in full-context class.</p>
      <p>We call the context-free class the weakest, because empty context flag set
is always a subset of any other class. Analogically, the full-context class will
be the strongest. Expression with non-empty cf ⊆ {p, s} is also considered to be
stronger than context-node expression, as the context position and size is always
generated together with nodes, hence node flag is in fact implicitly included.</p>
      <p>When evaluating location step, many nodes can be filtered by predicates
multiple times if they appear in more than one intermediate (Si) set. This may
be unavoidable since the node can be at different positions or the Si sets can have
different sizes, thus they create a different context for the predicate evaluation.
However, if there are no predicates which require position or size context value,
the filtering procedure can be optimized.</p>
      <p>Formally, if all predicates of a location step are from context-node or
contextfree class, we may unite all Si sets before they are filtered by predicates. Then
the union is filtered instead, thus predicates are executed for each node at most
once. Furthermore, if there are some predicates from context-free class, we need
not evaluate them for every node, as it is sufficient to execute them just once.
If they resolve true, they can be removed from predicate list since they do not
restrict the result set in any way. Otherwise (if they turn false), we need not
resolve the location step nor the remaining part of the location path any more
as it will never yield any nodes.
4.2</p>
      <p>Vectorization of Axes
Previous optimization can be used to improve processing of axes in location steps.
When axis is processed, single node is taken as input and a set of all conforming
nodes is returned. If there are no full-context predicates in the location step, the
intermediate node-sets (Si) are united right after the axis part of the step (with
the name filter) is resolved. Knowing this, we can optimize axis processing so
it will not retrieve intermediate set for each node, but rather use entire initial
node-set as input and yield the union of node-sets Si. In following algorithms
we expect that a node-set is represented by an array of nodes where nodes are
ordered by their left-values (i.e. in the document order).</p>
      <p>Descendant axis utilizes following observations: descendants of a single node
are stored in continuous subrange of element index (as described in Section 3) and
descendant sets of two following nodes have empty intersection. Furthermore, if
we have initial nodes u and v where v is descendant of u, descendants of v are
also descendants of u, therefore we need to process only node u when retrieving
descendants. The algorithm will work as follows.</p>
      <p>for i = 1 to |S| do
if i = 1 or not (S[i] descendant of last ) then
last ← S[i]
append descendants of S[i] to the result
end if
end for</p>
      <p>Following and preceding axes can be accelerated greatly by this optimization.
First, we have to find initial node with the lowest right-value (for the following
axis) or the greatest left-value (for the preceding axis). Then we use this node as
an input for simple version of axis resolution algorithm described in Section 3.
Finding the initial node is trivial in case of the preceding axis, since the node
with greatest left-value is always at the end of the set. In case of the following
axis, situation is slightly more complicated. Node with the smallest right-value
is either first node in the initial set or its descendant:
i ← 1
while i &lt; |S| and r(S[i]) &gt; r(S[i + 1]) do</p>
      <p>i ← i + 1
end while
return following nodes of S[i]</p>
      <p>
        Almost every other axis may be accelerated as well using similar approach.
We omit the description of the algorithms for the sake of scope. Complete
overview can be found in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
4.3
      </p>
      <p>Predicate Caches
Previous optimizations reduce necessary amount of work in many situations.
However, the recursive algorithm still suffers from exponential time complexity.
We shall avoid this problem using caches for predicate expressions. Problematic
expressions (which are likely to be evaluated multiple times with the same
context) will be covered by caches. If a cache covers an expression, all results of
this expression are stored in cache (when first computed). When the expression
is evaluated, the result is first looked up in the cache, thus each expression is
executed for each context at most once.</p>
      <p>The cache will reflect context class of the covered expression (e.g. expression
from context-node class must be covered by context-node cache). The class of
the cache defines which part of the context is used for indexing. Therefore,
context-free cache stores single value, context-node cache is indexed by nodes
and context-full cache utilizes context node, position and size. Full-context cache
may consume up to O(N 4) memory (where N is size of the document); however,
we may apply limits for cache size and make it forget records. Forgetful cache will
not save us from exponential time complexity, but it still improves performance
significantly.</p>
      <p>Following rules are applied for caches:
• Literal values must not be covered by a cache directly.
• When an expression is covered by a cache, all its subexpressions from the
same or stronger class are also considered to be covered. This rule is transitive
and does not apply to predicates (a predicate cannot be covered transitively).
All expressions from weaker classes must be covered by another cache of their
own.
• If location step has a full-context predicate, all predicates of that step must
be covered by caches. Otherwise, no predicate of that step needs to be
covered, but these rules are applied recursively on nested predicates.
Furthermore, when a predicate must be covered by cache, a cache with Boolean
values is always used, considering the predicates are implicitly wrapped by
boolean() function.</p>
      <p>
        The cache is implemented by concurrent hash-map from TBB library [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
This hash-map works as hash table with concurrent access with fine grade
locking. Locking is implemented by atomic operations and each position in the table
can be locked individually. Therefore, collisions on locks will be rare.
Unfortunately, the results indicate that locking on caches has significant impact on
efficiency.
5
      </p>
      <p>
        Parallelization
We have described data structures and algorithms which should allow us to
exploit parallelism using standard templates. These templates (parallel-for,
parallelreduce, parallel-scan, etc.) and data structures (concurrent vector, concurrent
hash-map, etc.) are described in literature [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We omit their description for
the sake of scope.
      </p>
      <p>
        The implementation presents many opportunities to apply these templates
and concurrent data structures. We describe only the most interesting ones. More
detailed description is provided in related work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
5.1
      </p>
      <p>Predicate Filtering
We expect to have list of predicates Pi (i = 1 . . . π) and a node-set S being
filtered. The filtering is processed as follows.</p>
      <p>• First, all context-free predicates are resolved. Predicates resolved as true are
immediately removed. When a single predicate is resolved as false, the whole
filtering operation yields empty node-set since every node is exterminated
by such predicate.
• Remaining predicates are grouped into segments. Full-context predicates
must be positioned as first predicate in segment and there can be only one
full-context predicate in each segment. Other predicates (from the
contextnode class) are grouped together to form as large segments as possible. For
example, predicates2 nnf nf nn are grouped as (nn)(f n)(f nn).
• Node-set S is filtered by first segment producing S0, then S0 is filtered by
second segment and so on. Last segment produces the result of whole filtering
operation.
2 Where n denotes context-node predicate and f denotes full-context predicate.</p>
      <p>The node-set is filtered by segment in two phases. In the first phase, all nodes
from initial set are filtered by all predicates in the segment (the predicate order
is respected). When a predicate is resolved as false, corresponding node in initial
set is replaced by null value. In the second phase, non-null values are copied
into filtered set.</p>
      <p>The first phase utilizes parallel-for pattern. All nodes from the set are
processed concurrently. Each task affects only its nodes, so no explicit
synchronization is required. The second phase copies only valid nodes and must respect the
ordering of the nodes. We use parallel-scan, where first pass (the pre-scan) only
calculates new offsets for nodes (after nulls are removed) and the second pass
(final scan) copies the nodes.
5.2</p>
    </sec>
    <sec id="sec-2">
      <title>Node-sets Merging</title>
      <p>
        Some situations of XPath processing require merging node-sets (namely location
steps with full-context predicates and the union operator). In both situations, a
concurrent hash-map is used. All nodes are inserted into the hash-map (which
ensures uniqueness of each node), then retrieved into an array and sorted. All
three steps are performed in parallel fashion. Nodes that are inserted into
hashmap are processed by parallel-for. The parallel-scan copies nodes into an array
where first pass computes offsets and second pass copies the nodes. Finally, the
array is sorted by parallel sort [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>In case of union operator, we employ another parallel-for which processes
operands of the union, so we have top level loop which operates over operands
(and resulting node-sets) and nested loops which insert nodes into hash-map.
5.3</p>
    </sec>
    <sec id="sec-3">
      <title>Processing Axes</title>
      <p>There are hardly any opportunities for parallelism when an axis is processed
for a single node. The only thing which can be done concurrently is copying
sequence of nodes (using parallel-for). However, even when a location step is
processed without optimizations, we may still process each node from initial set
(i.e. resolve the axis and filter it by predicates) concurrently.</p>
      <p>Vectorized processing of axes presents more opportunities for parallelism. In
many cases (e.g. descendant axis, child axis, ...), node-sets produced by initial
nodes are disjoint. Therefore, we need not use concurrent hash-map (which
requires locking) to merge them. Instead, we use parallel-scan to assemble merged
sets in correct order. The parallel scan is used as usual – first pass computes
offsets for nodes and the second pass copies the nodes.
6</p>
      <p>Practical Tests
Practical experiments were designed for two things. First, we would like to
determine scalability of the solution and to measure speedup on multiple cores.
Second, we compare our solution with existing libraries for XPath processing.
The comparison is relevant only for our implementation restricted to single core
as both libxml and Xalan are single-threaded.</p>
      <p>Methodology, Testing Data, Hardware Specifications
Performed tests will focus solely on execution speed. We will measure time
required to evaluate a query using system real-time clock. Other operations such
as loading XML data or processing results will not concern us. Real-time clock
will better reflect the practical characteristics of the implementation and cover
both application and kernel time (thus include context switching, thread
synchronization, etc.).</p>
      <p>Each query is executed 10×. Raw average is computed as arithmetic average
of all 10 times. Then, each measured time which is greater than raw average
multiplied by 1.25 (i.e. with 25% tolerance) is excluded. Final time is computed
as arithmetic average of remaining values.</p>
      <p>It is still possible that all ten measured values are distorted due to some long
lasting activity running on the system at the same time as the tests. Therefore,
each test-set was repeated three times in different daytime. These three results
were closely compared and if one of the values was obviously tainted, the test
was repeated. A result is considered tainted if it deviates from the other two by
more than 25%.</p>
      <p>
        Document used for testing was generated by xmlgen, a XML document
generator developed under XMark [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] project. This document simulates an auction
website (a real e-commerce application) and contains over 3 million elements.
      </p>
      <p>
        Queries evaluated on the document are taken from XPathMark performance
tests [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. These queries are especially designed to determine speed of tested
XPath implementation. Queries that were not compatible with our subset of
XPath were omitted. Unfortunately, some of the results are not presented here
due to lack of space. Complete data and results may be found in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>All test were performed on Dell M905 server with four six-core AMD Opterons
8431 (i.e. 24 cores) clocked at 2.4 GHz. Server was equipped with 96 GB of RAM
organized in 4-node NUMA. A RedHat Enterprise Linux (version 5.4) was used
as operating system.
6.2</p>
      <p>Results
Table 2 summarizes times (in ms) measured for our implementation, libxml and
Xalan libraries. Columns Ct show times required by our implementation on t
threads. Symbol ∞ represents times that were completely out of scale3.</p>
      <p>The results suggest that our implementation is highly scalable. The best
speedup was observed at query B6 which runs 18.8× faster on 24 cores than on
single core. Unfortunately, some queries (A2, A3, C2, D2, and E4) have shown
poor speedup or even slow-down. This is mostly caused by the structure of
these queries. When intermediate results between location steps are too small,
3 Times greater than 2 millions ms.
A2
A3
B5
B6
B9
the query cannot benefit from data parallelism very much. Almost all of these
queries (except for C2 and E4) are resolved very quickly on single core, therefore
we need not parallelize them at all. Queries A2, A3, and D2 are even slower
on two cores than on single core. This is caused by locking overhead of caches.
Finally, we have observed some fluctuations between times on 8, 16, and 24 cores
which are most likely caused by effects of memory access on NUMA systems and
thread distribution among real processors.</p>
      <p>
        In comparison with known XPath processors [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], our implementation
outperforms them in almost every test. The most significant difference is in
queries B5, B6, B9, B10, and E5 which all contain following or preceding axis.
However, queries C1 and C2 are resolved slightly slower in our implementation,
which is most likely caused by better optimizations of comparisons in libxml and
Xalan.
7
      </p>
      <p>Conclusions
This paper presents algorithms and indexing data structures for XPath
processing which can be easily adapted by standard parallelization templates. Our
experimental results demonstrate that proposed algorithms scale very well for
long lasting queries. These algorithms are beneficial also for sequential
processing as they outperform libxml and Xalan libraries in almost every test even on
single core.</p>
      <p>Our implementation uses some data structures which require locking. These
structures are most likely responsible for poor speedup of some tested queries.
Furthermore, our implementation does not analyze executed query nor data to
determine whether particular part of the query should be parallelized or not.
We use simple greedy method to partition workload and hope for the best.
We will focus on removing locking completely and try to design metrics for
query evaluation cost in our future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Krulsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>ˇ J.</given-names>
            <surname>Yaghob</surname>
          </string-name>
          .
          <source>Algorithms for Parallel Searching in XML Datasets</source>
          ,
          <year>2009</year>
          . http://www.ksi.mff.cuni.cz/~krulis/?page=study/master_thesis
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          .
          <article-title>Efficient algorithms for processing XPath queries</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>30</volume>
          (
          <issue>2</issue>
          ):
          <fpage>444491</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gannon</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Parallel XML Processing by Work Stealing SOCP '07: Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bordawekar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Shmueli</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <article-title>Parallelization of XPath queries using multi-core processors: challenges and experiences</article-title>
          ,
          <source>EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bordawekar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kementsietsidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kok</surname>
            ,
            <given-names>B.W.L. To</given-names>
          </string-name>
          <article-title>Parallelize or Not to Parallelize: XPath Queries on Multi-core Systems</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>DeRose</surname>
          </string-name>
          , et al.
          <source>XML Path Language (XPath) Version 1.0. W3C Recommendation</source>
          ,
          <volume>16</volume>
          :
          <year>1999</year>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>XML</given-names>
            <surname>Document</surname>
          </string-name>
          <article-title>Object Model</article-title>
          . http://www.w3.org/DOM/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <article-title>XMark - benchmark for various XML technologies</article-title>
          . http://www.xmlbenchmark.org/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. XPathMark - benchmark
          <source>for XPath 1</source>
          .0. http://sole.dimi.uniud.it/~massimo.franceschet/xpathmark/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <article-title>Libxml2 - The XML Library for GNOME</article-title>
          . http://xmlsoft.org/
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Xalan</surname>
          </string-name>
          for C++
          <article-title>- XSLT and XPath library developed by Appache</article-title>
          . http://xml.apache.org/xalan-c/
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>J.</given-names>
            <surname>Reinders</surname>
          </string-name>
          .
          <article-title>Intel threading building blocks</article-title>
          . OReilly &amp; Associates, Inc., Sebastopol, CA, USA,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <article-title>Intel Threading Building Blocks - an open source library for parallel programming</article-title>
          . http://www.threadingbuildingblocks.org/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          and Zhang, Y. and
          <string-name>
            <surname>Chiu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>A static load-balancing scheme for parallel xml parsing on multicore cpus</article-title>
          .
          <source>IEEE International Symposium on Cluster Computing and the Grid</source>
          , Rio de Janeiro,
          <year>2007</year>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wu</surname>
            , Yu, Qi Zhang, Zhiqiang Yu and
            <given-names>Jianhui</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>“A Hybrid Parallel Processing for XML Parsing and Schema Validation</article-title>
          .” Presented at Balisage:
          <source>The Markup Conference</source>
          <year>2008</year>
          , Montral, Canada,
          <source>August 12 - 15</source>
          ,
          <year>2008</year>
          .
          <source>In Proceedings of Balisage: The Markup Conference 2008. Balisage Series on Markup Technologies</source>
          , vol.
          <volume>1</volume>
          (
          <year>2008</year>
          ). doi:
          <volume>10</volume>
          .4242/BalisageVol1.Wu01.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>T.</given-names>
            <surname>Grust</surname>
          </string-name>
          .
          <article-title>Accelerating XPath location steps</article-title>
          . pages
          <fpage>109120</fpage>
          . ACM, New York, NY, USA,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>M. Kratky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pokorny</surname>
            , and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Snasel</surname>
          </string-name>
          .
          <article-title>Implementation of XPath axes in the multi-dimensional approach to indexing XML data. Current Trends in Database Technology-Edbt 2004 Workshops: EDBT 2004 Workshops, PhD, DataX, PIM, P2P&amp;DB, and</article-title>
          <string-name>
            <surname>Clustweb</surname>
          </string-name>
          , Heraklion, Crete, Greece, March
          <volume>14</volume>
          -18,
          <year>2004</year>
          : Revised Selected Papers, page 219. Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. I. Mly´nkova´,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toman</surname>
          </string-name>
          , and J. Pokorny´.
          <article-title>Statistical Analysis of Real XML Data Collections</article-title>
          .
          <source>In COMAD06: Proc. of the 13th Int. Conf. on Management of Data</source>
          , pages
          <year>2031</year>
          , New Delhi, India,
          <year>2006</year>
          . Tata
          <string-name>
            <surname>McGraw-Hill Publishing</surname>
          </string-name>
          Company Limited.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>