<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AA Mmoodduullaarr XXQQuueerryy iImmpplleemmeennttaattiioonn</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Vrany´</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Zˇa´k Jan Vrany´</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Zˇa´k</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, FEE, Czech Technical University Department of Comp</institution>
        </aff>
      </contrib-group>
      <fpage>47</fpage>
      <lpage>54</lpage>
      <abstract>
        <p>CellStore/XQuery is modular a implementation of the XML Query Language interpreter. Primary goal of the implementation is to provide open experimental platform for implementing and testing new query optimization techniques for querying XML data. This paper describes architecture and design features of the implementation as well as possible approaches to extending the interpreter.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>XQuery interpreter architecture</title>
      <p>Because our implementation is primarily intended as an experimental
framework for testing a new approaches in XML data querying, it consists of several,
well-defined and well-separated components. At figure 1 there is an architecture
overview. In our implementation, each component is represented by one class
and all other components communicate just through calling methods of this
class. In many cases, implementation of component’s functionality cannot be
done within one class. In such cases, there is only one class which act as a facade
for the rest. These facades together with true polymorphism makes changing of
the implementations very easy.</p>
      <p>When a query is to be executed, it’s sent to a parser, which converts the
query from it’s textual representation to internal form. Then it the query in the
internal form returned to the interpreter. The interpreter will then process the
query. During query processing, various data sources might (or might not) be
accessed (using the fn:doc or fn:collection functions). In such case, a
document provider component is asked for particular document. For data accessing
and navigating through a document, a data access component is used. There are
one data access component instance per document. After processing the query,
result is left in interpreter’s data context.</p>
      <p>Now we will describe each component in little bit more detail.</p>
      <sec id="sec-2-1">
        <title>Parsing XML query</title>
        <p>
          The XQuery parser is implemented using SmaCC tool [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], which is able to
generate LR(1) or LALR(1) parser from a grammar in the EBNF form. The grammar
was taken from W3C specification. Because XQuery syntax is pretty
complicated, some of its syntactic constructions cannot be easily handled by
SmaCCgenerated parser. This leads to some limitations of the parser which will be
described in little bit more detail in the section 3.2.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Internal query representation</title>
        <p>A parsed query is internally represented as a parse tree. For example the parse
tree for the query at figure 2 is depicted at figure 3.</p>
        <p>let $ a := 1
return $ a + 1</p>
        <p>XQuery specifications defines so-called XQuery Core, which is a subset of
XQuery. Every XQuery expression could be converted to XQuery Core
expression, which has same semantics as the original query. Our implementation doesn’t
perform such conversion. A query is represented in the same form as it was
written by user.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Accessing data sources</title>
        <p>XQuery allows user to access more than one XML document per query using
fn:doc and fn:collection built-in functions. These two functions loads
document with given uri at current context1, so the user can access and query data
in th document. These functions could be used as many times as user wish.</p>
        <p>Whenever a document is to be loaded into the current context, a document
provider is asked to return the document in form in which it could be used by
interpreter for further processing.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Accessing and navigating through XML data</title>
        <p>Once a XML document is loaded in interpreter’s context, data in the document
are (not necessarily) accessed. Because we want to make our implementation as
general as possible, we should have the possibility of different data source type
and data models in mind. For example, documents could be stored as DOM
nodes, as DOM-like nodes, as XML infoset or in data structures as they are
used by various XPath accelerators.</p>
        <p>It’s important for any practical XQuery interpreter to access data stored in
various data models, because user could access data in a XML database together
with data stored on disk in one query.</p>
        <p>To solve this problem, we developed a concept of document adaptor. It
provides methods for data accessing (xpathLocalNameOf:, xpathValueOf: for
example) and for navigating through the document structure in XPath axis manner
(xpathChildOf:, xpathParentOf: for example). Those methods get so-called
nodeid, which identifies a node within a document, and return a set of node-ids or
value represented as a string.</p>
        <p>Full list of methods provided by the adaptor is in table 1. It’s obvious
that presented set of methods is not the minimal one. For example, method
xpathAncestorOf: could be implemented using xpathParentOf:. It is also
obvious, that such implementations might not be the most efficient ones for given
storage model. Our implementation defines several so-called primitive methods
that must be supported by the adaptor and every other method could be
assembled as applications of primitive ones. However, user can override any of such
methods and use the most efficient algorithm for given storage model.</p>
        <p>
          Node-id could be any kind of object, it has no meaning for the rest of the
system. It’s used just for identifying XML nodes. For example in our
implementation node-ids for DOM document tree adaptor are DOM nodes itself, whereas
1 In fact, these function load just the root node. The rest of the document is loaded
lazily.
xpathDocument
xpathIsAttribute: node-id
xpathIsDocument: node-id
xpathIsElement: node-id
xpathIsText: node-id
xpathLocalNameOf: node-id
xpathNameOf: node-id
xpathNameSpaceOf: node-id
xpathValueOf: node-id
xpathAncestorOf: node-id
xpathAncestorOrSelfOf: node-id
xpathAttributeOf: node-id
xpathChildOf: node-id
xpathDescendantOf: node-id
xpathDescendantOrSelfOf: node-id
xpathFollowingOf: node-id
xpathFollowingSiblingOf: node-id
xpathPrecedingOf: node-id
xpathPrecedingSiblingOf: node-id
xpathParentOf: node-id
The XQuery interpreter is implemented as interpreter design pattern [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. It
traverses the parse tree and evaluates nodes. In order to a evaluate node, all its
subnodes have to be evaluated. A node is evaluated within a context (which
stores a data set, variable bindings and some auxiliary informations like focus)
and as result of evaluation new context with new data is created and stored
in the interpreter. There is one method per one parse tree node type in the
Interpreter class. This has significant impact on implementation flexibility –
see section 4.
        </p>
        <p>An outline of method which interprets additive node at figure 3 is at figure 4.
In fact, the real method is little bit more complicated, because XQuery defines
process called atomization, which should be applied to every operand before
addition is performed. Context handling is also a little bit tricky and should be
cleaned-up.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Limitations</title>
      <p>Our implementation has also some limitations, which will be described in this
section.
visitAdditiveNode( additiveNode ) {
var myContext, leftNodeContext, rightNodeContext;
/* we have to save current context because it’s overriden by sub-node */
myContext = this.context;
/* evaluate left node */
this.evaluate( additiveNode.getLeftNode() );
/* save its context */
leftNodeContext = this.context;
/* restore original context before right node’s evaluation */
this.context = myContext;
/* evaluate right node */
this.evaluate( additiveNode.getRightNode() );
/* save its context */
rightNodeContext = this.context;
/* store result context */
this.context = new Context( leftNodeContext.getData()</p>
      <p>
        + rightNodeContext.getData() )
}
XQuery language contains set of language constructs for typing and validating
expressions against W3C XML Schema [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Our implementation does not support
types. Unsupported constructs are not included in the grammar and thus using
such constructs will result in a parse error.
      </p>
      <p>Although no typing is supported in the query itself, our implementation
internally uses six data types:
– node
– xs:boolean
– xs:number
– xs:string
– xs:NCName
– xs:QName</p>
      <p>This is because XQuery contains constructs for arithmetic, conditional
(ifthen-else), logical and comparison expressions, XPath expressions with
predicates and function calls.
3.2</p>
      <sec id="sec-3-1">
        <title>Parser limitations</title>
        <p>XML query language syntax contains some problematic parts, which cannot be
handled by SmaCC-generated parsers.</p>
        <p>First sort of problems is related to fact, that keywords (if, div, return,
etc.) are not reserved words. That means, that whether some token is treated
element element {}
as keyword token or NCName token depends on parsing context. Consider the
query at figure 5.</p>
        <p>Although it’s perfectly legal according to the XQuery specification, our parser
(generated by SmaCC) will raise a parse error because after “element” keyword
(first “element”) an QName is expected.</p>
        <p>Another sort of problems is related to direct constructors. Direct
constructors allows user to construct new nodes during the query processing using
standard XML syntax. In fact, this means that every well-formed XML document
is also valid XQuery expression and thus XQuery parser should parse any XML
document. XML grammar itself is too complex to be handled by SmaCC. Our
implementation currently supports direct constructors in a very limited way.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Extending the intepreter</title>
      <p>In this section we will outline how to extend the XQueryinterpreter and add and
test new optimization techniques. Basically, there are two levels at which our
implementation could be extended and/or improved.</p>
      <p>First one is the level of data access layer. To add a new type of data source,
a new document adaptor has to be implemented. A complete set of tests is
provided, so it’s easy to find out whether new adaptor fulfils interpreters
requirements. Because document adaptor implements axis-based navigation, any
method that improves axis accesses can be used.</p>
      <p>However, there are also several approaches that speeds-up not only per-axis
access, but whole XPath expressions. Such techniques can be easily implemented
and tested as well. Because whole query is represented as a parse tree, embedded
XPath expression is represented as one parse tree node (whichh contains nodes
for every part of expression) and because there is one method for each parse tree
node type, one can simply override method responsible for evaluating XPath
expression node and use any kind of optimization method. One can use either
structural indices,value-based indices or both, if applicable. It’s also possible to
speed-up just some parts of XQuery expressions (for example, just parts without
predicates). In fact, one can optimize any XQuery construct by this way.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <p>A prototype of the XQuery interpreter has been implemented. This
implementation is currently very limited, but it supports all basic constructs: FLWOR
expressions, full XPath expressions with predicates, conditionals, arithmetic,
logical, comparison and quantified expressions, XML nodes construction and
function calls (both built-in and user-defined ones). It can be seen as a platform
for further experiments with XML query optimization techniques using both
indices (value-based and structural ones) and special techniques like structural
joins.</p>
      <p>Further development will be focused on the following topics:
– improvement of the XQuery parser
– implementing full set of built-in functions
– simplifying the context machinery in the interpreter
– improving document adaptor for CellStore/XML database by using some</p>
      <p>
        XML-specific optimization techniques
– creating interface to the XQuery Test Suite [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Boag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chamberlin</surname>
          </string-name>
          , M. Fern´andez,
          <string-name>
            <given-names>D.</given-names>
            <surname>Florescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Robie</surname>
          </string-name>
          , and J.
          <source>Sim´eon. XQuery 1</source>
          .0:
          <string-name>
            <surname>An</surname>
            <given-names>XML</given-names>
          </string-name>
          <string-name>
            <surname>Query</surname>
          </string-name>
          <article-title>Language</article-title>
          .
          <source>W3C, 1st edition</source>
          ,
          <year>2006</year>
          . http://www.w3.org/TR/xquery/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>DeRose. XML Path</surname>
          </string-name>
          <article-title>Language (XPath) Version 1</article-title>
          .0.
          <issue>W3C</issue>
          ,
          <year>1999</year>
          . http://www.w3.org/TR/xpath.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>K. B. Sherman R. Alpert</surname>
            and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Woolf</surname>
          </string-name>
          .
          <article-title>The Design Patterns Smalltalk Companion</article-title>
          .
          <source>Addison Wesley, 1st edition</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>C. M. Sperberg-McQueen</surname>
            and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
          </string-name>
          .
          <source>XML Schema 1.1. W3C</source>
          ,
          <year>2006</year>
          . http://www.w3.org/XML/Schema.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>5. The Refactory Inc. SmaCC compiler-compiler. The Refactory Inc</article-title>
          .,
          <source>1st edition</source>
          ,
          <year>2000</year>
          . http://www.refactory.com/Software/SmaCC.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Vrany</surname>
          </string-name>
          <article-title>´. Cellstore - the vision of pure object database</article-title>
          .
          <source>In Processings of DATESO</source>
          <year>2006</year>
          , pages
          <fpage>32</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>W3C XML Query Working Group. XML Query</surname>
          </string-name>
          <article-title>Test Suite</article-title>
          .
          <source>W3C, 1st edition</source>
          ,
          <year>2006</year>
          . http://www.w3.org/XML/Query/test-suite/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>