<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Interpreting XPath by Iterative Pattern Matching with Paisley</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Baltasar Trancon y Widemann</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Lepper</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>&lt;semantics/&gt; GmbH</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ilmenau University of Technology</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Paisley architecture is a light-weight EDSL for non-deterministic pattern matching. It automates the querying of arbitrary objectoriented data models in a general-purpose programming language, using API, libraries and simple programming patterns in a portable and noninvasive way. The core of Paisley has been applied to real-world applications. Here we discuss the extension of Paisley by pattern iteration, which adds a Kleene algebra of pattern function composition to the unrestricted use of the imperative host language, thus forming a hybrid object-oriented{functional{logic framework. We subject it to a classical practical problem and established benchmarks: the node-set fragment of the XPath language for querying W3C XML document object models.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Paisley embedded domain-speci c language (EDSL) and library adds the
more declarative style of pattern matching to the object-oriented context of
Java programming [
        <xref ref-type="bibr" rid="ref10 ref9">9,10</xref>
        ]. Paisley o ers a combination of features that is, by
our knowledge, not o ered by any other existing pattern matching solution for a
main-stream language: it is strictly typed, fully compositional, non-invasive and
supports non-determinism, full rei cation and persistence.
      </p>
      <p>
        The non-deterministic aspects of Paisley have been demonstrated to provide
a fairly powerful basis for embedded logic programming in the object-oriented
host environment. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] we have discussed how to solve cryptarithmetic puzzles
with the backtracking facilities of Paisley, and how to obtain non-trivial e cient
search plans by object-oriented meta-programming with Paisley pattern objects.
      </p>
      <p>
        Here we describe and evaluate recent extensions to Paisley that greatly
increase its capabilities as an embedded functional-logic programming system,
supporting more complex control and data ow than the combinatorial
generate&amp;test tactics of [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In particular, the novel contributions are:
1. in the core layer, a calculus of functions on patterns and its Kleene algebra,
thus providing regular expressions of nested patterns;
2. in the application layer, a library extension that demonstrates their use by
matching W3C XML Document Object Models with XPath [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] expressions;
3. a practical case study that evaluates the approach in comparison with
standard on-board facilities of the host language Java.
void collect (int variant, Node node, CollectionhElementi results,
      </p>
      <p>boolean sideways) f
if (node instanceof Element) f</p>
      <p>Element elem = (Element)node;
if (elem.getTagName( ).equals(variant == 1 ? ”a” : ”p”) ) f
if (variant &gt; 1) f</p>
      <p>CollectionhElementi sub = variant == 2 ? results : new ArrayListh i( );
collect(1, elem, sub, false);
g
if (variant == 1 &amp;&amp; !elem.getAttribute(”href”).equals(””) jj</p>
      <p>variant == 3 &amp;&amp; !sub.isEmpty( ) )
results.add(elem);
g</p>
      <p>g
g
if (node.getFirstChild( ) != null)</p>
      <p>collect(variant, node.getFirstChild( ), results, true);
if (sideways &amp;&amp; node.getNextSibling( ) != null)
collect(variant, node.getNextSibling( ), results, true);
// solution
// depth
// breadth
The basic motivation for a pattern matching EDSL is that patterns as language
constructs make data-gathering code more declarative, and thus more readable,
more writable and more maintainable. If done properly, this additional
expressive power can be used to factor code compositionally in ways that are not
straightforwardly possible in a conventional object-oriented approach, where the
logic and data ow of a complex query often appear as cross-cutting concerns.</p>
      <p>As an example that highlights the issues of logical compositionality, consider
a family of three related XPath expressions, depicted in Fig. 1, that select nodes
from XHTML documents. The rst selects, from the context node and its
descendants, all &lt;a&gt; elements (hyperlink anchors) that have a href (target) attribute
speci ed. The second selects only those &lt;a&gt; elements with href attributes that
are nested within &lt;p&gt; elements (paragraphs). The third selects all &lt;p&gt; elements
that have some &lt;a&gt; element with href attribute nested within.</p>
      <p>The task is to implement the speci ed search procedures, as object-oriented
queries of the standard W3C X(HT)ML Document Object Model. Evidently,
one wishes to implement most of the operational code generically, with reuse
and minimal adaptation for each of the three variants. Additionally, the second
and third variant should be able to use the rst recursively. To emphasize the
role of reuse in this example, Fig. 1 gives a uni ed implementation, where the
identi er variant is grayed out to emphasize that all its occurrences serve only the
static choice between the variants. The common algorithm is to traverse nodes
by depth- rst search, and add all valid matches, possibly repeatedly, to a results
collection supplied by the caller. Two further improvements of the solution are
suggested as exercises to the reader:
1. Of course, code that works for these three, but no other XPath expressions
is hardly generic. Abstract to a large, preferrably unbounded, set of useful
XPath expressions. Refactor the code to separate commonalities and
individual degrees of freedom, as cleanly as possible.
2. The use of a Collection for matches implies eager evaluation. This is not
e cient for the recursive calls from variant 3 to variant 1, where searching
can be aborted after the rst successful match (which falsi es sub.isEmpty( )).</p>
      <p>Refactor the code to allow for lazy evaluation, as transparently as possible.3
See section 5 and Fig. 4 below for our proposed solution. We do not claim that
these refactoring steps are infeasible in any particular previous framework. But
we shall demonstrate that the Paisley approach naturally gives both pragmatic
design guidelines, and a concrete notation for the adequate, abstract, modular
and e cient expression of the underlying algorithm and its instantiations.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Basic Matching with Paisley</title>
      <p>
        The design principles, semantics and APIs of Paisley patterns have been discussed
in detail in [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9,10,11</xref>
        ]. The EDSL is extremely lightweight: It requires no language
or compiler extension; API calls and programming idioms are su cient for its
full expressive power. The core API is summarized in Fig. 2.
      </p>
      <p>The root of the pattern hierarchy is the abstract base class PatternhAi of
patterns that can process objects of some arbitrary type A. A pattern PatternhAi p
is applied to some target data x of type A by calling p.match(x), returning a
boolean value indicating whether the match was successful.</p>
      <p>Complex patterns are composed from application-layer building blocks that
implement classi cations and projections, the rei cation of instance tests and
getter methods, respectively. These can be de ned independently for arbitrary
(public interfaces of) data models, requiring no intrusion into legacy code. Paisley
comes with prede ned bindings for common Java data models, such as the
Collection framework; more can be added by the user. Each projection x from
type A to B induces by contravariant lifting a construction of type PatternhAi from
PatternhBi, conveniently implemented as a method PatternhAi getX(PatternhBi p).</p>
      <p>
        Information is extracted from a successfully matched pattern via embedded
pattern variables. A pattern VariablehAi v matches any value A x, and stores a
reference to x that can be retrieved by invoking v.getValue( ). Variables behave like
imperative rather than logical variables; subsequent matches merely overwrite
the stored value, no uni cation is implied.4
3 This task is inherently much more di cult in conventional programming languages
than in logic and lazy functional approaches; cf. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
4 This design choice enables the use of Paisley for general, object-oriented data models,
where stable notions of equality, let alone an induction principle, cannot be taken
for granted. See section 4 for the handy implications of the imperative perspective.
g
g
abstract class PatternhAi f
public abstract boolean match(A target);
public boolean matchAgain( );
public static hAi PatternhAi
      </p>
      <p>both(Patternh? super Ai first, Patternh? super Ai second);
public static hAi PatternhAi</p>
      <p>either(Patternh? super Ai first, Patternh? super Ai second);
public PatternhAi someMatch( );
class VariablehAi extends PatternhAi f
public A getValue( );
public hBi ListhAi eagerBindings(Patternh? super Bi root, B target);
public hBi IterablehAi lazyBindings(Patternh? super Bi root, B target);
public hBi PatternhBi bind(Patternh? super Bi root, Patternh? super Ai sub);
public PatternhAi star(Patternh? super Ai root);
public PatternhAi plus(Patternh? super Ai root);</p>
      <p>Patterns can be composed conjunctively and disjunctively by the binary
combinators both and either, or their n-ary variants all and some (not shown),
respectively. As in traditional logic programming, disjunction is realized as
backtracking: After each successful match for a pattern p, p.matchAgain( ) can be invoked
to backtrack and nd another solution. Solutions can be exhausted, in an
imperative style of encapsulated search, by the following programming idiom:
if (p.match(x) ) do</p>
      <p>doSomething( );
while (p.matchAgain( ) );
When not using an exhaustive loop, the computation of alternative solutions can
be deferred inde nitely; the required state for choice points is stored residually
in the instances of the logical combinators themselves.</p>
      <p>The search construct can be abbreviated further to a functional style if the
desired result of each match is the value stored in a single pattern variable v.
Invoking v.eagerBindings(p, x) or v.lazyBindings(p, x) returns objects that give the
value of v for all successive matches, collected beforehands in an implicit loop,
or computed on iterator-driven demand, respectively. The latter can also deal
with in nite sequences of solutions.</p>
      <p>For cases where only the satis ability of a pattern p, but not the actual
sequence of solutions, is of concern, one can invoke p.someMatch( ), yielding a
wrapper that cuts all alternatives after the rst solution. Thus, laziness is
exploited for e ciency and, when used in conjunction with other patterns with
multiple relevant solutions, spurious multiplication of solution sets is avoided.</p>
    </sec>
    <sec id="sec-3">
      <title>Advanced Pattern Calculus</title>
      <sec id="sec-3-1">
        <title>Pattern Substitution and Iteration</title>
        <p>
          As discussed in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], Paisley patterns and their variables obey an algebraic
calculus where most of the expected mathematical laws hold, despite the low-level
imperative nature of their implementation.
        </p>
        <p>A variable v known to occur5 in a pattern p can be substituted by a
subpattern q, by invoking v.bind(p, q), mnemonically read as ( v: p)q. The
implementation delegates to p and q sequentially, and hence does not require access to the
internals of either operand.</p>
        <p>Substitution can also be given a recursive twist; a pattern p with a \hole"
v can be nested. The patterns v.star(p) and v.plus(p) correspond to the and
+-closures, respectively, of the search path relation between p and v, in the sense
that the usual recursive relations hold,
v.star(p)
v.plus(p)
either(v, v.plus(p) )
v.bind(p, v.star(p) )
which is already almost the complete, e ective implementation, up to lazy
duplication of v.plus(p) for each iteration level that is actually reached. Note that
iteration depth is conceptually unbounded, and solutions are explored in
depthrst pre-order, due to the way either is used. Thus patterns form a Kleene algebra
up to equivalence of solution sets but not sequences.</p>
        <p>Using these iteration operators, complex nondeterministic pattern
constructors can be de ned concisely. For instance, the contravariant lifting of the
multivalued, transitive descendant projection in XML document trees, applied to a
pattern p, becomes simply</p>
        <p>v.bind(v.plus(child(v) ), p)
where VariablehNodei v is fresh, and the multi-valued projection PatternhNodei
child(PatternhNodei ) is implemented in terms of the org.w3c.dom.Node getter
methods getFirstChild( ) and getNextSibling( ). Note how the two sources of
nondeterminism, regarding horizontal (child) and vertical (plus) position in the document
tree, respectively (cf. the distinct recursive calls in Fig. 1), combine completely
transparently.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Pattern Function Abstraction</title>
        <p>Functions taking patterns to patterns have so far featured in two important
roles. In operational form, implemented as lifting methods, they are the basis
for contravariant representation of projection patterns. In algebraic form, as
bind-based abstractions from a variable, they enable pattern composition and
iteration. Being so ubiquitous in the Paisley approach, they deserve their own
5 The actual condition is de nite assignment rather than occurrence, for technical
reasons.
public static hAi MotifhA, Ai id( );
public hCi MotifhC, Bi on(MotifhC, ? super Ai other);
public static hAi MotifhA, Ai star(MotifhA, ? super Ai f);
public static hAi MotifhA, Ai plus(MotifhA, ? super Ai f);
public IterablehAi lazyBindings (B target);
public ListhAi eagerBindings(B target);
g
g
class VariablehAi extends PatternhAi f
// : : :
public hBi MotifhA, Bi lambda(PatternhBi root);
rei cation. Fig. 3 shows the relevant API. An instance of class MotifhA, Bi is
the rei ed form of a function taking patterns over A to patterns over B, or by
contravariance, the pattern representation of an access operation that takes data
of type B to data of type A, by its apply method.</p>
        <p>The Motif class provides the algebra of the category of patterns, namely the
identical motif id( ) and the type-safe composition of motifs by the on(Motif)
instance method. The variable-related operations star/plus and lazy-/eagerBindings
each have a point-free counterpart in Motif, which create anonymous variables to
hold intermediate results on the y. For instance, the XML descendant pattern
de ned above can be expressed less redundantly in point-free form as
plus(child).apply(p)
given a child access rei cation MotifhNode,Nodei child.</p>
        <p>Conversely, motif abstraction is de ned for pattern variables, which obeys
the beta reduction rule:
v.lambda(p).apply(q)
v.bind(p, q)</p>
        <p>There is often a trade-o in convenience between functions in operational
and rei ed form, that is as either pattern lifting methods or motifs. Since there
is no automatic conversion between methods and function objects in Java prior
to version 8, our strategy in the Paisley library is to provide both redundantly.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Motivation Revisited</title>
      <p>Figure 4 repeats the XPath examples expressions, now contrasting the
domainspeci c XPath code with the general-purpose Java/Paisley code. The latter has
been underlined for easy reference. The solution makes full use of the shared
PatternhNodei dslash(String name, PatternhElementi: : : constraints) f</p>
      <p>return descendantOrSelf(element(both(tagName(name), all(constraints) ) ) );
VariablehElementi outerVar = new Variableh i ( );
PatternhNodei outerForm = dslash(”p”, outerVar);
VariablehElementi innerVar = new Variableh i ( );
PatternhNodei innerForm = dslash(”a”, innerVar, attr(”href”, neq(””) ) );
IterablehElementi collect1(Document root) f</p>
      <p>return innerVar.lazyBindings(innerForm, root);
g
IterablehElementi collect2(Document root) f</p>
      <p>return innerVar.lazyBindings(outerVar.bind(outerForm, innerForm), root);
g
IterablehElementi collect3(Document root) f</p>
      <p>return outerVar.lazyBindings(outerVar.bind(outerForm, innerForm.someMatch( ) ), root);
common structure of the three variants, and achieves complete separation of
concerns:
{ The search procedure implied by the operator // is not spread out as
recursive control ow, but rei ed and encapsulated as the descendantOrSelf pattern
lifting, prede ned in the XPathPatterns static factory class of Paisley, discussed
in detail in the following section.
{ XPath subexpressions are denoted concisely and orthogonally.</p>
      <p>The generic form .//tag. . . that reoccurs in all variants is abstracted as
the extensible pattern construction dslash.</p>
      <p>The particular inner and outer forms, .//a[@href] and .//p,
respectively, are de ned once and for all, independently of each other.
{ The semantics of XPath queries are e ected concisely and orthogonally.</p>
      <p>The general principle of encapsulated search is expressed naturally by a
call to lazyBindings. The third variant concisely prunes the recursively
encapsulated search after the rst solution, via the pattern modi er
someMatch( ).</p>
      <p>The particular con guration of subexpressions for all variants boils down
to a simple choice of the bound variable and pattern-algebraic
composition. Note that both are rei ed and hence rst-class citizens of Java.
Thus, each variant boils down to a one-line expression, where the points
of variability are ordinary subexpressions that can be chosen statically
(as shown) or dynamically, with arbitrarily complex meta-programming.
g
g
g
g
g
enum Axis f</p>
      <p>public Motifh? extends Node, Nodei getMotif( );
abstract class Predicate f
public abstract boolean accepts(NodeSet context, int position, Node candidate);
public Motifh? extends Node, Nodei apply(Motifh? extends Node, Nodei base);
class NodeSet f
private Collectionh? extends Nodei elems;
public NodeSet(Iterableh? extends Nodei elems);
public int size( );
public IterablehNodei filter(Predicate pred);
class Path f</p>
      <p>
        Path(Motifh? extends Node, Nodei motif);
public Motifh? extends Node, Nodei getMotif( );
public Path step(Axis axis, Test test, Predicate: : : predicates);
As a case study of real-world data models and queries, we have implemented
the navigational (proper path) fragment of the XPath 1.0 language as a Paisley
pattern combinator library. The complete implementation consists of the factory
classes XMLPatterns for generic DOM access and XPathPatterns for XPath speci cs,
with currently 433 and 282 lines of code, respectively. Given an XPath parser,
it can be extended to a full- edged interpreter, and hence a non-embedded DSL,
by a straightforward mapping of abstract syntax nodes to pattern operations.
The XPath 1.0 language comes with many datatypes and functions that do not
contribute directly to its main goal, namely the addressing of nodes in an XML
document. For simplicity, we restrict our treatment of the language to a fragment
streamlined for that purpose. Typical uses of XPath within XSLT or XQuery [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
conform to this fragment. We conjecture that the missing features can be added
without worsening essential operational complexity and runtime performance.
We omit external variables and functions, and all operations on the datatypes
of strings, numbers and booleans, as well as intangible document nodes such as
comments and processing instructions. The supported sublanguage is re ected
one-to-one by API operations based in the class hierarchy depicted in Fig. 5.
(Operation signatures are given in Fig. 11 in the appendix.) Its focus is on
socalled path expressions of the general syntactic form
path ::= abs path j rel path
step ::= axis :: test ([ predicate ])
abs path ::= /rel path?
rel path ::= (rel path / )?step
      </p>
      <p>Here axis is one of the twelve XPath document axes, omitting the deprecated
namespace declaration axis. The nonterminal test is the tautological test node(),
the text node test text() or an explicit node name test. For predicate, used to
lter selected nodes, we accept not the full XPath expression language, but only
predicate ::= path j integer j not predicate j ( predicate (and j or) predicate )
where a path predicate holds if and only if it selects at least one node
(existential quanti cation). Positive and nonpositive integer predicates [ i] select the
i-th node from the candidate sequence, and the (n i)-th node, respectively,
from the sequence of n candidates in total. The former is a valid abbreviation
for [position()==i] in standard XPath; we add the latter as an analogous
abbreviation for [position()==last()-i]. Logical connectives on predicates are
de ned as usual.</p>
      <p>Various abbreviation rules apply; for instance, the shorthand .//a[@href]
expands to the verbose form self::node()/descendantOrSelf::node()/child
::a[attribute::href]. This translates to the following semantic object:
relative( ).step(Axis.self, node( ) )
.step(Axis.descendantOrSelf, node( ) )
.step(Axis.child, name(”a”),</p>
      <p>exists(relative( ).step(Axis.attribute, ”href”) ) )</p>
      <p>The relation between XPath predicates and candidate sequences,
misleadingly called \node-sets" in the standard, is rather idiosyncratic (they can not be
modeled adequately as plain sets) and the major challenge in this case study.
A node-set is implicitly endowed with either of two prede ned orders, namely
forward or reverse document order. These orders are loosely speci ed by a
preorder traversal of nodes, up to attribute permutation. A predicate lters the
nodes in a node-set not purely by point-wise application, but may depend on
some context, namely the position of the node in the node-set, starting from one,
and the total number of members. This information is available in XPath via the
\functions" position() and last(), respectively. It is realized in the API by
the parameter position of method Predicate.accepts and the method size( ) of class
NodeSet, respectively, to be supplied from the method filter of class NodeSet.
6.2</p>
      <sec id="sec-4-1">
        <title>Pattern-Based Interpreter Design</title>
        <p>The node-extracting semantics of the XPath language can be rendered naturally
in the Paisley framework by contravariant lifting. An XPath expression can be
ancestor
ancestorOrSelf
plus(parent)
star(parent)</p>
        <p>descendant
descendantOrSelf
plus(child)
star(child)
followingSibling
plus(nextSibling)
precedingSibling
plus(previousSibling)
following</p>
        <p>ancestorOrSelf.on(followingSibling).on(descendantOrSelf)
preceding
ancestorOrSelf.on(precedingSibling).on(descendantOrSelf)</p>
        <p>self id
applied to any node of an XML document, here implemented as DOM, and
extracts some other nodes, possibly of a more special type, such as elements or
attributes. This gives rise to a semantic type Motifh? extends Node, Nodei.</p>
        <p>Except for the context-sensitivity of predicates, all language constructs could
simply been given motif semantics and lumped together by composition.</p>
        <p>XPath axes are all de ned concisely in terms of motifs. Primitive DOM access
operations from factory class XMLPatterns directly de ne the attribute, child and
parent axes. Given the additional primitives next-/ previousSibling, which are only
implicit in the XPath standard, all other axes are de nable in terms of elementary
motif algebra; see Figure 6. Node tests are straightforward applications of test
pattern lifting.</p>
        <p>Atomic path expressions are realized by the document root access motif and
the identity motif, for absolute and relative paths, respectively. Composite path
expressions conceptually compose their constituents left to right. The exception
are predicates, which are implemented as motif transforms in order to deal with
context sensitivity. These are applied in order to the step basis, that is the
composition of axis and test, for local ltering, and the result is then composed
with the front of the path expression; see Fig. 7.</p>
        <p>The real challenge is the implementation of node-set predicates: On the one
hand, context information about relative element position and total node-set
size must be provided, which transcends the context-free realm of pattern and
motif composition. On the other hand, elements should be enumerated lazily, in
class Predicate f
// : : :
gg; gg; g
g
class NodeSet f
// : : :
public MotifhNode, Nodei apply(final Motifh? extends Node, Nodei base) f
return new Motifh? extends Node, Nodei ( ) f
public PatternhNodei apply(Pattern h? super Nodei p) f
return new MultiTransformhNode, Nodei (p) f
protected IterablehNodei apply(Node n) f</p>
        <p>return new NodeSet(base.lazyBindings (n) ).filter(Predicate.this); // [ ]
public NodeSet(Iterable h? extends Nodei elems) f</p>
        <p>this.elems = cache(elems) ;
g
public IterablehNodei filter(final Predicate pred) f
return new IterablehNodei ( ) f
public IteratorhNodei iterator( ) f
return new FilterIteratorhNodei (elems.iterator( ) ) f
int i = 0;
protected boolean accepts(Node candidate) f</p>
        <p>return pred.accepts(NodeSet.this, ++i, candidate);
gg; gg; g
g
public static Predicate exists(final Path cond) f
return new Predicate( ) f
public boolean accepts(NodeSet context, int position, Node node) f</p>
        <p>return cond.getMotif( ).apply(any ( ) ).match (node);
g
g;
g</p>
        <p>g
g;
g
public static Predicate index(final int i) f
return new Predicate( ) f
public boolean accepts(NodeSet context, int position, Node node) f
return position == (i &gt; 0 ? i : context.size( ) i);
order to make tests for non-emptiness e cient and avoid scanning for unneeded
solutions. Obviously, evaluation must switch transparently from lazy to eager
strategy if the size of the node-set is observed. And lastly, for elegance reasons,
we strive for an implementation that is as declarative as possible, with very
limited amounts of speci c imperative coding.</p>
        <p>The solution is depicted in Figure 8. Generic Paisley API method calls are
underlined for easy reference. The action of a predicate on a base motif requires
control over the solution node-set. Hence it needs to intercept both the pattern
parameter at the motif level and the target node parameter at the pattern level,
by means of two nested anonymous classes. Then a lazy disjunction is spliced in,
which enumerates the solutions of the base motif, wraps them in a local node-set
and lters them context-sensitively.</p>
        <p>The counterpart on the node-set side of the implementation works as follows:
It wraps the lazy enumeration of candidate nodes is a collection, via the auxiliary
method cache (de nition not shown). This collection caches enumerated items
for repeated access, and forces eager evaluation if its size( ) method is called.</p>
        <p>The actual ltering operation yields a lazy enumeration that intercepts
iterator creation, again by means of two nested anonymous classes. The iterator
of the cached candidate collection is overwritten by an instance of the auxiliary
abstract class FilterIteratorhAi (de nition not shown) that in-/excludes elements
ad-hoc, determined by the result of its method boolean accepts(A). This
acceptance test is then routed back to the context-sensitive acceptance test of the
given predicate, by addition of a single minuscule piece of explicit imperative
(stateful) programming, namely a counter i for the relative position. Note that,
Java formal noise apart, the actual problem-speci c code consists of ve
singleline statements, marked with [ ].</p>
        <p>Existential predicates forget their results, hence the invocation of the
catchall pattern any( ). They prune the search after the rst hit, as witnessed by the
absence of a call to matchAgain( ) on the freshly created pattern. Eager evaluation
is only ever triggered by index predicates via a call to context.size( ). Logical
predicate connectives are de ned point-wise (not shown).
7</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Evaluation</title>
      <p>
        We have tested the performance and scalability of our implementation using
material from XMark [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a benchmark suite for XML-based large databases.
The homepage of XMark [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] o ers a downloadable tool to generate
pseudorandom well-typed XML les according to a published DTD, with a linear size
parameter. We have used the tools to produce test data les for size parameter
values from 0:04 to 0:40 in increments of 0:04, where 0:01 corresponds roughly to
1 MiB of canonical XML. From the 20 published XQuery benchmark queries we
have chosen as test cases the four where the entire logic is expressed in XPath
rather than XQuery operations; see Fig. 9 (and Table 1 in the appendix).
      </p>
      <p>
        For each pair of data le and query expression, we processed the document
with lazyBindings of the XPath motif, and computed running times for extracting
Query XPath Expression
Q01 /site/open auctions/open auction/bidder[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]/increase/text()
Q06 //site/regions//item
Q15 /site/closed auctions/closed auction/annotation/description/
parlist/listitem/parlist/listitem/text/emph/keyword/text()
Q16 /site/closed auctions/closed auction[annotation/description/
parlist/listitem/parlist/listitem/text/emph/keyword/text()]
the rst solution and all solutions (in a loop), as well as the e ective time (total
time divided by number of solutions).
      </p>
      <p>Timing values were obtained as real time with System.nanoTime( ), median
value of ten repetitions, with interspersed calls to System.gc( ). In order to
compensate deferred computation costs in the DOM implementation, we reused the
same document instances for all successive queries. All experiments were
performed on an Intel Core i5-3317U quad-core running at 1.70 GHz, with 8 GiB of
physical memory, under Ubuntu 14.04 LTS (64bit), and Oracle Java SE 1.8.0
05b13 with HotSpot 64bit Server VM 25.5-b02 and 800 MiB heap limit.</p>
      <p>Our rst quantitative goal, beyond highlighting the elegance and e
ectiveness of Paisley-style pattern speci cation, is to demonstrate the e ciency payo
of lazy pattern execution. Our second quantitative goal comes back to
methodological arguments from the motivation section of this paper. Pattern matching
has been hailed as a declarative tool that brings expressiveness for data
querying extremely close to the hosting programming language environment. Tools
for non-embedded domain-speci c languages may be more powerful in many
respects, but the burden of proof is on them that this power is worth the trouble
incurred by the impedance mismatch with ordinary host code. Nevertheless, we
should verify that the bene ts of tight embedding produced by our
methodological approach are not squandered by the implementation.</p>
      <p>To that end, we repeated our experiments with the next-closest tool at hand,
namely the Java on-board XPath implementation accessible as javax.xml.xpath. .6
We compared both preparation and running times of Paisley XPath patterns
with XPathExpression objects obtained via XPath.compile(String). The rst solution
and all solutions were obtained by calling XPathExpression.evaluate with the type
parameter values NODE and NODESET, respectively.</p>
      <p>The Paisley approach fares well, even as a non-embedded DSL. The Paisley
preparation process (a simple recursive descent parser for full XPath generated
with ANTLR7, followed by direct translation of abstract syntax nodes to pattern
operations) is consistently faster than on-board compilation to XPathExpression
objects, although the task may have been sped up marginally by considering
6 In our Java environment, com.sun.org.apache.xpath.internal.jaxp.XPathImpl.
7 http://www.antlr.org/</p>
      <p>Q06</p>
      <p>Q15</p>
      <p>Q16</p>
      <p>Q01</p>
      <p>Q06</p>
      <p>Q15</p>
      <p>Q16
only a subset of the language after parsing. (See Table 2 in the appendix for
details.) When patterns are constructed in embedded DSL style, using the API
rather than textual input, static safety is improved, and even the small overhead
eliminated, at the same time.</p>
      <p>The di erences in running times are so drastic that they can only be
visualized meaningfully in logarithmic scale. Proportions range from on-board facilities
being 13 % faster for all solutions of Q06 at size 0.04, to being over 26 000 times
slower for the rst solution of Q06 at size 0.40. With the exception of the
exhaustion of \brute-force" case Q06, Paisley is 1{2 (all solutions) or 2{4 ( rst
solution) orders of magnitude faster, respectively; see Fig. 10. Accessing only the
rst solution with the on-board tools is particularly disappointing, being only
marginally faster than exhausting all solutions. It appears that our motivation is
con rmed, and conventionally developed tools are ill-suited to scalable lazy
evaluation, where external demand governs internal control. (More detailed results
are given in Fig. 12 the appendix.)
8</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        The experimental gures for the on-board tools have been obtained without any
tweaking of features; hence there is large uncertainty in the amount of possible
improvements. Therefore the comparison should not be taken too literally. But
we feel that it is fair in a certain sense nevertheless: Our Paisley implementation
has been obtained in the straightest possible manner, also without any tweaking.
Its advantage lies thus chie y in the fact that we have chosen the application
domain, namely the speci ed XPath fragment, and tailored the tool design to
avoid any complexity inessential to the task at hand. It is this light-weight
exibility by e ective manual programming we wish to leverage with the Paisley
approach { the capabilities of existing, more heavy-weight tools with respect to
automated, adaptive specialization are generally no match.
Related work with regard to language design and implementation, and to
objectoriented-logic programming, has been discussed in [
        <xref ref-type="bibr" rid="ref10 ref9">9,10</xref>
        ] and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], respectively.
      </p>
      <p>
        Purely declarative accounts of XPath, although theoretically interesting, have
little technical impact on our main concern, the concrete embedding in a
mainstream programming language. A few random examples: In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], XPathLog is
presented, adding variable binding capabilities and sound Herbrandt semantics
to XPath, for a full- edged logic programming language. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], an algorithmic
analysis of XPath in terms of modal logic is given. The language fragment and
experimental approach used there is a model predecessor for our own work,
including the particular benchmark. In the recent paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the implementation
of XPath in the Haskell-like functional-logic programming language TOY is
discussed. A combinatorial approach to XPath constructs very similar to Paisley is
taken, which appears to corroborate our claims of a natural design.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Appendix: Supplementary Material</title>
      <p>6
6
5
5
4
4
3
3
0
0</p>
      <p>Q06
on−board eff
on−board first
on−board all
0.04
Fig. 12. Running times for Paisley and Java on-board XPath implementations.
Logarithmic scale; lower end of scale arbitrary.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Almendros-Jimenez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caballero</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            a-Ruiz,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saenz-Perez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>XPath query processing in a functional-logic language</article-title>
          .
          <source>ENTCS</source>
          <volume>282</volume>
          ,
          <issue>19</issue>
          {
          <fpage>34</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Boag</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chamberlin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Florescu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simeon</surname>
          </string-name>
          , J.:
          <source>XQuery 1</source>
          .0:
          <string-name>
            <surname>An</surname>
            <given-names>XML</given-names>
          </string-name>
          <string-name>
            <surname>Query Language (Second Edition). W3C</surname>
          </string-name>
          , http://www.w3. org/TR/2010/REC-xquery-
          <volume>20101214</volume>
          / (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , DeRose,
          <string-name>
            <given-names>S.: XML</given-names>
            <surname>Path</surname>
          </string-name>
          <article-title>Language (XPath) Version 1</article-title>
          .0. W3C, http: //www.w3.org/TR/1999/REC-xpath-
          <volume>19991116</volume>
          / (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Franceschet</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimuel</surname>
          </string-name>
          , E.:
          <article-title>Modal logic and navigational XPath: an experimental comparison</article-title>
          .
          <source>In: Proc. Workshop Methods for Modalities</source>
          . pp.
          <volume>156</volume>
          {
          <issue>172</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hughes</surname>
          </string-name>
          , J.:
          <article-title>Why Functional Programming Matters</article-title>
          .
          <source>Computer Journal</source>
          <volume>32</volume>
          (
          <issue>2</issue>
          ),
          <volume>98</volume>
          {
          <fpage>107</fpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>May</surname>
          </string-name>
          , W.:
          <article-title>XPath-Logic and XPathLog: A logic-based approach for declarative XML data manipulation</article-title>
          .
          <source>Tech. Rep</source>
          .
          <volume>149</volume>
          ,
          <string-name>
            <surname>Institut</surname>
            <given-names>fu</given-names>
          </string-name>
          r Informatik,
          <source>Albert-LudwigsUniversitat Freiburg</source>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>XMark { an XML benchmark project (</article-title>
          <year>2009</year>
          ), http://www. xml-benchmark.org/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waas</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kersten</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carey</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manolescu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Busse</surname>
          </string-name>
          , R.:
          <article-title>XMark: A benchmark for XML data management</article-title>
          .
          <source>In: Proc. 28th VLDB</source>
          . pp.
          <volume>974</volume>
          {
          <fpage>985</fpage>
          . Morgan Kaufmann (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Trancon y Widemann,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Lepper</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Paisley: pattern matching a la carte</article-title>
          .
          <source>In: Proc. 5th ICMT. LNCS</source>
          , vol.
          <volume>7307</volume>
          , pp.
          <volume>240</volume>
          {
          <fpage>247</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Trancon</surname>
            y Widemann,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lepper</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Paisley: A pattern matching library for arbitrary object models</article-title>
          .
          <source>In: Proc. 6th ATPS. LNI</source>
          , vol.
          <volume>215</volume>
          , pp.
          <volume>171</volume>
          {
          <fpage>186</fpage>
          . Gesellschaft fur Informatik (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Trancon</surname>
            y Widemann,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lepper</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Some experiments on light-weight objectfunctional-logic programming in Java with Paisley</article-title>
          .
          <source>In: Declarative Programming and Knowledge Management. LNCS</source>
          , vol.
          <volume>8439</volume>
          . Springer (
          <year>2014</year>
          ),
          <source>in press 0.16 0.28 0.40 0.04 0.16 0.28 0.40 0.16 0.28 0.40 0.04 0.16 0.28 0</source>
          .
          <fpage>40</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>