Interpreting XPath by Iterative Pattern
                Matching with Paisley

             Baltasar Trancón y Widemann12 and Markus Lepper2
                        1
                            Ilmenau University of Technology
                                2
                                  <semantics/> GmbH


      Abstract. The Paisley architecture is a light-weight EDSL for non-deter-
      ministic pattern matching. It automates the querying of arbitrary object-
      oriented data models in a general-purpose programming language, using
      API, libraries and simple programming patterns in a portable and non-
      invasive way. The core of Paisley has been applied to real-world appli-
      cations. Here we discuss the extension of Paisley by pattern iteration,
      which adds a Kleene algebra of pattern function composition to the un-
      restricted use of the imperative host language, thus forming a hybrid
      object-oriented–functional–logic framework. We subject it to a classical
      practical problem and established benchmarks: the node-set fragment of
      the XPath language for querying W3C XML document object models.


1   Introduction
The Paisley embedded domain-specific language (EDSL) and library adds the
more declarative style of pattern matching to the object-oriented context of
Java programming [9,10]. Paisley offers a combination of features that is, by
our knowledge, not offered by any other existing pattern matching solution for a
main-stream language: it is strictly typed, fully compositional, non-invasive and
supports non-determinism, full reification and persistence.
    The non-deterministic aspects of Paisley have been demonstrated to provide
a fairly powerful basis for embedded logic programming in the object-oriented
host environment. In [11] we have discussed how to solve cryptarithmetic puzzles
with the backtracking facilities of Paisley, and how to obtain non-trivial efficient
search plans by object-oriented meta-programming with Paisley pattern objects.
    Here we describe and evaluate recent extensions to Paisley that greatly in-
crease its capabilities as an embedded functional-logic programming system,
supporting more complex control and data flow than the combinatorial gen-
erate&test tactics of [11]. In particular, the novel contributions are:
 1. in the core layer, a calculus of functions on patterns and its Kleene algebra,
    thus providing regular expressions of nested patterns;
 2. in the application layer, a library extension that demonstrates their use by
    matching W3C XML Document Object Models with XPath [3] expressions;
 3. a practical case study that evaluates the approach in comparison with stan-
    dard on-board facilities of the host language Java.
Variants:              1 .//a[@href]            2 .//p//a[@href]               3 .//p[.//a[@href]]


    void collect (int variant, Node node, Collection h Element i results,
                  boolean sideways) {
      if (node instanceof Element) {
        Element elem = (Element)node;
        if (elem.getTagName( ).equals(variant == 1 ? ”a” : ”p”) ) {
          if (variant > 1) {
            Collection h Element i sub = variant == 2 ? results : new ArrayListh i ( );
            collect(1, elem, sub, false);
                }
                if (variant == 1 && !elem.getAttribute(”href”).equals(””) ||
                    variant == 3 && !sub.isEmpty( ) )
                  results.add(elem);                                                // solution
            }
        }
        if (node.getFirstChild( ) != null)
          collect(variant, node.getFirstChild( ), results, true);                   // depth
        if (sideways && node.getNextSibling( ) != null)
          collect(variant, node.getNextSibling( ), results, true);                  // breadth
    }

     Fig. 1. Three related XML queries. Top: XPath expressions; bottom: Java code.


2           Motivation
The basic motivation for a pattern matching EDSL is that patterns as language
constructs make data-gathering code more declarative, and thus more readable,
more writable and more maintainable. If done properly, this additional expres-
sive power can be used to factor code compositionally in ways that are not
straightforwardly possible in a conventional object-oriented approach, where the
logic and data flow of a complex query often appear as cross-cutting concerns.
    As an example that highlights the issues of logical compositionality, consider
a family of three related XPath expressions, depicted in Fig. 1, that select nodes
from XHTML documents. The first selects, from the context node and its descen-
dants, all <a> elements (hyperlink anchors) that have a href (target) attribute
specified. The second selects only those <a> elements with href attributes that
are nested within <p> elements (paragraphs). The third selects all <p> elements
that have some <a> element with href attribute nested within.
    The task is to implement the specified search procedures, as object-oriented
queries of the standard W3C X(HT)ML Document Object Model. Evidently,
one wishes to implement most of the operational code generically, with reuse
and minimal adaptation for each of the three variants. Additionally, the second
and third variant should be able to use the first recursively. To emphasize the
role of reuse in this example, Fig. 1 gives a unified implementation, where the
identifier variant is grayed out to emphasize that all its occurrences serve only the
static choice between the variants. The common algorithm is to traverse nodes
by depth-first search, and add all valid matches, possibly repeatedly, to a results
collection supplied by the caller. Two further improvements of the solution are
suggested as exercises to the reader:
 1. Of course, code that works for these three, but no other XPath expressions
    is hardly generic. Abstract to a large, preferrably unbounded, set of useful
    XPath expressions. Refactor the code to separate commonalities and indi-
    vidual degrees of freedom, as cleanly as possible.
 2. The use of a Collection for matches implies eager evaluation. This is not
    efficient for the recursive calls from variant 3 to variant 1, where searching
    can be aborted after the first successful match (which falsifies sub.isEmpty( ) ).
    Refactor the code to allow for lazy evaluation, as transparently as possible.3
See section 5 and Fig. 4 below for our proposed solution. We do not claim that
these refactoring steps are infeasible in any particular previous framework. But
we shall demonstrate that the Paisley approach naturally gives both pragmatic
design guidelines, and a concrete notation for the adequate, abstract, modular
and efficient expression of the underlying algorithm and its instantiations.

3    Basic Matching with Paisley
The design principles, semantics and APIs of Paisley patterns have been discussed
in detail in [9,10,11]. The EDSL is extremely lightweight: It requires no language
or compiler extension; API calls and programming idioms are sufficient for its
full expressive power. The core API is summarized in Fig. 2.
    The root of the pattern hierarchy is the abstract base class Pattern h A i of
patterns that can process objects of some arbitrary type A. A pattern Pattern h A i p
is applied to some target data x of type A by calling p.match(x) , returning a
boolean value indicating whether the match was successful.
    Complex patterns are composed from application-layer building blocks that
implement classifications and projections, the reification of instance tests and
getter methods, respectively. These can be defined independently for arbitrary
(public interfaces of) data models, requiring no intrusion into legacy code. Paisley
comes with predefined bindings for common Java data models, such as the
Collection framework; more can be added by the user. Each projection x from
type A to B induces by contravariant lifting a construction of type Pattern h A i from
Pattern h B i, conveniently implemented as a method Pattern h A i getX(Pattern h B i p) .
    Information is extracted from a successfully matched pattern via embedded
pattern variables. A pattern Variable h A i v matches any value A x, and stores a
reference to x that can be retrieved by invoking v.getValue( ) . Variables behave like
imperative rather than logical variables; subsequent matches merely overwrite
the stored value, no unification is implied.4
3
   This task is inherently much more difficult in conventional programming languages
  than in logic and lazy functional approaches; cf. [5].
4
   This design choice enables the use of Paisley for general, object-oriented data models,
  where stable notions of equality, let alone an induction principle, cannot be taken
  for granted. See section 4 for the handy implications of the imperative perspective.
   abstract class Pattern h A i {
     public abstract boolean match(A target);
     public boolean matchAgain( );
       public static h A i Pattern h A i
        both(Pattern h ? super A i first, Pattern h ? super A i second);
       public static h A i Pattern h A i
        either(Pattern h ? super A i first, Pattern h ? super A i second);
       public Pattern h A i someMatch( );
   }

   class Variable h A i extends Pattern h A i {
     public A getValue( );
       public h B i List h A i eagerBindings(Pattern h ? super B i root, B target);
       public h B i Iterable h A i lazyBindings(Pattern h ? super B i root, B target);
       public h B i Pattern h B i bind(Pattern h ? super B i root, Pattern h ? super A i sub);
       public Pattern h A i star(Pattern h ? super A i root);
       public Pattern h A i plus(Pattern h ? super A i root);
   }

                                 Fig. 2. Interface synopsis (core).


    Patterns can be composed conjunctively and disjunctively by the binary com-
binators both and either, or their n-ary variants all and some (not shown), respec-
tively. As in traditional logic programming, disjunction is realized as backtrack-
ing: After each successful match for a pattern p, p.matchAgain( ) can be invoked
to backtrack and find another solution. Solutions can be exhausted, in an im-
perative style of encapsulated search, by the following programming idiom:
    if (p.match(x) ) do
      doSomething( );
    while (p.matchAgain( ) );

When not using an exhaustive loop, the computation of alternative solutions can
be deferred indefinitely; the required state for choice points is stored residually
in the instances of the logical combinators themselves.
    The search construct can be abbreviated further to a functional style if the
desired result of each match is the value stored in a single pattern variable v.
Invoking v.eagerBindings(p, x) or v.lazyBindings(p, x) returns objects that give the
value of v for all successive matches, collected beforehands in an implicit loop,
or computed on iterator-driven demand, respectively. The latter can also deal
with infinite sequences of solutions.
    For cases where only the satisfiability of a pattern p, but not the actual
sequence of solutions, is of concern, one can invoke p.someMatch( ) , yielding a
wrapper that cuts all alternatives after the first solution. Thus, laziness is ex-
ploited for efficiency and, when used in conjunction with other patterns with
multiple relevant solutions, spurious multiplication of solution sets is avoided.
4     Advanced Pattern Calculus

4.1     Pattern Substitution and Iteration

As discussed in [10], Paisley patterns and their variables obey an algebraic cal-
culus where most of the expected mathematical laws hold, despite the low-level
imperative nature of their implementation.
    A variable v known to occur5 in a pattern p can be substituted by a subpat-
tern q, by invoking v.bind(p, q) , mnemonically read as (λv. p)q. The implementa-
tion delegates to p and q sequentially, and hence does not require access to the
internals of either operand.
    Substitution can also be given a recursive twist; a pattern p with a “hole”
v can be nested. The patterns v.star(p) and v.plus(p) correspond to the ∗ and
+
  -closures, respectively, of the search path relation between p and v, in the sense
that the usual recursive relations hold,

                               v.star(p) ≡ either(v, v.plus(p) )
                              v.plus(p) ≡ v.bind(p, v.star(p) )

which is already almost the complete, effective implementation, up to lazy du-
plication of v.plus(p) for each iteration level that is actually reached. Note that
iteration depth is conceptually unbounded, and solutions are explored in depth-
first pre-order, due to the way either is used. Thus patterns form a Kleene algebra
up to equivalence of solution sets but not sequences.
    Using these iteration operators, complex nondeterministic pattern construc-
tors can be defined concisely. For instance, the contravariant lifting of the multi-
valued, transitive descendant projection in XML document trees, applied to a
pattern p, becomes simply
      v.bind(v.plus(child(v) ), p)
where Variable h Node i v is fresh, and the multi-valued projection Pattern h Node i
child(Pattern h Node i ) is implemented in terms of the org.w3c.dom.Node getter meth-
ods getFirstChild( ) and getNextSibling( ) . Note how the two sources of nondetermin-
ism, regarding horizontal (child) and vertical (plus) position in the document
tree, respectively (cf. the distinct recursive calls in Fig. 1), combine completely
transparently.


4.2     Pattern Function Abstraction

Functions taking patterns to patterns have so far featured in two important
roles. In operational form, implemented as lifting methods, they are the basis
for contravariant representation of projection patterns. In algebraic form, as
bind-based abstractions from a variable, they enable pattern composition and
iteration. Being so ubiquitous in the Paisley approach, they deserve their own
5
    The actual condition is definite assignment rather than occurrence, for technical
    reasons.
    abstract class Motif h A, B i {
      public abstract Pattern h B i apply(Pattern h ? super A i );
        public static h A i Motif h A, A i id( );
        public        h C i Motif h C, B i on(Motif h C, ? super A i other);
        public static h A i Motif h A, A i star(Motif h A, ? super A i f);
        public static h A i Motif h A, A i plus(Motif h A, ? super A i f);
        public Iterable h A i lazyBindings (B target);
        public List h A i eagerBindings(B target);
    }

    class Variable h A i extends Pattern h A i {
      // . . .
      public h B i Motif h A, B i lambda(Pattern h B i root);
    }

                      Fig. 3. First-class pattern function (motif) interface.


reification. Fig. 3 shows the relevant API. An instance of class Motif h A, B i is
the reified form of a function taking patterns over A to patterns over B, or by
contravariance, the pattern representation of an access operation that takes data
of type B to data of type A, by its apply method.
    The Motif class provides the algebra of the category of patterns, namely the
identical motif id( ) and the type-safe composition of motifs by the on(Motif) in-
stance method. The variable-related operations star/plus and lazy-/eagerBindings
each have a point-free counterpart in Motif, which create anonymous variables to
hold intermediate results on the fly. For instance, the XML descendant pattern
defined above can be expressed less redundantly in point-free form as
    plus(child).apply(p)
given a child access reification Motif h Node,Node i child.
    Conversely, motif abstraction is defined for pattern variables, which obeys
the beta reduction rule:

                                 v.lambda(p).apply(q) ≡ v.bind(p, q)

    There is often a trade-off in convenience between functions in operational
and reified form, that is as either pattern lifting methods or motifs. Since there
is no automatic conversion between methods and function objects in Java prior
to version 8, our strategy in the Paisley library is to provide both redundantly.


5       Motivation Revisited

Figure 4 repeats the XPath examples expressions, now contrasting the domain-
specific XPath code with the general-purpose Java/Paisley code. The latter has
been underlined for easy reference. The solution makes full use of the shared
Variants:       1 .//a[@href]             2 .//p//a[@href]              3 .//p[.//a[@href]]


    Pattern h Node i dslash(String name, Pattern h Element i. . . constraints) {
     return descendantOrSelf(element(both(tagName(name), all(constraints) ) ) );
    }

    Variable h Element i outerVar = new Variable h i ( );
    Pattern h Node i outerForm = dslash(”p”, outerVar);
    Variable h Element i innerVar = new Variable h i ( );
    Pattern h Node i innerForm = dslash(”a”, innerVar, attr(”href”, neq(””) ) );


    Iterable h Element i collect1(Document root) {
       return innerVar.lazyBindings(innerForm, root);
    }
    Iterable h Element i collect2(Document root) {
       return innerVar.lazyBindings(outerVar.bind(outerForm, innerForm), root);
    }
    Iterable h Element i collect3(Document root) {
       return outerVar.lazyBindings(outerVar.bind(outerForm, innerForm.someMatch( ) ), root);
    }

Fig. 4. Three related XML queries revisited (from Figure 1). Top: XPath expressions;
middle: Java code template; bottom: template instantiations.


common structure of the three variants, and achieves complete separation of
concerns:
 – The search procedure implied by the operator // is not spread out as recur-
   sive control flow, but reified and encapsulated as the descendantOrSelf pattern
   lifting, predefined in the XPathPatterns static factory class of Paisley, discussed
   in detail in the following section.
 – XPath subexpressions are denoted concisely and orthogonally.
     • The generic form .//tag. . . that reoccurs in all variants is abstracted as
        the extensible pattern construction dslash.
     • The particular inner and outer forms, .//a[@href] and .//p, respec-
        tively, are defined once and for all, independently of each other.
 – The semantics of XPath queries are effected concisely and orthogonally.
     • The general principle of encapsulated search is expressed naturally by a
        call to lazyBindings. The third variant concisely prunes the recursively
        encapsulated search after the first solution, via the pattern modifier
        someMatch( ) .
     • The particular configuration of subexpressions for all variants boils down
        to a simple choice of the bound variable and pattern-algebraic compo-
        sition. Note that both are reified and hence first-class citizens of Java.
        Thus, each variant boils down to a one-line expression, where the points
        of variability are ordinary subexpressions that can be chosen statically
        (as shown) or dynamically, with arbitrarily complex meta-programming.
      enum Axis {
        public Motif h ? extends Node, Node i getMotif( );
      }

      abstract class Test {
        public abstract Motif h ? extends Node, Node i getMotif( );
      }

      abstract class Predicate {
        public abstract boolean accepts(NodeSet context, int position, Node candidate);
          public Motif h ? extends Node, Node i apply(Motif h ? extends Node, Node i base);
      }

      class NodeSet {
        private Collection h ? extends Node i elems;
        public NodeSet(Iterable h ? extends Node i elems);
          public int size( );
          public Iterable h Node i filter(Predicate pred);
      }

      class Path {
        Path(Motif h ? extends Node, Node i motif);
          public Motif h ? extends Node, Node i getMotif( );
          public Path step(Axis axis, Test test, Predicate. . . predicates);
      }

                                  Fig. 5. XPath base classes in Paisley.


6         The Paisley XPath Interpreter
As a case study of real-world data models and queries, we have implemented
the navigational (proper path) fragment of the XPath 1.0 language as a Paisley
pattern combinator library. The complete implementation consists of the factory
classes XMLPatterns for generic DOM access and XPathPatterns for XPath specifics,
with currently 433 and 282 lines of code, respectively. Given an XPath parser,
it can be extended to a full-fledged interpreter, and hence a non-embedded DSL,
by a straightforward mapping of abstract syntax nodes to pattern operations.

6.1        Language Fragment
The XPath 1.0 language comes with many datatypes and functions that do not
contribute directly to its main goal, namely the addressing of nodes in an XML
document. For simplicity, we restrict our treatment of the language to a fragment
streamlined for that purpose. Typical uses of XPath within XSLT or XQuery [2]
conform to this fragment. We conjecture that the missing features can be added
without worsening essential operational complexity and runtime performance.
We omit external variables and functions, and all operations on the datatypes
of strings, numbers and booleans, as well as intangible document nodes such as
comments and processing instructions. The supported sublanguage is reflected
one-to-one by API operations based in the class hierarchy depicted in Fig. 5.
(Operation signatures are given in Fig. 11 in the appendix.) Its focus is on so-
called path expressions of the general syntactic form
        path ::= abs path | rel path                              abs path ::= /rel path?
        step ::= axis :: test ([ predicate ])∗                     rel path ::= (rel path / )? step
     Here axis is one of the twelve XPath document axes, omitting the deprecated
namespace declaration axis. The nonterminal test is the tautological test node(),
the text node test text() or an explicit node name test. For predicate, used to
filter selected nodes, we accept not the full XPath expression language, but only
  predicate ::= path | integer | not predicate | ( predicate (and | or) predicate )
where a path predicate holds if and only if it selects at least one node (existen-
tial quantification). Positive and nonpositive integer predicates [±i] select the
i-th node from the candidate sequence, and the (n − i)-th node, respectively,
from the sequence of n candidates in total. The former is a valid abbreviation
for [position()==i] in standard XPath; we add the latter as an analogous ab-
breviation for [position()==last()-i]. Logical connectives on predicates are
defined as usual.
    Various abbreviation rules apply; for instance, the shorthand .//a[@href]
expands to the verbose form self::node()/descendantOrSelf::node()/child
::a[attribute::href]. This translates to the following semantic object:
      relative( ).step(Axis.self, node( ) )
                  .step(Axis.descendantOrSelf, node( ) )
                  .step(Axis.child, name(”a”),
                        exists(relative( ).step(Axis.attribute, ”href”) ) )
    The relation between XPath predicates and candidate sequences, mislead-
ingly called “node-sets” in the standard, is rather idiosyncratic (they can not be
modeled adequately as plain sets) and the major challenge in this case study.
A node-set is implicitly endowed with either of two predefined orders, namely
forward or reverse document order. These orders are loosely specified by a pre-
order traversal of nodes, up to attribute permutation. A predicate filters the
nodes in a node-set not purely by point-wise application, but may depend on
some context, namely the position of the node in the node-set, starting from one,
and the total number of members. This information is available in XPath via the
“functions” position() and last(), respectively. It is realized in the API by
the parameter position of method Predicate.accepts and the method size( ) of class
NodeSet, respectively, to be supplied from the method filter of class NodeSet.


6.2      Pattern-Based Interpreter Design
The node-extracting semantics of the XPath language can be rendered naturally
in the Paisley framework by contravariant lifting. An XPath expression can be
                 ancestor ≡ plus(parent)                      descendant ≡ plus(child)
           ancestorOrSelf ≡ star(parent)            descendantOrSelf ≡ star(child)
           followingSibling ≡ plus(nextSibling)       precedingSibling ≡ plus(previousSibling)
                following ≡ ancestorOrSelf.on(followingSibling).on(descendantOrSelf)
               preceding ≡ ancestorOrSelf.on(precedingSibling).on(descendantOrSelf)
                                                  self ≡ id


                               Fig. 6. Non-primitive XPath axes.

   class Path {
     // . . .
     public Path step(Axis axis, Test test, Predicate. . . predicates) {
       Motif h ? extends Node, Node i r = axis.getMotif( ).on(test.getMotif( ) );
       for (Predicate p : predicates)
          r = p.apply(r);
       return new Path(getMotif( ).on(r) );
       }
   }

                   Fig. 7. Implementation of composite path expressions.


applied to any node of an XML document, here implemented as DOM, and
extracts some other nodes, possibly of a more special type, such as elements or
attributes. This gives rise to a semantic type Motif h ? extends Node, Node i.
    Except for the context-sensitivity of predicates, all language constructs could
simply been given motif semantics and lumped together by composition.
    XPath axes are all defined concisely in terms of motifs. Primitive DOM access
operations from factory class XMLPatterns directly define the attribute, child and
parent axes. Given the additional primitives next-/ previousSibling, which are only
implicit in the XPath standard, all other axes are definable in terms of elementary
motif algebra; see Figure 6. Node tests are straightforward applications of test
pattern lifting.
    Atomic path expressions are realized by the document root access motif and
the identity motif, for absolute and relative paths, respectively. Composite path
expressions conceptually compose their constituents left to right. The exception
are predicates, which are implemented as motif transforms in order to deal with
context sensitivity. These are applied in order to the step basis, that is the
composition of axis and test, for local filtering, and the result is then composed
with the front of the path expression; see Fig. 7.
    The real challenge is the implementation of node-set predicates: On the one
hand, context information about relative element position and total node-set
size must be provided, which transcends the context-free realm of pattern and
motif composition. On the other hand, elements should be enumerated lazily, in
   class Predicate {
     // . . .
       public Motif h Node, Node i apply(final Motif h ? extends Node, Node i base) {
        return new Motif h ? extends Node, Node i ( ) {
           public Pattern h Node i apply(Pattern h ? super Node i p) {
             return new MultiTransform h Node, Node i (p) {
               protected Iterable h Node i apply(Node n) {
                 return new NodeSet(base.lazyBindings(n) ).filter(Predicate.this); // [ ∗ ]
       }}; }}; }
   }
   class NodeSet {
     // . . .
       public NodeSet(Iterable h ? extends Node i elems) {
        this.elems = cache(elems) ;
       }
       public Iterable h Node i filter(final Predicate pred) {
        return new Iterable h Node i ( ) {
           public Iterator h Node i iterator( ) {
             return new FilterIterator h Node i (elems.iterator( ) ) {
               int i = 0;                                                           // [ ∗ ]
               protected boolean accepts(Node candidate) {
                  return pred.accepts(NodeSet.this, ++i, candidate);                // [ ∗ ]
       }}; }}; }
   }
   public static Predicate exists(final Path cond) {
    return new Predicate( ) {
      public boolean accepts(NodeSet context, int position, Node node) {
         return cond.getMotif( ).apply(any( ) ).match(node);                        // [ ∗ ]
            }
       };
   }
   public static Predicate index(final int i) {
    return new Predicate( ) {
      public boolean accepts(NodeSet context, int position, Node node) {
         return position == (i > 0 ? i : context.size( ) − i);                     // [ ∗ ]
            }
       };
   }

Fig. 8. Implementation of context-sensitive predicate filtering. See text for underlining
and [∗].
order to make tests for non-emptiness efficient and avoid scanning for unneeded
solutions. Obviously, evaluation must switch transparently from lazy to eager
strategy if the size of the node-set is observed. And lastly, for elegance reasons,
we strive for an implementation that is as declarative as possible, with very
limited amounts of specific imperative coding.
    The solution is depicted in Figure 8. Generic Paisley API method calls are
underlined for easy reference. The action of a predicate on a base motif requires
control over the solution node-set. Hence it needs to intercept both the pattern
parameter at the motif level and the target node parameter at the pattern level,
by means of two nested anonymous classes. Then a lazy disjunction is spliced in,
which enumerates the solutions of the base motif, wraps them in a local node-set
and filters them context-sensitively.
    The counterpart on the node-set side of the implementation works as follows:
It wraps the lazy enumeration of candidate nodes is a collection, via the auxiliary
method cache (definition not shown). This collection caches enumerated items
for repeated access, and forces eager evaluation if its size( ) method is called.
    The actual filtering operation yields a lazy enumeration that intercepts it-
erator creation, again by means of two nested anonymous classes. The iterator
of the cached candidate collection is overwritten by an instance of the auxiliary
abstract class FilterIterator h A i (definition not shown) that in-/excludes elements
ad-hoc, determined by the result of its method boolean accepts(A) . This accep-
tance test is then routed back to the context-sensitive acceptance test of the
given predicate, by addition of a single minuscule piece of explicit imperative
(stateful) programming, namely a counter i for the relative position. Note that,
Java formal noise apart, the actual problem-specific code consists of five single-
line statements, marked with [∗].
    Existential predicates forget their results, hence the invocation of the catch-
all pattern any( ) . They prune the search after the first hit, as witnessed by the
absence of a call to matchAgain( ) on the freshly created pattern. Eager evaluation
is only ever triggered by index predicates via a call to context.size( ) . Logical
predicate connectives are defined point-wise (not shown).


7   Experimental Evaluation

We have tested the performance and scalability of our implementation using
material from XMark [8], a benchmark suite for XML-based large databases.
The homepage of XMark [7] offers a downloadable tool to generate pseudo-
random well-typed XML files according to a published DTD, with a linear size
parameter. We have used the tools to produce test data files for size parameter
values from 0.04 to 0.40 in increments of 0.04, where 0.01 corresponds roughly to
1 MiB of canonical XML. From the 20 published XQuery benchmark queries we
have chosen as test cases the four where the entire logic is expressed in XPath
rather than XQuery operations; see Fig. 9 (and Table 1 in the appendix).
   For each pair of data file and query expression, we processed the document
with lazyBindings of the XPath motif, and computed running times for extracting
    Query XPath Expression

     Q01 /site/open auctions/open auction/bidder[1]/increase/text()
     Q06 //site/regions//item
     Q15 /site/closed auctions/closed auction/annotation/description/
           parlist/listitem/parlist/listitem/text/emph/keyword/text()
     Q16 /site/closed auctions/closed auction[annotation/description/
           parlist/listitem/parlist/listitem/text/emph/keyword/text()]

            Fig. 9. XPath expression test cases from the XMark benchmark.


the first solution and all solutions (in a loop), as well as the effective time (total
time divided by number of solutions).
    Timing values were obtained as real time with System.nanoTime( ) , median
value of ten repetitions, with interspersed calls to System.gc( ) . In order to com-
pensate deferred computation costs in the DOM implementation, we reused the
same document instances for all successive queries. All experiments were per-
formed on an Intel Core i5-3317U quad-core running at 1.70 GHz, with 8 GiB of
physical memory, under Ubuntu 14.04 LTS (64bit), and Oracle Java SE 1.8.0 05-
b13 with HotSpot 64bit Server VM 25.5-b02 and 800 MiB heap limit.
    Our first quantitative goal, beyond highlighting the elegance and effective-
ness of Paisley-style pattern specification, is to demonstrate the efficiency payoff
of lazy pattern execution. Our second quantitative goal comes back to method-
ological arguments from the motivation section of this paper. Pattern matching
has been hailed as a declarative tool that brings expressiveness for data query-
ing extremely close to the hosting programming language environment. Tools
for non-embedded domain-specific languages may be more powerful in many re-
spects, but the burden of proof is on them that this power is worth the trouble
incurred by the impedance mismatch with ordinary host code. Nevertheless, we
should verify that the benefits of tight embedding produced by our methodolog-
ical approach are not squandered by the implementation.
    To that end, we repeated our experiments with the next-closest tool at hand,
namely the Java on-board XPath implementation accessible as javax.xml.xpath.∗.6
We compared both preparation and running times of Paisley XPath patterns
with XPathExpression objects obtained via XPath.compile(String) . The first solution
and all solutions were obtained by calling XPathExpression.evaluate with the type
parameter values NODE and NODESET, respectively.
    The Paisley approach fares well, even as a non-embedded DSL. The Paisley
preparation process (a simple recursive descent parser for full XPath generated
with ANTLR7 , followed by direct translation of abstract syntax nodes to pattern
operations) is consistently faster than on-board compilation to XPathExpression
objects, although the task may have been sped up marginally by considering
6
    In our Java environment, com.sun.org.apache.xpath.internal.jaxp.XPathImpl.
7
    http://www.antlr.org/
              6
              5
              4
log10(t µs)

              3
              2
              1
              0


                   Q01     Q06   Q15      Q16        Q01      Q06      Q15      Q16


  Fig. 10. Running times for Paisley and Java on-board XPath implementations. Left –
  size parameter 0.04; right – size parameter 0.40. Colors: dark to light – Paisley eff/-
  first/all; on-board eff/first/all. Logarithmic scale; lower end of scale arbitrary.


  only a subset of the language after parsing. (See Table 2 in the appendix for
  details.) When patterns are constructed in embedded DSL style, using the API
  rather than textual input, static safety is improved, and even the small overhead
  eliminated, at the same time.
      The differences in running times are so drastic that they can only be visual-
  ized meaningfully in logarithmic scale. Proportions range from on-board facilities
  being 13 % faster for all solutions of Q06 at size 0.04, to being over 26 000 times
  slower for the first solution of Q06 at size 0.40. With the exception of the ex-
  haustion of “brute-force” case Q06, Paisley is 1–2 (all solutions) or 2–4 (first
  solution) orders of magnitude faster, respectively; see Fig. 10. Accessing only the
  first solution with the on-board tools is particularly disappointing, being only
  marginally faster than exhausting all solutions. It appears that our motivation is
  confirmed, and conventionally developed tools are ill-suited to scalable lazy eval-
  uation, where external demand governs internal control. (More detailed results
  are given in Fig. 12 the appendix.)


  8           Conclusion
  The experimental figures for the on-board tools have been obtained without any
  tweaking of features; hence there is large uncertainty in the amount of possible
  improvements. Therefore the comparison should not be taken too literally. But
  we feel that it is fair in a certain sense nevertheless: Our Paisley implementation
  has been obtained in the straightest possible manner, also without any tweaking.
  Its advantage lies thus chiefly in the fact that we have chosen the application
  domain, namely the specified XPath fragment, and tailored the tool design to
  avoid any complexity inessential to the task at hand. It is this light-weight flex-
  ibility by effective manual programming we wish to leverage with the Paisley
approach – the capabilities of existing, more heavy-weight tools with respect to
automated, adaptive specialization are generally no match.

8.1   Related Work
Related work with regard to language design and implementation, and to object-
oriented-logic programming, has been discussed in [9,10] and [11], respectively.
     Purely declarative accounts of XPath, although theoretically interesting, have
little technical impact on our main concern, the concrete embedding in a main-
stream programming language. A few random examples: In [6], XPathLog is
presented, adding variable binding capabilities and sound Herbrandt semantics
to XPath, for a full-fledged logic programming language. In [4], an algorithmic
analysis of XPath in terms of modal logic is given. The language fragment and
experimental approach used there is a model predecessor for our own work, in-
cluding the particular benchmark. In the recent paper [1], the implementation
of XPath in the Haskell-like functional-logic programming language TOY is dis-
cussed. A combinatorial approach to XPath constructs very similar to Paisley is
taken, which appears to corroborate our claims of a natural design.

References
 1. Almendros-Jiménez, J., Caballero, R., Garcı́a-Ruiz, Y., Sáenz-Pérez, F.: XPath
    query processing in a functional-logic language. ENTCS 282, 19–34 (2012)
 2. Boag, S., Chamberlin, D., Fernández, M.F., Florescu, D., Robie, J., Siméon, J.:
    XQuery 1.0: An XML Query Language (Second Edition). W3C, http://www.w3.
    org/TR/2010/REC-xquery-20101214/ (2010)
 3. Clark, J., DeRose, S.: XML Path Language (XPath) Version 1.0. W3C, http:
    //www.w3.org/TR/1999/REC-xpath-19991116/ (1999)
 4. Franceschet, M., Zimuel, E.: Modal logic and navigational XPath: an experimental
    comparison. In: Proc. Workshop Methods for Modalities. pp. 156–172 (2005)
 5. Hughes, J.: Why Functional Programming Matters. Computer Journal 32(2), 98–
    107 (1989)
 6. May, W.: XPath-Logic and XPathLog: A logic-based approach for declarative
    XML data manipulation. Tech. Rep. 149, Institut für Informatik, Albert-Ludwigs-
    Universität Freiburg (2001)
 7. Schmidt, A.: XMark – an XML benchmark project (2009), http://www.
    xml-benchmark.org/
 8. Schmidt, A., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.:
    XMark: A benchmark for XML data management. In: Proc. 28th VLDB. pp. 974–
    985. Morgan Kaufmann (2002)
 9. Trancón y Widemann, B., Lepper, M.: Paisley: pattern matching à la carte. In:
    Proc. 5th ICMT. LNCS, vol. 7307, pp. 240–247. Springer (2012)
10. Trancón y Widemann, B., Lepper, M.: Paisley: A pattern matching library for ar-
    bitrary object models. In: Proc. 6th ATPS. LNI, vol. 215, pp. 171–186. Gesellschaft
    für Informatik (2013)
11. Trancón y Widemann, B., Lepper, M.: Some experiments on light-weight object-
    functional-logic programming in Java with Paisley. In: Declarative Programming
    and Knowledge Management. LNCS, vol. 8439. Springer (2014), in press
A    Appendix: Supplementary Material


Table 1. Size of input files and solution spaces for XPath expression test cases from
the XMark benchmark.

                             Size                             # Solutions
             Parameter     File (kiB)    # Nodes       Q01     Q06     Q15      Q16

                  0.04         4 648       191 083      439      870      6       5
                  0.08         9 212       379 719      884    1 740     17      15
                  0.12        13 696       569 434    1 301    2 604     29      28
                  0.16        18 100       748 766    1 732    3 480     32      29
                  0.20        22 964       946 554    2 142    4 350     37      34
                  0.24        27 396     1 129 553    2 610    5 214     44      40
                  0.28        31 976     1 315 902    3 045    6 090     62      58
                  0.32        36 616     1 502 611    3 453    6 960     60      53
                  0.36        41 076     1 690 597    3 857    7 824     61      54
                  0.40        45 600     1 877 979    4 310    8 700     68      59


 Table 2. Preparation times for Paisley and Java on-board XPath implementations.

                                         Q01       Q06        Q15       Q16

                   Paisley               56.40   101.73       89.26    154.32
                              (µs)
                 On-Board               233.44   204.73      162.61    215.92

                   Ratio                  4.14       2.01      1.82      1.40
               enum Axis {
                 ancestor, ancestorOrSelf, attribute, child, descendant, descendantOrSelf,
                 following, followingSibling, parent, preceding, precedingSibling, self
                   // . . .
               }
               public static Test node ( );
               public static Test text ( );
               public static Test name (String tagName);
               public static Predicate exists (Path cond);
               public static Predicate index (int i);
               public static Predicate not            (Predicate p);
               public static Predicate and            (Predicate p, Predicate q);
               public static Predicate or             (Predicate p, Predicate q);
               public static Path absolute ( );
               public static Path relative ( );


              Fig. 11. XPath operations as combinators in the Paisley application-layer library.

                                             Q01                                                       Q06
                6


                                                                           6
                5


                                                                           5
                4


                                                                           4
log10(t µs)

                3


                                                                           3
                2


                                                                           2


                                                                                ●     ●   ●       ●    ●     ●     ●     ●    ●    ●
                1


                                                                           1


                        ●         ●    ●     ●   ●    ●     ●   ●    ●
                              ●
                0


                                                                           0


                      0.04            0.16           0.28           0.40       0.04           0.16                0.28            0.40


                                             Q15                                                       Q16
                6


                                                                           6
                5


                                                                           5
                4


                                                                           4
log10(t µs)

                3


                                                                           3


                        ●                                   ●   ●    ●          ●                 ●    ●     ●     ●     ●    ●    ●
                                       ●     ●   ●    ●                               ●
                              ●   ●                                                       ●
                2


                                                                           2


                                                                                              ●   paisley eff          on−board eff
                1


                                                                           1


                                                                                                  paisley first        on−board first
                                                                                                  paisley all          on−board all
                0


                                                                           0


                      0.04            0.16           0.28           0.40       0.04           0.16                0.28            0.40

                                             size                                                       size


  Fig. 12. Running times for Paisley and Java on-board XPath implementations. Loga-
  rithmic scale; lower end of scale arbitrary.