Vertically acyclic conjunctive queries over trees ?

                         Filip Murlak and Grzegorz Zieliński

                                  University of Warsaw


        Abstract. Seeking a manageable subclass of conjunctive queries over
        trees that would reach beyond tree patterns, we find that vertical
        acyclicity of queries is sufficient to guarantee the same complexity bounds
        for static analysis problems, as those enjoyed by tree patterns.


1     Introduction
Conjunctive queries (CQs) are a staple of relational databases, offering a good
balance between expressive power and the computational cost of typical static
analysis problems, like query containment [3]. However, over tree-structured
data, like XML documents, their place is taken by tree patterns [7], which cor-
respond to downward, acyclic CQs. The arguments in their favour are threefold.
    First, allowing arbitrary CQs induces exponential increase in the complexity
of query containment in the presence of schema information: for unions of tree
patterns (using horizontal and vertical axes) it is ExpTime-complete in general
and ExpSpace-complete under non-recursive schemas (where the depth of trees
is bounded by the size of the schema) [1, 6, 9]; for unions of CQs it is 2ExpTime-
complete in general (even without horizontal axes) [2] and ExpSpace-complete
under non-recursive schemas (NExpTime-complete without horizontal axes) [8].
    Second, each CQ (over trees) can be rewritten as a union of exponentially
many polynomial-size tree patterns [5]. Hence, unions of tree patterns and unions
of CQs are equiexpressive languages, but the latter is more succinct. By choosing
tree patterns as the basic formalism, we put the burden of dealing with the
exponential blow-up on the user’s shoulders. An advantage—from the user’s
point of view—is that the computational cost is more directly reflected in the
size of queries, making it easier to control when writing them.
    Finally, over tree-structured data it is more natural to navigate in the tree,
rather than declaratively specifying properties of nodes in the fashion of first-
order logic. For acyclic queries, this navigational approach can be easily sup-
ported with appropriate syntax, as illustrated palpably by the popularity of the
XPath query language. Certain attempts of going beyond acyclic queries have
been made with the introduction of path intersection operator to XPath 2.0, but
there seems to be no natural way of dealing with full CQs.
    Still, following [2], one can ask how much of the succinctness of full con-
junctive queries can be allowed without compromising the complexity bounds.
A partial answer was given in [8]: without influencing the complexity bounds,
?
    Supported by Poland’s National Science Center grant 2013/11/D/ST6/03075.
                    ∗                         ∗       ∗      a1       ∗     ∗      a2
 ∗      ∗      a1       ∗      ∗      a2      a1      ∗      ∗        a2    ∗       ∗
a1      ∗       ∗       a2     ∗       ∗                          ∗


Fig. 1. A tree pattern with horizontal CQs (left) and a vertically acyclic CQ (right).
Solid, dashed, and dotted edges represent next-sibling, following-sibling, and descen-
dent relations; asterisk represents a node with an unspecified label. The left query
checks if both a1 and a2 occur twice among siblings, separated by 1 or 2 nodes; the
right query checks if this happens along a single branch.

one can extend tree patterns with arbitrary horizontal CQs over siblings (see
Fig. 1); arbitrary joins with horizontal CQs cannot be allowed, as they may
induce additional vertical relations, breaking the acyclicity condition. The ar-
gument combines the well-known automata construction for tree patterns with
a new construction of an exponential-size deterministic automaton over words,
equivalent to a given horizontal CQ.
    It is well known, however, that an exponential-size automaton can be con-
structed for any acyclic CQ (see e.g. [4]). Since acyclic CQs can navigate up and
down the tree, the construction is more involved than for tree patterns (which
only go down). Can it be used to extend the result of [8] from tree patterns
with horizontal CQs to vertically acyclic CQs (see Fig. 1)? We show that the
answer is yes, but rather unexpectedly it requires substantial additional effort.
The construction for the horizontal patterns provided by [8] is too weak: we need
an automaton that finds all matches of the horizontal CQ in the input word, not
just some match. In other words, we need a way to run multiple copies of the
original automaton without affecting drastically the size of the state-space.
    The reminder of the paper is organized as follows: in Section 2 we recall the
basic notions, in Section 3 we construct the enhanced automaton for horizontal
CQs, and in Section 4 we combine it with the construction for acyclic CQs.


2    Preliminaries

XML documents and trees. We model XML documents as unranked labelled
trees. Formally, a tree over a finite labelling alphabet Γ is a relational structure
                   +
T = hT, ↓, ↓+ , →, →, (aT )a∈Γ i, where

 – the set T is a finite unranked tree domain, i.e., a finite prefix-closed subset
   of N∗ such that v · i ∈ T implies v · j ∈ T for all j < i;
 – the binary relations ↓ and → are the child relation (v ↓ v · i) and the next-
   sibling relation (v · i → v · (i + 1));
            +
 – ↓+ and → are transitive closures of ↓ and →;
     T
 – (a )a∈Γ is a partition of the domain T into possibly empty sets.

We write |T | to denote the number of nodes of tree T . The partition (aT )a∈Γ
defines a labelling of the nodes of T with elements of Γ , denoted by `T .
Tree automata. We abstract schemas as tree automata. We use a variant in
which the state in a node v depends on the states in the previous sibling and
the last child of v. Formally, an automaton is a tuple A = (Σ, Q, q0 , F, δ), where
Σ is the labelling alphabet (the set of element types in our case), Q is the state
space with the initial state q0 and final states F , and δ ⊆ Q × Q × Σ × Q is the
transition relation. A run of A over a tree T is a labelling ρ of the nodes of T
with the states of A such that for each node v with children v ·0, v ·1, . . . , v ·k and
previous sibling w, ρ(w), ρ(v · k), `T (v), ρ(v) ∈ δ. If v has no previous sibling,
ρ(w) in the condition above is replaced with q0 . Similarly, if v has no children,
ρ(v · k) is replaced with q0 . The language of trees recognized by A, denoted by
L(A), consists of all trees admitting an accepting run of A, i.e. a run that assigns
one of the final states to the root.
   A schema is nonrecursive, if the depth of trees it accepts is bounded by a
constant dependent on the schema. For schemas defined with automata, this
constant is always at most the number of states.

CQs and patterns. A conjunctive query (CQ) over alphabet Γ is a formula of
first order logic using only conjunction and existential quantification, over unary
                                                                 +
predicates a(x) for a ∈ Γ and binary predicates ↓, ↓+ , →, → (referred to as child,
descendant, next sibling, and following sibling, respectively). Since we work only
with Boolean queries, to avoid unnecessary clutter we often skip the quantifiers,
assuming that all variables are by default quantified existentially.
    An alternative way of looking at CQs is via patterns. A pattern π over Γ can
be presented as π = hV, Ec , Ed , En , Ef , `π i where `π is a partial function from
V to Γ , and hV, Ec ∪ Ed ∪ En ∪ Ef i is a finite graph whose edges are split into
child edges Ec , descendant edges Ed , next-sibling edges En , and following-sibling
edges Ef . By kπk we mean the size of the underlying graph.
                                                  +
    We say that a tree T = hT, ↓, ↓+ , →, →, (aT )a∈Γ i satisfies a pattern π =
hV, Ec , Ed , En , Ef , `π i, denoted T |= π, if there exists a homomorphism h : π →
T , i.e., a function h : V → T such that
                                                +
 – h : hV, Ec , Ed , En , Ef i → hT, ↓, ↓+ , →, →i is a homomorphism of relational
   structures; and
 – `T (h(x)) = `π (x) for all x in the domain of `π .

Each pattern can be seen as a CQ, and vice versa. In what follows we use the
terms “pattern” and “CQ” interchangeably. Tree patterns are patterns whose
underlying graph is a directed tree with edges (of the four kinds) pointing from
parents to children.


3    Finding all matches of a horizontal pattern

We write V w for the set of positions of word w, {1, 2, . . . , |w|}, and use ≤ and
+1 for the standard order and successor on the positions of words. A horizontal
                         +
pattern uses only → and → edges. Homomorphisms into words are defined just
like for trees. Whenever we write h : π → w, we implicitly assume that h is a
homomorphism. By a pattern π rooted at vertex x ∈ V π we mean a pair (π, x);
for i ∈ V w we write w, i |= (π, x) if there is h : π → w such that h(x) = i.
    The aim of this section is to prove the following theorem, which provides an
economic construction of an automaton finding all matches of a rooted horizontal
pattern in the input word, used crucially in the proof of the main result.
Theorem 1. For each rooted horizontal pattern (π, x) one can construct in poly-
nomial working space an exponential-size deterministic automaton Matchπ,x
recognizing the language of words (a1 , α1 )(a2 , α2 ) . . . (an , αn ) ∈ (Γ × {>, ⊥})∗
such that for w = a1 a2 . . . an and all i ∈ V w , αi = > if and only if w, i |= (π, x).
    We shall think of Matchπ,x as a decision procedure reading letter by letter a
word w ∈ Γ ∗ and an additional labelling α ∈ {>, ⊥}|w| , storing some information
in the working memory of size polynomial in kπk and independent from |w|.

Earliest matchings. Like in [8], Matchπ,x will look for the earliest (leftmost)
matchings witnessing w, i |= (π, x). A word with infinity w∞ is a word w ex-
tended with an additional position ∞, greater then all original positions, that
is its own successor, and bears all the labels in Γ . Over words with infinity we
use <· to denote the composition of ≤ and the successor relation, which can
be equivalently defined as: x <· y iff x < y or x = y = ∞. The notion of ho-
momorphism extends naturally to words with infinity. Note that since ∞ is not
successor of any ordinary position, there can be no → edge between a vertex
mapped to an ordinary position and a vertex mapped to ∞, but there can be
a → edge between two vertices mapped to ∞. The source and target of each
 +
→ edge must be mapped to positions i <· j. Note that each homomorphism into
w∞ induces by restriction a partial homomorphism into w (we shall blur the
distinction between them), but not every partial homomorphism into w extends
to a homomorphism into w∞.
Definition 1. Let g, h : π → w∞ be two homomorphisms.
 – We write g ≤ h if g(x) ≤ h(x) for each vertex x of π.
 – We define min(g, h) : π → w∞ as min(g, h)(x) = min(g(x), h(x)).
 – We say that h respects relation η ⊆ V π × V w , if h ⊆ η ∪ V π × {∞}.
    Note that h constantly equal to ∞ respects each η. We will be mostly in-
terested in two special cases of η: extending a fixed partial homomorphism, and
mapping the pattern’s root to a fixed position (or one of several fixed positions),
but for conciseness we treat them in this abstract fashion.
Lemma 1. 1. For all g, h : π → w∞, min(g, h) is a homomorphism.
 2. There exists hmin : π → w∞ such that hmin ≤ h for all h : π → w∞.
 3. For each relation η ⊆ V π × V w∞ , there exists hηmin : π → w∞ respecting η
    such that hηmin ≤ h0 for each h0 : π → w∞ respecting η.                   t
                                                                              u
We call the unique hmin from Lemma 1 the earliest matching of π in w∞, and
hηmin the earliest matching respecting η. Note that hmin = hηmin for η = V π ×V w .
Of course, pattern π can be matched in word w if and only if the earliest matching
hmin : π → w∞ maps no vertex to ∞, and similarly for matchings respecting η.
Algorithm. The procedure Matchπ,x works with components of π, called firm
subpatterns, described in Definition 3.

Definition 2. A →-component of π is a maximal connected subgraph of the
→-graph of π. In the graph of →-components of π, denoted Gπ , there is an edge
                                                         +
from a →-component π1 to a →-component π2 if there is a → edge in π from a
vertex of π1 to a vertex of π2 .

Definition 3. A pattern π is firm if Gπ is strongly connected. In general, each
strongly connected component X of Gπ defines a firm subpattern of π: the sub-
graph of π induced by the vertices of →-components contained in X. The DAG
of firm subpatterns of π, denoted Fπ , is the standard DAG of strongly connected
components of Gπ .

All four horizontal subpatterns in Fig. 1 are firm, but have two →-components.
    Matchπ,x (w) does two independent checks, the positive and the negative; it
accepts when both checks accept. The negative check should reject if it finds a
matching of π that maps x to a position with ⊥, and accept otherwise. We read
the input word w from left to right letter by letter, trying to build the earliest
matching of π that maps x to a ⊥ position. To process position i,

 – compute the earliest matching in w1..i ∞ that extends the constructed partial
   homomorphism into w1..i−1 and maps x to a ⊥ position.

The positive check is dual: it should accept if it can find matchings for all po-
sitions with >. To keep the used space bounded, instead of running a separate
check for each position with >, we run a single test, but we partially reset the
constructed partial homomorphism each time we see >. We keep a look-ahead
of size kπk; that is, when position i is being processed, we already have read the
input word up to position i + kπk. To process position i,

 – if α(i) = >, reset the partial homomorphism constructed so far: restrict it
   to components not reachable from the firm component containing x;
 – compute the earliest matching in w1..i+kπk ∞ that extends the currently
   stored partial homomorphism and maps x to the most recent position j ≤ i
   with α(j) = >;
 – if α(i) = >, but the root component cannot be matched within w1..i+kπk ,
   reject immediately.

Accept if after processing the whole w a full homomorphism π → w is found.

Invariants. Correctness of the algorithm follows from the following invariants:

Invariant 1 The partial homomorphism computed by the negative check after
   reading the i-th letter is the earliest matching of π in w1..i ∞ mapping x to
   a ⊥ position.
Invariant 2 The partial homomorphism h computed by the positive check after
   processing the i-th letter is the earliest matching of π in w1..i+kπk ∞ mapping
   Sto the most recent > position in w1..i (not mapping it at all, i.e., respecting
   x
     y6=x {y} × {1, 2, . . . , i + kπk}, if there has been no > position yet); for each
   earlier > position j, the earliest matching hj : π → w1..i+kπk ∞ mapping x
   to j, satisfies hj ≤ h.

   Invariant 1 follows immediately from Lemma 2, by induction on i.

Lemma 2. The earliest matchings h : π → u∞ and h0 : π → uu0 ∞ respecting
η ⊆ V π × V u always agree over vertices mapped to u by h.             t
                                                                       u

   For Invariant 2, we use the following properties of the earliest matchings.

Lemma 3. Let h0 : π → u∞ be the earliest matching respecting y6=x {y} × V u ,
                                                                 S
and let hi : π → u∞ be the earliest matching that maps the root x of π to i ∈ V u .

1. For every node y, whose component cannot be reached from the component
   of the root x, hi (y) = hj (y) for all i, j.
2. If hi (x) = i and 1 ≤ i ≤ j, then hi ≤ hj .                         t
                                                                       u

    Slightly abusing notation, we shall apply the same symbol hj from Lemma 3
for different words; that is, hj : π → w1..i−1+kπk ∞ is not the same as hj : π →
w1..i+kπk ∞ (although, by Lemma 2, the latter extends the former).
    Before reading the first letter, Invariant 2 is satisfied: the empty partial ho-
momorphism is the earliest matching in the empty word (with infinity), and the
second part trivialises. Assume that Invariant 2 holds for i − 1 and let us see
that it holds for i. Let gi−1 be the partial homomorphism constructed so far,
and let gi be the one to be constructed. By Invariant 2 for i − 1, gi−1 is the
earliest matching of π in w1..i−1+kπk ∞ that maps root x to the most recent
> position, or using notation from Lemma 3, gi−1 = hj : π → w1..i−1+kπk ∞
for some 0 ≤ j ≤ i − 1. If α(i) = ⊥, the algorithm computes gi as the earli-
est matching extending the partial homomorphism gi−1 and mapping x to the
most recent > position in w1..i+kπk —which is also the most recent > position
in w1..i−1+kπk —and we conclude by Lemma 2. Assume that α(i) = >. Then,
                            0
the algorithm computes gi−1     , by restricting gi−1 to components that are not
                                                         0
reachable from the root component. By Lemma 2, gi−1          ⊆ hj : π → w1..i+kπk ∞.
                     0
By Lemma 3 (1), gi−1 ⊆ hi : π → w1..i+kπk ∞. Hence, the earliest matching in
                            0
w1..i+kπk ∞ that extends gi−1    and maps x to i is equal to hi : π → w1..i+kπk ∞.
Since this is exactly how gi is computed, we have gi = hi : π → w1..i+kπk ∞. This
concludes the proof of the first part of Invariant 2. To prove the second part,
assume that i is not the first ⊥ position; that is, 0 < j < i. Since we have not
rejected yet, gi−1 = hj : π → w1..i−1+kπk ∞ matches the root component of π
and gi−1 (x) = j. By Lemma 2, also hj : π → w1..i+kπk ∞ maps x to j (rather
than to ∞). Hence, the second part of Invariant 2 for i follows from Lemma 3
(2) and Invariant 2 for i − 1.
Correctness. Let us begin with a simple observation. It is routine to check that
each homomorphic image of a firm pattern π0 is a subword of length at most
kπ0 k. Consequently, if a node y of the pattern is matched at position i, then all
firm components from which the component of y is reachable, must be matched
before position i + kπk: they can reach at most kπk positions forward.
     If the algorithm accepts, Invariant 2 guarantees that all suitable homomor-
phisms exist: indeed, if the earliest matching for the last > position does not use
∞, neither do the ones for the earlier > positions.
     Assume that the algorithm rejects. That means that hi : π → w1..i+kπk ∞
does not match the root component (that is, matches it to ∞). Assume that
hi : π → w∞ does match the root component. Then it maps x to i, so by the
observation above it maps all the components from which the component of y
is reachable, within w1..i+kπk . Restricting hi : π → w∞ to these components
and extending with infinity to other vertices of π would give a matching earlier
then hi : π → w1..i+kπk ∞ — a contradiction. Thus, hi : π → w∞ is not a total
homomorphism into w, so w, i 6|= (π, x) and the algorithm rejects correctly.

Memory bound. Now that we have seen that Matchπ,x (w) is correct, we need
to bound the memory it uses. We claim that while processing position i the
algorithm only needs to have access to positions between i − kπk and i + kπk.
That means that the algorithm only needs to store last 2 · |π| + 1 symbols read,
plus the matching constructed so far, restricted to this suffix.
    We shall use a dualized version of the observation made previously: if a node
y of the pattern is matched at position i, then all firm components reachable
from the component of y must be matched after position i − kπk: they can reach
at most kπk positions back.
    For the negative check, the claim is almost obvious. After reading a new sym-
bol we match the subpatterns greedily, until no more can be matched. Among
matched patterns each has an ancestor (possibly itself) that touches the last
symbol (otherwise, it would have been matched before). By the dualized obser-
vation none of these patterns can reach back further than kπk, so it suffices to
store last kπk symbols read.
    For the positive check the situation is similar, except that before we start
matching we first check if kπk positions back we had > or ⊥. If it was ⊥ we
proceed essentially like in the negative check. But if it was >, we rematch the
firm components reachable from the component of the root x, in such a way
that x is matched kπk positions back. Again, by the dualized observation, the
rematched components can reach at most kπk more positions back, so they also
fit within the intended window.


4   The main result

Using the matching procedure we prove our main result: we show that extending
acyclic CQs using vertical axes with arbitrary horizontal CQs over siblings does
not increase the complexity of the containment problem. In fact, we shall work
with a more general problem of satisfiability of Boolean combinations, or BC-SAT:
given a Boolean combination of patterns ϕ and an automaton A, decide if there is
a tree T ∈ L(A) such that T |= ϕ. Containment for unions of CQs corresponds to
non-satisfiability for Boolean combinations of the form π ∧ ¬π1 ∧ ¬π2 ∧ · · · ∧ ¬πk .
Since we work with boolean combinations anyway, each disconnected pattern
can be replaced with a conjunction of connected patterns. Thus, we shall only
consider connected patterns from now on.
    Assuming that vertical edges propagate through horizontal edges (e.g., next
sibling of a child is also explicitly connected with a child edge), one can de-
fine vertically acyclic patterns as those whose ↓, ↓+ -subgraph is acyclic in the
undirected sense; since patterns are connected, it is then an undirected tree.
Without this assumption we define this notion as follows. A horizontal compo-
                                                              +
nent of pattern π is a connected component of the →, →-subgraph of π. Let
                +
Hπ = hVπ , ↓, ↓ i be a graph over horizontal components of π, where edge X ↓ Y
is present if x ↓ y for some x ∈ X and y ∈ Y , and X ↓+ Y is present if x ↓+ y
for some x ∈ X and y ∈ Y , but there is no edge X ↓ Y . We say that pattern π
is vertically acyclic, if this graph is acyclic in the undirected sense. Again, as π
is connected, the graph is an undirected tree.
    Given a vertically acyclic pattern π, by simple preprocessing we ensure that
there is at most one edge in π between each two horizontal components: if there
are more, we can merge their starting points in the parent/ancestor horizontal
component, and drop all edges except one (a child edge, if there is one).

Theorem 2. For vertically acyclic patterns, BC-SAT is ExpTime-complete, and
PSpace-complete under non-recursive schemas.

Proof. We shall see that for a vertically acyclic pattern π one can construct an
equivalent automaton Aπ in polynomial working space (states have polynomial-
size representations and all ingredients of Aπ can be enumerated in polynomial
working space). Despite its nondeterminism, we will be able to complement Aπ
by simply changing accepting states. Hence, one can easily reduce BC-SAT to
nonemptiness of tree automata, by constructing in polynomial working space an
automaton Aϕ equivalent to a given boolean combination ϕ. The product B of
Aϕ with a nonrecursive automaton A can be tested for nonemptiness by a non-
deterministic algorithm using space O(kAk · log kAk · poly(kϕk)), which guesses
a tree of depth at most kAk in the depth-first order and nondeterministically
evaluates B over it, processing it in the post-order fashion. By Savitch’s theorem,
this gives a PSpace algorithm for satisfiability under nonrecursive schemas. As
Aϕ is singly exponential in kϕk, the standard polynomial-time emptiness test
gives an ExpTime algorithm for satisfiability under arbitrary schemas.
    The idea is that the automaton Aπ guesses in each node which subpatterns
are matched there, and then checks that these guesses are consistent: a subpat-
tern is matched in some node if and only if its root can be matched there and all
its sub-subpatterns can be matched in appropriate nodes. To ensure that these
dependencies are well founded, we root the undirected tree forming the graph
Hπ , by arbitrarily fixing a root component and treating all the edges as pointing
outwards. This builds the structure of a directed tree over the graph Hπ ; the
orientation of edges in this directed tree is entirely unrelated to the character
of the corresponding edges in the pattern: child and descendent edges can point
both up and down the tree. We use this structure to speak precisely of subpat-
terns: for each horizontal component X of π we define subpattern π.X obtained
by restricting π to X and all descendants of X in the directed tree. If component
X has successors Y1 , Y2 , . . . , Yn in the directed tree, then the subpattern π.X is
composed of the nodes in the horizontal component X connected to subpatterns
π.Y1 , π.Y2 , . . . , π.Yn with child or descendant edges, pointing to or form X. We
call patterns π.Y1 , π.Y2 , . . . , π.Yn the immediate subpatterns of pattern π.X. Un-
less X is the root component of the whole directed tree, it is connected to some
other horizontal component Z, with a child or descendent edge, pointing to or
from X. If the edge points to X, we call π.X a down-subpattern. Otherwise, we
call it an up-subpattern. In either case, we call the vertex of X that is the end
of this edge, the root vertex of π.X; if X is the root component of the whole
tree, we choose an arbitrary vertex in X for the root vertex. We say that π.X is
matched at node v if its root vertex is mapped to v.
    The automaton Aπ , before reading the sequence of children of a node v,
non-deterministically selects the set Expected (v) of subpatterns that can be
matched at v. Similarly, Aπ non-deterministically selects a set ExpectedAbove(v)
of subpatterns that can be matched at some ancestor of v. (Note that for down-
subpatterns the exact place where the root vertex is matched among siblings is
irrelevant; we make the distinction only for the purpose of uniform treatment
by the procedure match.) After selecting Expected (v) and ExpectedAbove(v), the
automaton proceeds to read the sequence of children of v. While doing so it
computes the set Matched (v) of subpatterns of π matched in the children of v
and the set MatchedBelow (v) of subpatterns that were matched in the children
of some descendant of v, and performs a number of consistency checks using a
slightly modified version of the procedure Match, described below.
    With each subpattern π.X we associate a horizontal pattern πX obtained
by restricting π to X and including in the label of each vertex x the set Φ of
immediate subpatterns π.Y of π.X connected to x. In the modified procedure
Match for πX , a vertex labelled with (σ, Φ) can be matched in a position labelled
with (τ, Ψ ) only if σ = τ and Φ ⊆ Ψ . It is straightforward to check that this does
not influence the correctness of Match. Observe that the extended alphabet
is exponential, but each symbol can be stored in polynomial memory. Hence,
Match still works in memory polynomial in the size of the pattern.
    Reading the sequence of children of v, the automaton performs simultane-
ously (using the product construction) the following tasks:

 1. Check that Expected (v) and ExpectedAbove(v) do not contradict the choices
    for the children of v. The subpatterns expected to be matched in an ancestor
    of a child w must either be matched in the parent of w or an ancestor of the
    parent of w; that is, ExpectedAbove(w) = Expected (v) ∪ ExpectedAbove(v)
    must hold for each child w. If for some w it fails to hold, the computation
    stops and no state is assigned to v, because the guesses are inconsistent.
2. Check that Expected (w) was chosen correctly for each child w. This is done
   by simultaneously running procedure Match for each subpattern π.X (with
   the fixed root vertex) over the children of v. For each child w, the procedure
   is fed with an extended symbol that consists of:
     – the label of node w,
     – the set of immediate subpatterns of π.X matched below w (computed
        from Matched (w) and MatchedBelow (w), both empty for leaves), and
        above w (computed from Expected (v) and ExpectedAbove(v)),
     – > if π.X ∈ Expected (w), and ⊥ otherwise.
   If some instance of Match rejects, the computation stops and no state is
   assigned to v, since the guesses made below have proved to be incorrect.
3. Compute the sets Matched (v) and MatchedBelow (v), based on Matched (w),
   MatchedBelow (w), and Expected (w) for each child w of v.
The automaton treats the root as any other node; that is, it performs the
tasks above for the single-node sequence of siblings consisting only from the
root, treating it as a child of a dummy node #. The automaton accepts if:
Expected (#) = ExpectedAbove(#) = ∅ and π ∈ Matched (#)∪MatchedBelow (#).
    By the structural induction over subpatterns one easily proves that in any
complete run (accepting or rejecting) of the described automaton Aπ , the sets
Expected (v), ExpectedAbove(v), Matched (v), and MatchedBelow (v) are guessed
or computed correctly for each node of the input tree. Hence, the construction
is correct. Moreover, for each node v of the input tree, there exists exactly
one correct set Expected (v) and one correct set ExpectedAbove(v). Hence, for
each input tree there is exactly one complete run, modulo the last guess of
Expected (#) and ExpectedAbove(#). Consequently, the automaton Aπ can be
complemented simply by changing the acceptance condition to: Expected (#) =
ExpectedAbove(#) = ∅ and π ∈  / Matched (#) ∪ MatchedBelow (#).              t
                                                                             u

References
 1. M. Benedikt, W. Fan, F. Geerts. XPath satisfiability in the presence of DTDs.
    J. ACM 55(2), 2008.
 2. H. Björklund, W. Martens, T. Schwentick. Optimizing conjunctive queries over
    trees using schema information. Proc. MFCS 2008: 132–143.
 3. A. K. Chandra, P. M. Merlin. Optimal implementation of conjunctive queries in
    relational data bases. Proc. STOC 1977: 77–90.
 4. N. Francis, C. David, L. Libkin. A Direct Translation from XPath to Nondeter-
    ministic Automata. Proc. AMW 2011.
 5. G. Gottlob, C. Koch, K. Schulz. Conjunctive queries over trees. J. ACM 53 (2006),
    238–272.
 6. M. Marx. XPath with conditional axis relations. Proc. EDBT 2004: 477–494.
 7. G. Miklau, M. Suciu. Containment and equivalence for a fragment of XPath.
    J. ACM 51(1): 2–45, 2004.
 8. F. Murlak, M. Ogiński, M. Przybylko. Between Tree Patterns and Conjunctive
    Queries: Is There Tractability beyond Acyclicity? Proc. MFCS 2012: 705-717.
 9. F. Neven, T. Schwentick. On the complexity of XPath containment in the presence
    of disjunction, DTDs, and variables. Logical Meth. in Comp. Sci. 2(3): 1–30, 2006.