<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Vertically acyclic conjunctive queries over trees ?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Filip Murlak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grzegorz Zielinski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Warsaw</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <abstract>
        <p>Seeking a manageable subclass of conjunctive queries over trees that would reach beyond tree patterns, we nd that vertical acyclicity of queries is su cient to guarantee the same complexity bounds for static analysis problems, as those enjoyed by tree patterns. Conjunctive queries (CQs) are a staple of relational databases, o ering a good balance between expressive power and the computational cost of typical static analysis problems, like query containment [3]. However, over tree-structured data, like XML documents, their place is taken by tree patterns [7], which correspond to downward, acyclic CQs. The arguments in their favour are threefold. First, allowing arbitrary CQs induces exponential increase in the complexity of query containment in the presence of schema information: for unions of tree patterns (using horizontal and vertical axes) it is ExpTime-complete in general and ExpSpace-complete under non-recursive schemas (where the depth of trees is bounded by the size of the schema) [1, 6, 9]; for unions of CQs it is 2ExpTimecomplete in general (even without horizontal axes) [2] and ExpSpace-complete under non-recursive schemas (NExpTime-complete without horizontal axes) [8]. Second, each CQ (over trees) can be rewritten as a union of exponentially many polynomial-size tree patterns [5]. Hence, unions of tree patterns and unions of CQs are equiexpressive languages, but the latter is more succinct. By choosing tree patterns as the basic formalism, we put the burden of dealing with the exponential blow-up on the user's shoulders. An advantage|from the user's point of view|is that the computational cost is more directly re ected in the size of queries, making it easier to control when writing them. Finally, over tree-structured data it is more natural to navigate in the tree, rather than declaratively specifying properties of nodes in the fashion of rstorder logic. For acyclic queries, this navigational approach can be easily supported with appropriate syntax, as illustrated palpably by the popularity of the XPath query language. Certain attempts of going beyond acyclic queries have been made with the introduction of path intersection operator to XPath 2.0, but there seems to be no natural way of dealing with full CQs. Still, following [2], one can ask how much of the succinctness of full conjunctive queries can be allowed without compromising the complexity bounds. A partial answer was given in [8]: without in uencing the complexity bounds,</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>a1
a1
a2
a2
a1
a2
one can extend tree patterns with arbitrary horizontal CQs over siblings (see
Fig. 1); arbitrary joins with horizontal CQs cannot be allowed, as they may
induce additional vertical relations, breaking the acyclicity condition. The
argument combines the well-known automata construction for tree patterns with
a new construction of an exponential-size deterministic automaton over words,
equivalent to a given horizontal CQ.</p>
      <p>
        It is well known, however, that an exponential-size automaton can be
constructed for any acyclic CQ (see e.g. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). Since acyclic CQs can navigate up and
down the tree, the construction is more involved than for tree patterns (which
only go down). Can it be used to extend the result of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] from tree patterns
with horizontal CQs to vertically acyclic CQs (see Fig. 1)? We show that the
answer is yes, but rather unexpectedly it requires substantial additional e ort.
The construction for the horizontal patterns provided by [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is too weak: we need
an automaton that nds all matches of the horizontal CQ in the input word, not
just some match. In other words, we need a way to run multiple copies of the
original automaton without a ecting drastically the size of the state-space.
      </p>
      <p>The reminder of the paper is organized as follows: in Section 2 we recall the
basic notions, in Section 3 we construct the enhanced automaton for horizontal
CQs, and in Section 4 we combine it with the construction for acyclic CQs.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>XML documents and trees. We model XML documents as unranked labelled
trees. Formally, a tree over a nite labelling alphabet is a relational structure
T = hT; #; #+; !; !+; (aT )a2 i, where
{ the set T is a nite unranked tree domain, i.e., a nite pre x-closed subset
of N such that v i 2 T implies v j 2 T for all j &lt; i;
{ the binary relations # and ! are the child relation (v # v i) and the
nextsibling relation (v i ! v (i + 1));
{ #+ and !+ are transitive closures of # and !;
{ (aT )a2 is a partition of the domain T into possibly empty sets.
We write jT j to denote the number of nodes of tree T . The partition (aT )a2
de nes a labelling of the nodes of T with elements of , denoted by `T .
Tree automata. We abstract schemas as tree automata. We use a variant in
which the state in a node v depends on the states in the previous sibling and
the last child of v. Formally, an automaton is a tuple A = ( ; Q; q0; F; ), where
is the labelling alphabet (the set of element types in our case), Q is the state
space with the initial state q0 and nal states F , and Q Q Q is the
transition relation. A run of A over a tree T is a labelling of the nodes of T
with the states of A such that for each node v with children v 0; v 1; : : : ; v k and
previous sibling w, (w); (v k); `T (v); (v) 2 . If v has no previous sibling,
(w) in the condition above is replaced with q0. Similarly, if v has no children,
(v k) is replaced with q0. The language of trees recognized by A, denoted by
L(A), consists of all trees admitting an accepting run of A, i.e. a run that assigns
one of the nal states to the root.</p>
      <p>A schema is nonrecursive, if the depth of trees it accepts is bounded by a
constant dependent on the schema. For schemas de ned with automata, this
constant is always at most the number of states.</p>
      <p>CQs and patterns. A conjunctive query (CQ) over alphabet is a formula of
rst order logic using only conjunction and existential quanti cation, over unary
predicates a(x) for a 2 and binary predicates #; #+; !; !+ (referred to as child,
descendant, next sibling, and following sibling, respectively). Since we work only
with Boolean queries, to avoid unnecessary clutter we often skip the quanti ers,
assuming that all variables are by default quanti ed existentially.</p>
      <p>An alternative way of looking at CQs is via patterns. A pattern over can
be presented as = hV; Ec; Ed; En; Ef ; ` i where ` is a partial function from
V to , and hV; Ec [ Ed [ En [ Ef i is a nite graph whose edges are split into
child edges Ec, descendant edges Ed, next-sibling edges En, and following-sibling
edges Ef . By k k we mean the size of the underlying graph.</p>
      <p>We say that a tree T = hT; #; #+; !; !+; (aT )a2 i satis es a pattern =
hV; Ec; Ed; En; Ef ; ` i, denoted T j= , if there exists a homomorphism h : !
T , i.e., a function h : V ! T such that
+
{ h : hV; Ec; Ed; En; Ef i ! hT; #; #+; !; !i is a homomorphism of relational
structures; and
{ `T (h(x)) = ` (x) for all x in the domain of ` .</p>
      <p>Each pattern can be seen as a CQ, and vice versa. In what follows we use the
terms \pattern" and \CQ" interchangeably. Tree patterns are patterns whose
underlying graph is a directed tree with edges (of the four kinds) pointing from
parents to children.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Finding all matches of a horizontal pattern</title>
      <p>We write V w for the set of positions of word w, f1; 2; : : : ; jwjg, and use and
+1 for the standard order and successor on the positions of words. A horizontal
pattern uses only ! and !+ edges. Homomorphisms into words are de ned just
like for trees. Whenever we write h : ! w, we implicitly assume that h is a
homomorphism. By a pattern rooted at vertex x 2 V we mean a pair ( ; x);
for i 2 V w we write w; i j= ( ; x) if there is h : ! w such that h(x) = i.</p>
      <p>The aim of this section is to prove the following theorem, which provides an
economic construction of an automaton nding all matches of a rooted horizontal
pattern in the input word, used crucially in the proof of the main result.
Theorem 1. For each rooted horizontal pattern ( ; x) one can construct in
polynomial working space an exponential-size deterministic automaton Match ;x
recognizing the language of words (a1; 1)(a2; 2) : : : (an; n) 2 ( f&gt;; ?g)
such that for w = a1a2 : : : an and all i 2 V w, i = &gt; if and only if w; i j= ( ; x).</p>
      <p>
        We shall think of Match ;x as a decision procedure reading letter by letter a
word w 2 and an additional labelling 2 f&gt;; ?gjwj, storing some information
in the working memory of size polynomial in k k and independent from jwj.
Earliest matchings. Like in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Match ;x will look for the earliest (leftmost)
matchings witnessing w; i j= ( ; x). A word with in nity w1 is a word w
extended with an additional position 1, greater then all original positions, that
is its own successor, and bears all the labels in . Over words with in nity we
use &lt; to denote the composition of and the successor relation, which can
be equivalently de ned as: x &lt; y i x &lt; y or x = y = 1. The notion of
homomorphism extends naturally to words with in nity. Note that since 1 is not
successor of any ordinary position, there can be no ! edge between a vertex
mapped to an ordinary position and a vertex mapped to 1, but there can be
a ! edge between two vertices mapped to 1. The source and target of each
!+ edge must be mapped to positions i &lt; j. Note that each homomorphism into
w1 induces by restriction a partial homomorphism into w (we shall blur the
distinction between them), but not every partial homomorphism into w extends
to a homomorphism into w1.
      </p>
      <p>De nition 1. Let g; h :</p>
      <p>! w1 be two homomorphisms.
{ We write g h if g(x) h(x) for each vertex x of .
{ We de ne min(g; h) : ! w1 as min(g; h)(x) = min(g(x); h(x)).
{ We say that h respects relation V V w, if h [ V f1g.</p>
      <p>Note that h constantly equal to 1 respects each . We will be mostly
interested in two special cases of : extending a xed partial homomorphism, and
mapping the pattern's root to a xed position (or one of several xed positions),
but for conciseness we treat them in this abstract fashion.</p>
      <p>Lemma 1. 1. For all g; h : ! w1, min(g; h) is a homomorphism.
2. There exists hmin : ! w1 such that hmin h for all h : ! w1.
3. For each relation V V w1, there exists hmin : ! w1 respecting
such that hmin h0 for each h0 : ! w1 respecting .
tu
We call the unique hmin from Lemma 1 the earliest matching of in w1, and
hmin the earliest matching respecting . Note that hmin = hmin for = V V w.
Of course, pattern can be matched in word w if and only if the earliest matching
hmin : ! w1 maps no vertex to 1, and similarly for matchings respecting .
Algorithm. The procedure Match ;x works with components of , called rm
subpatterns, described in De nition 3.</p>
      <p>De nition 2. A !-component of is a maximal connected subgraph of the
!-graph of . In the graph of !-components of , denoted G , there is an edge
from a !-component 1 to a !-component 2 if there is a !+ edge in from a
vertex of 1 to a vertex of 2.</p>
      <p>De nition 3. A pattern is rm if G is strongly connected. In general, each
strongly connected component X of G de nes a rm subpattern of : the
subgraph of induced by the vertices of !-components contained in X. The DAG
of rm subpatterns of , denoted F , is the standard DAG of strongly connected
components of G .</p>
      <p>All four horizontal subpatterns in Fig. 1 are rm, but have two !-components.</p>
      <p>Match ;x(w) does two independent checks, the positive and the negative; it
accepts when both checks accept. The negative check should reject if it nds a
matching of that maps x to a position with ?, and accept otherwise. We read
the input word w from left to right letter by letter, trying to build the earliest
matching of that maps x to a ? position. To process position i,
{ compute the earliest matching in w1::i1 that extends the constructed partial
homomorphism into w1::i 1 and maps x to a ? position.</p>
      <p>The positive check is dual: it should accept if it can nd matchings for all
positions with &gt;. To keep the used space bounded, instead of running a separate
check for each position with &gt;, we run a single test, but we partially reset the
constructed partial homomorphism each time we see &gt;. We keep a look-ahead
of size k k; that is, when position i is being processed, we already have read the
input word up to position i + k k. To process position i,
{ if (i) = &gt;, reset the partial homomorphism constructed so far: restrict it
to components not reachable from the rm component containing x;
{ compute the earliest matching in w1::i+k k1 that extends the currently
stored partial homomorphism and maps x to the most recent position j i
with (j) = &gt;;
{ if (i) = &gt;, but the root component cannot be matched within w1::i+k k,
reject immediately.</p>
      <sec id="sec-3-1">
        <title>Accept if after processing the whole w a full homomorphism</title>
        <p>! w is found.</p>
        <p>Invariants. Correctness of the algorithm follows from the following invariants:
Invariant 1 The partial homomorphism computed by the negative check after
reading the i-th letter is the earliest matching of in w1::i1 mapping x to
a ? position.</p>
        <p>Invariant 2 The partial homomorphism h computed by the positive check after
processing the i-th letter is the earliest matching of in w1::i+k k1 mapping
x to the most recent &gt; position in w1::i (not mapping it at all, i.e., respecting
Sy6=xfyg f1; 2; : : : ; i + k kg, if there has been no &gt; position yet); for each
earlier &gt; position j, the earliest matching hj : ! w1::i+k k1 mapping x
to j, satis es hj h.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Invariant 1 follows immediately from Lemma 2, by induction on i.</title>
        <p>Lemma 2. The earliest matchings h : ! u1 and h0 :</p>
        <p>V V u always agree over vertices mapped to u by h.
! uu01 respecting
tu</p>
        <p>For Invariant 2, we use the following properties of the earliest matchings.
Lemma 3. Let h0 : ! u1 be the earliest matching respecting Sy6=xfyg V u,
and let hi : ! u1 be the earliest matching that maps the root x of to i 2 V u.
1. For every node y, whose component cannot be reached from the component
of the root x, hi(y) = hj (y) for all i; j.
2. If hi(x) = i and 1 i j, then hi hj .
tu</p>
        <p>Slightly abusing notation, we shall apply the same symbol hj from Lemma 3
for di erent words; that is, hj : ! w1::i 1+k k1 is not the same as hj : !
w1::i+k k1 (although, by Lemma 2, the latter extends the former).</p>
        <p>Before reading the rst letter, Invariant 2 is satis ed: the empty partial
homomorphism is the earliest matching in the empty word (with in nity), and the
second part trivialises. Assume that Invariant 2 holds for i 1 and let us see
that it holds for i. Let gi 1 be the partial homomorphism constructed so far,
and let gi be the one to be constructed. By Invariant 2 for i 1, gi 1 is the
earliest matching of in w1::i 1+k k1 that maps root x to the most recent
&gt; position, or using notation from Lemma 3, gi 1 = hj : ! w1::i 1+k k1
for some 0 j i 1. If (i) = ?, the algorithm computes gi as the
earliest matching extending the partial homomorphism gi 1 and mapping x to the
most recent &gt; position in w1::i+k k|which is also the most recent &gt; position
in w1::i 1+k k|and we conclude by Lemma 2. Assume that (i) = &gt;. Then,
the algorithm computes gi0 1, by restricting gi 1 to components that are not
reachable from the root component. By Lemma 2, gi0 1 hj : ! w1::i+k k1.
By Lemma 3 (1), gi0 1 hi : ! w1::i+k k1. Hence, the earliest matching in
w1::i+k k1 that extends gi0 1 and maps x to i is equal to hi : ! w1::i+k k1.
Since this is exactly how gi is computed, we have gi = hi : ! w1::i+k k1. This
concludes the proof of the rst part of Invariant 2. To prove the second part,
assume that i is not the rst ? position; that is, 0 &lt; j &lt; i. Since we have not
rejected yet, gi 1 = hj : ! w1::i 1+k k1 matches the root component of
and gi 1(x) = j. By Lemma 2, also hj : ! w1::i+k k1 maps x to j (rather
than to 1). Hence, the second part of Invariant 2 for i follows from Lemma 3
(2) and Invariant 2 for i 1.</p>
        <p>Correctness. Let us begin with a simple observation. It is routine to check that
each homomorphic image of a rm pattern 0 is a subword of length at most
k 0k. Consequently, if a node y of the pattern is matched at position i, then all
rm components from which the component of y is reachable, must be matched
before position i + k k: they can reach at most k k positions forward.</p>
        <p>If the algorithm accepts, Invariant 2 guarantees that all suitable
homomorphisms exist: indeed, if the earliest matching for the last &gt; position does not use
1, neither do the ones for the earlier &gt; positions.</p>
        <p>Assume that the algorithm rejects. That means that hi : ! w1::i+k k1
does not match the root component (that is, matches it to 1). Assume that
hi : ! w1 does match the root component. Then it maps x to i, so by the
observation above it maps all the components from which the component of y
is reachable, within w1::i+k k. Restricting hi : ! w1 to these components
and extending with in nity to other vertices of would give a matching earlier
then hi : ! w1::i+k k1 | a contradiction. Thus, hi : ! w1 is not a total
homomorphism into w, so w; i 6j= ( ; x) and the algorithm rejects correctly.
Memory bound. Now that we have seen that Match ;x(w) is correct, we need
to bound the memory it uses. We claim that while processing position i the
algorithm only needs to have access to positions between i k k and i + k k.
That means that the algorithm only needs to store last 2 j j + 1 symbols read,
plus the matching constructed so far, restricted to this su x.</p>
        <p>We shall use a dualized version of the observation made previously: if a node
y of the pattern is matched at position i, then all rm components reachable
from the component of y must be matched after position i k k: they can reach
at most k k positions back.</p>
        <p>For the negative check, the claim is almost obvious. After reading a new
symbol we match the subpatterns greedily, until no more can be matched. Among
matched patterns each has an ancestor (possibly itself) that touches the last
symbol (otherwise, it would have been matched before). By the dualized
observation none of these patterns can reach back further than k k, so it su ces to
store last k k symbols read.</p>
        <p>For the positive check the situation is similar, except that before we start
matching we rst check if k k positions back we had &gt; or ?. If it was ? we
proceed essentially like in the negative check. But if it was &gt;, we rematch the
rm components reachable from the component of the root x, in such a way
that x is matched k k positions back. Again, by the dualized observation, the
rematched components can reach at most k k more positions back, so they also
t within the intended window.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The main result</title>
      <p>Using the matching procedure we prove our main result: we show that extending
acyclic CQs using vertical axes with arbitrary horizontal CQs over siblings does
not increase the complexity of the containment problem. In fact, we shall work
with a more general problem of satis ability of Boolean combinations, or BC-SAT:
given a Boolean combination of patterns ' and an automaton A, decide if there is
a tree T 2 L(A) such that T j= '. Containment for unions of CQs corresponds to
non-satis ability for Boolean combinations of the form ^ : 1 ^ : 2 ^ ^ : k.
Since we work with boolean combinations anyway, each disconnected pattern
can be replaced with a conjunction of connected patterns. Thus, we shall only
consider connected patterns from now on.</p>
      <p>Assuming that vertical edges propagate through horizontal edges (e.g., next
sibling of a child is also explicitly connected with a child edge), one can
dene vertically acyclic patterns as those whose #; #+-subgraph is acyclic in the
undirected sense; since patterns are connected, it is then an undirected tree.
Without this assumption we de ne this notion as follows. A horizontal
component of pattern is a connected component of the !; !+-subgraph of . Let
H = hV ; #; #+i be a graph over horizontal components of , where edge X # Y
is present if x # y for some x 2 X and y 2 Y , and X #+ Y is present if x #+ y
for some x 2 X and y 2 Y , but there is no edge X # Y . We say that pattern
is vertically acyclic, if this graph is acyclic in the undirected sense. Again, as
is connected, the graph is an undirected tree.</p>
      <p>Given a vertically acyclic pattern , by simple preprocessing we ensure that
there is at most one edge in between each two horizontal components: if there
are more, we can merge their starting points in the parent/ancestor horizontal
component, and drop all edges except one (a child edge, if there is one).
Theorem 2. For vertically acyclic patterns, BC-SAT is ExpTime-complete, and
PSpace-complete under non-recursive schemas.</p>
      <p>Proof. We shall see that for a vertically acyclic pattern one can construct an
equivalent automaton A in polynomial working space (states have
polynomialsize representations and all ingredients of A can be enumerated in polynomial
working space). Despite its nondeterminism, we will be able to complement A
by simply changing accepting states. Hence, one can easily reduce BC-SAT to
nonemptiness of tree automata, by constructing in polynomial working space an
automaton A' equivalent to a given boolean combination '. The product B of
A' with a nonrecursive automaton A can be tested for nonemptiness by a
nondeterministic algorithm using space O(kAk log kAk poly (k'k)), which guesses
a tree of depth at most kAk in the depth- rst order and nondeterministically
evaluates B over it, processing it in the post-order fashion. By Savitch's theorem,
this gives a PSpace algorithm for satis ability under nonrecursive schemas. As
A' is singly exponential in k'k, the standard polynomial-time emptiness test
gives an ExpTime algorithm for satis ability under arbitrary schemas.</p>
      <p>The idea is that the automaton A guesses in each node which subpatterns
are matched there, and then checks that these guesses are consistent: a
subpattern is matched in some node if and only if its root can be matched there and all
its sub-subpatterns can be matched in appropriate nodes. To ensure that these
dependencies are well founded, we root the undirected tree forming the graph
H , by arbitrarily xing a root component and treating all the edges as pointing
outwards. This builds the structure of a directed tree over the graph H ; the
orientation of edges in this directed tree is entirely unrelated to the character
of the corresponding edges in the pattern: child and descendent edges can point
both up and down the tree. We use this structure to speak precisely of
subpatterns: for each horizontal component X of we de ne subpattern :X obtained
by restricting to X and all descendants of X in the directed tree. If component
X has successors Y1; Y2; : : : ; Yn in the directed tree, then the subpattern :X is
composed of the nodes in the horizontal component X connected to subpatterns
:Y1; :Y2; : : : ; :Yn with child or descendant edges, pointing to or form X. We
call patterns :Y1; :Y2; : : : ; :Yn the immediate subpatterns of pattern :X.
Unless X is the root component of the whole directed tree, it is connected to some
other horizontal component Z, with a child or descendent edge, pointing to or
from X. If the edge points to X, we call :X a down-subpattern. Otherwise, we
call it an up-subpattern. In either case, we call the vertex of X that is the end
of this edge, the root vertex of :X; if X is the root component of the whole
tree, we choose an arbitrary vertex in X for the root vertex. We say that :X is
matched at node v if its root vertex is mapped to v.</p>
      <p>The automaton A , before reading the sequence of children of a node v,
non-deterministically selects the set Expected (v) of subpatterns that can be
matched at v. Similarly, A non-deterministically selects a set ExpectedAbove(v)
of subpatterns that can be matched at some ancestor of v. (Note that for
downsubpatterns the exact place where the root vertex is matched among siblings is
irrelevant; we make the distinction only for the purpose of uniform treatment
by the procedure match.) After selecting Expected (v) and ExpectedAbove(v), the
automaton proceeds to read the sequence of children of v. While doing so it
computes the set Matched (v) of subpatterns of matched in the children of v
and the set MatchedBelow (v) of subpatterns that were matched in the children
of some descendant of v, and performs a number of consistency checks using a
slightly modi ed version of the procedure Match, described below.</p>
      <p>With each subpattern :X we associate a horizontal pattern X obtained
by restricting to X and including in the label of each vertex x the set of
immediate subpatterns :Y of :X connected to x. In the modi ed procedure
Match for X , a vertex labelled with ( ; ) can be matched in a position labelled
with ( ; ) only if = and . It is straightforward to check that this does
not in uence the correctness of Match. Observe that the extended alphabet
is exponential, but each symbol can be stored in polynomial memory. Hence,
Match still works in memory polynomial in the size of the pattern.</p>
      <p>Reading the sequence of children of v, the automaton performs
simultaneously (using the product construction) the following tasks:
1. Check that Expected (v) and ExpectedAbove(v) do not contradict the choices
for the children of v. The subpatterns expected to be matched in an ancestor
of a child w must either be matched in the parent of w or an ancestor of the
parent of w; that is, ExpectedAbove(w) = Expected (v) [ ExpectedAbove(v)
must hold for each child w. If for some w it fails to hold, the computation
stops and no state is assigned to v, because the guesses are inconsistent.</p>
      <p>The automaton treats the root as any other node; that is, it performs the
tasks above for the single-node sequence of siblings consisting only from the
root, treating it as a child of a dummy node #. The automaton accepts if:
Expected (#) = ExpectedAbove(#) = ; and 2 Matched (#)[MatchedBelow (#).</p>
      <p>By the structural induction over subpatterns one easily proves that in any
complete run (accepting or rejecting) of the described automaton A , the sets
Expected (v), ExpectedAbove(v), Matched (v), and MatchedBelow (v) are guessed
or computed correctly for each node of the input tree. Hence, the construction
is correct. Moreover, for each node v of the input tree, there exists exactly
one correct set Expected (v) and one correct set ExpectedAbove(v). Hence, for
each input tree there is exactly one complete run, modulo the last guess of
Expected (#) and ExpectedAbove(#). Consequently, the automaton A can be
complemented simply by changing the acceptance condition to: Expected (#) =
ExpectedAbove(#) = ; and 2= Matched (#) [ MatchedBelow (#). tu</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Benedikt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Geerts</surname>
          </string-name>
          .
          <article-title>XPath satis ability in the presence of DTDs</article-title>
          .
          <source>J. ACM</source>
          <volume>55</volume>
          (
          <issue>2</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. H. Bjorklund, W. Martens,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schwentick</surname>
          </string-name>
          .
          <article-title>Optimizing conjunctive queries over trees using schema information</article-title>
          .
          <source>Proc. MFCS</source>
          <year>2008</year>
          :
          <volume>132</volume>
          {
          <fpage>143</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>A. K. Chandra</surname>
            ,
            <given-names>P. M.</given-names>
          </string-name>
          <string-name>
            <surname>Merlin</surname>
          </string-name>
          .
          <article-title>Optimal implementation of conjunctive queries in relational data bases</article-title>
          .
          <source>Proc. STOC</source>
          <year>1977</year>
          :
          <volume>77</volume>
          {
          <fpage>90</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>N.</given-names>
            <surname>Francis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>David</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          .
          <article-title>A Direct Translation from XPath to Nondeterministic Automata</article-title>
          .
          <source>Proc. AMW</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schulz</surname>
          </string-name>
          .
          <article-title>Conjunctive queries over trees</article-title>
          .
          <source>J. ACM</source>
          <volume>53</volume>
          (
          <year>2006</year>
          ),
          <volume>238</volume>
          {
          <fpage>272</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Marx</surname>
          </string-name>
          .
          <article-title>XPath with conditional axis relations</article-title>
          .
          <source>Proc. EDBT</source>
          <year>2004</year>
          :
          <volume>477</volume>
          {
          <fpage>494</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>G.</given-names>
            <surname>Miklau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <article-title>Containment and equivalence for a fragment of XPath</article-title>
          .
          <source>J. ACM</source>
          <volume>51</volume>
          (
          <issue>1</issue>
          ): 2{
          <fpage>45</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>F.</given-names>
            <surname>Murlak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oginski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Przybylko</surname>
          </string-name>
          .
          <source>Between Tree Patterns and Conjunctive Queries: Is There Tractability beyond Acyclicity? Proc. MFCS</source>
          <year>2012</year>
          :
          <fpage>705</fpage>
          -
          <lpage>717</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>F.</given-names>
            <surname>Neven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schwentick</surname>
          </string-name>
          .
          <article-title>On the complexity of XPath containment in the presence of disjunction, DTDs, and variables</article-title>
          .
          <source>Logical Meth. in Comp. Sci. 2</source>
          (
          <issue>3</issue>
          ): 1{
          <fpage>30</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>