Vertically acyclic conjunctive queries over trees ? Filip Murlak and Grzegorz Zieliński University of Warsaw Abstract. Seeking a manageable subclass of conjunctive queries over trees that would reach beyond tree patterns, we find that vertical acyclicity of queries is sufficient to guarantee the same complexity bounds for static analysis problems, as those enjoyed by tree patterns. 1 Introduction Conjunctive queries (CQs) are a staple of relational databases, offering a good balance between expressive power and the computational cost of typical static analysis problems, like query containment [3]. However, over tree-structured data, like XML documents, their place is taken by tree patterns [7], which cor- respond to downward, acyclic CQs. The arguments in their favour are threefold. First, allowing arbitrary CQs induces exponential increase in the complexity of query containment in the presence of schema information: for unions of tree patterns (using horizontal and vertical axes) it is ExpTime-complete in general and ExpSpace-complete under non-recursive schemas (where the depth of trees is bounded by the size of the schema) [1, 6, 9]; for unions of CQs it is 2ExpTime- complete in general (even without horizontal axes) [2] and ExpSpace-complete under non-recursive schemas (NExpTime-complete without horizontal axes) [8]. Second, each CQ (over trees) can be rewritten as a union of exponentially many polynomial-size tree patterns [5]. Hence, unions of tree patterns and unions of CQs are equiexpressive languages, but the latter is more succinct. By choosing tree patterns as the basic formalism, we put the burden of dealing with the exponential blow-up on the user’s shoulders. An advantage—from the user’s point of view—is that the computational cost is more directly reflected in the size of queries, making it easier to control when writing them. Finally, over tree-structured data it is more natural to navigate in the tree, rather than declaratively specifying properties of nodes in the fashion of first- order logic. For acyclic queries, this navigational approach can be easily sup- ported with appropriate syntax, as illustrated palpably by the popularity of the XPath query language. Certain attempts of going beyond acyclic queries have been made with the introduction of path intersection operator to XPath 2.0, but there seems to be no natural way of dealing with full CQs. Still, following [2], one can ask how much of the succinctness of full con- junctive queries can be allowed without compromising the complexity bounds. A partial answer was given in [8]: without influencing the complexity bounds, ? Supported by Poland’s National Science Center grant 2013/11/D/ST6/03075. ∗ ∗ ∗ a1 ∗ ∗ a2 ∗ ∗ a1 ∗ ∗ a2 a1 ∗ ∗ a2 ∗ ∗ a1 ∗ ∗ a2 ∗ ∗ ∗ Fig. 1. A tree pattern with horizontal CQs (left) and a vertically acyclic CQ (right). Solid, dashed, and dotted edges represent next-sibling, following-sibling, and descen- dent relations; asterisk represents a node with an unspecified label. The left query checks if both a1 and a2 occur twice among siblings, separated by 1 or 2 nodes; the right query checks if this happens along a single branch. one can extend tree patterns with arbitrary horizontal CQs over siblings (see Fig. 1); arbitrary joins with horizontal CQs cannot be allowed, as they may induce additional vertical relations, breaking the acyclicity condition. The ar- gument combines the well-known automata construction for tree patterns with a new construction of an exponential-size deterministic automaton over words, equivalent to a given horizontal CQ. It is well known, however, that an exponential-size automaton can be con- structed for any acyclic CQ (see e.g. [4]). Since acyclic CQs can navigate up and down the tree, the construction is more involved than for tree patterns (which only go down). Can it be used to extend the result of [8] from tree patterns with horizontal CQs to vertically acyclic CQs (see Fig. 1)? We show that the answer is yes, but rather unexpectedly it requires substantial additional effort. The construction for the horizontal patterns provided by [8] is too weak: we need an automaton that finds all matches of the horizontal CQ in the input word, not just some match. In other words, we need a way to run multiple copies of the original automaton without affecting drastically the size of the state-space. The reminder of the paper is organized as follows: in Section 2 we recall the basic notions, in Section 3 we construct the enhanced automaton for horizontal CQs, and in Section 4 we combine it with the construction for acyclic CQs. 2 Preliminaries XML documents and trees. We model XML documents as unranked labelled trees. Formally, a tree over a finite labelling alphabet Γ is a relational structure + T = hT, ↓, ↓+ , →, →, (aT )a∈Γ i, where – the set T is a finite unranked tree domain, i.e., a finite prefix-closed subset of N∗ such that v · i ∈ T implies v · j ∈ T for all j < i; – the binary relations ↓ and → are the child relation (v ↓ v · i) and the next- sibling relation (v · i → v · (i + 1)); + – ↓+ and → are transitive closures of ↓ and →; T – (a )a∈Γ is a partition of the domain T into possibly empty sets. We write |T | to denote the number of nodes of tree T . The partition (aT )a∈Γ defines a labelling of the nodes of T with elements of Γ , denoted by `T . Tree automata. We abstract schemas as tree automata. We use a variant in which the state in a node v depends on the states in the previous sibling and the last child of v. Formally, an automaton is a tuple A = (Σ, Q, q0 , F, δ), where Σ is the labelling alphabet (the set of element types in our case), Q is the state space with the initial state q0 and final states F , and δ ⊆ Q × Q × Σ × Q is the transition relation. A run of A over a tree T is a labelling ρ of the nodes of T with the states of A such that for each node v with children v ·0, v ·1, . . . , v ·k and previous sibling w, ρ(w), ρ(v · k), `T (v), ρ(v) ∈ δ. If v has no previous sibling, ρ(w) in the condition above is replaced with q0 . Similarly, if v has no children, ρ(v · k) is replaced with q0 . The language of trees recognized by A, denoted by L(A), consists of all trees admitting an accepting run of A, i.e. a run that assigns one of the final states to the root. A schema is nonrecursive, if the depth of trees it accepts is bounded by a constant dependent on the schema. For schemas defined with automata, this constant is always at most the number of states. CQs and patterns. A conjunctive query (CQ) over alphabet Γ is a formula of first order logic using only conjunction and existential quantification, over unary + predicates a(x) for a ∈ Γ and binary predicates ↓, ↓+ , →, → (referred to as child, descendant, next sibling, and following sibling, respectively). Since we work only with Boolean queries, to avoid unnecessary clutter we often skip the quantifiers, assuming that all variables are by default quantified existentially. An alternative way of looking at CQs is via patterns. A pattern π over Γ can be presented as π = hV, Ec , Ed , En , Ef , `π i where `π is a partial function from V to Γ , and hV, Ec ∪ Ed ∪ En ∪ Ef i is a finite graph whose edges are split into child edges Ec , descendant edges Ed , next-sibling edges En , and following-sibling edges Ef . By kπk we mean the size of the underlying graph. + We say that a tree T = hT, ↓, ↓+ , →, →, (aT )a∈Γ i satisfies a pattern π = hV, Ec , Ed , En , Ef , `π i, denoted T |= π, if there exists a homomorphism h : π → T , i.e., a function h : V → T such that + – h : hV, Ec , Ed , En , Ef i → hT, ↓, ↓+ , →, →i is a homomorphism of relational structures; and – `T (h(x)) = `π (x) for all x in the domain of `π . Each pattern can be seen as a CQ, and vice versa. In what follows we use the terms “pattern” and “CQ” interchangeably. Tree patterns are patterns whose underlying graph is a directed tree with edges (of the four kinds) pointing from parents to children. 3 Finding all matches of a horizontal pattern We write V w for the set of positions of word w, {1, 2, . . . , |w|}, and use ≤ and +1 for the standard order and successor on the positions of words. A horizontal + pattern uses only → and → edges. Homomorphisms into words are defined just like for trees. Whenever we write h : π → w, we implicitly assume that h is a homomorphism. By a pattern π rooted at vertex x ∈ V π we mean a pair (π, x); for i ∈ V w we write w, i |= (π, x) if there is h : π → w such that h(x) = i. The aim of this section is to prove the following theorem, which provides an economic construction of an automaton finding all matches of a rooted horizontal pattern in the input word, used crucially in the proof of the main result. Theorem 1. For each rooted horizontal pattern (π, x) one can construct in poly- nomial working space an exponential-size deterministic automaton Matchπ,x recognizing the language of words (a1 , α1 )(a2 , α2 ) . . . (an , αn ) ∈ (Γ × {>, ⊥})∗ such that for w = a1 a2 . . . an and all i ∈ V w , αi = > if and only if w, i |= (π, x). We shall think of Matchπ,x as a decision procedure reading letter by letter a word w ∈ Γ ∗ and an additional labelling α ∈ {>, ⊥}|w| , storing some information in the working memory of size polynomial in kπk and independent from |w|. Earliest matchings. Like in [8], Matchπ,x will look for the earliest (leftmost) matchings witnessing w, i |= (π, x). A word with infinity w∞ is a word w ex- tended with an additional position ∞, greater then all original positions, that is its own successor, and bears all the labels in Γ . Over words with infinity we use <· to denote the composition of ≤ and the successor relation, which can be equivalently defined as: x <· y iff x < y or x = y = ∞. The notion of ho- momorphism extends naturally to words with infinity. Note that since ∞ is not successor of any ordinary position, there can be no → edge between a vertex mapped to an ordinary position and a vertex mapped to ∞, but there can be a → edge between two vertices mapped to ∞. The source and target of each + → edge must be mapped to positions i <· j. Note that each homomorphism into w∞ induces by restriction a partial homomorphism into w (we shall blur the distinction between them), but not every partial homomorphism into w extends to a homomorphism into w∞. Definition 1. Let g, h : π → w∞ be two homomorphisms. – We write g ≤ h if g(x) ≤ h(x) for each vertex x of π. – We define min(g, h) : π → w∞ as min(g, h)(x) = min(g(x), h(x)). – We say that h respects relation η ⊆ V π × V w , if h ⊆ η ∪ V π × {∞}. Note that h constantly equal to ∞ respects each η. We will be mostly in- terested in two special cases of η: extending a fixed partial homomorphism, and mapping the pattern’s root to a fixed position (or one of several fixed positions), but for conciseness we treat them in this abstract fashion. Lemma 1. 1. For all g, h : π → w∞, min(g, h) is a homomorphism. 2. There exists hmin : π → w∞ such that hmin ≤ h for all h : π → w∞. 3. For each relation η ⊆ V π × V w∞ , there exists hηmin : π → w∞ respecting η such that hηmin ≤ h0 for each h0 : π → w∞ respecting η. t u We call the unique hmin from Lemma 1 the earliest matching of π in w∞, and hηmin the earliest matching respecting η. Note that hmin = hηmin for η = V π ×V w . Of course, pattern π can be matched in word w if and only if the earliest matching hmin : π → w∞ maps no vertex to ∞, and similarly for matchings respecting η. Algorithm. The procedure Matchπ,x works with components of π, called firm subpatterns, described in Definition 3. Definition 2. A →-component of π is a maximal connected subgraph of the →-graph of π. In the graph of →-components of π, denoted Gπ , there is an edge + from a →-component π1 to a →-component π2 if there is a → edge in π from a vertex of π1 to a vertex of π2 . Definition 3. A pattern π is firm if Gπ is strongly connected. In general, each strongly connected component X of Gπ defines a firm subpattern of π: the sub- graph of π induced by the vertices of →-components contained in X. The DAG of firm subpatterns of π, denoted Fπ , is the standard DAG of strongly connected components of Gπ . All four horizontal subpatterns in Fig. 1 are firm, but have two →-components. Matchπ,x (w) does two independent checks, the positive and the negative; it accepts when both checks accept. The negative check should reject if it finds a matching of π that maps x to a position with ⊥, and accept otherwise. We read the input word w from left to right letter by letter, trying to build the earliest matching of π that maps x to a ⊥ position. To process position i, – compute the earliest matching in w1..i ∞ that extends the constructed partial homomorphism into w1..i−1 and maps x to a ⊥ position. The positive check is dual: it should accept if it can find matchings for all po- sitions with >. To keep the used space bounded, instead of running a separate check for each position with >, we run a single test, but we partially reset the constructed partial homomorphism each time we see >. We keep a look-ahead of size kπk; that is, when position i is being processed, we already have read the input word up to position i + kπk. To process position i, – if α(i) = >, reset the partial homomorphism constructed so far: restrict it to components not reachable from the firm component containing x; – compute the earliest matching in w1..i+kπk ∞ that extends the currently stored partial homomorphism and maps x to the most recent position j ≤ i with α(j) = >; – if α(i) = >, but the root component cannot be matched within w1..i+kπk , reject immediately. Accept if after processing the whole w a full homomorphism π → w is found. Invariants. Correctness of the algorithm follows from the following invariants: Invariant 1 The partial homomorphism computed by the negative check after reading the i-th letter is the earliest matching of π in w1..i ∞ mapping x to a ⊥ position. Invariant 2 The partial homomorphism h computed by the positive check after processing the i-th letter is the earliest matching of π in w1..i+kπk ∞ mapping Sto the most recent > position in w1..i (not mapping it at all, i.e., respecting x y6=x {y} × {1, 2, . . . , i + kπk}, if there has been no > position yet); for each earlier > position j, the earliest matching hj : π → w1..i+kπk ∞ mapping x to j, satisfies hj ≤ h. Invariant 1 follows immediately from Lemma 2, by induction on i. Lemma 2. The earliest matchings h : π → u∞ and h0 : π → uu0 ∞ respecting η ⊆ V π × V u always agree over vertices mapped to u by h. t u For Invariant 2, we use the following properties of the earliest matchings. Lemma 3. Let h0 : π → u∞ be the earliest matching respecting y6=x {y} × V u , S and let hi : π → u∞ be the earliest matching that maps the root x of π to i ∈ V u . 1. For every node y, whose component cannot be reached from the component of the root x, hi (y) = hj (y) for all i, j. 2. If hi (x) = i and 1 ≤ i ≤ j, then hi ≤ hj . t u Slightly abusing notation, we shall apply the same symbol hj from Lemma 3 for different words; that is, hj : π → w1..i−1+kπk ∞ is not the same as hj : π → w1..i+kπk ∞ (although, by Lemma 2, the latter extends the former). Before reading the first letter, Invariant 2 is satisfied: the empty partial ho- momorphism is the earliest matching in the empty word (with infinity), and the second part trivialises. Assume that Invariant 2 holds for i − 1 and let us see that it holds for i. Let gi−1 be the partial homomorphism constructed so far, and let gi be the one to be constructed. By Invariant 2 for i − 1, gi−1 is the earliest matching of π in w1..i−1+kπk ∞ that maps root x to the most recent > position, or using notation from Lemma 3, gi−1 = hj : π → w1..i−1+kπk ∞ for some 0 ≤ j ≤ i − 1. If α(i) = ⊥, the algorithm computes gi as the earli- est matching extending the partial homomorphism gi−1 and mapping x to the most recent > position in w1..i+kπk —which is also the most recent > position in w1..i−1+kπk —and we conclude by Lemma 2. Assume that α(i) = >. Then, 0 the algorithm computes gi−1 , by restricting gi−1 to components that are not 0 reachable from the root component. By Lemma 2, gi−1 ⊆ hj : π → w1..i+kπk ∞. 0 By Lemma 3 (1), gi−1 ⊆ hi : π → w1..i+kπk ∞. Hence, the earliest matching in 0 w1..i+kπk ∞ that extends gi−1 and maps x to i is equal to hi : π → w1..i+kπk ∞. Since this is exactly how gi is computed, we have gi = hi : π → w1..i+kπk ∞. This concludes the proof of the first part of Invariant 2. To prove the second part, assume that i is not the first ⊥ position; that is, 0 < j < i. Since we have not rejected yet, gi−1 = hj : π → w1..i−1+kπk ∞ matches the root component of π and gi−1 (x) = j. By Lemma 2, also hj : π → w1..i+kπk ∞ maps x to j (rather than to ∞). Hence, the second part of Invariant 2 for i follows from Lemma 3 (2) and Invariant 2 for i − 1. Correctness. Let us begin with a simple observation. It is routine to check that each homomorphic image of a firm pattern π0 is a subword of length at most kπ0 k. Consequently, if a node y of the pattern is matched at position i, then all firm components from which the component of y is reachable, must be matched before position i + kπk: they can reach at most kπk positions forward. If the algorithm accepts, Invariant 2 guarantees that all suitable homomor- phisms exist: indeed, if the earliest matching for the last > position does not use ∞, neither do the ones for the earlier > positions. Assume that the algorithm rejects. That means that hi : π → w1..i+kπk ∞ does not match the root component (that is, matches it to ∞). Assume that hi : π → w∞ does match the root component. Then it maps x to i, so by the observation above it maps all the components from which the component of y is reachable, within w1..i+kπk . Restricting hi : π → w∞ to these components and extending with infinity to other vertices of π would give a matching earlier then hi : π → w1..i+kπk ∞ — a contradiction. Thus, hi : π → w∞ is not a total homomorphism into w, so w, i 6|= (π, x) and the algorithm rejects correctly. Memory bound. Now that we have seen that Matchπ,x (w) is correct, we need to bound the memory it uses. We claim that while processing position i the algorithm only needs to have access to positions between i − kπk and i + kπk. That means that the algorithm only needs to store last 2 · |π| + 1 symbols read, plus the matching constructed so far, restricted to this suffix. We shall use a dualized version of the observation made previously: if a node y of the pattern is matched at position i, then all firm components reachable from the component of y must be matched after position i − kπk: they can reach at most kπk positions back. For the negative check, the claim is almost obvious. After reading a new sym- bol we match the subpatterns greedily, until no more can be matched. Among matched patterns each has an ancestor (possibly itself) that touches the last symbol (otherwise, it would have been matched before). By the dualized obser- vation none of these patterns can reach back further than kπk, so it suffices to store last kπk symbols read. For the positive check the situation is similar, except that before we start matching we first check if kπk positions back we had > or ⊥. If it was ⊥ we proceed essentially like in the negative check. But if it was >, we rematch the firm components reachable from the component of the root x, in such a way that x is matched kπk positions back. Again, by the dualized observation, the rematched components can reach at most kπk more positions back, so they also fit within the intended window. 4 The main result Using the matching procedure we prove our main result: we show that extending acyclic CQs using vertical axes with arbitrary horizontal CQs over siblings does not increase the complexity of the containment problem. In fact, we shall work with a more general problem of satisfiability of Boolean combinations, or BC-SAT: given a Boolean combination of patterns ϕ and an automaton A, decide if there is a tree T ∈ L(A) such that T |= ϕ. Containment for unions of CQs corresponds to non-satisfiability for Boolean combinations of the form π ∧ ¬π1 ∧ ¬π2 ∧ · · · ∧ ¬πk . Since we work with boolean combinations anyway, each disconnected pattern can be replaced with a conjunction of connected patterns. Thus, we shall only consider connected patterns from now on. Assuming that vertical edges propagate through horizontal edges (e.g., next sibling of a child is also explicitly connected with a child edge), one can de- fine vertically acyclic patterns as those whose ↓, ↓+ -subgraph is acyclic in the undirected sense; since patterns are connected, it is then an undirected tree. Without this assumption we define this notion as follows. A horizontal compo- + nent of pattern π is a connected component of the →, →-subgraph of π. Let + Hπ = hVπ , ↓, ↓ i be a graph over horizontal components of π, where edge X ↓ Y is present if x ↓ y for some x ∈ X and y ∈ Y , and X ↓+ Y is present if x ↓+ y for some x ∈ X and y ∈ Y , but there is no edge X ↓ Y . We say that pattern π is vertically acyclic, if this graph is acyclic in the undirected sense. Again, as π is connected, the graph is an undirected tree. Given a vertically acyclic pattern π, by simple preprocessing we ensure that there is at most one edge in π between each two horizontal components: if there are more, we can merge their starting points in the parent/ancestor horizontal component, and drop all edges except one (a child edge, if there is one). Theorem 2. For vertically acyclic patterns, BC-SAT is ExpTime-complete, and PSpace-complete under non-recursive schemas. Proof. We shall see that for a vertically acyclic pattern π one can construct an equivalent automaton Aπ in polynomial working space (states have polynomial- size representations and all ingredients of Aπ can be enumerated in polynomial working space). Despite its nondeterminism, we will be able to complement Aπ by simply changing accepting states. Hence, one can easily reduce BC-SAT to nonemptiness of tree automata, by constructing in polynomial working space an automaton Aϕ equivalent to a given boolean combination ϕ. The product B of Aϕ with a nonrecursive automaton A can be tested for nonemptiness by a non- deterministic algorithm using space O(kAk · log kAk · poly(kϕk)), which guesses a tree of depth at most kAk in the depth-first order and nondeterministically evaluates B over it, processing it in the post-order fashion. By Savitch’s theorem, this gives a PSpace algorithm for satisfiability under nonrecursive schemas. As Aϕ is singly exponential in kϕk, the standard polynomial-time emptiness test gives an ExpTime algorithm for satisfiability under arbitrary schemas. The idea is that the automaton Aπ guesses in each node which subpatterns are matched there, and then checks that these guesses are consistent: a subpat- tern is matched in some node if and only if its root can be matched there and all its sub-subpatterns can be matched in appropriate nodes. To ensure that these dependencies are well founded, we root the undirected tree forming the graph Hπ , by arbitrarily fixing a root component and treating all the edges as pointing outwards. This builds the structure of a directed tree over the graph Hπ ; the orientation of edges in this directed tree is entirely unrelated to the character of the corresponding edges in the pattern: child and descendent edges can point both up and down the tree. We use this structure to speak precisely of subpat- terns: for each horizontal component X of π we define subpattern π.X obtained by restricting π to X and all descendants of X in the directed tree. If component X has successors Y1 , Y2 , . . . , Yn in the directed tree, then the subpattern π.X is composed of the nodes in the horizontal component X connected to subpatterns π.Y1 , π.Y2 , . . . , π.Yn with child or descendant edges, pointing to or form X. We call patterns π.Y1 , π.Y2 , . . . , π.Yn the immediate subpatterns of pattern π.X. Un- less X is the root component of the whole directed tree, it is connected to some other horizontal component Z, with a child or descendent edge, pointing to or from X. If the edge points to X, we call π.X a down-subpattern. Otherwise, we call it an up-subpattern. In either case, we call the vertex of X that is the end of this edge, the root vertex of π.X; if X is the root component of the whole tree, we choose an arbitrary vertex in X for the root vertex. We say that π.X is matched at node v if its root vertex is mapped to v. The automaton Aπ , before reading the sequence of children of a node v, non-deterministically selects the set Expected (v) of subpatterns that can be matched at v. Similarly, Aπ non-deterministically selects a set ExpectedAbove(v) of subpatterns that can be matched at some ancestor of v. (Note that for down- subpatterns the exact place where the root vertex is matched among siblings is irrelevant; we make the distinction only for the purpose of uniform treatment by the procedure match.) After selecting Expected (v) and ExpectedAbove(v), the automaton proceeds to read the sequence of children of v. While doing so it computes the set Matched (v) of subpatterns of π matched in the children of v and the set MatchedBelow (v) of subpatterns that were matched in the children of some descendant of v, and performs a number of consistency checks using a slightly modified version of the procedure Match, described below. With each subpattern π.X we associate a horizontal pattern πX obtained by restricting π to X and including in the label of each vertex x the set Φ of immediate subpatterns π.Y of π.X connected to x. In the modified procedure Match for πX , a vertex labelled with (σ, Φ) can be matched in a position labelled with (τ, Ψ ) only if σ = τ and Φ ⊆ Ψ . It is straightforward to check that this does not influence the correctness of Match. Observe that the extended alphabet is exponential, but each symbol can be stored in polynomial memory. Hence, Match still works in memory polynomial in the size of the pattern. Reading the sequence of children of v, the automaton performs simultane- ously (using the product construction) the following tasks: 1. Check that Expected (v) and ExpectedAbove(v) do not contradict the choices for the children of v. The subpatterns expected to be matched in an ancestor of a child w must either be matched in the parent of w or an ancestor of the parent of w; that is, ExpectedAbove(w) = Expected (v) ∪ ExpectedAbove(v) must hold for each child w. If for some w it fails to hold, the computation stops and no state is assigned to v, because the guesses are inconsistent. 2. Check that Expected (w) was chosen correctly for each child w. This is done by simultaneously running procedure Match for each subpattern π.X (with the fixed root vertex) over the children of v. For each child w, the procedure is fed with an extended symbol that consists of: – the label of node w, – the set of immediate subpatterns of π.X matched below w (computed from Matched (w) and MatchedBelow (w), both empty for leaves), and above w (computed from Expected (v) and ExpectedAbove(v)), – > if π.X ∈ Expected (w), and ⊥ otherwise. If some instance of Match rejects, the computation stops and no state is assigned to v, since the guesses made below have proved to be incorrect. 3. Compute the sets Matched (v) and MatchedBelow (v), based on Matched (w), MatchedBelow (w), and Expected (w) for each child w of v. The automaton treats the root as any other node; that is, it performs the tasks above for the single-node sequence of siblings consisting only from the root, treating it as a child of a dummy node #. The automaton accepts if: Expected (#) = ExpectedAbove(#) = ∅ and π ∈ Matched (#)∪MatchedBelow (#). By the structural induction over subpatterns one easily proves that in any complete run (accepting or rejecting) of the described automaton Aπ , the sets Expected (v), ExpectedAbove(v), Matched (v), and MatchedBelow (v) are guessed or computed correctly for each node of the input tree. Hence, the construction is correct. Moreover, for each node v of the input tree, there exists exactly one correct set Expected (v) and one correct set ExpectedAbove(v). Hence, for each input tree there is exactly one complete run, modulo the last guess of Expected (#) and ExpectedAbove(#). Consequently, the automaton Aπ can be complemented simply by changing the acceptance condition to: Expected (#) = ExpectedAbove(#) = ∅ and π ∈ / Matched (#) ∪ MatchedBelow (#). t u References 1. M. Benedikt, W. Fan, F. Geerts. XPath satisfiability in the presence of DTDs. J. ACM 55(2), 2008. 2. H. Björklund, W. Martens, T. Schwentick. Optimizing conjunctive queries over trees using schema information. Proc. MFCS 2008: 132–143. 3. A. K. Chandra, P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. Proc. STOC 1977: 77–90. 4. N. Francis, C. David, L. Libkin. A Direct Translation from XPath to Nondeter- ministic Automata. Proc. AMW 2011. 5. G. Gottlob, C. Koch, K. Schulz. Conjunctive queries over trees. J. ACM 53 (2006), 238–272. 6. M. Marx. XPath with conditional axis relations. Proc. EDBT 2004: 477–494. 7. G. Miklau, M. Suciu. Containment and equivalence for a fragment of XPath. J. ACM 51(1): 2–45, 2004. 8. F. Murlak, M. Ogiński, M. Przybylko. Between Tree Patterns and Conjunctive Queries: Is There Tractability beyond Acyclicity? Proc. MFCS 2012: 705-717. 9. F. Neven, T. Schwentick. On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Meth. in Comp. Sci. 2(3): 1–30, 2006.