Monadic Datalog Containment on Trees Using the Descendant-Axis André Frochaux and Nicole Schweikardt Institut für Informatik, Humboldt-Universität zu Berlin {frochaua,schweikn}@informatik.hu-berlin.de Abstract. In their AMW’14-paper, Frochaux, Grohe, and Schweikardt showed that the query containment problem for monadic datalog on finite unranked labeled trees is Exptime-complete when (a) considering unordered trees using the child -axis, and when (b) considering ordered trees using the axes firstchild, nextsibling, and child. Furthermore, when allowing to use also the descendant-axis, the query containment problem was shown to be solvable in 2-fold exponential time, but it remained open to determine the problem’s exact complexity in presence of the descendant-axis. The present paper closes this gap by showing that, in the presence of the descendant-axis, the problem is 2Exptime-hard. 1 Introduction The query containment problem (QCP) is a fundamental problem that has been studied for various query languages. Datalog is a standard tool for expressing queries with recursion. From Cosmadakis et al. [5] and Benedikt et al. [2] it is known that the QCP for monadic datalog queries on the class of all finite relational structures is 2Exptime-complete. Restricting attention to finite unranked labeled trees, Gottlob and Koch [10] showed that on ordered trees the QCP for monadic datalog is Exptime-hard and decidable, leaving open the question for a tight bound. This gap was closed by Frochaux, Grohe, and Schweikardt in [8] by giving a matching Exptime upper bound for the QCP for monadic datalog on ordered trees using the axes firstchild, nextsibling, and child. Similar results were obtained in [8] also for unordered finite labeled trees: in this setting, the QCP is Exptime-complete for monadic datalog queries on unordered trees using the child -axis. For the case where queries are allowed to also use the descendant-axis, [8] presented a 2-fold exponential time algorithm for the QCP for monadic datalog on (ordered or unordered) trees. Determining the problem’s exact complexity in the presence of the descendant-axis, however, was left open. The present paper closes the gap by proving a matching 2Exptime lower bound (both, for ordered and for unordered trees). This gives a conclusive answer to a question posed by Abiteboul et al. in [1], asking for the complexity of the QCP on unordered trees in the presence of the descendant-axis. Our 2Exptime- hardness proof for ordered trees is by a reduction from a 2Exptime-hardness result of [3] for the validity of conjunctive queries w.r.t. schema constraints. For obtaining the 2Exptime-hardness on unordered trees, we follow the approach of [3] and construct a reduction from the 2Exptime-complete word problem for exponential-space bounded alternating Turing machines [4]. The remainder of the paper is organised as follows. Section 2 fixes the basic notation. Section 3 presents a 2Exptime lower bound for the QCP on ordered trees using the axes firstchild, nextsibling, root, leaf, lastsibling, child, descendant. Section 4 is devoted to the 2Exptime lower bound for the QCP on unordered trees using only the axes child and descendant. We conclude in Section 5. Due to space limitations, many proof details had to be deferred to the paper’s full version. 2 Trees and Monadic Datalog (mDatalog) Throughout this paper, Σ will always denote a finite non-empty alphabet. By N we denote the set of non-negative integers, and we let N>1 := N \ {0}. Relational Structures. As usual, a schema τ consists of a finite number of relation symbols R, each of a fixed arity ar(R) ∈ N>1 . A τ -structure A consists of a finite non-empty set A called the domain of A, and a relation RA ⊆ Aar(R) for each relation symbol R ∈ τ . It will often be convenient to identify A with the set of atomic facts of A, i.e., the set atoms(A) consisting of all facts R(a1 , . . . , aar(R) ) for all relation symbols R ∈ τ and all tuples (a1 , . . . , aar(R) ) ∈ RA . If τ is a schema and ` is a list of relation symbols, we write τ ` to denote the extension of the schema τ by the symbols in `. Furthermore, τΣ denotes the extension of τ by new unary relation symbols labelα , for all α ∈ Σ. Unordered Trees. An unordered Σ-labeled tree T = (V T , λT , E T ) consists of a finite non-empty set V T of nodes, a function λT : V T → Σ assigning to each node v of T a label λ(v) ∈ Σ, and a set E T ⊆ V T × V T of directed edges such that the directed graph (V T , E T ) is a rooted tree where edges are directed from the root towards the leaves. We represent such a tree T as a relational structure of domain V T with unary and binary relations: For each label α ∈ Σ, labelα (x) expresses that x is a node with label α; child(x, y) expresses that y is a child of node x; root(x) expresses that x is the tree’s root node; leaf (x) expresses that x is a leaf; and desc(x, y) expresses that y is a descendant of x (i.e., y is a child or a grandchild or . . . of x). We denote this relational structure representing T by Su (T ), but when no confusion arises we simply write T instead of Su (T ). The queries we consider for unordered trees are allowed to make use of at least the predicates labelα and child. We fix the schema τu := { child }. Ordered Trees. An ordered Σ-labeled tree T = (V T , λT , E T , orderT ) has the same components as an unordered Σ-labeled tree and, in addition, orderT fixes for each node u of T a strict linear order of all the children of u in T . To represent such a tree as a relational structure, we use the same domain and the same predicates as for unordered Σ-labeled trees, along with three further predicates fc (“first-child”), ns (“next-sibling”), and ls (“last sibling”), where fc(x, y) expresses that y is the first child of node x (w.r.t. the linear order of the children of x induced by orderT ); ns(x, y) expresses that y is the right sibling of x (i.e., x and y have the same parent p, and y is the immediate successor of x in the linear order of p’s children given by orderT ); and ls(x) expresses that x is the rightmost sibling (w.r.t. the linear order of the children of x’s parent given by orderT ). We denote this relational structure representing T by So (T ), but when no confusion arises we simply write T instead of So (T ). The queries we consider for ordered trees are allowed to make use of at least the predicates labelα , fc, and ns. We fix the schemas τo := { fc, ns } and τGK := τoroot,leaf ,ls . In [10], Gottlob and Koch used τGK,Σ -structures to represent ordered Σ-labeled trees. Datalog. We assume that the reader is familiar with the syntax and seman- tics of datalog (cf., e.g., [6, 10]). Predicates that occur in the head of some rule of a datalog program P are called intensional, whereas predicates that only occur in the body of rules of P are called extensional. By idb(P) and edb(P) we denote the sets of intensional and extensional predicates of P, resp. We say that P is of schema τ if edb(P) ⊆ τ . We write TP to denote the immediate consequence operator associated with a datalog program P. Recall that TP maps a set C of atomic facts to the set of all atomic facts that are derivable from C by at most one application of the rules of P. The monotonicity of TP implies that for each finite set C, the iterated application of TP to C leads to a fixed point, denoted by TPω (C), which is reached after a finite number of iterations. Monadic datalog queries. A datalog program belongs to monadic datalog (mDatalog, for short), if all its intensional predicates have arity 1. A unary monadic datalog query of schema τ is a tuple Q = (P, P ) where P is a monadic datalog program of schema τ and P is an intensional predicate of P. P and P are called the program and the query predicate of Q. When evaluated in a finite τ -structure A that represents a labeled tree T , the query Q results in the unary relation Q(T ) := {a ∈ A : P (a) ∈ TPω (atoms(A)) }. The Boolean monadic datalog query QBool specified by Q = (P, P ) is the Boolean query with QBool (T ) = yes iff the tree’s root node belongs to Q(T ). The size ||Q|| of a monadic datalog query Q is the length of Q = (P, P ) viewed as a string over a suitable alphabet. Expressive power of monadic datalog on trees. On ordered Σ-labeled trees represented as τGK,Σ -structures, monadic datalog can express exactly the same unary queries as monadic second-order logic [10] — for short, we will say “mDatalog(τGK ) = MSO(τGK ) on ordered trees”. Since the child and desc child,desc relations are definable in MSO(τGK ), mDatalog(τGK ) = mDatalog(τGK ) on ordered trees. Moreover, for (ordered or unordered) trees, every monadic Datalog query that uses the desc-axis can be rewritten in 1-fold exponential time into an equivalent monadic datalog query which uses the child-axis, but not the desc-axis (see the proof of Lemma 23 in the full version of [8]). Using the monotonicity of the immediate consequence operator, one obtains that removing any of the predicates root, leaf , ls from τGK strictly decreases the expressive power of mDatalog on ordered trees (see [9]). By a similar reasoning root,leaf ,desc one also obtains that on unordered trees, represented as τu,Σ -structures, monadic datalog is strictly less expressive than monadic second-order logic, and omitting any of the predicates root, leaf further reduces the expressiveness of monadic datalog on unordered trees [9]. The Query Containment Problem (QCP). Let τΣ be one of the schemas used for representing (ordered or unordered) Σ-labeled trees as relational struc- tures. For two unary queries Q1 and Q2 of schema τΣ we write Q1 ⊆ Q2 to indicate that for every Σ-labeled tree T we have Q1 (T ) ⊆ Q2 (T ). Similarly, if Q1 and Q2 are Boolean queries of schema τΣ , we write Q1 ⊆ Q2 to indicate that for every Σ-labeled tree T , if Q1 (T ) = yes then also Q2 (T ) = yes. We write Q1 6⊆ Q2 to indicate that Q1 ⊆ Q2 does not hold. For a schema τ , the query containment problem (QCP) for mDatalog(τ ) on finite labeled trees receives as input a finite alphabet Σ and two (unary or Boolean) mDatalog(τΣ )-queries Q1 and Q2 , and the task is to decide whether Q1 ⊆ Q2 . From [8] we know: Theorem 1 (Frochaux et al. [8]) The QCP for mDatalog(τuroot,leaf ,desc ) on child,desc unordered trees and for mDatalog(τGK ) on ordered trees can be solved in 2-fold exponential time. 3 2Exptime-hardness on Ordered Trees child,desc Theorem 2 The QCP for Boolean mDatalog(τGK ) on finite labeled or- dered trees is 2Exptime-hard. The proof is by a reduction based on a 2Exptime-hardness result of Björklund, Martens, and Schwentick [3]. For stating their result, we recall some nota- tion used in [3]. A nondeterministic (unranked) tree automaton (NTA) A = (Σ, S, ∆, F ) consists of an input alphabet Σ, a finite set S of states, a set F ⊆ S of accepting states, and a finite set ∆ of transition rules of the form (s, α) → L, where s ∈ S, α ∈ Σ, and L is a regular string-language over S. A run of the NTA A on a ordered Σ-labeled tree T is a mapping ρ : V T → S such that the following is true for all nodes v of T , where α denotes the label of v in T : if v has n > 0 children u1 , . . . , un (in order from the left to the right), then there exists a rule (s, α) → L in ∆ such that ρ(v) = s and wv ∈ L, for the string wv := ρ(u1 ) · · · ρ(un ). In particular, if v is a leaf, then there must be a rule (s, α) → L in ∆ such that ρ(v) = s and ε ∈ L, where ε denotes the empty string. A run ρ of A on T is accepting, if T ’s root note v is labeled with an accepting state of A, i.e., ρ(v) ∈ F . A finite ordered Σ-labeled tree T is accepted by A, if there exists an accepting run of A on T . We write L(A) to denote the language of A, i.e., the set of all finite ordered Σ-labeled trees that are accepted by A. To present an NTA A = (Σ, S, ∆, F ) as an input for an algorithm, the string- languages L that occur in the right-hand side of rules in ∆ are specified by NFAs AL = (ΣL , QL , δL , qL , FL ), whose input alphabet is ΣL := S, and where QL is a finite set of states, δL ⊆ (QL × ΣL × QL ) is a transition relation, qL ∈ QL is the initial state, and FL ⊆ QL is the set of accepting states of AL . The size of AL is ||AL || := |QL | + |δL |, and the size of A is the sum of |Σ|, |S|, |∆|, and ||AL ||, for all L ∈ strL(A), where strL(A) is the set of all string-languages L that occur in the right-hand side of a rule in ∆. In [3], NTAs are used to describe schema information. A Boolean query Q is said to be valid with respect to an NTA A if Q(T ) = yes for every ordered Σ-labeled tree T ∈ L(A). The particular queries of interest here are Boolean desc CQ(child, desc) queries, i.e., Boolean conjunctive queries of schema τu,Σ = {child, desc} ∪ {labelα : α ∈ Σ}, for a suitable alphabet Σ. The problem “validity of Boolean CQ(child, desc) w.r.t. a tree automaton” receives as input a Boolean CQ(child, desc) query Q and an NTA A, and the task is to decide whether Q is valid with respect to A. Theorem 3 (Björklund et al. [3]) Validity of Boolean CQ(child, desc) w.r.t. a tree automaton is 2Exptime-complete. Our proof of Theorem 2 is via a polynomial-time reduction from the problem validity of Boolean CQ(child, desc) w.r.t. a tree automaton to the QCP for child,desc Boolean mDatalog(τGK ) on finite labeled ordered trees. Let QCQ be a Boolean CQ(child, desc)-query, and let A be an NTA with desc input alphabet Σ. We translate QCQ into an equivalent mDatalog(τu,Σ )-query 0 QCQ = (P, P ): If QCQ is of the form Ans() ← R1 (u1 ), . . . , R` (u` ) for relational atoms R1 (u1 ), . . . , R` (u` ), we choose an arbitrary variable x that occurs in at least one of these atoms, we use a new unary idb-predicate P , and we let P be the program consisting of the two rules P (x) ← R1 (u1 ), . . . , R` (u` ) and P (x) ← child(x, y), P (y). Then, for every ordered Σ-labeled tree T we have Q0CQ,Bool (T ) = yes iff QCQ (T ) = yes. The following Lemma 4 constructs, in time polynomial in the child size of A, an mDatalog(τGK,Σ )-query QA which is equivalent to A, i.e., for every ordered Σ-labeled tree T we have QA,Bool (T ) = yes iff T ∈ L(A). Note that QCQ is valid w.r.t. A if, and only if, QA,Bool ⊆ Q0CQ,Bool . Thus, we obtain the desired polynomial-time reduction, showing that the QCP for Boolean child,desc mDatalog(τGK ) on finite ordered Σ-labeled trees inherits the 2Exptime- hardness from the problem “validity of Boolean CQ(child, desc) w.r.t. a tree automaton”. All that remains to finish the proof of Theorem 2 is to prove the following Lemma 4. child Lemma 4 For every NTA A = (Σ, S, ∆, F ) there is an mDatalog(τGK,Σ )-query Q = (P, P ), such that for every finite ordered Σ-labeled tree T we have QBool (T ) = yes iff T ∈ L(A). Furthermore, Q is constructible from A in time polynomial in the size of A. Proof. We construct a monadic datalog program P which, for every node v of T , computes information on all states that A can assume at node v, i.e., all states s ∈ S for which there is a run ρ of A on the subtree of T rooted at v, such that ρ(v) = s. To this end, for every state s ∈ S, we will use an idb-predicate s. The query QBool will accept an input tree T if there is an accepting state s ∈ F such that s(rootT ) ∈ TPω (T ), where rootT denotes the root of T . The program P is constructed in such a way that it performs a generalised version of the well-known powerset construction. Recall that the transition rules of A are of the form (s, α) → L, where s ∈ S, α ∈ Σ, and L is a regular string-language over S, specified by an NFA AL = (ΣL , QL , δL , qL , FL ) with ΣL = S and δL ⊆ (QL ×ΣL ×QL ). W.l.o.g., we assume that the state sets of all the NFAs are mutually disjoint, and disjoint with S. To emulate the standard powerset construction of the NFA AL , we use an idb-predicate q for every state q ∈ QL , and an extra idb-predicate AccL . If u1 , . . . , un are the children of a node v in an input tree T , the NFA AL processes the strings over alphabet S that are of the form s1 · · · sn , where si is a state that A can assume at node ui (for every i ∈ {1, . . . , n}). We start by letting PL := ∅ and then add to PL the following rules: For the initial state qL of AL , consider all s ∈ S and q ∈ QL such that (qL , s, q) ∈ δL , and add to PL the rule q(x) ← fc(y, x), s(x) . Afterwards, for every transition (q, s, q 0 ) ∈ δL , add to PL the rule q 0 (x0 ) ← q(x), ns(x, x0 ), s(x0 ) . Finally, for every accepting state q ∈ FL of AL , add to PL the rule AccL (x) ← ls(x), q(x) . Clearly, the program PL can be constructed in time polynomial in ||AL ||. Now, we are ready to construct the monadic datalog program P that simu- lates the NTA A. We start by letting P be the disjoint union of the programs PL , for all L ∈ strL(A). The computation of A on an input tree T starts in the leaves of T . Thus, to initiate the simulation of A, we consider every rule (s, α) → L in ∆, where ε ∈ L.1 For each such rule, we add to P the rule s(x) ← labelα (x), leaf (x) . Note that for each L ∈ strL(A), the program PL ensures that every last sibling un of a node v will be marked by AccL (un ) iff the states of A assigned to un and its siblings form a string in L. To transfer this information from the last sibling to its parent node, we add to P the rule childAccL (y) ← child(y, x), ls(x), AccL (x) , where childAccL is a new idb-predicate, for every L ∈ strL(A). Afterwards, we consider every rule (s, α) → L in ∆, and add to P the rule s(x) ← childAccL (x), labelα (x) . Finally, to test if A accepts an input tree T , we add rules to test whether T ’s root is assigned an accepting state of A. To this end, we consider every accepting state s ∈ F of A and add to P the rule P (x) ← root(x), s(x) . This finishes the construction of the program P and the query Q = (P, P ). child Clearly, P is a monadic datalog program of schema τGK,Σ , and Q can be con- structed in time polynomial in ||A||. It is not difficult, but somewhat tedious, to 1 Note that “ε ∈ L ?” can be checked by simply checking whether qL ∈ FL . verify that, as intended by the construction, indeed for every finite ordered Σ- labeled tree T we have QBool (T ) = yes if, and only if, there exists an accepting run of the NTA A on T . This completes the proof of Lemma 4. t u 4 2Exptime-hardness on Unordered Trees Our next aim is to transfer the statement of Theorem 2 to unordered trees. Precisely, we will show the following. Theorem 5 The QCP for Boolean mDatalog(τudesc ) on finite labeled unordered trees is 2Exptime-hard. For proving Theorem 5, we cannot directly build on Björklund et al.’s The- orem 3, since their NTAs explicitly refer to ordered trees. By constructing suitable reductions, we can show that proving Theorem 5 boils down to proving the following Theorem 6, which deals with the emptiness problem on trees over a ranked alphabet. For the remainder of this section, Σ 0 will denote a ranked finite alphabet. I.e., Σ 0 is a finite set of symbols, and each symbol α ∈ Σ 0 is equipped with a fixed arity ar(α) ∈ N. An unordered ranked Σ 0 -labeled tree is an unordered Σ 0 -labeled tree where each node labeled with symbol α ∈ Σ 0 has exactly ar(α) desc children. For a Boolean mDatalog(τu,Σ 0 )-query Q, we say that Q is unsatisfiable by unordered ranked trees (in symbols: Q = ∅) if for every finite unordered ranked Σ 0 -labeled tree T we have Q(T ) = ∅. The emptiness problem for Boolean desc 0 mDatalog(τu,Σ 0 ) on finite unordered ranked Σ -labeled trees receives as input a desc Boolean mDatalog(τu,Σ 0 )-query Q, and the task is to decide whether Q = ∅. The main technical step needed for proving Theorem 5 is to prove the following. Theorem 6 There is a ranked finite alphabet Σ 0 , such that the emptiness prob- desc 0 lem for Boolean mDatalog(τu,Σ 0 ) on finite unordered ranked Σ -labeled trees is 2Exptime-hard. For the proof of Theorem 6, we can build on the approach used by Björklund et al. for proving Theorem 3: As in [3], we proceed by a reduction from the word problem for exponential-space bounded alternating Turing machines, which is known to be 2Exptime-complete [4]. The remainder of this section is devoted to the proof of Theorem 6. An alternating Turing machine (ATM) is a nondeterministic Turing machine A = (Q, Σ, Γ, δ, q0 ) whose state space Q is partitioned into universal states Q∀ , existential states Q∃ , an accepting state qa , and a rejecting state qr . The ATM’s tape cells are numbered 0,1,2,. . . . A configuration of A is a finite string of the form w1 qw2 with w1 , w2 ∈ Γ ∗ and q ∈ Q, representing the situation where the ATM’s tape contains the word w1 w2 , followed by blanks, the ATM’s current state is q, and the head is positioned at the first letter of w2 . A configuration w1 qw2 is a halting (universal, existential, resp.) configuration if q ∈ {qa , qr } (q ∈ Q∀ , q ∈ Q∃ , resp.). W.l.o.g., no halting configuration has a successor configuration, and every halting configuration is of the form qw. A computation tree TA of the ATM A on input w ∈ Σ ∗ is a tree labeled with configurations of A, such that the root of TA is labeled by q0 w, and for each node v of TA labeled by w1 qw2 , – if q ∈ Q∃ , then u has exactly one child, and this child is labeled with a successor configuration of w1 qw2 , – if q ∈ Q∀ , then u has a child v for every successor configuration w10 q 0 w20 , and v is labeled by w10 q 0 w20 , – if q ∈ {qa , qr }, then u is a leaf of TA . A computation tree is accepting if all its branches are finite and all its leaves are labeled by configurations with state qa . The language L(A) of A is defined as the set of all words w ∈ Σ ∗ , for which there exists an accepting computation tree of A on w. W.l.o.g., we will assume that the ATM is normalized, i.e., every non- halting configuration has precisely two successor configurations, each universal step only affects the state of the machine, and the machine always alternates between universal and existential states. The proof of Theorem 6 proceeds by a reduction from the word problem for exponential-space bounded ATMs A. The reduction itself will be done from an ATM with empty input word. To this end, we construct, in the canonical way, for the given exponential-space bounded ATM A and the given word w ∈ Σ ∗ an ATM Aw that works in space exponential in the size of w and accepts the empty word if, and only if, A accepts w. Since A is exponential-space bounded, the non-blank portion of the ATM’s tape during a computation of Aw will never be longer that 2n , where n is polynomial in the size |w| of the original input. The crucial point of the reduction is to find an encoding of computation trees desc of Aw on empty input, which can be verified by a mDatalog(τu,Σ 0 )-query that can be constructed in time polynomial in the size of Aw . For this, it is necessary to find a smart encoding of the tape inscription of length 2n . This encoding shall allow to compare the content of every tape cell with the same tape cell of the successor configuration. To achieve this, we adapt the encoding of Björklund et al. [3]; in particular, we use their very elegant “navigation gadgets”. We choose a fixed ranked finite alphabet Σ 0 which, among other symbols, con- tains a 0-ary symbol ⊥, unary symbols r, p, m, 0, 1, binary symbols CTleft ∃ , CT∃ right , and 3-ary symbols CT∀ and s. Consider a computation tree TAw of a normalized ATM Aw = (Q, Σ, Γ, δ, q0 ), see Figure 1. We fix an arbitrary order on the children of nodes in TAw , such that every universal node has a left child and a right child. The encoding T := enc(TAw ) is the ranked Σ 0 -labeled unordered tree obtained from TAw by replacing every node v labeled w1 qw2 with a Σ 0 -labeled ranked tree enc(tv ), as follows: – if v is universal, then the root of enc(tv ) is labeled with CT∀ , – if v is existential, and v is the root of TAw or v is the left child of a universal node, then the root of enc(tv ) is labeled with CTleft∃ , – if v is existential, and v is the right child of a universal node, then the root of enc(tv ) is labeled with CTright ∃ , (a) (b) v1 w10 q1 w100 CT∀ v2 w20 q2 w200 r CTright ∃ v3 w30 q3 w300 r CT∀ w40 q4 w400 v4 v5 w50 q5 w500 r CTright ∃ CTleft ∃ Fig. 1. (a) A part of a computation tree TAw , where the node v1 labeled by w10 q1 w100 is universal, and its children are existential. The node v2 labeled by w20 q2 w200 is the right child of v1 . The node v2 has one child, the univeral node v3 . (b) The replacement of v1 is a tree with a root node labeled by CT∀ and with three children, the first is labeled by r and is the root of the subtree encoding the configuration in v1 , the second is the replacement for its left child, and the third is the replacment for its right child. The obtained tree T := enc(TAw ) is an unordered ranked Σ 0 -labeled tree. – exactly one child of the root of enc(tv ) is labeled by r (this will be the root of the subtree that encodes the configuration at v), and – for each child u of v in TAw , enc(tv ) has a subtree enc(tu ), which is the encoded subtree of TAw obtained by the replacement of u. The subtree γr rooted at the r-labeled child of the root of enc(tv ), encodes the configuration c := w1 qw2 represented by node v in TAw . Since A is exponential- space bounded, the tape inscription of c has length 6 2n . For representing c, we use a full binary ordered tree of height n. The path from the root to a leaf specifies the address of the tape cell represented by the leaf, and the leaf carries information on the tape cell’s inscription and, in case that the tape cell is the current head position, also information on the current state; all this information is encoded by a suitable tape cell gadget that is attached to the “leaf”. The number k of possible tape cell inscriptions (enriched with information on the current state) is polynomial in ||Aw ||. The nodes of the “full binary tree” are called skeleton nodes and are labeled s. To ensure that the desired query Q can be constructed in polynomial time, we attach to each skeleton node a navigation gadget [3], which is a path of length 4. To indicate that a node is a left (resp., right) child, this gadget is labeled p − 0 − 1 − ⊥ (resp., p − 1 − 0 − ⊥). See Figure 2 for an illustration of the navigation gadget and the tape cell gadget. Given an ATM A and a word w ∈ Σ ∗ , we construct in polynomial time desc an mDatalog(τu,Σ 0 )-query Q = (P, Ans) such that QBool 6= ∅ iff there is an accepting computation tree for Aw on ε, i.e., w ∈ L(A). The query Q consists of two parts, one to verify that the structure of the input tree represents an encoded computation tree, and the other to verify consistency with the ATM’s transition relation. Details can be found in the paper’s full version. The particular choice of the navigation gadgets ensures that Q can be constructed in time polynomial (a) (b) s sleaf s p s p m 1 0 0 i 0 1 k 1 ⊥ 0 ⊥ ⊥ Fig. 2. (a) A skeleton node and its navigation gadget, indicating that the node is its parent’s left child. (b) A skeleton node encoding a leaf of the configuration tree. This leaf is its parent’s right child. It has a tape cell gadget m followed by k digits, the i-th of which is labeled with 1 iff the tape cell’s inscription is represented by the number i. in the size of A and w. The only point where we make essential use of the desc- predicate is during the comparison of the cells by using the navigation gadgets. 5 Final Remarks Along with the upper bound provided by Theorem 1, and since τudesc ⊆ τochild,desc , Theorem 5 implies the following corollary, which summarizes our main results. Corollary 7 The QCP is 2Exptime-complete for Boolean mDatalog(τudesc ) on finite labeled unordered trees, and for Boolean mDatalog(τochild,desc ) on finite labeled ordered trees. By applying standard reductions, the 2Exptime-completeness results of Corol- lary 7 carry over from the QCP to the equivalence problem. When restrict- ing attention to ranked trees over a ranked finite alphabet, the 2Exptime- completeness results also carry over to the emptiness problem. For unranked labeled trees, the emptiness problem for mDatalog(τochild,desc ) is in 2Exptime, but we currently do not have a matching 2Exptime-hardness result. An overview of the currently known results is given in Table 1; for further information and detailed proofs we refer to [7]. Table 1. Complexity of monadic datalog on finite labeled trees; N ⊆ {root, leaf } and M ⊆ {root, leaf , ls, child}; “c” (“h”) means “complete” (“hard”). N ∪{desc} M ∪{child,desc} child,desc τuN τoM τu τo τGK Exptime-h & in 2Exptime unranked Emptiness Exptime-c 2Exptime-c ranked unranked Equivalence Exptime-c 2Exptime-c ranked unranked Containment Exptime-c 2Exptime-c ranked References [1] Abiteboul, S., Bourhis, P., Muscholl, A., Wu, Z.: Recursive queries on trees and data trees. In: Proc. ICDT’13. pp. 93–104 (2013) [2] Benedikt, M., Bourhis, P., Senellart, P.: Monadic datalog containment. In: Proc. ICALP’12. pp. 79–91 (2012) [3] Björklund, H., Martens, W., Schwentick, T.: Optimizing conjunctive queries over trees using schema information. In: Proc. MFCS’08. pp. 132–143 (2008), full version: http://www8.cs.umu.se/~henrikb/papers/mfcs08full.pdf (accessed: 2016-03-05) [4] Chandra, A.K., Kozen, D., Stockmeyer, L.J.: Alternation. J. ACM 28(1), 114–133 (1981), http://doi.acm.org/10.1145/322234.322243 [5] Cosmadakis, S., Gaifman, H., Kanellakis, P., Vardi, M.: Decidable optimization problems for database logic programs. In: Proc. STOC’88. pp. 477–490 (1988) [6] Dantsin, E., Eiter, T., Gottlob, G., Voronkov, A.: Complexity and expressive power of logic programming. ACM Comput. Surv. 33(3), 374–425 (2001) [7] Frochaux, A.: Static Analysis of Monadic Datalog on Finite Labeled Trees. Doc- toral Dissertation, Humboldt-Universität zu Berlin, to appear [8] Frochaux, A., Grohe, M., Schweikardt, N.: Monadic datalog containment on trees. In: Proc. AMW’14 (2014), full version: http://arxiv.org/abs/1404.0606 [9] Frochaux, A., Schweikardt, N.: A note on monadic datalog on unranked trees. Technical Report, available at http://arxiv.org/abs/1310.1316 (2013) [10] Gottlob, G., Koch, C.: Monadic datalog and the expressive power of languages for web information extraction. J. ACM 51(1), 74–113 (2004)