=Paper=
{{Paper
|id=None
|storemode=property
|title=Query reasoning on data trees with counting
|pdfUrl=https://ceur-ws.org/Vol-1659/paper5.pdf
|volume=Vol-1659
|authors=Everardo Barcenas,Edgard Benítez-Guerrero,Jesús Lavalle
|dblpUrl=https://dblp.org/rec/conf/lanmr/BarcenasBL16
}}
==Query reasoning on data trees with counting==
Query Reasoning on Data Trees with Counting Everardo Bárcenas1,2 , Edgard Benı́tez-Guerrero2 , and Jesús Lavalle3,4 1 CONACYT 2 Universidad Veracruzana 3 Benemérita Universidad Autónoma de Puebla 4 Instituto Nacional de Astrofı́sica, Óptica y Electrónica iebarcenaspa@conacyt.mx,edbenitez@uv.mx,jlavalle@cs.buap.mx Abstract. Regular path expressions represent the navigation core of the XPath query language for semi-structured data (XML), and it has been characterized as the First Order Logic with Two Variables (FO2 ). Data tests refers to (dis)equality comparisons on data tree models, which are unranked trees with two kinds of labels, propositions from a finite alphabet, and data values from a possibly infinite alphabet. Node occur- rences on tree models can be constrained by counting/arithmetic con- structors. In this paper, we identify an EXPTIME extension of regular paths with data tests and counting operators. This extension is char- acterized in terms of a closed under negation Presburger tree logic. As a consequence, the EXPTIME bound also applies for standard query reasoning (emptiness, containment and equivalence). 1 Introduction XPath is a W3C standard query language for semi-structured data (XML), and it also takes an important role in many XML technologies, such as, XProc, XSLT, and XQuery [1, 2]. The navigation core of XPath, also known as regular path queries, has been recently characterized by the First Order Logic of Two Variables (FO2 ) [1]. Models for this logic are unranked trees, where nodes are labeled by propositions from a finite alphabet. Data tests, also known as data joins in databases community, on XPath queries are expressions of the forms ρ1 ≡ ρ2 and ρ1 6≡ ρ2 . These expressions hold whenever data values (propositions from an infinite alphabet) contained in path ρ1 are (dis)equal to data values contained in path ρ2 , respectively. Another important constructors on XPath queries concerns counting: ρ1 # ρ2 , where # ∈ {≤, >, =, 6=}. These expressions hold whenever the number of nodes denoted by ρ1 and ρ2 satisfies constraint #. There are several recent works studying regular path extensions with either data tests or counting [3–6, 2]. However, as far as we know, the current work rep- resent the first study on regular path extensions concerning both constructors, data tests and counting. More precisely, we give a characterization of a regular paths with data test with respect to constants ρ ≡ k (ρ 6≡ k) and with count- ing operators on children paths. For this characterization we use a modal tree logic equipped with a fixed point operator, converse modalities and Presburger 33 arithimetic constraints [7]. Due to this characterization, the EXPTIME bound from the logic is imported to standard query reasoning (emptiness, containment, and equivalence) with counting and data tests. There are several extensions of FO2 with data tests [8–11]. In [8], FO2 (< , +1, ≡) for data trees is introduced: < stands for descendants and following sibling relations, +1 refers to child and next sibling relations, and ≡ is a bi- nary predicate for data tests. Decidability, without any complexity analysis, for FO2 (<, +1, ≡) in data trees is first shown by a reduction to the reachability prob- lem of a counter tree automata model. Previously in [10], the same result was obtained for data words (one branched tree). Even earlier in [11], FO2 (+1, ≡) for trees was introduced and shown decidable in 3NEXPTIME. In another direction, regarding regular paths (XPath navigation core), it is well know data test on full navigation regular paths is undecidable [5]. Several fragments (downward, forward, transitive) of regular path expressions with data tests are studied [12, 13, 6, 5, 3]. With their corresponding complexity ranging from EXPTIME to non elementary. Contrastingly, in this paper, instead of restricting navigation on queries, we study the full navigation (children, parents, following and previous sibling, descendants and ancestros) regular path expressions, but we restrict data tests to constants only. Regarding regular paths with counting, there are several recent studies [1, 2, 4]. In [1], it was show the extension of regular paths with counting is in general undecidable. EXPTIME fragments (counting with respect to constants) were later identified in [2, 4]. Several other logics with counting have been proposed in the setting of unranked trees [14–16, 7, 17]. In [17], the EXPTIME bound was further developed for a set of coalegebraic modal logics via a type elimination al- gorithm. Excepting [18], where the emptiness problem for ranked tree automaton with equality and counting constraints was shown decidable without a further complexity analysis, all the above works study separately data tests and count- ing. In the current work, we identify an EXPTIME extension of regular paths with both, data tests and counting. We describe a counting and data tests extension of regular paths in Section 2. In Section 3, we describe a modal tree logic with a fixed point, converse modali- ties and Presburger arithmetic constructors. The main result of this paper, which is a characterization of the regular path extension with counting and data tests in terms of the logic, is described in Section 4. We conclude with a summary of this work, together with a brief discussion of further research perspectives in Section 5. 2 Regular Path Queries with Counting and Data Tests For the languages described in the current work, unless otherwise stated, we use a fixed alphabet composed by set of propositions P ROP S and a a set of modalities M ODS = {, , , }. Intuitively, propositions are used in tree models to label nodes, and modalities are interpreted as the children, parent , right siblings , and left siblings relations. 34 We now introduce the notion of a tree, which can be seen as a tree-shaped Kripke structure (transition system). Definition 1 (Tree). A tree T is defined as a tuple (N, R, L), such that: N is a finite set of nodes; R : N × M ODS × N is a transition relation among nodes and modalities forming a tree (we often write n ∈ R(n0 , m) instead of (n0 , m, n) ∈ R); and L : N × P ROP S is a left-total labeling relation (we often write p ∈ L(n) instead of (n, p) ∈ L). The set of data values are the set of natural numbers N. Data trees can be seen as an extension of trees (Definition 1), where nodes are labeled with data values and propositions. Definition 2 (Data tree). A data tree Γ is defined as a tuple (N, R, L, D), such that: (N, R, L) is a tree; and D : N 7→ N is a total function. We now give a precise syntax of regular paths with counting and data tests. Definition 3 (Syntax). We define the RPQCD expressions (queries) by the following grammar: ρ :=> | α | p | α : p | ρ/ρ | ρ[β] β :=ρ | ρ − ρ # k | ρ ≡ k | ¬β | β ∨ β where p ∈ P ROP S, k ∈ N, # ∈ {>, ≤, =} and α ∈ {, , , , ? , ? }. In the case of ρ1 − ρ2 # k, both ρi (i = 1, 2) are restricted to be children paths, that is, they have one of the following forms: , : p, [β] or : p[β]. RPQCD expressions are interpreted over data trees: > selects the entire set of nodes; α : p navigates through α and selects the p nodes; ρ1 /ρ2 is the composi- tions of paths; and ρ[β] selects the nodes denoted by ρ satisfying condition β. In particular, when β is ρ ≡ k, it holds whenever there is a node denoted by ρ whose data value is equal to k. ρ1 − ρ2 # k is true if and only if the number of nodes selected by ρ1 minus the number of nodes selected by ρ2 , satisfies constraint #k. Notice some syntactic sugar (notation) as ρ1 #ρ2 instead of ρ1 − ρ2 #0 can also be defined. Negation and disjunction are interpreted as expected. We now give a precise description on how RPQCD expressions are interpreted over data trees. 35 Definition 4 (Semantics). Given a data tree Γ = (N, R, L, D), RPQCD ex- pressions are interpreted as follows: Γ [[>]] =N × N Γ [[p]] = {(n, n) | p ∈ L(n)} n o Γ α [[α]] = (n1 , n2 ) | n1 → n2 n o Γ Γ [[α : p]] = (n1 , n2 ) ∈ [[α]] | p ∈ L(n2 ) Γ Γ Γ [[ρ1 /ρ2 ]] = [[ρ1 ]] ◦ [[ρ2 ]] n o Γ Γ Γ [[ρ[β]]] = (n1 , n2 ) ∈ [[ρ]] | n2 ∈ [[[β]]] n o Γ Γ [[[ρ]]] = n | (n, n0 ) ∈ [[ρ]] n n o n o o Γ Γ Γ [[[ρ1 − ρ2 # k]]] = n | n1 | (n, n1 ) ∈ [[ρ1 ]] − n2 | (n, n2 ) ∈ [[ρ2 ]] #k n o Γ Γ [[[ρ ≡ k]]] = n | (n0 , n) ∈ [[[ρ]]] , D(n) = k Γ Γ [[[¬β]]] =N \ [[[β]]] Γ Γ Γ [[β1 ∨ β2 ]] = [[[β1 ]]] ∪ [[[β2 ]]] α where n1 → n2 holds, if and only if, n1 is related to n2 through α in Γ . We also interpret RPQCD expressions with respect to a context, more precisely, the interpretation of a RPQCD expression ρ on n a data tree Γ from a subset o of Γ Γ nodes N 0 (of Γ ) is defined as follows: [[ρ]]N 0 = n0 | (n, n0 ) ∈ [[ρ]] , n ∈ N 0 We now define the standard query reasoning problems for RPQCD: empti- ness, containment and equivalence. Definition 5 (Reasoning). – We say a RPQCD expression ρ is empty, if and only if, for any data tree Γ , Γ we have that [[ρ]] 6= ∅. – Given two RPQCD expressions ρ1 and ρ2 , we say ρ1 is contained in ρ2 , Γ written ρ1 ⊆ ρ2 , if and only if, for any data tree Γ , we have that [[ρ1 ]] ⊆ Γ [[ρ2 ]] . – Given two RPQCD expressions ρ1 and ρ2 , we say ρ1 is equivalent to ρ2 , if and only if, ρ1 ⊆ ρ2 and ρ2 ⊆ ρ1 . 3 A Presburger Tree Logic We now describe a modal tree logic, as originally introduced in [7], with a fixed point, converse modalities and Presburger arithmetic operators. Definition 6 (Syntax). We inductively define the set of µTLIC formulas by the following grammar: φ := p | ¬φ | φ ∨ φ | hmi φ | µx.φ | φ − φ # k, where p ∈ P ROP S, m ∈ M ODS, # ∈ {>, ≤, =}, and k ∈ N coded in binary form. 36 µTLIC expressions are interpreted as subset tree nodes: propositions are used as node labels; negation is interpreted as set complement; disjunction as set union; modal formulas hmi φ holds in nodes where there is at least one m transition to a node supporting φ; the fixed point operator µx.φ is interpreted as a recursion operator; and Presburger formulas φ − ψ # k selects nodes whose φ children minus ψ children satisfy constraint # k. Before formally introduce the interpretation of µTLIC formulas, we first de- fine a valuation function V : X 7→ N of set of variables x over a set of nodes of a given tree. Definition 7 (Semantics). Given a tree T = (N, R, L) and a valuation V , µTLIC formulas are interpreted as follows: T [[p]]V = {n | p ∈ L(n)} T T [[¬φ]]V =N \ [[φ]]V T T T [[φ ∨ ψ]]V = [[φ]]V ∪ [[ψ]]V n o T T [[hmi φ]]V = n | R(n, m) ∩ [[φ]]V T \n T o [[µx.φ]]V = M | [[φ]]V [M /x ] ⊆ M n o T T T [[φ − φ # k]]V = n | R(n, ) ∩ [[φ]]V − R(n, ) ∩ [[ψ]]V # k Without loss of generality, we assume variables can only occur bounded, and in the scope of modal or counting formulas [7]. Furthermore equivalent negated normal forms can also be achieved by traditional De Morgan’s and modal rules: ¬ hmi φ := [m] ¬φ, ¬(φ ∨ ψ) := ¬φ ∧ ¬ψ, ¬µx.φ := νx.¬φ [x /¬x ], ¬(φ−ψ > k) := φ−ψ ≤ k, ¬(φ−ψ ≤ k) := φ−ψ > k, ¬(φ−ψ = k) := φ−ψ 6= k, and ¬(φ − ψ 6= k) := φ − ψ = k. We conclude this Section recalling the complexity of µTLIC. Theorem 1 ([7]). µTLIC is in EXPTIME-complete. 4 Logic characterization In this Section we give a characterization of RPQCD expressions in terms of µTLIC formulas. First we define a non-data version of data trees. Intuitively, data values in data trees are represented by children nodes labeled by a fresh proposition δ. For instance, if a node has value k, then its non-data version has k children labeled by δ. Then, Presburger formulas can be used to test values in non-data trees. Definition 8. Provided a data tree Γ = (N, R, L, D), we define the tree T (Γ ) = (N 0 , R0 , L0 ) as follows: – let Ni be a set of ki new nodes (N ∩ Ni = ∅) induced by data values of nodes S|N | in N , that is, for each ni ∈ N , D(ni ) = ki , then N 0 = N ∪ i=1 Ni ; 37 S|N | – let Ri = {ni } × {} × Ni , then R0 = R ∪ i=1 Ri ; S|N | – and let Li : Ni × {δ} be left total, then L0 = L ∪ i=1 Li , provided δ is a proposition not occurring in L, that is, for each n ∈ N , if (n, p) ∈ L, then δ 6= p. We now give a precise translation of regular paths with counting and data tests in terms of the logic. Definition 9. We define a translation function F of RPQCD expressions in terms of the logic as follows: F (>, C) := C ∧ ¬δ F (p, C) := p ∧ ¬δ ∧ C F (, C) := ¬δ ∧ hi C F (, C) := ¬δ ∧ hi C F ( , C) := ¬δ ∧ hi C F (, C) := ¬δ ∧ h i C F (? , C) := ¬δ ∧ µx. hi (C ∨ x) F (? , C) := ¬δ ∧ µx. hi (C ∨ x) F (α : p, C) := F (α, C) ∧ F (p, >) F (ρ1 /ρ2 , C) := F (ρ2 , F (ρ1 , C)) F (ρ[β], C) := F (ρ, C) ∧ G(β, >) where δ is a fresh proposition and G is a translation of qualifiers (Definition 10). Definition 10. We define a translation of qualifiers in terms of the logic as follows: G(>, C) := C ∧ ¬δ G(α, C) := F (α, C) G(p, C) := p ∧ ¬δ ∧ C G(α : p, C) := F (α, C ∧ p ∧ ¬δ) G(ρ1 /ρ2 , C) := G(ρ1 , G(ρ2 , C)) G(ρ[β], C) := G(ρ, G(β, >) ∧ C) G(¬β, C) := ¬G(β, C) G(β1 ∨ β2 , C) := G(β1 , C) ∨ G(β2 , C) ≡ G(ρ ≡ k, C) := G (ρ ≡ k, C) G(ρ1 −ρ2 # k, C) := G #(ρ1 , C)−G #(ρ2 , C) # k G≡ (> ≡ k, C) := (φδ ∧ hi G(>, C) = k G≡ (p ≡ k, C) := (φδ ∧ hi G(p, C)) = k G≡ (α ≡ k, C) := G(α, (φδ ∧ hi C) = k) G≡ (α : p ≡ k, C) := G(α, (φδ ∧ hi (C ∧ p)) = k) G≡ (ρ1 /ρ2 ≡ k, C) := G(ρ1 , G≡ (ρ2 ≡ k, C)) G≡ (ρ[β] ≡ k, C) := G≡ (ρ ≡ k, G(β, >) ∧ C) φδ := δ ∧ ¬p0 ∧ ¬ hi > G # (, C) := C ∧ ¬δ G # (: p, C) := p ∧ ¬δ ∧ C G # ( [β], C) := G(β, >) ∧ C G # (: p[β], C) := p ∧ ¬δ ∧ G(β, >) ∧ C provided that α is the dual relation of α, more precisely, =, =, ? =? , and α = α; and where p0 represents all other propositions distinct to δ (recall the set of propositions is finite). 38 Since translation of paths consider a context represented by formulas, we now give a non-data version of formulas. Intuitively, context formulas are indis- tinguishably interpreted over data and non-data trees. Definition 11 (Context formula). Given a formula φ in negated normal form, its corresponding context formula φC is inductively defined as follows: pC := p (¬p)C := ¬δ ∧ ¬p (φ ∨ ψ)C := φC ∨ ψ C (φ ∧ ψ)C := φC ∧ ψ C (hmi φ)C := ¬δ ∧ hmi φC ([m] φ)C := ¬δ ∧ [m] φC C C (µx.φ)C := φ µx.φ /x (νx.φ)C := φ νx.φ /x (φ − ψ # k)C := φC − ψ C # k Lemma 1. Given any data tree Γ , for any formula φ and any valuation V , we Γ T (Γ ) have that φC V = φC V . We now describe the main result of this paper: a characterization of regular paths with counting and data tests in terms of Presburger formulas. Theorem 2 (Logic characterization of data queries). For any ρ RPQCD expression, data tree Γ , µTLIC context formula φC , and any valuation V , we have the following: Γ T (Γ ) – [[ρ]][[φC ]]T (Γ ) = F ρ, φC V ; and V – F ρ, φC is of polynomial size with respect to q and φC . An immediate consequence of Theorems 1 and 2 is an EXPTIME bound for RPQCD reasoning. Corollary 1. Reasoning (emptiness, containment and equivalence) on regular path queries with counting and data tests (RPQCD) is in EXPTIME. 5 Discussion We introduced an extension of regular path expressions with counting and data tests. Counting operators express occurrence restrictions on children path ex- pressions, whereas data tests express (dis)equality relations among paths with respect to their data values. We give a characterization of the extension of reg- ular paths in terms of a Presburger logic originally introduced in [7]. Since the characterization is polynomial and the logic is closed under negation, the EXP- TIME bound of the logic is then imported for the emptiness, containment and equivalence of paths with counting and data tests. As a first further research per- spective we propose the study of the model checking problem of the Presburger logic (it is known a quadratic-time model checking algorithm for the logic with- out converse modalities [19]). This would imply complexity bound for the query evaluation of paths with counting and data tests. As another future work, we propose the study of further data test extensions of regular paths, in the setting of expressive modal logics with efficient reasoning Fischer-Ladner algorithms as in [7]. 39 Acknowledgment. This work was partially developed under the support of the Mexican National Science Council (CONACYT) in the scope of the Cátedras CONACYT project “Infraestructura para Agilizar el Desarrollo de Sistemas Cen- trados en el Usuario” (Ref 3053). References 1. ten Cate, B., Litak, T., Marx, M.: Complete axiomatizations for xpath fragments. J. Applied Logic 8(2) (2010) 153–172 2. Bárcenas, E., Genevès, P., Layaı̈da, N., Schmitt, A.: Query reasoning on trees with types, interleaving, and counting. In: IJCAI, International Joint Conference on Artificial Intelligence. (2011) 3. Figueira, D., Figueira, S., Areces, C.: Model theory of xpath on data trees. part I: bisimulation and characterization. J. Artif. Intell. Res. (JAIR) 53 (2015) 271–314 4. Bárcenas, E., Lavalle, J.: Global numerical constraints on trees. Logical Methods in Computer Science 10(2) (2014) 5. Figueira, D.: On xpath with transitive axes and data tests. In: Symposium on Principles of Database Systems, PODS. (2013) 249–260 6. Figueira, D.: Decidability of downward xpath. ACM Trans. Comput. Log. 13(4) (2012) 34 7. Bárcenas, E., Lavalle, J.: Expressive reasoning on tree structures: Recursion, in- verse programs, Presburger constraints and nominals. In: Mexican International Conference on Artificial Intelligence, MICAI. (2013) 80–91 8. Jacquemard, F., Segoufin, L., Dimino, J.: Fo2(<, +1, ˜) on data trees, data tree automata and branching vector addition systems. Logical Methods in Computer Science 12(2) (2016) 9. Bojańczyk, M., Place, T.: Toward model theory with data values. In: Automata, Languages, and Programming - International Colloquium, ICALP. (2012) 10. Bojańczyk, M., David, C., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data words. ACM Trans. Comput. Log. 12(4) (2011) 27 11. Bojańczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees and XML reasoning. J. ACM 56(3) (2009) 12. Figueira, D.: Forward-xpath and extended register automata on data-trees. In: Database Theory - ICDT, International Conference. (2010) 13. Figueira, D., Segoufin, L.: Bottom-up automata on data trees and vertical xpath. In: Symposium on Theoretical Aspects of Computer Science, STACS. (2011) 14. Dal-Zilio, S., Lugiez, D., Meyssonnier, C.: A logic you can count on. In: Symposium on Principles of Programming Languages, POPL. (2004) 15. Seidl, H., Schwentick, T., Muscholl, A.: Counting in trees. In: Logic and Automata. (2008) 16. Demri, S., Lugiez, D.: Complexity of modal logics with presburger constraints. J. Applied Logic 8(3) (2010) 233–252 17. Kupke, C., Pattinson, D., Schröder, L.: Reasoning with global assumptions in arithmetic modal logics. In: Fundamentals of Computation Theory, FCT. (2015) 18. Barguñó, L., Creus, C., Godoy, G., Jacquemard, F., Vacher, C.: Decidable classes of tree automata mixing local and global constraints modulo flat theories. Logical Methods in Computer Science 9(2) (2013) 19. Bárcenas, E., Benı́tez-Guerrero, E., Lavalle, J.: On the model checking of the graded µ-calculus on trees. In: Mexican International Conference on Artificial Intelligence, MICAI. (2015) 40