=Paper=
{{Paper
|id=Vol-1378/paper21
|storemode=property
|title=Saturation, Definability, and Separation for XPath on Data Trees
|pdfUrl=https://ceur-ws.org/Vol-1378/AMW_2015_paper_21.pdf
|volume=Vol-1378
|dblpUrl=https://dblp.org/rec/conf/amw/AbriolaDF15
}}
==Saturation, Definability, and Separation for XPath on Data Trees==
Saturation, Definability, and Separation for XPath on Data Trees Sergio Abriola1 , Marı́a Emilia Descotte1 , and Santiago Figueira1,2 1 University of Buenos Aires, Argentina 2 CONICET, Argentina Abstract. We study the expressive power of some fragments of XPath equipped with (in)equality tests over data trees. Our main results are the definability theorems, which give necessary and sufficient conditions under which a class of data trees can be defined by a node expression or set of node expressions, and our separation theorems, which give sufficient conditions under which two disjoint classes of data trees can be separated by a class of data trees definable in XPath. Keywords: XPath · data tree · bisimulation · definability · first-order logic · ultraproduct · saturation · separation. 1 Introduction The abstraction of an XML document is a data tree, i.e. a tree whose every node contains a tag or label (such as LastName) from a finite domain, and a data value (such as Smith) from an infinite domain. XPath is the most widely used query language for XML documents; it is an open standard and consti- tutes a World Wide Web Consortium (W3C) Recommendation [3]. XPath= has syntactic operators to navigate the tree using the ‘child’, ‘parent’, ‘sibling’, etc. accessibility relations, and can make tests on intermediate nodes. It can express properties of the underlying tree structure of the XML document, such as “the root of the tree has a child labeled a and a child labeled b”, and it can express conditions on the actual data contained in the attributes, such as “the root of the tree has two children with same tag a but different data value”. First, we provide notions of saturation and ultraproducts that are adequate for XPath= , and show that bisimulation coincides with logical equivalence over saturated data trees. Using these tools, we show definability theorems, giving necessary and sufficient conditions under which a class of data trees can be defined by a node expression or set of node expressions of XPath= . Finally we give separation results, providing sufficient conditions under which two disjoint classes of data trees can be separated by a class of data trees definable in XPath= . While on this work we will only show results for the fragment of XPath that can only navigate via the ‘child’ accessibility relation, similar results hold for the vertical fragment having both the ‘child’ and ‘parent’ navigational operators. The results on definability of this paper appeared originally in [1]. 2 Preliminaries Data trees. We say that T is a data tree if it is a tree from Trees(A × D), where A is a finite set of labels and D is an infinite set of data values. The data of a node x is denoted data(x), and its label is label (x). The set of nodes of a data tree T is denoted T . Downward XPath with data tests. We consider a fragment of XPath that corre- sponds to the navigational part of XPath 1.0 with data equality and inequality. XPath= is a two-sorted language, with path expressions (that we write α, β, γ) and node expressions (that we write ϕ, ψ, η). The downward XPath, no- tated XPath↓= is defined by mutual recursion as follows: α, β ::= o | [ϕ] | αβ | α ∪ β o ∈ {ε, ↓} ϕ, ψ ::= a | ¬ϕ | ϕ ∧ ψ | ϕ ∨ ψ | hαi | hα = βi | hα 6= βi a∈A Node expressions represent properties on nodes. They are evaluated in nodes, and, intuitively, hα = βi is true at x if there are two paths starting in x, one satisfying the property α, and the other satisfying the property β, which end in nodes with equal data value. On the other hand, path expressions represent properties on paths. They are evaluated in pairs of nodes. For instance ↓ is true at (x, y) if y is a child of x, and [ϕ] is true at x, y if x = y and x satisfies ϕ. Let T and T 0 be data trees, and let u ∈ T , u0 ∈ T 0 . We say that T , u and T , u0 are logically equivalent for XPath↓= if no XPath↓= can distinguish node 0 u from u0 . Bisimulations. Notions of bisimulation present a way to determine whether two pointed data trees can be distinguished by a series of moves in XPath. We do not reproduce them here, but it is worth mentioning that they are forms of back-and-forth conditions over two data trees. The main previous result in the literature establishing the connection between bisimulation and equivalence is the following: Theorem 1. [4] If T , u is bisimilar to T 0 , u0 , then they are logically equivalent. If T and T 0 are finitely branching, the other implication also holds. 3 Saturation and quasi-ultraproducts We introduce a notion of saturation for the downward fragment of XPath, and show that the reverse implication of Theorem 1 is true over saturated data trees. Saturation is the key ingredient to show the Definability theorems, but their use lays hidden in the proof. Saturation. Let hΣ1 , . . . , Σn i and hΓ1 , . . . , Γm i be tuples of sets of XPath↓= - formulas. Given a data tree T and u ∈ T , we say that hΣ1 , . . . , Σn i and hΓ1 , . . . , Γm i are =↓n,m -satisfiable [resp. 6=↓n,m -satisfiable] at T , u if there exist v0 → v1 → · · · → vn ∈ T and w0 → w1 → · · · → wm ∈ T such that u = v0 = w0 and 1. for all i ∈ {1, . . . , n}, T , vi |= Σi ; 2. for all j ∈ {1, . . . , m}, T , wj |= Γj ; and 3. data(vn ) = data(wm ) [resp. data(vn ) 6= data(wm )]. We say that hΣ1 , . . . , Σn i and hΓ1 , . . . , Γm i are =↓n,m -finitely satisfiable [resp. 6=↓n,m -finitely satisfiable] at T , u if for every finite Σi0 ⊆ Σi and finite Γj0 ⊆ Γj , we have that hΣ10 , . . . , Σn0 i and hΓ10 , . . . , Γm 0 i are =↓n,m -satisfiable [resp. 6=↓n,m - satisfiable] at T , u. Definition 2. We say that a data tree T is ↓-saturated if for every n, m ∈ N, every pair of tuples hΣ1 , . . . , Σn i and hΓ1 , . . . , Γm i of sets of XPath↓= -formulas, every u ∈ T , and ? ∈ {=, 6=}, the following is true: if hΣ1 , . . . , Σn i and hΓ1 , . . . , Γm i are ?↓n,m -finitely satisfiable at T , u then hΣ1 , . . . , Σn i and hΓ1 , . . . , Γm i are ?↓n,m -satisfiable at T , u. Proposition 3. For ↓-saturated data trees, bisimulation coincides with logical equivalence. Quasi-ultraproducts We introduce the notion of quasi-ultraproduct, a variant of the usual notion of first-order model theory, which will be needed for the definability theorems. Some of our results for quasi-ultraproducts make use of the fundamental theorem of ultraproducts (see e.g. [2, Thm. 4.1.9]). Definition 4. Suppose (Ti , ui )i∈I is a family of pointed data trees, U is an ul- trafilter over I, T ∗ is the ultraproduct of (Ti , ui )i∈I , and u∗ is the ultralimit of (ui )i∈I . The ↓-quasi ultraproduct of (Ti , ui )i∈I modulo U is the pointed data tree (T ∗ |u∗ , u∗ ), where T ∗ |u∗ denotes the subtree of T ∗ induced by al the descen- dants of u∗ . As a particular case one has the notion of ↓-quasi ultrapower. 4 Definability Definability theorems address the question of which properties of models can be defined via formulas of the logic. If K is a class of pointed data trees, we denote its complement by K. Theorem 5. Let K be a class of pointed data trees. Then K is definable by a set of XPath↓= -formulas iff K is closed under ↓-bisimulations and ↓-quasi ultra- products, and K is closed under ↓-quasi ultrapowers. Theorem 6. Let K be a class of pointed data trees. Then K is definable by an XPath↓= -formula iff both K and K are closed under ↓-bisimulations and ↓-quasi ultraproducts. The notion of `-bisimulation is a restricted version of ↓-bisimulations. It has been shown to coincide with the notion of `-equivalence, which informally means indistinguishable by XPath↓= formulas that cannot “see” beyond ` ‘child’-steps from the current point of evaluation. Theorem 7. Let K be a class of pointed data trees. Then K is definable by a formula of XPath↓= iff K is closed by `-bisimulations for XPath↓= for some `. 5 Separation Separation theorem provide conditions under which two disjoint classes of pointed models can be separated by a class definable in the logic. Theorem 8. Let K1 and K2 be two disjoint classes of pointed data trees such that K1 is closed under ↓-bisimulations and ↓-quasi ultraproducts and K2 is closed under ↓-bisimulations and ↓-quasi ultrapowers. Then there exists a third class K which is definable by a set of XPath↓= -formulas, contains K1 and is disjoint from K2 . Theorem 9. Let K1 and K2 be two disjoint classes of pointed data trees closed under ↓-bisimulations and ↓-quasi ultraproducts. Then there exists a third class K which is definable by an XPath↓= -formula, contains K1 and is disjoint from K2 . References 1. Sergio Abriola, Marı́a Emilia Descotte, and Santiago Figueira. Definability for down- ward and vertical Xpath on data trees. In Logic, Language, Information, and Com- putation - 21st International Workshop, WoLLIC 2014, Valparaı́so, Chile, Septem- ber 1-4, 2014. Proceedings, pages 20–35, 2014. 2. C.C. Chang and H.J. Keisler. Model theory. Studies in logic and the foundations of mathematics. North-Holland, 1990. 3. J. Clark and S. DeRose. XML path language (XPath). Website, 1999. W3C Recommendation. http://www.w3.org/TR/xpath. 4. D. Figueira, S. Figueira, and C. Areces. Basic model theory of XPath on data trees. In ICDT, pages 50–60, 2014.