Querying XML With Lambda Calculi Pavel Loupal Dept. of Computer Science, FEE CTU Prague, Karlovo nám. 13, 121 35 Prague, Czech Republic loupalp@fel.cvut.cz Abstract way. The goal of its authors is to implement some of W3C standards – XML, XPath, XSLT – in the Scheme The aim of this paper is to outline an uni- language [11]. SXML takes in account only first ver- form functional approach useful both for con- sions of these specifications but not the latest drafts. structing query languages for XML and also The main components of the project are SSAX, SXML, for formal description of their semantics. This SXPath, and SXSLT [13, 14]. framework offers an alternative way to to- The key idea is the usage of S-expressions. S- day’s mainstream query languages – XPath expressions are known from the LISP language where and XQuery. With respect to the goal of the there can be either single objects such as numbers, VLDB PhD Workshop it is focused more on LISP atoms including the special atoms or pairs. The the concept than on the complete solution. framework claims its fully conformant to XML speci- fication. Regarding to [13] there is a formal algorithm 1 XML and Query Languages how to rewrite XPath queries to SXPath expressions with identical semantics. SXSLT is apparently an im- No doubt XML [4] is nowadays moving the world and plementation of XSLT 1.0 semantics. Unfortunatelly thus not surprisingly it might be found almost every- from published papers it is not exactly obvious that where. Querying XML data is the essential operation the full XSLT standard was implemented, authors only for many algorithms and therefore many researchers mention their experiments with real XML documents. focus on this field. Mainstream query languages – XPath [5, 7], used for locating nodes in an XML in- stance, and XQuery 1.0 [3] (more or less a SQL–like 2 Motivation language) – are the results developed by the wide com- Evolution of a query language usually brings some munity of contributors. changes in its syntax or/and the semantics. For ex- The XML specification itself does not define any ample, the progress from XPath 1.0 to XPath 2.0 (i.e. data model for XML instances therefore successive sub–specification of XQuery 1.0) brought a new data query languages had to define a theoretical base for model for XML. It was a logical and necessary step their syntax and semantics. XPath 1.0 is based on the forced by requirements given for the new language. XML Infoset [8], XQuery 1.0 grounds on a newly de- Our idea is to use such formalism (formal system) as a veloped ”XQuery 1.0 and XPath 2.0 Data Model” [12]. base for the language that is universal enough to avoid Based upon this data model its semantics is formally big changes when enriching the language with new con- described in [10]. structs. Therefore the idea of a functional approach There are various different query languages for combined together with basic lambda calculi opera- XML – especially those used as predecessors of XQuery tions (lambda abstractions and lambda applications) – e.g. Lorel [1], Quilt [6] or XML-QL [9]. seems to be very simple and not restrictive, rather it From our perspective one of the most interesting allows to be easily extended in the future. languages is the SXML framework. It seems to be the Our work tries to link to the idea of the XML-λ only framework for manipulating XML in a functional Framework (see Section 3), explore it further and study its various aspects from different points of view. c 2006 for the individual paper by the paper’s authors. Copying permitted for private and scientific purposes. Re-publication Its author already identified some issues and draw- of material on this page requires permission by the copyright backs of the framework. We see one of the biggest owners. issues the missing comparison of the proposed query Proceedings of the VLDB2006 Ph.D. Workshop, language with the XQuery language. The question of Seoul, Korea, 2006 expressive power of these two languages is very chal- lenging because these approaches represent different • zero or one: T ? is a type over B. formal systems and seems to be not solved yet. To lay out more specific goals for our consecu- 4. alternative: Let T1 and T2 be types. Then (T1 |T2 ) tive work we formulate the following aims: (a) Uti- is a type over B. lize the XML-λ Framework for formal description of XQuery semantics, (b) Construct a XQuery-to-XML-λ 5. sequence: Let T1 , . . . , Tn be types. Then query compiler (and vice–versa) and compare expres- (T1 , . . . , Tn ) is a type over B. sive power of XQuery and XML-λ query languages, (c) Evaluate functional approach for various aspects 6. named type: Let T be a type given by a step from related to querying XML. 3.-5. Let tag ∈ N AM E. Then tag : T is a type of B. 3 The XML-λ Framework Note here that this type system Treg is still suitable XML-λ framework is an idea of using functional ap- to describe types not only in a concrete data structure proach together with lambda calculi [2] and utilize it (in our case tree structure) but for all data that might for querying XML data. The proposal was originally be tagged with name - for example data accordant to developed by Pokorný [16, 17]. the relational model. The following sections outline the theoretical base Having the Treg type system we have to extend it of the XML-λ Framework and the query language built to be able to work with XML data. We build the upon it. In Section 3.3 we show an example of a query type system TE induced by Treg . Key idea is to define evaluation. abstract elements that are particular XML elements with some content and also define a set containing all 3.1 Concept abstract elements within an XML instance – E . Type System TE . Let Treg over base B be a type The type system is built on base B – a set contain- system from definition 3.1 and E is the set of abstract ing a finite number of base types S1 , . . . , Sk (k ≥ 1). elements. Then type system TE induced by Treg is the Type hierarchy is then created by following inductive least set containing type given by this rule: definition: Type System T . Let B is a set of primitive types • Let tag : T ∈ Treg . Then T AG : T is a member of S1 . . . Sn , n ≥ 1. Type System T over base B is the TE . (Replacement of all tags in tag : T by upper least set containing types given by 1.-4. case version) 1. base type: each member of B is type over B Note one remark: we can see the type system TE 2. functional type: if T1 and T2 are types over B, as function container that adds functions for extract- then (T1 → T2 ) is also a type over B ing data values from elements (via abstractions and projections) with two ways 3. n-tuple type: if T1 , . . . , Tn (n ≥ 1) are types over B, then (T1 , . . . , Tn ) is type over B • simple element: if tag : String ∈ Treg , then (E → tag : String) ∈ TE 4. union type: if T1 , . . . , Tn (n ≥ 1) are types over B, then (T1 + . . . + Tn ) is type over B • compound element: if tag : T ∈ Treg , then (E → T 0 ) ∈ TE Subsequently we define a regular type system Treg that extends type system T with regular constructs: Type System Treg . Let B= {String, Bool}, let 3.2 XML-λ Query Language N AM E be a set of names. Type System Treg is the A typical query has a main (body) part – an expression least set containing types given by 1.-6. to be evaluated over data – and a constructor part that wraps query result and forms the XML output. 1. Every member of the base B is an (primitive) type XML-λ’s Query Language (XLQL) is based on λ-terms over B. defined over the type system TE . 2. named character data: Let tag ∈ N AM E. Then Main constructs of the language are variables, con- tag : String is an (elementary) type over B, stants, tuples, use of projections and λ-calculus op- tag : is an (empty elementary) type over B. erations – abstractions and applications. Tagged terms might be used for declaring functions. Syn- 3. Let T be a group type or named character data. tax of this language is similar to λ-calculus expres- Then sion thus queries are structured as λ-expresions, i.e. λ . . . (λ . . . (expression) . . .). In addition, there are • zero or more: T ∗ is a type over B. also typical constructs such as logical connectives, con- • one or more: T + is a type over B. stants or comparison predicates. XML-λ Query Language is inductively defined as the least set containing all terms created by applica- tion of following rules: Jaroslav Let T, T1 , . . . , Tn , n ≥ 1 be members of T . Then Pokorny 1. variable: each variable of type T is a term of type T Karel Richta 2. constant: each constant (member of F) of type T is a term of type T XML Technology </name> 3. application: if M is a term of type ((T1 , . . . , Tn ) → <price> 155.00 </price> T ) and N1 , . . . , Nn are (in the same order) types </book> T1 , . . . , Tn , then M (N1 , . . . , Nn ) is a term of type T Figure 2: An XML instance 4. λ-abstraction: if x1 , . . . , xn are distinct vari- Upon this base we construct type system TE of func- ables of types T1 , . . . , Tn and M is a term of tions based by given XML schema (in our case DTD). type T , then λx1 , . . . , xn (M ) is a term of type See the list of types available: ((T1 , . . . , Tn ) → T ) BOOK : (AU T HOR+, T IT LE, P RICE), 5. n-tuple: if N1 , . . . , Nn are terms of types AU T HOR : (F IRST, LAST ), T1 , . . . , Tn , then (N1 , . . . , Nn ) is a term of type F IRST : String, (T1 , . . . , Tn ) LAST : String, T IT LE : String, 6. projection: if (N1 , . . . , Nn ) is a term of type P RICE : String (T1 , . . . , Tn ), then N1 , . . . , Nn are terms of types T1 , . . . , T n Now, let us have a look at the following simple query 7. tagged term: if N is a term of type N AM E and solving our example: M is a term of type T then N : T is a term of type (E → T ). λf (/book/author(a) and a/f irst = f ) Simple implementation of this language is available Evaluation of this query takes place in following in [18]. way: this query is composed of one high-level λ-term and contains two variables called a and f . Naviga- 3.3 Example of a Query tion over the structure of elements is implemented as continuous λ-applications combined with projections Following example should give more detailed view to (performed by element name). When evaluating the an evaluation of a XML-λ query. Let us consider the value of the query first the value of a is evaluated DTD and a conforming XML instance as shown in (note that /book/author(a) is syntactically equivalent Figure 1 and Figure 2 and a query that will return to author(book(a))). This term is evaluated using se- first names of all authors mentioned in the document. quent application of book and author functions. Then <!ELEMENT book (author+, the variable a contains an abstract element of type title, price)> author. We apply to this element function author that <!ELEMENT author (first, last)> returns a n-tuple of elements from E. Last step is pro- <!ELEMENT first (#PCDATA)> jection from this n-tuple into the first component. <!ELEMENT last (#PCDATA)> The result of this operation is a particular abstract <!ELEMENT title (#PCDATA)> element. <!ELEMENT price (#PCDATA)> Hereby we receive the output (without enveloping elements): Jaroslav, Karel. 3.4 Utilizing the Functional Approach Figure 1: A Document Type Definition Example In the previous sections we describe a query language Solution First we specify a set E of all abstract based on the XML-λ Framework. It uses functions elements contained in XML instance as follows (note as a data model for XML documents and lambda cal- that subscripts are here only to distinguish abstract culi abstractions and applications for development of elements of the same type): a query language suitable for querying XML. E = {book, author1 , f irst1 , last1 , author2 , f irst2 , Beside that we can also use this data model for dif- last2 , name, price} ferent purposes. Due to universality of the λ-calculus, we can construct a language for modifying XML 5 Future Work data. Generally it means performing changes (addi- Future work should lead into a successful completion tions/removals) in the set E of abstract elements for of the dissertation thesis. Therefore we need to define a particular XML instance and eventual changes in concrete tasks and formulate consecutive experiments. the type system TE . With this approach we can also The proposed dissertation thesis investigates an uti- be able to express integrity constraints for XML data. lization of the λ-calculus based framework for descrip- This research direction however lays out of scope of tion of query language semantics. We plan to formu- this paper. late both denotational and structured operational se- mantics of the XQuery language. Having this formal- 4 Relationship Between XQuery and ism available we will compare the expressive power XML-λ of these languages and further study their properties. XQuery [3] is a mainstream language for querying Our primary aim is to define the existing semantics XML. The language uses expressions as its fundamen- of XQuery in a functional way as a base for ongoing tal constructs. Basic expressions are literals and vari- experiments and then compare their expressive power ables, for navigating in XML documents it uses XPath by finding transformations between queries in XQuery expressions. Related specification [15] contains an ex- and XML-λ. tensive list of predefined numeric, arithmetic, string or There are two concrete ongoing tasks – specification date-time functions. of XQuery semantics by using the XML-λ Framework One important type of expression are the FLWOR and consecutive construction of transformation engine expressions. FLWOR is a shortcut for For, Let, Where, for converting XQuery and XML-λ queries. Order-By, Return statements that are used in the same way as in imperative languages. This type of expres- 5.1 Specification of XQuery Semantics Using sion is sometimes compared to SQL SELECT state- λ-calculus ment known from relational databases. The functional approach applied in XML-λ might be The goal of our research is to explore the relation- used not only for constructing a query language but ship between XQuery and XML-λ. Thus, in XQuery also for a different purpose – description of seman- we can write a query returning all first names in our tics of various query languages. Within our research bibliography database in following way: we want to describe the semantics of the mainstream XML query language XQuery with help of the func- <result> { tional approach based on λ-calculus – XML-λ. Tak- for $x in doc("bib.xml")/book/author ing into account the type system used we expect the return ( semantics to be easier to express in comparison with <fname> existing W3C semantics. {$x/first/text()} With respect to the scope of the XQuery semantics </fname>) specification this topic represents a complex theoreti- } </result> cal research. A query with the same semantics but written in 5.2 Bi-directional Transformation of XQuery XML-λ has (of course) different syntax as shown be- and XML-λ Queries low: Expression of the XQuery semantics with help of xmldata("bib.xml") λ-calculus gives us an opportunity to compare expres- lambda fname f sive power of XQuery and XML-λ. Because of the fact (/book/author(a) and a/first = f) that the semantics of XML-λ query language is closely related to its syntax we can propose a method for con- Both queries however return the same result: version of queries based on language’s semantics. The result of this aim is to have a transformation engine <result> and its implementation for converting XQuery queries <fname> Jaroslav </fname> in XML-λ and vice–versa available. The conversion <fname> Karel </fname> process is generally a transformation of an input se- </result> mantic tree in one language to an output semantic tree in the other language. We want to show that all queries in XQuery can be rewritten into XML-λ and vice–versa. This claim should be backed by formal description of denotational 6 Conclusion and operational semantics of both languages and by This paper outlines a framework for querying XML finding a transformation between semantic subtrees of data based on a functional approach combined with all language constructs. lambda calculi operations. The functional approach is not common nowadays but offers a universal way [7] J. Clark and S. DeRose. XML Path that might be used as a base for constructing various Language (XPath) 1.0, November 1999. query languages for XML. Beside that it can be also http://www.w3.org/TR/xpath. used for specifying semantics of these languages. Hav- ing the semantics formalized with the same tool we [8] J. Cowan and R. Tobin. XML infor- can compare their expressive power and evaluate their mation set (second edition), April 2004. relationship. http://www.w3.org/TR/2004/REC-xml-infoset- For consecutive work to lead into a dissertation the- 20040204/. sis we plan to describe semantics of XQuery using the [9] A. Deutsch, M. F. Fernandez, D. Florescu, A. Y. functional framework. With this formalism we can Levy, and D. Suciu. XML–QL: A query language construct a XQuery-to-XML-λ compiler (and vice– for XML. The Query Language Workshop (QL), versa) by transformation of respective semantic trees. Cambridge, MA, 1998. These transformations describe relationships between various language constructs. With knowledge of these [10] D. Draper, P. Fankhauser, M. Fernández, relationships we can then compare XQuery’s expres- A. Malhotra, K. Rose, M. Rys, J. Siméon, sive power with a query language based on lambda and P. Wadler. XQuery 1.0 and XPath calculi. 2.0 formal semantics, September 2005. The main contribution of the thesis should be seen http://www.w3.org/TR/xquery-semantics/. in using functional approach for description of XQuery semantics and for evaluation of properties of lambda [11] R. K. Dybvig. The Scheme Programming Lan- calculi based framework and its suitability for querying guage. The MIT Press, 2003. Available online at XML. http://www.scheme.com/tspl3/. [12] M. Fernández, A. Malhotra, J. Marsh, 7 Acknowledgements M. Nagy, and N. Walsh. XQuery 1.0 and This research has been partially supported by MŠMT XPath 2.0 Data Model, September 2005. under research program MSM 6840770014. http://www.w3.org/TR/xpath-datamodel/. References [13] O. Kiselyov. SXML specification, 2004. http://okmij.org/ftp/Scheme/SXML.html. [1] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The Lorel query language for [14] K. Lisovsky. SXPath: XPath made functional. semistructured data. International Journal on In International Lisp Conference ILC 2003, New Digital Libraries, pages 68–88, 1997. York, US, October 2003. [2] H. Barendregt. Lambda calculi with types. In [15] A. Malhotra, J. Melton, and N. Walsh. Handbook of Logic in Computer Science, Vol- XQuery 1.0 and XPath 2.0 Func- umes 1 (Background: Mathematical Structures) tions and Operators, September 2005. and 2 (Background: Computational Structures), http://www.w3.org/TR/xpath-functions/. Abramsky & Gabbay & Maibaum (Eds.), Claren- don, volume 2. 1992. [16] J. Pokorný. XML functionally. In B. C. Desai, Y. Kioki, and M. Toyama, editors, Proceedings [3] S. Boag, D. Chamberlin, M. F. Fernández, D. Flo- of IDEAS2000, pages 266–274. IEEE Comp. So- rescu, J. Robie, and J. Simeon. XQuery 1.0: ciety, 2000. An XML Query Language, September 2005. http://www.w3.org/TR/xquery/. [17] J. Pokorný. XML-λ: an extendible framework for manipulating XML data. In Proceedings of BIS [4] T. Bray, F. Yergeau, J. Cowan, J. Paoli, 2002, Poznan, 2002. C. M. Sperberg-McQueen, and E. Maler. Ex- tensible markup language (XML) 1.1, February [18] P. Šárek. Implementation of the XML lambda 2004. http://www.w3.org/TR/2004/REC-xml11- language. Master’s thesis, Dept. of Software En- 20040204/. gineering, Charles University, Prague, 2002. [5] D. Chamberlin, A. Berglund, and e. a. Scott Boag. XML Path Language (XPath) 2.0, September 2005. http://www.w3.org/TR/xpath20/. [6] D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language for heterogeneous data sources. WebDB, Lecture Notes in Computer Sci- ence, volume 1997.