Querying XML With Lambda Calculi
Pavel Loupal
Dept. of Computer Science, FEE CTU Prague,
Karlovo nám. 13, 121 35
Prague, Czech Republic
loupalp@fel.cvut.cz
Abstract way. The goal of its authors is to implement some of
W3C standards – XML, XPath, XSLT – in the Scheme
The aim of this paper is to outline an uni- language [11]. SXML takes in account only first ver-
form functional approach useful both for con- sions of these specifications but not the latest drafts.
structing query languages for XML and also The main components of the project are SSAX, SXML,
for formal description of their semantics. This SXPath, and SXSLT [13, 14].
framework offers an alternative way to to- The key idea is the usage of S-expressions. S-
day’s mainstream query languages – XPath expressions are known from the LISP language where
and XQuery. With respect to the goal of the there can be either single objects such as numbers,
VLDB PhD Workshop it is focused more on LISP atoms including the special atoms or pairs. The
the concept than on the complete solution. framework claims its fully conformant to XML speci-
fication. Regarding to [13] there is a formal algorithm
1 XML and Query Languages how to rewrite XPath queries to SXPath expressions
with identical semantics. SXSLT is apparently an im-
No doubt XML [4] is nowadays moving the world and plementation of XSLT 1.0 semantics. Unfortunatelly
thus not surprisingly it might be found almost every- from published papers it is not exactly obvious that
where. Querying XML data is the essential operation the full XSLT standard was implemented, authors only
for many algorithms and therefore many researchers mention their experiments with real XML documents.
focus on this field. Mainstream query languages –
XPath [5, 7], used for locating nodes in an XML in-
stance, and XQuery 1.0 [3] (more or less a SQL–like 2 Motivation
language) – are the results developed by the wide com- Evolution of a query language usually brings some
munity of contributors. changes in its syntax or/and the semantics. For ex-
The XML specification itself does not define any ample, the progress from XPath 1.0 to XPath 2.0 (i.e.
data model for XML instances therefore successive sub–specification of XQuery 1.0) brought a new data
query languages had to define a theoretical base for model for XML. It was a logical and necessary step
their syntax and semantics. XPath 1.0 is based on the forced by requirements given for the new language.
XML Infoset [8], XQuery 1.0 grounds on a newly de- Our idea is to use such formalism (formal system) as a
veloped ”XQuery 1.0 and XPath 2.0 Data Model” [12]. base for the language that is universal enough to avoid
Based upon this data model its semantics is formally big changes when enriching the language with new con-
described in [10]. structs. Therefore the idea of a functional approach
There are various different query languages for combined together with basic lambda calculi opera-
XML – especially those used as predecessors of XQuery tions (lambda abstractions and lambda applications)
– e.g. Lorel [1], Quilt [6] or XML-QL [9]. seems to be very simple and not restrictive, rather it
From our perspective one of the most interesting allows to be easily extended in the future.
languages is the SXML framework. It seems to be the Our work tries to link to the idea of the XML-λ
only framework for manipulating XML in a functional Framework (see Section 3), explore it further and
study its various aspects from different points of view.
c 2006 for the individual paper by the paper’s authors. Copying
permitted for private and scientific purposes. Re-publication
Its author already identified some issues and draw-
of material on this page requires permission by the copyright backs of the framework. We see one of the biggest
owners. issues the missing comparison of the proposed query
Proceedings of the VLDB2006 Ph.D. Workshop, language with the XQuery language. The question of
Seoul, Korea, 2006 expressive power of these two languages is very chal-
lenging because these approaches represent different • zero or one: T ? is a type over B.
formal systems and seems to be not solved yet.
To lay out more specific goals for our consecu- 4. alternative: Let T1 and T2 be types. Then (T1 |T2 )
tive work we formulate the following aims: (a) Uti- is a type over B.
lize the XML-λ Framework for formal description of
XQuery semantics, (b) Construct a XQuery-to-XML-λ 5. sequence: Let T1 , . . . , Tn be types. Then
query compiler (and vice–versa) and compare expres- (T1 , . . . , Tn ) is a type over B.
sive power of XQuery and XML-λ query languages,
(c) Evaluate functional approach for various aspects 6. named type: Let T be a type given by a step from
related to querying XML. 3.-5. Let tag ∈ N AM E. Then tag : T is a type
of B.
3 The XML-λ Framework Note here that this type system Treg is still suitable
XML-λ framework is an idea of using functional ap- to describe types not only in a concrete data structure
proach together with lambda calculi [2] and utilize it (in our case tree structure) but for all data that might
for querying XML data. The proposal was originally be tagged with name - for example data accordant to
developed by Pokorný [16, 17]. the relational model.
The following sections outline the theoretical base Having the Treg type system we have to extend it
of the XML-λ Framework and the query language built to be able to work with XML data. We build the
upon it. In Section 3.3 we show an example of a query type system TE induced by Treg . Key idea is to define
evaluation. abstract elements that are particular XML elements
with some content and also define a set containing all
3.1 Concept abstract elements within an XML instance – E .
Type System TE . Let Treg over base B be a type
The type system is built on base B – a set contain-
system from definition 3.1 and E is the set of abstract
ing a finite number of base types S1 , . . . , Sk (k ≥ 1).
elements. Then type system TE induced by Treg is the
Type hierarchy is then created by following inductive
least set containing type given by this rule:
definition:
Type System T . Let B is a set of primitive types
• Let tag : T ∈ Treg . Then T AG : T is a member of
S1 . . . Sn , n ≥ 1. Type System T over base B is the
TE . (Replacement of all tags in tag : T by upper
least set containing types given by 1.-4.
case version)
1. base type: each member of B is type over B
Note one remark: we can see the type system TE
2. functional type: if T1 and T2 are types over B, as function container that adds functions for extract-
then (T1 → T2 ) is also a type over B ing data values from elements (via abstractions and
projections) with two ways
3. n-tuple type: if T1 , . . . , Tn (n ≥ 1) are types over
B, then (T1 , . . . , Tn ) is type over B • simple element: if tag : String ∈ Treg , then (E →
tag : String) ∈ TE
4. union type: if T1 , . . . , Tn (n ≥ 1) are types over B,
then (T1 + . . . + Tn ) is type over B • compound element: if tag : T ∈ Treg , then (E →
T 0 ) ∈ TE
Subsequently we define a regular type system Treg that
extends type system T with regular constructs:
Type System Treg . Let B= {String, Bool}, let 3.2 XML-λ Query Language
N AM E be a set of names. Type System Treg is the A typical query has a main (body) part – an expression
least set containing types given by 1.-6. to be evaluated over data – and a constructor part
that wraps query result and forms the XML output.
1. Every member of the base B is an (primitive) type
XML-λ’s Query Language (XLQL) is based on λ-terms
over B.
defined over the type system TE .
2. named character data: Let tag ∈ N AM E. Then Main constructs of the language are variables, con-
tag : String is an (elementary) type over B, stants, tuples, use of projections and λ-calculus op-
tag : is an (empty elementary) type over B. erations – abstractions and applications. Tagged
terms might be used for declaring functions. Syn-
3. Let T be a group type or named character data. tax of this language is similar to λ-calculus expres-
Then sion thus queries are structured as λ-expresions, i.e.
λ . . . (λ . . . (expression) . . .). In addition, there are
• zero or more: T ∗ is a type over B. also typical constructs such as logical connectives, con-
• one or more: T + is a type over B. stants or comparison predicates.
XML-λ Query Language is inductively defined as
the least set containing all terms created by applica-
tion of following rules: Jaroslav
Let T, T1 , . . . , Tn , n ≥ 1 be members of T . Then Pokorny
1. variable: each variable of type T is a term of type
T Karel
Richta
2. constant: each constant (member of F) of type T
is a term of type T XML Technology
3. application: if M is a term of type ((T1 , . . . , Tn ) → 155.00
T ) and N1 , . . . , Nn are (in the same order) types
T1 , . . . , Tn , then M (N1 , . . . , Nn ) is a term of type
T Figure 2: An XML instance
4. λ-abstraction: if x1 , . . . , xn are distinct vari- Upon this base we construct type system TE of func-
ables of types T1 , . . . , Tn and M is a term of tions based by given XML schema (in our case DTD).
type T , then λx1 , . . . , xn (M ) is a term of type See the list of types available:
((T1 , . . . , Tn ) → T )
BOOK : (AU T HOR+, T IT LE, P RICE),
5. n-tuple: if N1 , . . . , Nn are terms of types AU T HOR : (F IRST, LAST ),
T1 , . . . , Tn , then (N1 , . . . , Nn ) is a term of type F IRST : String,
(T1 , . . . , Tn ) LAST : String,
T IT LE : String,
6. projection: if (N1 , . . . , Nn ) is a term of type
P RICE : String
(T1 , . . . , Tn ), then N1 , . . . , Nn are terms of types
T1 , . . . , T n
Now, let us have a look at the following simple query
7. tagged term: if N is a term of type N AM E and solving our example:
M is a term of type T then N : T is a term of
type (E → T ). λf (/book/author(a) and a/f irst = f )
Simple implementation of this language is available Evaluation of this query takes place in following
in [18]. way: this query is composed of one high-level λ-term
and contains two variables called a and f . Naviga-
3.3 Example of a Query tion over the structure of elements is implemented as
continuous λ-applications combined with projections
Following example should give more detailed view to
(performed by element name). When evaluating the
an evaluation of a XML-λ query. Let us consider the
value of the query first the value of a is evaluated
DTD and a conforming XML instance as shown in
(note that /book/author(a) is syntactically equivalent
Figure 1 and Figure 2 and a query that will return
to author(book(a))). This term is evaluated using se-
first names of all authors mentioned in the document.
quent application of book and author functions. Then
author. We apply to this element function author that
returns a n-tuple of elements from E. Last step is pro-
jection from this n-tuple into the first component.
The result of this operation is a particular abstract
element.
Hereby we receive the output (without enveloping
elements): Jaroslav, Karel.
3.4 Utilizing the Functional Approach
Figure 1: A Document Type Definition Example
In the previous sections we describe a query language
Solution First we specify a set E of all abstract based on the XML-λ Framework. It uses functions
elements contained in XML instance as follows (note as a data model for XML documents and lambda cal-
that subscripts are here only to distinguish abstract culi abstractions and applications for development of
elements of the same type): a query language suitable for querying XML.
E = {book, author1 , f irst1 , last1 , author2 , f irst2 , Beside that we can also use this data model for dif-
last2 , name, price} ferent purposes. Due to universality of the λ-calculus,
we can construct a language for modifying XML 5 Future Work
data. Generally it means performing changes (addi-
Future work should lead into a successful completion
tions/removals) in the set E of abstract elements for
of the dissertation thesis. Therefore we need to define
a particular XML instance and eventual changes in
concrete tasks and formulate consecutive experiments.
the type system TE . With this approach we can also
The proposed dissertation thesis investigates an uti-
be able to express integrity constraints for XML data.
lization of the λ-calculus based framework for descrip-
This research direction however lays out of scope of
tion of query language semantics. We plan to formu-
this paper.
late both denotational and structured operational se-
mantics of the XQuery language. Having this formal-
4 Relationship Between XQuery and ism available we will compare the expressive power
XML-λ of these languages and further study their properties.
XQuery [3] is a mainstream language for querying Our primary aim is to define the existing semantics
XML. The language uses expressions as its fundamen- of XQuery in a functional way as a base for ongoing
tal constructs. Basic expressions are literals and vari- experiments and then compare their expressive power
ables, for navigating in XML documents it uses XPath by finding transformations between queries in XQuery
expressions. Related specification [15] contains an ex- and XML-λ.
tensive list of predefined numeric, arithmetic, string or There are two concrete ongoing tasks – specification
date-time functions. of XQuery semantics by using the XML-λ Framework
One important type of expression are the FLWOR and consecutive construction of transformation engine
expressions. FLWOR is a shortcut for For, Let, Where, for converting XQuery and XML-λ queries.
Order-By, Return statements that are used in the same
way as in imperative languages. This type of expres- 5.1 Specification of XQuery Semantics Using
sion is sometimes compared to SQL SELECT state- λ-calculus
ment known from relational databases. The functional approach applied in XML-λ might be
The goal of our research is to explore the relation- used not only for constructing a query language but
ship between XQuery and XML-λ. Thus, in XQuery also for a different purpose – description of seman-
we can write a query returning all first names in our tics of various query languages. Within our research
bibliography database in following way: we want to describe the semantics of the mainstream
XML query language XQuery with help of the func-
{ tional approach based on λ-calculus – XML-λ. Tak-
for $x in doc("bib.xml")/book/author ing into account the type system used we expect the
return ( semantics to be easier to express in comparison with
existing W3C semantics.
{$x/first/text()} With respect to the scope of the XQuery semantics
) specification this topic represents a complex theoreti-
} cal research.
A query with the same semantics but written in 5.2 Bi-directional Transformation of XQuery
XML-λ has (of course) different syntax as shown be- and XML-λ Queries
low:
Expression of the XQuery semantics with help of
xmldata("bib.xml") λ-calculus gives us an opportunity to compare expres-
lambda fname f sive power of XQuery and XML-λ. Because of the fact
(/book/author(a) and a/first = f) that the semantics of XML-λ query language is closely
related to its syntax we can propose a method for con-
Both queries however return the same result: version of queries based on language’s semantics. The
result of this aim is to have a transformation engine
and its implementation for converting XQuery queries
Jaroslav in XML-λ and vice–versa available. The conversion
Karel process is generally a transformation of an input se-
mantic tree in one language to an output semantic
tree in the other language.
We want to show that all queries in XQuery can
be rewritten into XML-λ and vice–versa. This claim
should be backed by formal description of denotational
6 Conclusion
and operational semantics of both languages and by This paper outlines a framework for querying XML
finding a transformation between semantic subtrees of data based on a functional approach combined with
all language constructs. lambda calculi operations. The functional approach
is not common nowadays but offers a universal way [7] J. Clark and S. DeRose. XML Path
that might be used as a base for constructing various Language (XPath) 1.0, November 1999.
query languages for XML. Beside that it can be also http://www.w3.org/TR/xpath.
used for specifying semantics of these languages. Hav-
ing the semantics formalized with the same tool we [8] J. Cowan and R. Tobin. XML infor-
can compare their expressive power and evaluate their mation set (second edition), April 2004.
relationship. http://www.w3.org/TR/2004/REC-xml-infoset-
For consecutive work to lead into a dissertation the- 20040204/.
sis we plan to describe semantics of XQuery using the [9] A. Deutsch, M. F. Fernandez, D. Florescu, A. Y.
functional framework. With this formalism we can Levy, and D. Suciu. XML–QL: A query language
construct a XQuery-to-XML-λ compiler (and vice– for XML. The Query Language Workshop (QL),
versa) by transformation of respective semantic trees. Cambridge, MA, 1998.
These transformations describe relationships between
various language constructs. With knowledge of these [10] D. Draper, P. Fankhauser, M. Fernández,
relationships we can then compare XQuery’s expres- A. Malhotra, K. Rose, M. Rys, J. Siméon,
sive power with a query language based on lambda and P. Wadler. XQuery 1.0 and XPath
calculi. 2.0 formal semantics, September 2005.
The main contribution of the thesis should be seen http://www.w3.org/TR/xquery-semantics/.
in using functional approach for description of XQuery
semantics and for evaluation of properties of lambda [11] R. K. Dybvig. The Scheme Programming Lan-
calculi based framework and its suitability for querying guage. The MIT Press, 2003. Available online at
XML. http://www.scheme.com/tspl3/.
[12] M. Fernández, A. Malhotra, J. Marsh,
7 Acknowledgements M. Nagy, and N. Walsh. XQuery 1.0 and
This research has been partially supported by MŠMT XPath 2.0 Data Model, September 2005.
under research program MSM 6840770014. http://www.w3.org/TR/xpath-datamodel/.
References [13] O. Kiselyov. SXML specification, 2004.
http://okmij.org/ftp/Scheme/SXML.html.
[1] S. Abiteboul, D. Quass, J. McHugh, J. Widom,
and J. L. Wiener. The Lorel query language for [14] K. Lisovsky. SXPath: XPath made functional.
semistructured data. International Journal on In International Lisp Conference ILC 2003, New
Digital Libraries, pages 68–88, 1997. York, US, October 2003.
[2] H. Barendregt. Lambda calculi with types. In [15] A. Malhotra, J. Melton, and N. Walsh.
Handbook of Logic in Computer Science, Vol- XQuery 1.0 and XPath 2.0 Func-
umes 1 (Background: Mathematical Structures) tions and Operators, September 2005.
and 2 (Background: Computational Structures), http://www.w3.org/TR/xpath-functions/.
Abramsky & Gabbay & Maibaum (Eds.), Claren-
don, volume 2. 1992. [16] J. Pokorný. XML functionally. In B. C. Desai,
Y. Kioki, and M. Toyama, editors, Proceedings
[3] S. Boag, D. Chamberlin, M. F. Fernández, D. Flo- of IDEAS2000, pages 266–274. IEEE Comp. So-
rescu, J. Robie, and J. Simeon. XQuery 1.0: ciety, 2000.
An XML Query Language, September 2005.
http://www.w3.org/TR/xquery/. [17] J. Pokorný. XML-λ: an extendible framework for
manipulating XML data. In Proceedings of BIS
[4] T. Bray, F. Yergeau, J. Cowan, J. Paoli, 2002, Poznan, 2002.
C. M. Sperberg-McQueen, and E. Maler. Ex-
tensible markup language (XML) 1.1, February [18] P. Šárek. Implementation of the XML lambda
2004. http://www.w3.org/TR/2004/REC-xml11- language. Master’s thesis, Dept. of Software En-
20040204/. gineering, Charles University, Prague, 2002.
[5] D. Chamberlin, A. Berglund, and e. a. Scott Boag.
XML Path Language (XPath) 2.0, September
2005. http://www.w3.org/TR/xpath20/.
[6] D. Chamberlin, J. Robie, and D. Florescu. Quilt:
An XML query language for heterogeneous data
sources. WebDB, Lecture Notes in Computer Sci-
ence, volume 1997.