=Paper= {{Paper |id=Vol-170/paper-9 |storemode=property |title=Querying XML With Lambda Calculi |pdfUrl=https://ceur-ws.org/Vol-170/paper8.pdf |volume=Vol-170 }} ==Querying XML With Lambda Calculi== https://ceur-ws.org/Vol-170/paper8.pdf
                      Querying XML With Lambda Calculi

                                                         Pavel Loupal
                                      Dept. of Computer Science, FEE CTU Prague,
                                                Karlovo nám. 13, 121 35
                                                 Prague, Czech Republic
                                                   loupalp@fel.cvut.cz



                        Abstract                                   way. The goal of its authors is to implement some of
                                                                   W3C standards – XML, XPath, XSLT – in the Scheme
    The aim of this paper is to outline an uni-                    language [11]. SXML takes in account only first ver-
    form functional approach useful both for con-                  sions of these specifications but not the latest drafts.
    structing query languages for XML and also                     The main components of the project are SSAX, SXML,
    for formal description of their semantics. This                SXPath, and SXSLT [13, 14].
    framework offers an alternative way to to-                        The key idea is the usage of S-expressions. S-
    day’s mainstream query languages – XPath                       expressions are known from the LISP language where
    and XQuery. With respect to the goal of the                    there can be either single objects such as numbers,
    VLDB PhD Workshop it is focused more on                        LISP atoms including the special atoms or pairs. The
    the concept than on the complete solution.                     framework claims its fully conformant to XML speci-
                                                                   fication. Regarding to [13] there is a formal algorithm
1    XML and Query Languages                                       how to rewrite XPath queries to SXPath expressions
                                                                   with identical semantics. SXSLT is apparently an im-
No doubt XML [4] is nowadays moving the world and                  plementation of XSLT 1.0 semantics. Unfortunatelly
thus not surprisingly it might be found almost every-              from published papers it is not exactly obvious that
where. Querying XML data is the essential operation                the full XSLT standard was implemented, authors only
for many algorithms and therefore many researchers                 mention their experiments with real XML documents.
focus on this field. Mainstream query languages –
XPath [5, 7], used for locating nodes in an XML in-
stance, and XQuery 1.0 [3] (more or less a SQL–like                2   Motivation
language) – are the results developed by the wide com-             Evolution of a query language usually brings some
munity of contributors.                                            changes in its syntax or/and the semantics. For ex-
   The XML specification itself does not define any                ample, the progress from XPath 1.0 to XPath 2.0 (i.e.
data model for XML instances therefore successive                  sub–specification of XQuery 1.0) brought a new data
query languages had to define a theoretical base for               model for XML. It was a logical and necessary step
their syntax and semantics. XPath 1.0 is based on the              forced by requirements given for the new language.
XML Infoset [8], XQuery 1.0 grounds on a newly de-                 Our idea is to use such formalism (formal system) as a
veloped ”XQuery 1.0 and XPath 2.0 Data Model” [12].                base for the language that is universal enough to avoid
Based upon this data model its semantics is formally               big changes when enriching the language with new con-
described in [10].                                                 structs. Therefore the idea of a functional approach
   There are various different query languages for                 combined together with basic lambda calculi opera-
XML – especially those used as predecessors of XQuery              tions (lambda abstractions and lambda applications)
– e.g. Lorel [1], Quilt [6] or XML-QL [9].                         seems to be very simple and not restrictive, rather it
   From our perspective one of the most interesting                allows to be easily extended in the future.
languages is the SXML framework. It seems to be the                   Our work tries to link to the idea of the XML-λ
only framework for manipulating XML in a functional                Framework (see Section 3), explore it further and
                                                                   study its various aspects from different points of view.
 c 2006 for the individual paper by the paper’s authors. Copying
permitted for private and scientific purposes. Re-publication
                                                                   Its author already identified some issues and draw-
of material on this page requires permission by the copyright      backs of the framework. We see one of the biggest
owners.                                                            issues the missing comparison of the proposed query
Proceedings of the VLDB2006 Ph.D. Workshop,                        language with the XQuery language. The question of
Seoul, Korea, 2006                                                 expressive power of these two languages is very chal-
lenging because these approaches represent different                      • zero or one: T ? is a type over B.
formal systems and seems to be not solved yet.
   To lay out more specific goals for our consecu-                 4. alternative: Let T1 and T2 be types. Then (T1 |T2 )
tive work we formulate the following aims: (a) Uti-                   is a type over B.
lize the XML-λ Framework for formal description of
XQuery semantics, (b) Construct a XQuery-to-XML-λ                  5. sequence: Let T1 , . . . , Tn be types.           Then
query compiler (and vice–versa) and compare expres-                   (T1 , . . . , Tn ) is a type over B.
sive power of XQuery and XML-λ query languages,
(c) Evaluate functional approach for various aspects               6. named type: Let T be a type given by a step from
related to querying XML.                                              3.-5. Let tag ∈ N AM E. Then tag : T is a type
                                                                      of B.
3      The XML-λ Framework                                           Note here that this type system Treg is still suitable
XML-λ framework is an idea of using functional ap-                to describe types not only in a concrete data structure
proach together with lambda calculi [2] and utilize it            (in our case tree structure) but for all data that might
for querying XML data. The proposal was originally                be tagged with name - for example data accordant to
developed by Pokorný [16, 17].                                   the relational model.
   The following sections outline the theoretical base               Having the Treg type system we have to extend it
of the XML-λ Framework and the query language built               to be able to work with XML data. We build the
upon it. In Section 3.3 we show an example of a query             type system TE induced by Treg . Key idea is to define
evaluation.                                                       abstract elements that are particular XML elements
                                                                  with some content and also define a set containing all
3.1     Concept                                                   abstract elements within an XML instance – E .
                                                                     Type System TE . Let Treg over base B be a type
The type system is built on base B – a set contain-
                                                                  system from definition 3.1 and E is the set of abstract
ing a finite number of base types S1 , . . . , Sk (k ≥ 1).
                                                                  elements. Then type system TE induced by Treg is the
Type hierarchy is then created by following inductive
                                                                  least set containing type given by this rule:
definition:
   Type System T . Let B is a set of primitive types
                                                                    • Let tag : T ∈ Treg . Then T AG : T is a member of
S1 . . . Sn , n ≥ 1. Type System T over base B is the
                                                                      TE . (Replacement of all tags in tag : T by upper
least set containing types given by 1.-4.
                                                                      case version)
    1. base type: each member of B is type over B
                                                                     Note one remark: we can see the type system TE
    2. functional type: if T1 and T2 are types over B,            as function container that adds functions for extract-
       then (T1 → T2 ) is also a type over B                      ing data values from elements (via abstractions and
                                                                  projections) with two ways
    3. n-tuple type: if T1 , . . . , Tn (n ≥ 1) are types over
       B, then (T1 , . . . , Tn ) is type over B                    • simple element: if tag : String ∈ Treg , then (E →
                                                                      tag : String) ∈ TE
    4. union type: if T1 , . . . , Tn (n ≥ 1) are types over B,
       then (T1 + . . . + Tn ) is type over B                       • compound element: if tag : T ∈ Treg , then (E →
                                                                      T 0 ) ∈ TE
Subsequently we define a regular type system Treg that
extends type system T with regular constructs:
   Type System Treg . Let B= {String, Bool}, let                  3.2   XML-λ Query Language
N AM E be a set of names. Type System Treg is the                 A typical query has a main (body) part – an expression
least set containing types given by 1.-6.                         to be evaluated over data – and a constructor part
                                                                  that wraps query result and forms the XML output.
    1. Every member of the base B is an (primitive) type
                                                                  XML-λ’s Query Language (XLQL) is based on λ-terms
       over B.
                                                                  defined over the type system TE .
    2. named character data: Let tag ∈ N AM E. Then                   Main constructs of the language are variables, con-
       tag : String is an (elementary) type over B,               stants, tuples, use of projections and λ-calculus op-
       tag : is an (empty elementary) type over B.                erations – abstractions and applications. Tagged
                                                                  terms might be used for declaring functions. Syn-
    3. Let T be a group type or named character data.             tax of this language is similar to λ-calculus expres-
       Then                                                       sion thus queries are structured as λ-expresions, i.e.
                                                                  λ . . . (λ . . . (expression) . . .). In addition, there are
          • zero or more: T ∗ is a type over B.                   also typical constructs such as logical connectives, con-
          • one or more: T + is a type over B.                    stants or comparison predicates.
   XML-λ Query Language is inductively defined as                     
the least set containing all terms created by applica-                  
tion of following rules:                                                   Jaroslav 
   Let T, T1 , . . . , Tn , n ≥ 1 be members of T . Then                   Pokorny 
                                                                        
 1. variable: each variable of type T is a term of type                 
    T                                                                      Karel 
                                                                           Richta 
 2. constant: each constant (member of F) of type T                     
    is a term of type T                                                  XML Technology </name>
 3. application: if M is a term of type ((T1 , . . . , Tn ) →           <price> 155.00 </price>
    T ) and N1 , . . . , Nn are (in the same order) types             </book>
    T1 , . . . , Tn , then M (N1 , . . . , Nn ) is a term of type
    T                                                                            Figure 2: An XML instance
 4. λ-abstraction: if x1 , . . . , xn are distinct vari-               Upon this base we construct type system TE of func-
    ables of types T1 , . . . , Tn and M is a term of               tions based by given XML schema (in our case DTD).
    type T , then λx1 , . . . , xn (M ) is a term of type           See the list of types available:
    ((T1 , . . . , Tn ) → T )
                                                                    BOOK : (AU T HOR+, T IT LE, P RICE),
 5. n-tuple:         if N1 , . . . , Nn are terms of types          AU T HOR : (F IRST, LAST ),
    T1 , . . . , Tn , then (N1 , . . . , Nn ) is a term of type     F IRST : String,
    (T1 , . . . , Tn )                                              LAST : String,
                                                                    T IT LE : String,
 6. projection: if (N1 , . . . , Nn ) is a term of type
                                                                    P RICE : String
    (T1 , . . . , Tn ), then N1 , . . . , Nn are terms of types
    T1 , . . . , T n
                                                                       Now, let us have a look at the following simple query
 7. tagged term: if N is a term of type N AM E and                  solving our example:
    M is a term of type T then N : T is a term of
    type (E → T ).                                                        λf (/book/author(a) and       a/f irst = f )

   Simple implementation of this language is available                 Evaluation of this query takes place in following
in [18].                                                            way: this query is composed of one high-level λ-term
                                                                    and contains two variables called a and f . Naviga-
3.3   Example of a Query                                            tion over the structure of elements is implemented as
                                                                    continuous λ-applications combined with projections
Following example should give more detailed view to
                                                                    (performed by element name). When evaluating the
an evaluation of a XML-λ query. Let us consider the
                                                                    value of the query first the value of a is evaluated
DTD and a conforming XML instance as shown in
                                                                    (note that /book/author(a) is syntactically equivalent
Figure 1 and Figure 2 and a query that will return
                                                                    to author(book(a))). This term is evaluated using se-
first names of all authors mentioned in the document.
                                                                    quent application of book and author functions. Then
   <!ELEMENT book (author+,                                         the variable a contains an abstract element of type
                       title, price)>                               author. We apply to this element function author that
   <!ELEMENT author (first, last)>                                  returns a n-tuple of elements from E. Last step is pro-
   <!ELEMENT first (#PCDATA)>                                       jection from this n-tuple into the first component.
   <!ELEMENT last (#PCDATA)>                                        The result of this operation is a particular abstract
   <!ELEMENT title (#PCDATA)>                                       element.
   <!ELEMENT price (#PCDATA)>                                          Hereby we receive the output (without enveloping
                                                                    elements): Jaroslav, Karel.

                                                                    3.4   Utilizing the Functional Approach
  Figure 1: A Document Type Definition Example
                                                                    In the previous sections we describe a query language
   Solution First we specify a set E of all abstract                based on the XML-λ Framework. It uses functions
elements contained in XML instance as follows (note                 as a data model for XML documents and lambda cal-
that subscripts are here only to distinguish abstract               culi abstractions and applications for development of
elements of the same type):                                         a query language suitable for querying XML.
E = {book, author1 , f irst1 , last1 , author2 , f irst2 ,             Beside that we can also use this data model for dif-
last2 , name, price}                                                ferent purposes. Due to universality of the λ-calculus,
we can construct a language for modifying XML               5     Future Work
data. Generally it means performing changes (addi-
                                                            Future work should lead into a successful completion
tions/removals) in the set E of abstract elements for
                                                            of the dissertation thesis. Therefore we need to define
a particular XML instance and eventual changes in
                                                            concrete tasks and formulate consecutive experiments.
the type system TE . With this approach we can also
                                                               The proposed dissertation thesis investigates an uti-
be able to express integrity constraints for XML data.
                                                            lization of the λ-calculus based framework for descrip-
This research direction however lays out of scope of
                                                            tion of query language semantics. We plan to formu-
this paper.
                                                            late both denotational and structured operational se-
                                                            mantics of the XQuery language. Having this formal-
4    Relationship Between XQuery and                        ism available we will compare the expressive power
     XML-λ                                                  of these languages and further study their properties.
XQuery [3] is a mainstream language for querying            Our primary aim is to define the existing semantics
XML. The language uses expressions as its fundamen-         of XQuery in a functional way as a base for ongoing
tal constructs. Basic expressions are literals and vari-    experiments and then compare their expressive power
ables, for navigating in XML documents it uses XPath        by finding transformations between queries in XQuery
expressions. Related specification [15] contains an ex-     and XML-λ.
tensive list of predefined numeric, arithmetic, string or      There are two concrete ongoing tasks – specification
date-time functions.                                        of XQuery semantics by using the XML-λ Framework
   One important type of expression are the FLWOR           and consecutive construction of transformation engine
expressions. FLWOR is a shortcut for For, Let, Where,       for converting XQuery and XML-λ queries.
Order-By, Return statements that are used in the same
way as in imperative languages. This type of expres-        5.1   Specification of XQuery Semantics Using
sion is sometimes compared to SQL SELECT state-                   λ-calculus
ment known from relational databases.                       The functional approach applied in XML-λ might be
   The goal of our research is to explore the relation-     used not only for constructing a query language but
ship between XQuery and XML-λ. Thus, in XQuery              also for a different purpose – description of seman-
we can write a query returning all first names in our       tics of various query languages. Within our research
bibliography database in following way:                     we want to describe the semantics of the mainstream
                                                            XML query language XQuery with help of the func-
<result> {                                                  tional approach based on λ-calculus – XML-λ. Tak-
  for $x in doc("bib.xml")/book/author                      ing into account the type system used we expect the
    return (                                                semantics to be easier to express in comparison with
      <fname>                                               existing W3C semantics.
        {$x/first/text()}                                      With respect to the scope of the XQuery semantics
      </fname>)                                             specification this topic represents a complex theoreti-
} </result>                                                 cal research.

   A query with the same semantics but written in           5.2   Bi-directional Transformation of XQuery
XML-λ has (of course) different syntax as shown be-               and XML-λ Queries
low:
                                                            Expression of the XQuery semantics with help of
xmldata("bib.xml")                                          λ-calculus gives us an opportunity to compare expres-
lambda fname f                                              sive power of XQuery and XML-λ. Because of the fact
  (/book/author(a) and a/first = f)                         that the semantics of XML-λ query language is closely
                                                            related to its syntax we can propose a method for con-
    Both queries however return the same result:            version of queries based on language’s semantics. The
                                                            result of this aim is to have a transformation engine
<result>                                                    and its implementation for converting XQuery queries
  <fname> Jaroslav </fname>                                 in XML-λ and vice–versa available. The conversion
  <fname> Karel </fname>                                    process is generally a transformation of an input se-
</result>                                                   mantic tree in one language to an output semantic
                                                            tree in the other language.
   We want to show that all queries in XQuery can
be rewritten into XML-λ and vice–versa. This claim
should be backed by formal description of denotational
                                                            6     Conclusion
and operational semantics of both languages and by          This paper outlines a framework for querying XML
finding a transformation between semantic subtrees of       data based on a functional approach combined with
all language constructs.                                    lambda calculi operations. The functional approach
is not common nowadays but offers a universal way           [7] J. Clark and S. DeRose.       XML Path
that might be used as a base for constructing various           Language (XPath) 1.0,     November 1999.
query languages for XML. Beside that it can be also             http://www.w3.org/TR/xpath.
used for specifying semantics of these languages. Hav-
ing the semantics formalized with the same tool we          [8] J. Cowan and R. Tobin.        XML infor-
can compare their expressive power and evaluate their           mation set (second edition), April 2004.
relationship.                                                   http://www.w3.org/TR/2004/REC-xml-infoset-
   For consecutive work to lead into a dissertation the-        20040204/.
sis we plan to describe semantics of XQuery using the       [9] A. Deutsch, M. F. Fernandez, D. Florescu, A. Y.
functional framework. With this formalism we can                Levy, and D. Suciu. XML–QL: A query language
construct a XQuery-to-XML-λ compiler (and vice–                 for XML. The Query Language Workshop (QL),
versa) by transformation of respective semantic trees.          Cambridge, MA, 1998.
These transformations describe relationships between
various language constructs. With knowledge of these       [10] D. Draper, P. Fankhauser, M. Fernández,
relationships we can then compare XQuery’s expres-              A. Malhotra, K. Rose, M. Rys, J. Siméon,
sive power with a query language based on lambda                and P. Wadler.     XQuery 1.0 and XPath
calculi.                                                        2.0 formal semantics,    September 2005.
   The main contribution of the thesis should be seen           http://www.w3.org/TR/xquery-semantics/.
in using functional approach for description of XQuery
semantics and for evaluation of properties of lambda       [11] R. K. Dybvig. The Scheme Programming Lan-
calculi based framework and its suitability for querying        guage. The MIT Press, 2003. Available online at
XML.                                                            http://www.scheme.com/tspl3/.
                                                           [12] M. Fernández, A. Malhotra, J. Marsh,
7   Acknowledgements                                            M. Nagy, and N. Walsh.     XQuery 1.0 and
This research has been partially supported by MŠMT             XPath 2.0 Data Model, September 2005.
under research program MSM 6840770014.                          http://www.w3.org/TR/xpath-datamodel/.

References                                                 [13] O. Kiselyov.       SXML specification, 2004.
                                                                http://okmij.org/ftp/Scheme/SXML.html.
 [1] S. Abiteboul, D. Quass, J. McHugh, J. Widom,
     and J. L. Wiener. The Lorel query language for        [14] K. Lisovsky. SXPath: XPath made functional.
     semistructured data. International Journal on              In International Lisp Conference ILC 2003, New
     Digital Libraries, pages 68–88, 1997.                      York, US, October 2003.

 [2] H. Barendregt. Lambda calculi with types. In          [15] A. Malhotra, J. Melton, and N. Walsh.
     Handbook of Logic in Computer Science, Vol-                XQuery    1.0   and   XPath     2.0    Func-
     umes 1 (Background: Mathematical Structures)               tions and Operators,     September 2005.
     and 2 (Background: Computational Structures),              http://www.w3.org/TR/xpath-functions/.
     Abramsky & Gabbay & Maibaum (Eds.), Claren-
     don, volume 2. 1992.                                  [16] J. Pokorný. XML functionally. In B. C. Desai,
                                                                Y. Kioki, and M. Toyama, editors, Proceedings
 [3] S. Boag, D. Chamberlin, M. F. Fernández, D. Flo-          of IDEAS2000, pages 266–274. IEEE Comp. So-
     rescu, J. Robie, and J. Simeon. XQuery 1.0:                ciety, 2000.
     An XML Query Language, September 2005.
     http://www.w3.org/TR/xquery/.                         [17] J. Pokorný. XML-λ: an extendible framework for
                                                                manipulating XML data. In Proceedings of BIS
 [4] T. Bray, F. Yergeau, J. Cowan, J. Paoli,                   2002, Poznan, 2002.
     C. M. Sperberg-McQueen, and E. Maler. Ex-
     tensible markup language (XML) 1.1, February          [18] P. Šárek. Implementation of the XML lambda
     2004. http://www.w3.org/TR/2004/REC-xml11-                 language. Master’s thesis, Dept. of Software En-
     20040204/.                                                 gineering, Charles University, Prague, 2002.
 [5] D. Chamberlin, A. Berglund, and e. a. Scott Boag.
     XML Path Language (XPath) 2.0, September
     2005. http://www.w3.org/TR/xpath20/.
 [6] D. Chamberlin, J. Robie, and D. Florescu. Quilt:
     An XML query language for heterogeneous data
     sources. WebDB, Lecture Notes in Computer Sci-
     ence, volume 1997.

</pre>