On the denotational semantics of XML-Lambda⋆ Pavel Loupal1 and Karel Richta2 1 Department of Software Engineering, Czech Technical University, Faculty of Information Technology Prague, Czech Republic pavel.loupal@fit.cvut.cz 2 Department of Software Engineering, Charles University, Faculty of Mathematics and Physics Prague, Czech Republic richta@ksi.mff.cuni.cz Abstract. The article deals with the denotational seman- tegral part of the work, we have designed and devel- tics of a special query language called XML-Lambda oped a prototype of an XML-λ query processor for (abbreviated as XML-λ), which is based on the simply typed validating the functional approach and experimenting lambda calculus. The exact semantics allows experimenta- with it. The main advantage of this concept is the tion with a language definition, prototyping of programs, possibility of a query optimizations in the XML-λ in- and similar experiments. One of such experiment is the termediate form. It is much more easier than optimiza- implementation of the XQuery language in the XML-λ en- vironment. The main advantage of our approach is the pos- tions when we use the official W3C semantics ([3]). sibility of a query optimizations in the XML-λ intermedi- Since it is not possible to express the semantics of ate form. It is much more easier than optimizations based the whole XML-λ language in this contribution, the on the official W3C semantics. XML-λ is a part of more paper focuses chiefly on its main idea and concepts. complex XML-λ Framework which serves for experiment- ing with the tools for XML processing. 2 XML-λ Query Language 1 Introduction In this section, we briefly describe the XML-λ Query Language, a query language for XML based on the In this paper, we define formally the semantics of simply typed lambda calculus. XML-Lambda Query Language. From now on we will use abbreviation XML-λ. XML-λ employs the func- 2.1 Language of terms tional data model for XML data elaboration. The first idea for such an attitude was published in [5, 6]. This Typical query expression has a query part — an ex- research brought in the key idea of a functional query pression to be evaluated over data — and a constructor processing with a wide potential that was later proven part that wraps a query result and forms the output. by a simple prototype implementation [7]. The XML-λ Query Language is based on λ-terms de- We can imagine two scenarios for this language; fined over the type system TE as will be shown later. firstly, the language plays a role of a full-featured query λ-calculus is a formal mathematical system for inves- language for XML (it has both formal syntax and se- tigation of function definition and application. It was mantics and there is also an existing prototype that introduced by Alonzo Church and has been utilized acts as a proof-of-the-concept application). But there in many ways. In this work, we use a variant of this already exist standard approaches for XML querying formalism, the simply-typed λ-calculus, as a core for – especially XQuery, with probably more appropriate the XML-λ Query Language. We have gathered the syntax for users. So, there is no need to define any new knowledge from [8] and [1]. Our realization is enriched query language. by usage of tuples. In the second scenario, the XML-λ Query Lan- The main constructs of the language are variables, guage is utilized as an intermediate language for the constants, tuples, projections, and λ-calculus opera- description of XQuery semantics. In [4] we propose tions — applications and abstractions. The syntax is a novel method for XQuery evaluation based on the similar to λ-calculus expressions, thus the queries are transformation of XQuery queries into their XML-λ structured as nested λ-expressions, i.e.: equivalents and their subsequent evaluation. As an in- λ . . . (λ . . . (expression) . . .) ⋆ This work has been supported by the Ministry of Edu- cation, Youth and Sports under Research Program No. In addition, there are also typical constructs such as MSM 6840770014 and also by the grant project of the logical connectives, constants, comparison predicates, Czech Grant Agency No. GA201/09/0990. and a set of built-in functions. 34 Pavel Loupal, Karel Richta Language of terms is defined inductively as the 3.1 Syntactic domains least set containing all terms created by the applica- tion of the following rules. Let T, T1 , . . . , Tn , n ≥ 1 be By the term syntactic domain, we understand a logi- members of TE . Let F be a set of typed constants, and cal part of a language. In Table 1, we list all syntactic V an at most countable set of typed variables. Then: domains of the XML-λ Query Language with their in- formal meaning. Notation Q : Query stands for the 1. variable: each variable (member of V) of type T is symbol Q representing a member of the Query do- a term of type T main. 2. constant: each constant (member of F) of type T is a term of type T 3. application: if M is a term of type ((T1 ,. . ., Tn) → T ) Q : Query XML-λ queries, and N1 , . . . , Nn are terms of the types T1 , . . . , Tn , O : Options XML input attachements, then M (N1 , . . . , Nn ) is a term of the type T C : Constructor constructors of output results, 4. λ-abstraction: if x1 , . . . , xn are distinct variables E : Expression general expressions, of types T1 , . . . , Tn and M is a term of type T , SQ : SubQuery (nested) subqueries, then λx1 : T1 , . . . , xn : T1 .(M ) is a term of type T : T erm sort of expression, ((T1 , . . . , Tn ) → T ) F : F ragment sub-parts of a T erm, 5. n-tuple: if N1 ,. . ., Nn are terms of types T1 ,. . ., Tn , As : Assignment variable assignments, then (N1 , . . . , Nn ) is a term of type (T1 , . . . , Tn ) F lt : F ilter set pruning conditions, F C : F unctionCall either built-in or user-defined 6. projection: if (N1 , . . . , Nn ) is a term of type functions, (T1 , . . . , Tn ), then N1 , . . . , Nn are terms of types BinOp : BinOperator binary logical operators, T 1 , . . . , Tn RelOp : RelOperator binary relational operators, 7. tagged term: if N is a term of type N AM E and N F : N ullary identifiers of nullary functions M is a term of type T then N : M is a term of (subset of Identif ier), type (E → T ). P roj : P rojection identifiers for projections (sub- set of Identif ier), The set of abstract elements E serves as a notation B : Boolean logical values, for abstract positions in XML trees. Terms can be in- N : N umeral numbers, terpreted in a standard way by means of an interpre- D : Digits digits, tation assigning to each constant from F an object of S : String character strings, the same type, and by a semantic mapping from the Id : Identif ier strings conforming to the language of terms to all functions and Cartesian prod- N ame syntactic rule in [2]. ucts given by the type system TE . Speaking briefly, an Table 1. Syntactic domains of the XML-λ Query Lan- application is evaluated as an application of the as- guage. sociated function to given arguments, an abstraction ’constructs’ a new function of the respective type. The tuple is a member of Cartesian product of sets of typed 3.2 Abstract production rules objects. A tagged term is interpreted as a function de- fined only for one e ∈ E. It returns again a function. The abstract production rules listed in Table 2 (writ- ten using EBNF) connect the terms of syntactic do- mains from the previous section into logical parts with 3 Abstract syntax suitable level of details for further processing. On the basis of these rules, we will construct the denotational As for evaluation of a query, we do not need its com- semantics of the language. plete derivation tree; such information is too complex and superfluous. Therefore, in order to diminish the domain that needs to be described without any loss of 4 Denotational semantics precision, we employ the abstract syntax. With the ab- stract syntax, we break up the query into logical pieces We use denotational semantics for the description of that forming an abstract syntax tree carrying all orig- the meaning of each XML-λ query. The approach is inal information constitute an internal representation based on the idea that for each correct syntactic con- suitable for query evaluation. We introduce syntactic struct of the language we can define a respective mean- domains for the language, i.e., logical blocks a query ing of it as a formal expression in another, well-known, may consist of. Subsequently, we list all production notation. We can say that the program is the denota- rules. These definitions are later utilized in Section 4 tion of its meaning. The validity of the whole approach within the denotational semantics. is based on structural induction; i.e, that the meaning of more complex expressions is defined on the basis Semantics of XML-Lambda 35 Query ::= Options Constructor Expression as algebraic structures, where for each type τ ∈ T ype Constructor ::= ElemConstr + | Identif ier+ there is exactly one carrier Vτ , whose elements are the ElemConstr ::= N ame AttrConstr ∗ (Identif ier values of the respective type τ . | ElemConstr) AttrConstr ::= N ame Identif ier Expression ::= F ragment Variables. An XML-λ query can use an arbitrary F ragment ::= N ullary | Identif ier | T erm (countable) number of variables. We model variables | F ragment P rojection as pairs name : τ , where name refers to a variable | SubQuery | F unctionCall name and τ is the data type of the variable – any | N umeral | String | Boolean member of T ype. Syntactically, variable name is al- T erm ::= Boolean | F ilter | ’not’ T erm ways prepended by the dollar sign. Each expression in | T erm BinOperator T erm XML-λ has a recognizable type, otherwise both the F ilter ::= F ragment RelOperator F ragment type and the value are undefined. SubQuery ::= Identif ier + Expression BinOperator ::= ’or’ | ’and’ Query Evaluation Context. During the process RelOperator ::= ’<=’ | ’<’ | ’==’ | ’!=’ | ’>’ | ’>=’ of query evaluation we need to store variables inside N umeral ::= Digit+ | Numeral ′ .′ Digit+ a working space known as a context. Formally, we de- Digit ::= ’0’|’1’|’2’|’3’|’4’|’5’|’6’|’7’|’8’|’9’ note this context as the State. We usually understand Identif ier ::= N ame a state as the set of all active objects and their values P rojection ::= Identif ier at a given instance. We denote the semantic domain State of all states as a set of all functions from the set Table 2. Abstract production rules for the XML-λ Query of identifiers Identif ier into their values of the type Language. τ ∈ T ype. Obviously, one particular state σ : State represents an immediate snapshot of the evaluation of their simpler parts. As the notation we employ the process; i.e., values of all variables at a given time. We simply typed lambda calculus. It is a well-known and denote this particular value for the variable x as σ[[x]]. formally verified tool for such a purpose. Simply speaking, the state is the particular valuation of variables. We use the functor f [x ← v] for the defini- 4.1 Prerequisites tion of a function change in one point x to the value v. The denotational semantics utilizes a set of functions for the definition of the language meaning. For this 4.2 Signatures of semantic functions purpose, we formulate all necessary mathematical def- initions. We start with the data types and specifica- Having defined all necessary prerequisites and auxil- tion of the evaluation context followed by the outline iary functions (recalling that the SeqT ype represents of bindings to the TE type system. Then, all auxiliary any permitted type of value), we formalize semantic and denotation functions are introduced. functions over semantic domains as: Data Types. Each value computed during the process of the query evaluation is of a type from T ype. Let E SemQuery : Query → Seq(XM LDoc ) → Seq(T ype) be a type from the type system TE , we define T ype as: SemOptions : Options → (State → State) SemExpr : Expression → State → Seq(T ype) T ype ::= BaseT ype | Seq(T ype) SemT erm : T erm → (State → Boolean) Seq(T ype) ::= ⊥ | BaseT ype × Seq(T ype) SemF rag : F ragment → State → Seq(T ype) BaseT ype ::= E | P rimitiveT ype SemAssign : F ragment×Identif ier → State → State P rimitiveT ype ::= Boolean | String | N umber SemRelOper : F ragment×RelOperator×F ragment → → (State → Boolean) Primitive types, Boolean, String, and N umber, are SemBinOper : T erm × BinOperator × T erm → defined with their set of allowed values as usual. The → (State → Boolean) type constructor Seq stands for the the type of ordered SemAttrCons : AttrConstr × State → Seq(T ype) sequences of elements of values of the type T ype3 . We SemElemCons : ElemConstr × State → Seq(T ype) use it only for base types, so Seq(T ype) is the type Table 3. Semantic functions arities. of all ordered sequences of elements of base types. We do not permit sequences of sequences. The symbol ⊥ 4.3 Semantic equations stands for the empty sequence of types – represents an unknown type. More precisely, we interpret types We start with the semantic equations for the expres- 3 We suppose usual functions nil, cons, append, null, head, sions, then we will continue with the semantics and tail for sequences. of queries. 36 Pavel Loupal, Karel Richta Terms. Terms are logical expressions. It means, that SemBinOper [[T1 ’or’ T2 ]] = the meaning of any term in a given state is the Boolean = λσ.(SemT erm [[T1 ]]σ or SemT erm [[T2 ]]σ) value. SemBinOper [[T1 ’and’ T2 ]] = SemT erm : T erm → State → Boolean = λσ.(SemT erm [[T1 ]]σ and SemT erm [[T2 ]]σ) Terms are constructed with help of relational opera- tors (we call it f ilter, but we do not distinguise it Table 6. Semantic equations for binary operators. here), binary operators, negation sign, or primitive Boolean literals (denoted as B in the definition). Fragments. Fragments are logical parts of filters and have the same meaning as expressions. SemT erm [[B]] = λσ.bool[[B]] SemF rag : F ragment → State → Seq(T ype) if B is a constant of the type Boolean SemT erm [[F1 RelOp F2 ]] = λσ.SemRelOper [[F1 RelOp F2 ]]σ SemF rag [[Id]] = λσ.σ[[Id]] ′ ′ SemF rag [[f (E1 , ..., En )]] = SemT erm [[ not T ]] = λσ.not(SemT erm [[T ]]σ) = λσ.f (SemExpr [[E1 ]]σ, ..., SemExpr [[En ]]σ) SemT erm [[T1 BinOp T2 ]] = = λσ.SemBinOper [[T1 BinOp T2 ]]σ SemF rag [[F P ]] = = λσ.(SemF rag [[F ]] ◦ SemF rag [[P ]])σ Table 4. Semantic equations for terms. SemF rag [[(subquery)(arg)]] = = λσ.(SemExpr [[subquery]](σ)(SemExpr [[arg]](σ))) Relational Operators. Relational operators can be ap- SemF rag [[I1 I2 ...In E]] = plied to any two fragments and the meaning of result- = SemExpr [[I2 ...In E]](σ[SemExpr [[E]]σ ← I1 ]) ing expression is the mapping from the current state SemF rag [[N ]] = to Boolean values. They serve in filters. = λσ.num[[N ]] if N is a constant of the type N umeral SemRelOper : SemF rag [[S]] = F ragment × RelOperator × F ragment → = λσ.str[[S]] if S is a constant of the type String → State → Boolean SemF rag [[B]] = = λσ.bool[[B]] if B is a constant of the type Boolean SemRelOper [[F1 ’<’ F2 ]] = = λσ.(SemF rag [[F1 ]]σ < SemF rag [[F2 ]]σ) Table 7. Semantic equations for fragments. SemRelOper [[F1 ’==’ F2 ]] = = λσ.(SemF rag [[F1 ]]σ == SemF rag [[F2 ]]σ) Assignments. An assignment expression is a manda- SemRelOper [[F1 ’>’ F2 ]] = tory part of a query. It sets the initial context of the = λσ.(SemF rag [[F1 ]]σ > SemF rag [[F2 ]]σ) evaluation, more precisely, such expression evaluates SemRelOper [[F1 ’<=’ F2 ]] = a nullary function and stores the result into a vari- = λσ.(SemF rag [[F1 ]]σ <= SemF rag [[F2 ]]σ) able. Then, the evaluation process will filter results SemRelOper [[F1 ’>=’ F2 ]] = and than iterates over all values and computes remain- = λσ.(SemF rag [[F1 ]]σ >= SemF rag [[F2 ]]σ) ing results. SemRelOper [[F1 ’!=’ F2 ]] = SemAssign : F ragment × Identif ier = λσ.(not(SemF rag [[F1 ]]σ = SemF rag [[F2 ]]σ)) → State → State Table 5. Semantic equations for relational operators. SemAssign [[Id ’=’ F ]] = λσ.σ[Id ← SemF rag [[F ]]σ] Binary Operators. Binary operators can be applied to any two terms and the meaning of resulting expres- Table 8. Semantic equation for assignments. sion is the mapping from the current state to Boolean values. XML-λ uses clasical logical conectives – logical Expressions. Each expression e has a defined value ’or’ and logical ’and’ SemExpr [[e]](σ)(ξ) in a state σ and in an en- SemBinOper : T erm × BinOperator × T erm vironment ξ. The state represents the values of vari- → State → Boolean ables, the environment represents XML document that is elaborated. The result is a state SemExpr [[e]](σ)(ξ), Semantics of XML-Lambda 37 where all interesting values are bound into local vari- SemElemCons [[N A1 ...An I]]σ = ables. We do not need the environment in the next = element(N, σ[[I]], SemAttrCons [[A1 ]]σ, ..., computation, because all information is in the state. SemAttrCons [[An ]]σ) We can model input data as a mapping Env from identifiers to XML documents, formally: SemElemCons [[N A1 ...An E]]σ = = element(N, SemExpr [[E]]σ, SemAttrCons [[A1 ]]σ, ..., Env : Identif ier → XM Ldoc SemAttrCons [[An ]]σ) SemExpr : Expression × State × Env → SemElemCons [[N I]]σ = element(N, σ[[I]], nil) Seq(T ype) SemElemCons [[N E]]σ = = element(N, SemExpr [[E]]σ, nil) SemExpr [[A1 A2 ...An F1 F2 ...Fm ]]σξ = Table 11. Semantic equations for element constructors. = SemExpr [[A2 ...An F1 F2 ...Fm ]](SemAssign [[A1 ]]σξ) SemExpr [[F1 F2 ...Fm ]]σξ = = append(SemExpr [[F1 ]]σξ, SemExpr [[F2 ...Fm ]]σξ) SemAttrCons [[N I]]σ = attribute(N, SemExpr [[I]]σ) if SemRelOper [[F1 ]]σξ = true SemExpr [[F1 F2 ...Fm ]]σξ = Table 12. The semantic equation for attribute construc- = SemExpr [[F2 ...Fm ]]σξ tors. if SemRelOper [[F1 ]]σξ = f alse Example. Let us show an example of resulting se- SemExpr [[]]σξ = nil quence for the XML-λ constructor lambda book attlist [ title $b ] $a Table 9. The semantics of general expressions. The result is the function Constructors. Resulting values are created by con- λσ.element(′ book′ , σ[[a]], attribute(′ title′ , σ[[b]])) structors. A constructor is a list of items which can be either a variable identifier or a constructing expres- returning in the given state the string sion. element(book,"the value of a", SemCons : Constructor × State → Seq(T ype) attribute(title,"the value of b")) SemCons [[E1 E]]σ = append(SemElemCons [[E1 ]]σ, Options. The only allowed option in the language is SemCons [[E]]σ) now the specification of input XML documents. SemCons [[I1 E]]σ = cons(σ[[I1 ]], SemCons [[E]]σ) SemOptions : Options × Env → Env SemCons [[ ]]σ = nil SemOptions [[ ]](E) = E Table 10. Semantic equations for constructors. SemOptions [[ xmldata( X ) Y ]] = = λξ.SemOptions [[Y ]](ξ[Dom(X) ← X#]) Element Constructors. The most common kind of re- Table 13. Semantic equations for options. sulting value is undoubtedly the element constructor; obviously, all its alternatives are supported – either We explore a function Dom(X) that converts an in- with empty content, textual content or more complex put XML document X into its internal representation (i.e. “mixed”) content. For all cases we can also attach accessible under identification X#. attributes to elements. In the definition we use ab- stract functions element and attribute, which serves Query. A query (denoted as Q) consists of query op- to construct output XML values from arguments. tions (denoted as O), where input XML documents are bound to its formal names, the query expression to be SemElemCons : ElemConstr × State → Seq(T ype) evaluated (denoted as E), and the output construction Attribute Constructors. Elements can have attributes commands (denoted as C). At first, input files are elab- assigned by attribute constructors orated, than an initial variable assignment takes place, followed by evaluation of expression. Finally, the out- SemAttrCons : AttrConstr × State → Seq(T ype) put is constructed. This idea is inbuilt into the defini- tion of SemQuery [[O C E]] bellow. The whole meaning 38 Pavel Loupal, Karel Richta SemQuery [[Q]] of a query Q can be modeled as a map- = (λσ.⊥)[′ $v1′ ← λσ2 .plus(num[[′ 3′ ]] ping from the sequence of input XML documents into (λσ2 .⊥), num[[′ 2′ ]](λσ2 .⊥))(λσ.⊥)](′ $v1′ ) = a sequence of output values of the type of T ype. The = (λσ.⊥)[′ $v1′ ← λσ2 .plus(3, 2)(λσ.⊥)](′ $v1′ ) = definition is carried by the structural induction on the = (λσ.⊥)[′ $v1′ ← λσ2 .5(λσ.⊥)](′ $v1′ ) = possible forms of an input sequence. = (λσ.⊥)[′ $v1′ ← λσ.5](′ $v1′ ) = = (λσ.⊥)[′ $v1′ ← 5](′ $v1′ ) = SemQuery [[Q]] : Seq(XM LDoc ) → Seq(T ype) =5 SemQuery [[O C E]] : XM LDoc → Seq(T ype) 6 Conclusion SemQuery [[Q]](nil) = nil In this paper, we have presented syntax and deno- SemQuery [[Q]](cons(H, T )) = tational semantics of the XML-λ Query Language, = append(SemQuery [[Q]](H), SemQuery[[Q]](T )) a query language for XML based on simply typed SemQuery [[O C E]] = lambda calculus. We use this language within the spe- = λδ.(SemCons [[C]](SemExpr [[E]](λσ.⊥) cial XML-λ Framework as an intermediate form of (SemOptions [[O]](λξ.⊥)(δ))) XQuery expressions for description of its semantics. Nevertheless the language in its current version does not support all XML features, e.g. comments, process- Table 14. Semantic equations for queries. ing instructions, and deals only with type information available in DTD, it can be successfully utilized for fundamental scenarios both for standalone query eval- 5 The example uation or as a tool for XQuery semantics description. The following example illustrates the computations References performed in order to evaluate given XML-λ queries inside a virtual machine. It just computes a simple nu- 1. H. Barendregt: Lambda calculi with types. In: Handbook merical term. More complex examples could be found of Logic in Computer Science, Volumes 1 (Background: in [4]. Mathematical Structures) and 2 (Background: Compu- tational Structures), Abramsky & Gabbay & Maibaum (Eds.), Clarendon, volume 2, Oxford University Press, 5.1 Simple computation 1992. 2. T. Bray, J. Paoli, C. M. Sperberg-McQueen, Let us suppose the following simple query in the E. Maler, and F. Yergeau: Extensible markup XML-λ and its evaluation. language (XML) 1.0 (fifth edition), November 2008. http://www.w3.org/TR/2008/REC-xml-20081126. lambda $v1 ($v1 = plus(3, 2)) 3. D. Draper, P. Fankhauser, M. Fernández, A. Malhotra, K. Rose, M. Rys, J. Siméon, and P. Wadler: XQuery We can compute its meaning according to the XML-λ 1.0 and XPath 2.0 formal semantics, September 2005. semantics as (the result is independent on the input http://www.w3.org/TR/xquery-semantics. XML documents, so we can use the empty sequence 4. P. Loupal. XML-λ: A functional framework for XML. nil as the input): Ph.d. Thesis, Department of Computer Science and Engineering, Faculty of Electrical Engineering, Czech SemQuery [[′ lambda $v1 ($v1=plus(3, 2)′ ]](nil) = Technical University in Prague, February 2010, sub- = λδ.(SemCons [[′ lambda $v1 ($v1 = plus(3, 2))′ ]] mitted. (SemExpr [[ ]])(λσ.⊥)(δ))(nil) = 5. J. Pokorný: XML functionally. In: B. C. De- = SemCons [[′ lambda $v1 ($v1 = plus(3, 2))′ ]] sai, Y. Kioki, and M. Toyama, (Eds), Proceedings (λσ.⊥) = of IDEAS2000, IEEE Computer Society, 2000, 266–274. = SemAssign [[′ $v1 = plus(3, 2)′ ]] 6. J. Pokorný: XML-λ: an extendible framework for ma- (λσ.⊥)(′ $v1′ ) = nipulating XML data. In: Proceedings of BIS 2002, Poz- = λσ1 .σ1 [′ $v1′ ← SemF rag [[′ plus(3,2)′ ]] nan, 2002, 160–168. (σ1 )](λσ.⊥)(′ $v1′ ) = 7. P. Šárek: Implementation of the XML lambda language. = (λσ.⊥)[′ $v1′ ← SemF rag [[′ plus(3,2)′ ]] Master’s thesis, Dept. of Software Engineering, Charles University, Prague, 2002. (λσ.⊥)](′ $v1′ ) = 8. J. Zlatuška: Lambda-kalkul. Masarykova univerzita, = (λσ.⊥)[′ $v1′ ← λσ2 .plus(SemExpr [[′ 3′ ]] Brno, Česká republika, 1993. (λσ2 .⊥), SemExpr [[′ 2′ ]](λσ2 .⊥)) (λσ.⊥)](′ $v1′ ) =