Denotational Semantics of the XML-λ Query Denotational Semantics of ?the XML-λ Query Language ? Language Pavel Loupal1 and Karel Richta2 Pavel Loupal1 and Karel Richta2 1 Department of Computer Science and Engineering, FEL ČVUT 1 Department ofKarlovo Computer Science nám. and 13, 121 35Engineering, Praha 2 FEL ČVUT Karlovo nám. 13, 121 35 Praha 2 loupalp@fel.cvut.cz 2 loupalp@fel.cvut.cz Department of Software Engineering MFF UK, 2 Department of Software Malostranske nam. 25,Engineering MFF 118 00 Praha 1 UK, Malostranske nam. 25, 118 00 Praha 1 richta@ksi.mff.cuni.cz richta@ksi.mff.cuni.cz Abstract. In this paper, we define formally the XML-λ Query Lan- guage, a query language for XML, that employs the functional data model. We describe its fundamental principles including the abstract syntax and denotational semantics. The paper basically aims for outlin- ing of the language scope and capabilities. 1 Introduction In this paper, we define formally the XML-λ Query Language, a query language for XML, that employs the functional data model. The first idea for such an attitude was published in [4]. This research brought in the key idea of a func- tional query processing with a wide potential that was later proven by a simple prototype implementation [6]. We can imagine two scenarios for this language; firstly, the language plays a role of a full-featured query language for XML (it has both formal syntax and semantics and there is also an existing prototype that acts as a proof-of- the-concept application). In the second scenario, the language is utilized as an intermediate language for the description of XQuery semantics. In [3] we propose a novel method for XQuery evaluation based on the transformation of XQuery queries into their XML-λ equivalents and their subsequent evaluation. As an integral part of the work, we have designed and developed a prototype of an XML-λ query processor for validating the functional approach and experiment- ing with it. 2 XML-λ Query Language In this section, we describe the query language XML-λ, that is based on the sim- ply typed lambda calculus. As a formal tool we use the approach published in ? This work has been supported by the Ministry of Education, Youth and Sports under Research Program No. MSM 6840770014 and also by the grant project of the Czech Grant Agency (GAČR) No. GA201/09/0990. J. Pokorný, V. Snášel, K. Richta (Eds.): Dateso 2010, pp. 139–146, ISBN 978-80-7378-116-3. 2140 Pavel Loupal, Loupal Karel P., Richta K.Richta Richta’s overview of semantics [5]. For listing of language syntax, we use the Ex- tended Backus-Naur Form (EBNF) and for meaning of queries the denotational semantics [5]. 2.1 Language of Terms Typical query expression has a query part — an expression to be evaluated over data — and a constructor part that wraps a query result and forms the output. The XML-λ Query Language is based on λ-terms defined over the type system TE as shown later. Lambda calculus, written also as λ-calculus, is a formal mathematical system for investigation of function definition and application. It was introduced by Alonzo Church and has been utilized in many ways. In this work, we use a variant of this formalism, the simply-typed λ-calculus, as a core for the XML-λ Query Language. We have gathered the knowledge from [7] and [1]. Our realization is enriched by usage of tuples. The main constructs of the language are variables, constants, tuples, projec- tions, and λ-calculus operations — applications and abstractions. The syntax is similar to λ-calculus expressions, thus the queries are structured as nested λ- expressions, i.e., λ . . . (λ . . . (expression) . . .). In addition, there are also typical constructs such as logical connectives, constants, comparison predicates, and a set of built-in functions. Language of terms is defined inductively as the least set containing all terms created by the application of the following rules. Let T, T1 , . . . , Tn , n ≥ 1 be members of TE . Let F be a set of typed constants. Then: 1. variable: each variable of type T is a term of type T 2. constant: each constant (member of F ) of type T is a term of type T 3. application: if M is a term of type ((T1 , . . . , Tn ) → T ) and N1 , . . . , Nn are terms of the types T1 , . . . , Tn ,then M (N1 , . . . , Nn ) is a term of the type T 4. λ-abstraction: if x1 , . . . , xn are distinct variables of types T1 , . . . , Tn and M is a term of type T , then λx1 : T1 , . . . , xn : T1 .(M ) is a term of type ((T1 , . . . , Tn ) → T ) 5. n-tuple: if N1 , . . . , Nn are terms of types T1 , . . . , Tn , then (N1 , . . . , Nn ) is a term of type (T1 , . . . , Tn ) 6. projection: if (N1 , . . . , Nn ) is a term of type (T1 , . . . , Tn ), then N1 , . . . , Nn are terms of types T1 , . . . , Tn 7. tagged term: if N is a term of type N AM E and M is a term of type T then N : M is a term of type (E → T ). Terms can be interpreted in a standard way by means of an interpretation as- signing to each constant from F an object of the same type, and by a semantic mapping from the language of terms to all functions and Cartesian products given by the type system TE . Speaking briefly, an application is evaluated as an application of of the associated function to given arguments, an abstraction ’constructs’ a new function of the respective type. The tuple is a member of Cartesian product of sets of typed objects. A tagged term is interpreted as a function defined only for one e ∈ E. It returns again a function. Denotational Semantics Denotational Semantics of of the the XML-λ XML-λ Query Query Language Language 141 3 3 Abstract Syntax As for evaluation of a query, we do not need its complete derivation tree; such information is too complex and superfluous. Therefore, in order to diminish the domain that needs to be described without any loss of precision, we employ the abstract syntax. With the abstract syntax, we break up the query into logi- cal pieces that forming an abstract syntax tree carrying all original information constitute an internal representation suitable for query evaluation. We introduce syntactic domains for the language, i.e., logical blocks a query may consist of. Subsequently, we list all production rules. These definitions are later utilized in Section 4 within the denotational semantics. 3.1 Syntactic Domains By the term syntactic domain, we understand a logical part of a language. In Table 1, we list all syntactic domains of the XML-λ Query Language with their informal meaning. Notation Q : Query stands for the symbol Q representing a member of the Query domain. Q : Query XML-λ queries, O : Option XML-λ options – XML input attachements, C : Constructor XML-λ constructors of output results, E : Expression general expressions, yield a BaseT ype value, T : T erm sort of expression, yield a BaseT ype value, F : F ragment sub-parts of a T erm, BinOp : BinOperator binary logical operators, RelOp : RelOperator binary relational operators, N : N umeral numbers, S : String character strings, Id : Identif ier strings conforming to the N ame syntactic rule in [2], N F : N ullary identifiers of nullary functions (subset of Identif ier), P roj : P rojection identifiers for projections (subset of Identif ier). Table 1. Syntactic domains of the XML-λ Query Language 3.2 Abstract Production Rules The abstract production rules listed in Table 2 (written using EBNF) connect the terms of syntactic domains from the previous section into logical parts with suitable level of details for further processing. On the basis of these rules, we will construct the denotational semantics of the language. 4 Denotational Semantics For description the meaning of each XML-λ query, we use denotational seman- tics. The approach is based on the idea that for each correct syntactic construct of the language we can define a respective meaning of it as a formal expression in 4142 Pavel Loupal, Loupal Karel P., Richta K.Richta Query ::= Options Constructor Expression Constructor ::= ElemConstr + | Identif ier+ ElemConstr ::= N ame AttrConstr ∗ (Identif ier | ElemConstr) AttrConstr ::= N ame Identif ier Expression ::= F ragment F ragment ::= N ullary | Identif ier | F ragment P rojection | SubQuery | F unctionCall | N umeral | String | Boolean T erm ::= Boolean | F ilter | ’not’ T erm | T erm BinOper T erm F ilter ::= F ragment RelOper F ragment SubQuery ::= Identif ier + Expression BinOper ::= ’or’ | ’and’ RelOper ::= ’<=’ | ’<’ | ’=’ | ’!=’ | ’>’ | ’>=’ N umeral ::= Digit+ | Numeral 0 .0 Digit+ Digit ::= ’0’ | ’1’ | ’2’ | ’3’ | ’4’ | ’5’ | ’6’ | ’7’ | ’8’ | ’9’ Identif ier ::= N ame P rojection ::= Identif ier N ullary ::= Identif ier Table 2. Abstract production rules for the XML-λ Query Language another, well-known, notation. We can say that the program is the denotation of its meaning. The validity of the whole approach is based on structural induction; i.e, that the meaning of more complex expressions is defined on the basis of their simpler parts. As the notation we employ the simply typed lambda calculus. It is a well-known and formally verified tool for such a purpose. 4.1 Prerequisites The denotational semantics utilizes a set of functions for the definition of the language meaning. For this purpose, we formulate all necessary mathematical definitions. We start with the data types and specification of the evaluation context followed by the outline of bindings to the TE type system. Then, all auxiliary and denotation functions are introduced. Data Types. Each value computed during the process of the query evaluation is of a type from T ype. Let E be a type from the type system TE , we define T ype as: T ype ::= BaseT ype | SeqT ype SeqT ype ::= ⊥ | BaseT ype × SeqT ype BaseT ype ::= E | P rimitiveT ype P rimitiveT ype ::= Boolean | String | N umber Primitive types, Boolean, String, and N umber, are defined with their set of allowed values as usual. The type SeqT ype is the type of all ordered sequences of elements of base types3 . We do not permit sequences of sequences. The symbol ⊥ stands for the empty sequence of types – represents an unknown type. More 3 We suppose usual functions cons, append, null, head, and tail for sequences. Denotational Semantics Denotational Semantics of of the the XML-λ XML-λ Query Query Language Language 143 5 precisely, we interpret types as algebraic structures, where for each type τ ∈ T ype there is exactly one carrier Vτ , whose elements are the values of the respective type τ . Variables. An XML-λ query can use an arbitrary (countable) number of vari- ables. We model variables as pairs name : τ , where name refers to a variable name and τ is the data type of the variable – any member of T ype. Syntactically, variable name is always prepended by the dollar sign. Each expression in XML-λ has a recognizable type, otherwise both the type and the value are undefined. Query Evaluation Context. During the process of query evaluation we need to store variables inside a working space known as a context. Formally, we denote this context as the State. We usually understand a state as the set of all active objects and their values at a given instance. We denote the semantic domain State of all states as a set of all functions from the set of identifiers Identif ier into their values of the type τ ∈ T ype. Obviously, one particular state σ : State represents an immediate snapshot of the evaluation process; i.e., values of all variables at a given time. We denote this particular value for the variable x as σ[[x]]. Simply speaking, the state is the particular valuation of variables. We use the functor f [x ← v] for the definition of a function change in one point x to the value v. 4.2 Auxiliary Functions For the sake of readability improvement, we propose few semantic functions, de- noted as auxiliary, that should make the denotations more legible. We introduce functions: isIdent — returns true iff its argument denotes an variable identifier in a given State, typeOf — returns a type of given argument (one type from the T ype set), valueOf — returns a typed value of an expression, bool — evaluates its argument as a Boolean value, num — converts its argument into a numeric value, and str — converts its argument into a string value. Each expression e has a distinguished type in a state σ. The type can depend on the state because an expression can contain variables. This type is available by calling the typeOf semantic function defined in Table 3. typeOf : Expression × State → T ype 4.3 XML Schema-Specific Functions For utilization of the features offered by the XML-λ Framework we propose a number of functions working with information available in the type system. These functions help us to access an arbitrary data model instance. An applica- tion is informally used for accessing child elements of a given one. More formally, it is an evaluation of a T -object specified by its name. A projection is generally used for selecting certain items from a sequence. A nullary function. A T -nullary function returns all abstract elements from ET . Root Element Access is a short- cut for a common activity in the XML world — accessing the root element of an XML document. We type it as a constant of a given type from TE . 6144 Pavel Loupal, Loupal Karel P., Richta K.Richta    ⊥ if e is a nullary fragment   Boolean if e ∈ νBoolean     (e is a constant of the type Boolean)     N umeral if e ∈ νNumeral       (e is a constant of the type N umeral)    String if e ∈ νString typeOf [[e]](σ) = (e is a constant of the type String)   τ  if isIdent[[e]](σ) and σ[[e]] : τ     (e is a variable of the type τ )     Boolean if e is a relational fragment (filter)     e1 RelOper e2     Boolean if e is a logical expression  e1 BinOper e2 , or not e1 Table 3. Types of general expressions appXM LDoc : E → SeqT ype projXM LDoc : SeqT ype × τ → SeqT ype nullXM LDoc : E × T → SeqT ype rootXM LDoc : E 4.4 Signatures of Semantic Functions Having defined all necessary prerequisites and auxiliary functions (recalling that the SeqT ype represents any permitted type of value), we formalize semantic functions over semantic domains as SemQuery : Query → (XM LDoc → SeqT ype) SemOptions : Options → (State → State) SemExpr : Expression → (State → SeqT ype) SemT erm : T erm → (State → Boolean) SemF rag : F ragment → (State → SeqT ype) SemRelOper : F ragment × RelOper × F ragment → (State → Boolean) SemBinOper : T erm × BinOper × T erm → (State → Boolean) 4.5 Semantic Equations We start with the semantic equations for the expressions. Each expression e has a value SemExpr [[e]](σ) in a state σ. The state represents values of variables. The result is a state, where all interesting values are bound into local variables. Resulting values are created by constructors. A constructor is a list of items which can be variable identifier or constructing expression. Resulting values can be created by element constructors. Elements can have attributes assigned by attribute constructors. Options and Queries. The only allowed option in the language is now the speci- fication of input XML documents. We explore a function Dom(X) that converts input XML document X into its internal representation accessible under iden- tification X#. A query consists of query options, where input XML documents Denotational Denotational Semantics Semantics of the of the XML-λ XML-λ Query Query Language Language 7145 SemT erm [[B]] = λσ : State.bool[[B]] if B is a constant of the type Boolean SemT erm [[F1 RelOp F2 ]] = λσ : State.SemRelOper [[F1 RelOp F2 ]]σ 0 0 SemT erm [[ not T ]] = λσ : State.not(SemT erm [[T ]]σ) SemBinOper [[T1 ’or’ T2 ]] = λσ : State.(SemT erm [[T1 ]]σ or SemT erm [[T2 ]]σ) SemBinOper [[T1 ’and’ T2 ]] = λσ : State.(SemT erm [[T1 ]]σ and SemT erm [[T2 ]]σ) SemT erm [[T1 BinOper T2 ]] = λσ : State.SemBinOper [[T1 BinOper T2 ]]σ Table 4. Semantic equations for terms, relational and binary operators SemAttrConstr [[N I]]σ = attribute(N, SemExpr [[I]]σ) SemElemConstr [[N A1 ...An I]]σ = = element(N, σ[[I]], SemAttrCons [[A1 ]]σ, ..., SemAttrCons [[An ]]σ) SemElemConstr [[N A1 ...An E]]σ = = element(N, SemExpr [[E]]σ, SemAttrCons [[A1 ]]σ, ..., SemAttrCons [[An ]]σ) SemElemConstr [[N I]]σ = element(N, σ[[I]], nil) SemElemConstr [[N E]]σ = element(N, SemExpr [[E]]σ, nil) SemCons [[E1 E]]σ = append(SemElemCons[[E1 ]]σ, SemCons [[E]]σ) SemCons [[I1 E]]σ = cons(σ[[I1 ]], SemCons [[E]]σ) SemCons [[ ]]σ = nil Table 5. The semantic equation for constructors SemF rag [[N ull]] = λσ : State.nullXM LDoc [[N ull]] SemF rag [[Id]] = λσ : State.σ[[Id]] SemF rag [[f (E1 , ..., En )]] = λσ : State.f (SemExpr [[E1 ]]σ, ..., SemExpr [[En ]]σ) SemF rag [[F P ]] = λσ : State.(SemF rag [[F ]] ◦ SemF rag [[P ]])σ SemF rag [[(subquery)(arg)]] = λσ : State.(SemExpr [[subquery]](σ)(SemExpr [[arg]](σ))) SemF rag [[I1 I2 ...In E]] = SemExpr [[I2 ...In E]](σ[SemExpr [[E]]σ ← I1 ]) SemF rag [[N ]] = λσ : State.num[[N ]] if N is a constant of the type N umeral SemF rag [[S]] = λσ : State.str[[S]] if S is a constant of the type String SemF rag [[B]] = λσ : State.bool[[B]] if B is a constant of the type Boolean SemExpr [[F ]]σ = SemF rag [[F ]]σ Table 6. Semantic equations for fragments and expressions 8146 Pavel Loupal, Loupal Karel P., Richta K.Richta SemQuery [[O C E]] = = λδ : XM LDoc.(SemCons [[C]](SemExpr [[E]](SemOptions [[O]](λσ.⊥)(δ))) SemQuery [[Q]](nil) = nil SemQuery [[Q]](cons(H, T )) = append(SemQuery [[Q]](H), SemQuery[[Q]](T )) SemOptions [[ ]] = λσ : State.⊥ SemOptions [[ xmldata( X ) Y ]] = λσ : State.SemOptions [[Y ]](σ[Dom(X) ← X#]) Table 7. Semantic equations for options and queries are bound to its formal names, the query expression to be evaluated, and the output construction commands. First, input files are elaborated, than an initial variable assignment takes place, followed by evaluation of expression. Finally, the output is constructed. The whole meaning of a query can be modeled as a mapping from the sequence of input XML documents into a sequence of output values of the type of T ype. 5 Conclusions In this paper, we have presented syntax and denotational semantics of the XML-λ Query Language, a query language for XML based on simply typed lambda calculus. We use this language within the XML-λ Framework as an intermediate form of XQuery expressions for description of its semantics. Nev- ertheless the language in its current version does not support all XML features, e.g. comments, processing instructions, or deals only with type information avail- able in DTD, it can be successfully utilized for fundamental scenarios both for standalone query evaluation or as a tool for XQuery semantics description. References 1. H. Barendregt. Lambda calculi with types. In Handbook of Logic in Computer Science, Volumes 1 (Background: Mathematical Structures) and 2 (Background: Computational Structures), Abramsky & Gabbay & Maibaum (Eds.), Clarendon, volume 2. Oxford University Press, 1992. 2. T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau. Extensible markup language (XML) 1.0 (fourth edition), August 2006. http://www.w3.org/ TR/2006/REC-xml-20060816. 3. P. Loupal. XML-λ : A Functional Framework for XML. Ph.D. Thesis, Department of Computer Science and Engineering, Faculty of Electrical Engineering, Czech Tech- nical University in Prague, February 2010. Submitted. 4. J. Pokorný. XML functionally. In B. C. Desai, Y. Kioki, and M. Toyama, editors, Proceedings of IDEAS2000, pages 266–274. IEEE Computer Society, 2000. 5. K. Richta and J. Velebil. Sémantika programovacı́ch jazyku. Univerzita Karlova, 1997. 6. P. Šárek. Implementation of the XML lambda language. Master’s thesis, Dept. of Software Engineering, Charles University, Prague, 2002. 7. J. Zlatuška. Lambda-kalkul. Masarykova univerzita, Brno, Česká republika, 1993.