Determining XSLT Streamability Using New Hierarchical XSD Model

                                                 Jana Dvořáková and Filip Zavoral

                                            Department Of Software Engineering
                                             Faculty of Mathematics and Physics
                                         Charles University in Prague, Czech republic
                                   {Jana.Dvorakova, Filip.Zavoral}@mff.cuni.cz

Abstract. We introduce a new compact model of XML Schema               framework for streaming XML transformations introduced
called a schema tree. Given an XSLT transformation xsl and an          in [3]. The framework is capable to process automatically a
XML schema xsd, we present a method which statically analyzes          class of top-down XSLT transformations which captures a
the schema tree constructed according to xsd and determines            significant number of practically needed XML transforma-
whether xsl can be processed in a streaming manner on a set            tions. The processing is done using a stack of the size pro-
of XML documents defined by xsd. We consider streaming pro-
                                                                       portional to the depth of the input XML document - such
cessing that uses a stack of the size proportional to the depth of
the input document - this processing is highly efficient in practice
                                                                       processing is highly efficient in practice since real XML
since real-world XML documents are shallow. The schema anal-           documents are shallow [9].
ysis is performed via stepwise application of templates of xsl on          We focus especially on the schema-based analyzer
the schema tree. We present the implementation of a schema tree        which represents a powerful tool used within the Xord frame-
and the static XSLT analyzer on .NET platform.                         work to determine the most efficient way of processing the
                                                                       given XSLT transformation. For a given XSLT stylesheet
                                                                       xsl and an XML schema xsd1 , it automatically analyzes
1    Introduction                                                      the memory usage of the streaming processing of xsl on a
                                                                       set of documents defined by xsd.
Many applications need to employ streaming approach                        The existing models for XML schemas (DOM, .NET
when processing huge data in XML format. Most typi-                    XmlSchema) appeared inconvenient for the purpose of the
cally, the languages XSLT [10] and XQuery [13] are used                streamability analysis, we therefore introduce the Xord
to specify XML transformations. Both of them enable the                Schema Model - a new compact model for schema repre-
user to write a high-level specification based on tree ma-             sentation. The model is abstract, and thus not bounded to a
nipulation. Common processors of these languages (e.g.,                particular schema language. However, in the prototype im-
Saxon, Xalan, AltovaXML) are tree-based, i.e., read the                plementation we employ W3C XSD notation [11, 12] for
whole input document into memory and then perform the                  XML schemas.
transformation itself.
                                                                       Related work. Several streaming processors for XSLT and
    The XSLT and XQuery tree-based processors are ap-                  XQuery have been implemented. However, their efficiency
parently not suitable when transforming XML streams or                 was demonstrated only by experiments on a small number
huge XML data. In this case, the transformation can be ei-             of XML transformations and input XML documents. It is
ther written by hand using an event-base parser (e.g., SAX,            thus not known how much memory is consumed on clearly
StAX) or using some streaming transformation language                  characterized transformation classes.
(STX [1], XStream [5]). In both cases, writing the speci-                  XML Streaming Machine (XSM) [8] processes a subset
fication is a non-trivial task since the user must explicitly          of XQuery on XML streams without attributes and recur-
handle storing parts of the input document in the memory               sive structures. It is based on a model called XML stream-
buffers for later processing.                                          ing transducer. The processor have been tested on XML
    In this paper we focus on the problem how to enable the            documents of various sizes against a simple query. Using
user to write a tree manipulation specification in the XSLT            XSM the processing time grows linearly with the document
language, and at the same time to process it in a streaming            size, while in the case of standard XQuery processors the
manner automatically. Such automatic streaming processor               time grows superlinearly. More complex queries have not
is supposed to apply the tree-manipulation functions over              been tested.
a continuous stream of data while the buffering is treated                 BEA/XQRL [4] is a streaming processor that implements
automatically. An important issue is to design the processor           full XQuery. The processor was compared with Xalan-J
in such a way that the size of memory buffers is minimized             XSLT processor on the set of 25 transformations and an-
for the given transformation and the input document.                   other test was carried on XMark Benchmarks. BEA pro-
    We describe the implementation of the Xord frame-
work which represents a prototype XSLT automatic stream-                1
                                                                            We use the term XML schema for a general schema for XML
ing processor. The Xord framework is based on the formal                    documents.
                                                                                  adopted to model any XML schema language based on
                                                                                  structure definition.
                          XfAnalyzer                             XfAlgorithm
                                                                                      Furthermore, the framework is complemented by a set
                                                                                  of auxiliary helper classes. The algorithmic part of the API
      XfXsdSsxtAnalyzer                XfTemplateAnalyzer                XfSsxt   supports:
         sch                                              xslt
                                                                                  SsxtAlgorithm algorithm derived from the abstract Al-
                          xslt                   xslt
                                                                                      gorithm model, and
         XfSchema                           XfXslt
       (SchemaModel)                    (TemplateModel)                           XsdSsxtAnalyzer algorithm derived from the abstract An-
                                                                                      alyzer model, and using the Schema Model and the
                                                                                      Template Model.
        SchemaModel
                       Fig. 1. TheTemplateModel
                                    Xord framework.                    Ssxt
                                                                                  The implementation of the above mentioned models are de-
                                                                                  scribed in more detail in following sections.

cessor was fast on small input documents, however, the
processing of large documents was slower since the opti-                          3    XSLT representation
mizations specially designed for XML streams are limited
                                                                                  The Xord framework is currently restricted to process sim-
in this engine.
                                                                                  ple XSLT transformations on XML documents without data
    FluXQuery [7] is a streaming XQuery processor based                           values.
on a new internal query language FluX which extends
XQuery with constructs for streaming processing. XQuery                           Simple XSLT stylesheets. Simple XSLT stylesheet con-
query is converted into FluX and the memory size is opti-                         sists of an initializing template and several transforming
mized by examining the query as well as the input DTD.                            templates. The initializing template sets the current mode
FluXQuery supports a subset of XQuery. The engine was                             to the initial mode m0 and calls processing of the root ele-
benchmarked against XQuery processors Galax and AnonX                             ment of the input document. It is of the form:
on selected queries of the XMark benchmark. The results                           <xsl:template match="/">
                                                                                      <xsl:apply-templates mode="m0"/>
show that FluXQuery consumes less memory and runtime.                             </xsl:template>
    SPM (Streaming Processing Model) [6] is a one-pass
streaming XSLT processor without an additional memory.                            The transforming templates are of the form:
                                                                                  <xsl:template match="name" mode="m">
Authors present a procedure that tries to converts a given                             ... template body ...
XSLT stylesheet into SPM. No algorithm for testing the                            </xsl:template>
streamability of XSLT is introduced, and therefore the class
of XSLT transformations captured by SPM is not clearly                            The template body contains output elements (possibly nested)
characterized.                                                                    and apply-templates calls. Output elements are of the form:
                                                                                  <name> . . . element content . . . </name>

                                                                                  The apply-templates construct has a select attribute
2   Xord Framework
           1/1
                                                                                  that contains selecting expression, and a mode attribute.
                                                                                  <xsl:apply-templates select="selexp" mode="m’"/>
The Xord framework for analyzing and transforming XML
data is implemented on .NET platform. Its application in-                         A subset of XPath expression is allowed in templates -
terface is formed by a set of interface classes for traversing                    they contain child and descendant axis, and select nodes
analyzed data structures. The core of the framework con-                          by name:
sists of these abstract models (see Fig. 1):                                       XPath := Step | Step/XPath
                                                                                   Step := child::name | descendant::name
1. Template Model for transforming templates implemen-
                                                          where name refers to an element name.
    ted by the XfXslt classes,
2. Schema Model for XML schemas implemented by the Xord Template Model. In the Xord framework, XSLT sty-
    XfSchema classes,                                     lesheets are represented by a set of classes, an Xord Tem-
3. Algorithm Model for streaming algorithms implemen- plate Model. Its simplified object structure is depicted in
    ted by the XfSsxt classes,                            Fig. 2.
4. Analyzer Model for static analyzers implemented by the     Each template from the XSLT contains a sequence of
    XfXsdSsxtAnalyzer and XfTemplateAnalyzer classes.     template calls. A template call consists of the parsed XPath
                                                          expression and the template called by the apply-templates
Since the models are abstract, the Template Model may be mechanism. The input template file is parsed into these
adopted to model templates of any template-based XML structures before the analysis. Then the analysis algorithm
transformation language and the Schema Model may be directly traverses the DAG, evaluates the expressions etc.
                                                 XfCall                                                                          XfSchema
                  Step               +   template   : XfTemplate                                                     + map : Dictionary<string,Node>
                                     +   node       : XmlNode
         + Name       : string       +   select     : XfXpath
         + Descendant : bool         +   mode       : string
                                                                                                                                                 map
                                                                   template

                                                                                                              Item               node          Node
                exp                                       calls                                        - node      : Node                  - id : string
                          select
                                                                                                       - minOccurs : int
                                                                     XfTemplate
                                                                                                       - maxOccurs : int
                XfXpath                                     +     match   : string
         + exp : List<Step>                                 +     mode    : string
                                                            +     node    : XmlNode                          children
                                                            +     calls   : List<XfCall>
                                                                                                                        ComplexElement                 SimpleElement
                                         calls                  templates                                        - children : List<Item>


                                                 XfXslt
                                   - templates : List<XfTemplate>
                                   - calls     : List<XfCall>
                                                                                                              XfRegexp              atom          Atom
                                                                                                        + regexp : List<Atom>                + id   : string
                                                                                                                                             + type :
                   Fig. 2. The Xord Template Model.

                                                                                                          Fig. 3. The Xord Schema Model.

4     Hierarchical XML schema representation

We represent an XML schema hierarchically as a schema                                      programmatically, but its application interface is not very
tree. The representation does not depend on a particular                                   useful for parsing and analyzing existing schemas.
schema notation (DTD, XSD). The schema tree consists of                                        Since the schema analysis using standard XML schema
two kinds of nodes:                                                                        DOM model would be very complicated and tangled, we
                                                                                           have designed an Xord Schema Model which is targeted to
    – element nodes: correspond to an element type defined                                 effective representation and analysis of existing schemas.
      within schema                                                                        A simplified object structure of that model is depicted in
    – constructor nodes: correspond to constructors used in                                Fig. 3.
      the schema (sequence, choice, *, +, ?)                                                   The whole schema is represented as an associative ar-
                                                                                           ray of simple or complex type nodes. 1Each
                                                                                                                                    /1    complex node
The relationships among element types and constructors                                     contains a list of references to its child nodes with their
are represented by the structure of the tree.                                              cardinality. Using this recursive structure that form a DAG
    Some subtrees of schema tree may be identical - this                                   (or a tree with one particular node selected as a root), the
situation occurs if we derive the1schema
                                   /1
                                           tree from DTD or                                parsed schema could be easily traversed and processed.
XSD containing shared element types. When designing the
analyzer, the tree representation is more convenient. How-
ever in the implementation of schema-based analyzer each                                   5   Schema-based analyzer
type is represented as a single node and the whole schema
is represented as a DAG (see Schema Object Model be-                                       The schema-based analyzer applies the given XSLT style-
low).                                                                                      sheet xsl to the schema tree xsd, starting at the root node.
    In the schema-based analysis, we consider XML sche-                                    First, let us remind the principles of the XSLT application
mas without the choice constructor and recursive defini-                                   to the XML document tree. Let tmp be the current tem-
tions. Such schema can be represented as a single regular                                  plate of the XSLT stylesheet (at the beginning of the trans-
expression. This representation is useful in the extraction                                formation, it is the template matching the root element in
part of the analyzer algorithm (see Section 5).                                            the initial mode m0 )

Xord Schema Model. Although there are well established 1. The node sequence selected by the XPath expressions
and widely used XML parsers, we have found no suitable           in the rule calls of the current template are found.
parser for XSD. To perform schema manipulation, the .NET      2. The templates called by the rule calls are applied to the
Framework provides a set of classes called the Schema Ob-        selected nodes.
ject Model, or SOM for short. The SOM is for schemas However, in case of the schema tree, a modification of the
what DOM is for XML documents: the SOM classes repre- first step of this simple algorithm is needed:
sent various parts of a schema, for example XmlSchemaSim-
pleType, XmlSchemaElement, there are many other classes 1. All possible node sequences selected by the XPath ex-
that represent attributes, facets, groups, complex types, and    pressions in the rule calls of the current template are
so on. This model is especially useful for creating schemas      found.
                                                                   bool AnalyzeNode(XfTemplate t,XfSchema.Node n) {
 2. The templates called by the rule calls are applied to the           if(t.Empty)
    selected nodes.                                                         return true;
                                                                        XfLastNames li = t.GetLastNames();
                                                                        XfRegexp re = sch.ExtractFragment(n, t);
Since the set of all possible node sequences selected by                if(re.Empty())
XPath expressions in the first step may be infinite, we rep-                return true;
                                                                        if(! sch.Compare(re, li))
resent it in the form of regular expression regexp. Such                    return false;
regular expression is basically a fragment of the schema                foreach(XfCall call in t.calls) {
                                                                            List<XfSchema.Node> ln =
tree, i.e., a set of its nodes (not necessarily connected) which                new List<XfSchema.Node>();
is a fragment of the regular expression representing the                    ln = sch.EvalExp(n, call.select);
                                                                            foreach(XfSchema.Node ni in ln) {
whole schema tree.                                                              AnalyzeNode(call.template, ni);
     The regular expression regexp may contain both ele-                    }
                                                                        }
ment nodes and constructor nodes. It is extracted as fol-               return true;
lows: First, the node sequence selected by the XPath ex-           }

pressions are found in the same way as in the XML docu-                            Fig. 4. The code of the AnalyzeNode function.
ment tree (constructor nodes are skipped). Second, all con-
structors appearing in the branch of the schema tree from
the root to the selected nodes are added to the sequence.          6      Stack-based streaming algorithm
The hierarchy of the nodes is preserved by delimiting the
nodes appearing at the same level of the schema tree by      The stack-based algorithm is based on a formal model called
parentheses.                                                 simple streaming XML transducer (SSXT), therefore we call
     We say that regexp represents possible reading orders   it the SSXT algorithm. The transducer has a single input
of the element names selected by the expressions in tmp,     head that reads the input document sequentially, and a sin-
i.e., the order in which the elements are accessed when a    gle output head that generates the output document sequen-
document defined by the schema xsd is read sequentially.     tially. The SSXT is equipped with a stack to store tempo-
Now let names be a sequence of element names in the or-      rary data.
der they are called in tmp - clearly, the sequence can be         The SSXT takes an input document din and a top-down
constructed statically by examining the last steps of the    XSLT stylesheet xsl as the input. It reads din sequentially
XPath expressions in tmp. The names sequence repre-          in one pass and apply the stylesheet xsl stepwise. First,
sents the processing order of the elements. In case one of   the template matching the root element of din in the initial
the reading orders does not conform to the processing or-    mode m0 is set to be the currently processed template (cur-
der, the order-preservation of the xsl is violated and the   rent template). The processing proceeds in cycles. During
SSXT algorithm is not applicable2 . It is thus only neces-   a single cycle, a single template call of the current template
sary to compare regexp to the names sequence in order to     is processed.
check applicability ot the stack-based algorithm.            Processing cycle. All XPath expression within a template
                                                             are evaluating concurrently. The evaluation is realized by
Implementation. The core of the schema-based analyzer deterministic finite automata (DFA)3 . A single DFA is con-
is the AnalyzeNode function which takes two arguments structed for each expression. When the processing of a tem-
- a template of xsl and a node of the schema tree xsd. plate starts, the sequence of the initial states of DFAs is
It performs the application of the template to the schema pushed on the stack. The input head of SSXT reads the
tree node as described above. Using the Template Model elements of din in document order. When a start-tag is en-
and the Schema Models allows the analyzer algorithm to countered, new sequence of DFAs is computed. Three sit-
be simple and straightforward - see Fig. 4.                  uations may occur:
    The comparison of the regexp to the names sequence
is accomplished by the Compare function. Its implemen- a) new sequence contains no final state - the input head
tation is based on inherent properties of its arguments. In-      continues in evaluation,
stead of an expensive checking of swapping for each pair      b)  new sequence contains a single final state which be-
of names, the predicate is a compound of two simple steps.        longs  to the DFA evaluating the lastly-matched expres-
First, regexp is checked for existence of two distinct names      sion or  an expression located after the lastly-matched
within any ’+’ or ’*’ sequence. Second, the last names in         expression   - the corresponding template call is pro-
names are stripped to those contained in the schema be-           cessed,
                                                               c) new sequence contains a final state which belongs to
ing used, adjacent duplicities are reduced to a single name,
                                                                  the DFA evaluating expression located before the lastly-
and the resulting list is linearly compared to names con-
                                                                  matched expression, or it contains two or more final
tained in regexp. Since each name appearing regexp must
                                                                  states - error.
be contained in names, any difference cause a fail.
                                                                    3
                                                                        We refer the reader to [2] for a more detailed description of
 2
     See [3] for further details.                                       this evaluating method.
                                                          void ProcessDfaSequence(XfXml xml)
In case b), the current cycle configuration (template id, {
matched expression id) is pushed on the stack and new       SIDfaSequence ds = stk.GetDfaSequence();
                                                            switch(xml.currType) {
cycle for processing the called template starts. The cycle    case XmlNodeType.Element:
configuration is popped after the whole called template has    SIDfaSequence new ds = ds.Transition(xml.currName);
                                                               if(!new ds.HasFinalStates()) {
been processed and the control moves back to the current         stk.Push(new ds);
template. In case a), the evaluation continues. Here if an       xml.Advance();
                                                               }
end-tag is encountered, the sequence of the DFA states lo-     else {
cated at the top of the stack is popped. Hence, the XPath ex-    XfCall myCall = new ds.GetCallWithFinalState();
                                                                 currTemplate.Generate(currCall, myCall);
pression of the current template are evaluated on “branches”     XfTemplate calledTemplate =
of din .                                                            xslt.SelectTemplate(xml.currName, myCall.mode);
                                                                 if(calledTemplate.Empty) {
                                                                   calledTemplate.Generate(null, null);
                                                                   currCall = myCall;
Implementation. The implementation of the SSXT algo-               if(xml.laType == XmlNodeType.Element)
rithm uses both Template Model and Algorithm Model classes.         stk.Push(new ds);
                                                                   xml.Advance();
Since the algorithm is stack-based, the main data struc-         } else {
ture used is a polymorphic stack stk of sequences of DFA           stk.Push(new SICycleConfig(currTemplate,
                                                                    myCall));
states (SIDfaSequence) and cycle configurations (SICycle-          currTemplate = calledTemplate;
Config), see Fig. 8.                                               currCall = null;
                                                                 }
    Until the transformation is finished the top of stack      }
                                                               break;
is checked and the stack item is processed, see the func-     case XmlNodeType.EndElement:
tion RunSsxt in Fig. 5. In case of an empty stack and          if(xml.laType == XmlNodeType.EndElement)
                                                                 stk.Pop();
nonempty remaining input new DFA sequence is pushed            xml.Advance();
on the stack.                                                  break;
                                                              default:
                                                               stk.Pop();
                                                               break;
void RunSsxt(XfXml xml)                                     }
{                                                         }
    XfTemplate currTemplate = xslt.Start();
    XfCall currCall = null;                                     Fig. 6. The code of the ProcessDfaSequence function used in the SSXT algorithm.
    bool transformed = false;
    while(!transformed) {
      if(!stk.Empty()) {
        switch(stk.Type()) {
          case XfStack.ItemType.DfaSequence:                        The cycle configuration processing (Fig. 7) depends on
           ProcessDfaSequence();                                the current XML node type. A start tag pushes a new DFA
           break;
          case XfStack.ItemType.CycleConfig:                    sequence while an end tag generates output.
           ProcessCycleConfig();
           break;
        }
                                                                void ProcessCycleConfig(XfXml xml)
      } else {
                                                                {
        switch(xml.currType) {
                                                                  SICycleConfig cc = stk.GetCycleConfig();
          case XmlNodeType.Element:
                                                                  switch(xml.currType) {
           stk.Push(new SIDfaSequence(currTemplate));
                                                                    case XmlNodeType.Element:
           xml.Advance();
                                                                     if(xml.laType == XmlNodeType.Element)
           break;
                                                                      stk.Push(new SIDfaSequence(currTemplate));
          case XmlNodeType.EndElement:
                                                                     xml.Advance();
           currTemplate.Generate(currCall, null);
                                                                     break;
           transformed = true;
                                                                    case XmlNodeType.EndElement:
           break;
                                                                     currTemplate.Generate(currCall, null);
        }
                                                                     currTemplate = cc.template;
      }
                                                                     currCall = cc.call;
    }
                                                                     stk.Pop();
}
                                                                     break;
                                                                  }
                Fig. 5. The code of the SSXT algorithm.         }

                                                                 Fig. 7. The code of the ProcessCycleConf function used in the SSXT algorithm.


    The core of the DFA sequence processing (Fig. 6) is
accomplished when start tags of elements are encountered.
A new DFA sequence is generated on the stack in case the
current DFA sequence contains no final states. Otherwise        7    Evaluation
the output is generated and a new cycle configuration is
placed on the stack. In case of a template without calls, its   The evaluation and measurements of the SSXT algorithm
output is generated immediately.                                implementation confirmed our expectation that it requires
                                                                                                 ing algorithm with such buffers and we obtain much more
              XfCall
          (TemplateModel)
                                                                            XfTemplate
                                                                         (TemplateModel)
                                                                                                 powerful automatic streaming XSLT processor.
                                              DfaItem
    +   template   : XfTemplate                                     +   match   : string
    +   node       : XmlNode       call
                                          + call : XfCall           +   mode    : string         Acknowledgments. This work was supported in part by
    +   select     : XfXpath                                        +   node    : XmlNode
    +   mode       : string                                         +   calls   : List<XfCall>   the National programme of research (Information society
                                                                  template
                                                                                                 project 1ET100300419).
                     call
                                             items


              SICycleConfig                      SIDfaSequence
        + template : XfTemplate            + template : XfTemplate                               References
        + call     : XfCall                + items    : List<DfaItem>


                                                                              1. O. Becker. Transforming XML on the Fly. In Proceedings
                              cc
                                            ds                                   of XML Europe 2003, 2003.
                         StackItem                            XfStack         2. Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer.
                  - cc : SICycleConfig     stack     - st : Stack<StackItem>     Path sharing and predicate evaluation for high-performance
                  - ds : SIDfaSequence
                                                                                 XML filtering. ACM Trans. Database Syst., 28(4):467–516,
                                                          stk
                                                                                 2003.
                                               XfAlgorithm
                                                                              3. J. Dvořáková. Towards Analyzing Space Complexity of
                                            # xslt : XfXslt                      Streaming XML Transformations. In The Second IEEE Inter-
                                            # stk : XfStack
                                                                                 national Conference on Research Challenges in Information
                                                                                 Science. IEEE Computer Society, 2008.
                 Fig. 8. The Xord SSXT Model.                                 4. D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi,
                                                                                 T. Westmann, M. J. Carey, A. Sundararajan, and G. Agrawal.
                                                                                 The BEA/XQRL Streaming XQuery Processor. In Proceed-
                                                                                 ings of VLDB 2003, pages 997–1008, 2003.
a memory proportional to a depth of the input XML doc-                        5. A. Frisch and K. Nakano. Streaming XML Transformations
                                                                                 Using Term Rewriting. In Proceedings of PLAN-X 2007,
ument. Since most documents are relatively shallow, our
                                                                                 2007.
memory requirements are independent to the document size.
                                                                              6. Z. Guo, M. Li, X. Wang, and A. Zhou. Scalable XSLT Eval-
Even for huge documents like DBLP, the SSXT algorithm                            uation. In Advanced Web Technologies and Applications,
required few hundreds KB while the commonly used XSLT                            LNCS 3007/2004. Springer Berlin / Heidelberg, 2004.
processors like Saxon or Xalan crashed or hanged after al- 7. C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier.
locating about 1.5 GB of memory.                                                 FluXQuery: An optimizing XQuery processor for streaming
                                                                                 XML data. In VLDB’2004: Proceedings of the Thirtieth
                                                                                 International Conference on Very Large Databases, pages
8 Conclusion and future                1/1
                                              work                               1309–1312, 2004.
                                                                              8. B. Ludäscher, P. Mukhopadhyay, and Y. Papakonstantinou.
                                                                                 A Transducer-Based XML Query Processor. In Proceedings
We have presented a prototype implementation of the Xord
                                                                                 of VLDB 2002, pages 227–238, 2002.
framework which represents an automatic streaming pro- 9. I. Mlýnková, K. Toman, and J. Pokorný. Statistical Analysis
cessor for the XSLT language. It incorporates a powerful                         of Real XML Data Collections. In COMAD’06: Proc. of the
schema-based analyzer which, for a given XSLT transfor-                          13th Int. Conf. on Management of Data, pages 20–31, New
mation xsl and an XML schema xsd, analyzes memory                                Delhi, India, 2006. Tata McGraw-Hill Publishing Company
requirements of the streaming processing of xsl on a set                         Limited.
of XML documents defined by xsd. The analyzer employs 10. W3C. XSL Transformations (XSLT) Version 1.0, W3C Rec-
a special hierarchical model of XML schema called Xord                           ommendation, 1999. http://www.w3.org/TR/xslt.
Schema Model. We have implemented the Xord framework 11. W3C.                                   XML Schema Part 1: Structures Sec-
on .NET platform for a specific streaming processing using                       ond       Edition,    W3C       Recommendation,         2004.
stack of the size proportional to the depth of the input XML                     http://www.w3.org/TR/xmlschema-1.
                                                                             12. W3C.           XML Schema Part 2: Datatypes Sec-
document.
                                                                                 ond       Edition,    W3C       Recommendation,         2004.
    Our schema-based analyzer is restricted in several as-                       http://www.w3.org/TR/xmlschema-2.
pects - first, a subset of XSLT and XML schema defini- 13. W3C. XQuery 1.0: An XML Query Language, W3C Recom-
tions is considered, and second, it currently gives us only                      mendation, 2007. http://www.w3.org/TR/xquery.
true/false answer whether the stack-based processing is ap-
plicable. However, we intend to extend it in the future re-
search - if we examine particular pairs of elements for which
the comparing function returns false and the possible size
of their content, we may compute exact size of the mem-
ory buffers needed for processing such elements. Then it
is only necessary to extend the basic stack-based stream-