QuickXDB: A Prototype of a Native XML QuickXDB: A Prototype? of a Native XML DBMS ? DBMS Petr Lukáš, Radim Bača, and Michal Krátký Petr Lukáš, Radim Bača, and Michal Krátký Department of Computer Science, VŠB – Technical University of Ostrava Department of Computer Science, CzechVRepublic ŠB – Technical University of Ostrava Czech Republic {petr.lukas, radim.baca, michal.kratky}@vsb.cz {petr.lukas, radim.baca, michal.kratky}@vsb.cz Abstract. XML (extensible mark-up language) is a well established for- mat which is often used for semi-structured data modeling. XPath and XQuery [16] are de facto standards among XML query languages. There is a large number of different approaches addressing an efficient XQuery processing. The aim of this article is to introduce a prototype of an XML database called QuickXDB integrating these state-of-the-art approaches. We primarily describe main concepts of QuickXDB with stress on our XQuery processor. Furthermore, we depict main challenges such as iden- tification of patterns in an input query. Our experiments show strengths and weaknesses of our prototype. We outline our future work which will focus on a cost-based optimization of a query plan. 1 Introduction XML data model is often considered as a useful alternative to a traditional relational data model. XML data model is more flexible and maintenance of an XML structure is more simple if we work with a very dynamic structure of a database. Therefore, an efficient XML database can be a very useful tool in many real-world projects. Many different approaches have been developed in recent years [1, 5, 6, 12, 18], however, putting all these ideas into a working XML database represents a challenging task with unsolved problems. Query languages for XML databases such as XPath and XQuery [16] are considered as a de-facto standard. Our work focuses on an efficient processing of XQuery which includes semantics of XPath. There exist several query models which express a core functionality of XQuery. A twig pattern query (TPQ) is the most simple model used by many approaches [1, 5, 9, 17, 19]. A TPQ is rep- resented by a rooted labeled tree (see the TPQ in Figure 1(c)). There is also a more general query model called GTP which includes more semantics of the XQuery language, such as semantics related to the output formatting, boolean expressions or optional parts of a query. These query models represent an impor- tant abstraction which enabled designing many efficient algorithms [1, 5, 9, 17, 19] that are independent of a semantic details of an XML query language. However, correct identification of a TPQ or GTP in a XQuery is not straightforward [12]. ? Work is partially supported by Grant of SGS No. SP2013/42, VŠB-Technical Uni- versity of Ostrava, Czech Republic. V. Snášel, K. Richta, J. Pokorný (Eds.): Dateso 2013, pp. 36–47, ISBN 978-80-248-2968-5. QuickXDB: A Prototype of a Native XML DBMS 37 a1 a b1 b3 b4 b #a b2 c1 d2 c2 a4 d3 a b c d #c #b a2 d1 c3 f2 c4 a d c f c #f #a' a3 f1 x1 f3 a f x f (a) (b) (c) Fig. 1. (a) XML tree (b) DataGuide (c) TPQ We can find three main areas in the XQuery processing: (1) index data struc- tures and XML document partitioning, (2) join algorithms for a TPQ/GTP pro- cessing, and (3) query algebras and cost-based optimizations. Techniques from different areas are often closely related. For example, join algorithms usually need a specific type of index for its efficient run. In this paper, we describe a prototype of an XML database called QuickXDB from the perspective of all these three areas. We describe key concepts of our prototype which enable fast XQuery processing. Main challenges addressed by our prototype are as follows: – Implementation of an XQuery algebra enabling creation of a query plan, query plan rewritings, and transformation to a physical query plan; – correct identification of TPQs (more precisely GTP) in a query plan and incorporation of fast and optimal algorithms for the GTP processing; – cost-based optimizations of a physical query plan that are dependent on XML document statistics as well as on indices available in a database. The paper is organized as follows. In Section 2, we define basic terms. In Section 3, we summarize basic types of data structures and indices that are available in our database. Section 4 introduces main concepts included in our XQuery processor. In Section 5, we show results of our experiments where we compare efficiency of QuickXDB with other common XML databases. In Sec- tion 6, we summarize our results and outline our future work on our prototype of an XML database. 2 Preliminaries An XML document can be modeled as a rooted, ordered, labeled tree, where every tree node corresponds to an element or an attribute of the document and edges connect elements, or elements and attributes, having the parent-child relationship. We call such a representation of an XML document an XML tree. In what follows, we simply write ‘node’ instead of the correct ‘tree node’ or ‘data node’. An example of the XML tree is shown in Figure 1(a). 38 Petr Lukáš, Radim Bača, Michal Krátký The DataGuide tree [8] is a labeled tree where every labeled path from the XML tree occurs exactly once there. Figure 1(b) depicts an example of a DataGuide for the XML document in Figure 1(a). We assign a label to every node of an XML tree. Node labels allow us to resolve basic XPath relationships between two nodes during the query processing. There are two types of labeling schemes: (1) element scheme (e.g., containment scheme [19] or Dietz’s scheme [7]) and (2) path scheme (e.g., Dewey order [15]). The main feature of labels using a path labeling scheme is that we can extract labels of all ancestors and that they are more flexible during updates. A twig pattern query (TPQ) can be modeled as an ordered rooted tree. Single and double edges between two query nodes represent parent-child (PC) and ancestor-descendant (AD) relationships, respectively (see an example of a TPQ in Figure 1(c)). 3 Indices node node partition label label B-tree B-tree (a) (b) Fig. 2. (a) Document index (b) Partition index There are two basic type of indices: (1) that having a node label as a key (document index) and (2) that having a node name as a key (partition index). Nodes corresponding to one node name are sorted according to the node label in the case of a partition index. An illustration of these indices can be found in Figure 2. From the query processing perspective, a document index is very useful if we have a very small intermediate result and we want to use it to process the rest of a query. This type of the query processing can be considered as nav- igational [11]. However, many join algorithms are based on a partition index. These joins mainly focus on a merging during one sequential read of lists which removes irrelevant nodes. Selection of an appropriate index will be part of future cost-based optimizations [2]. QuickXDB contains also DataGuide, that can be used to speed up the query processing. The main idea is to preproccess the TPQ in a DataGuide and use the result to decrease a search space of a partition or a document index [3]. QuickXDB: A Prototype of a Native XML DBMS 39 4 XQuery algebra In this section, we present a brief introduction to the techniques we use to evaluate real XQuery queries, not only stand-alone TPQs. We use a modified XQuery algebra proposed in [13]. 4.1 Processor architecture There is a block schema of the processor in Figure 3. The traditional first step is to load and parse the input query. The first step yields a syntax tree1 based on the XQuery grammar standardized by W3C. The next conventional step is usually a normalization phase [12], but we adapted the compilation rules so that the query is directly compiled into the tree of operators (see Figure 5). At the current prototype state, each operator has exactly one evaluating algorithm, so the operator tree has the same meaning as a physical query plan and we call it simply a query plan. However, a modular architecture of the prototype does not give us any limitations of using cost-base optimizations to select one of more low-level algorithms evaluating an operator in the future (see Section 6). typed operator tree syntax tree operator tree XQuery Parsing Compiling Static typing Optimization optimized operator tree Static Query Evaluation Result preprocessing preprocessed unnesting preprocessed optimized optimized operator tree unnested operator tree Fig. 3. Block structure of the processor A compiled query plan is statically typed. The static typing is an important feature for some of the rewriting rules in the next optimization phase (see Section 4.5). None of static pre-computations uses the input XML document. During the optimization a set of rewriting rules is repeatedly applied on the query plan. Each rule searches for a specific pattern with some specific properties and replaces it by a single operator or a subtree of operators. This phase is also responsible for searching the largest possible TPQs in the query plan (see Section 4.6). The crucial attribute of all rewritings is that they are transparent from the result perspective. After applying a rewriting rule, output types of operators have to be reevaluated. Before the evaluation is done, all the operators are statically preprocessed. For example, we can evaluate a pointer to the algorithm of a function call ac- cording to the name of the function. Before the final step we also perform a 1 Syntax tree is an ordered, rooted, labeled tree, where the nodes stand for terminal / non-terminal symbols of a formal grammar of a language such as XQuery. 40 Petr Lukáš, Radim Bača, Michal Krátký query unnesting (see Section 4.4). Finally, the optimized and unnested query plan is evaluated. 4.2 Algebra A detailed description of an algebra which we use can be found in [13]. Here we only mention some important characteristics to make the following examples clear. The algebra operates over two different types of sets: (1) a set of sequences containing items (atomic values or XML nodes), (2) a set of tables containing tuples. Tuple components are formed of single items or sequences. Furthermore, we have three groups of operators. Operators over sequences, operators over tables, and hybrid operators over both types of sets. Each operator can have three groups of arguments: independent suboper- ators, dependent suboperators, and static attributes. Evaluation of the inde- pendent suboperators is usually the first step of evaluation algorithms of all operators. The purpose of the dependent suboperators depends on a purpose of particular operator. For example, the Selection operator has one independent suboperator computing its input table and one dependent suboperator deciding which tuples of the input table will be passed to the output. Example 1. Let us consider the example in Figure 5. There are two possible plans of the Q1 query from Figure 4. Both plans lead to the same result. P1a is obtained after the optimization phase without using TPQ rewritings. If we include TPQ rewritings, we obtain the P1b plan. If we run the query against the XML tree from Figure 1, we get the XML node c1 as the result. Q1 Q3 for $x in //b, $y in $x//c for $x in //item where $x//d and not($y/x) where $x/@id = max(//item/@id) return $y return $x/name Q2 Q4 for $x in //closed_auctions/closed_auction for $x in //europe/item[mailbox/mail/date = "08/05/1999"] for $y in //people/person [description//parlist//parlist][1] where $x/buyer/@person = $y/@id return $x return { $x/type/text() }, { $y/name/text() } Fig. 4. Example queries Now let us discuss the P1a plan to outline how the evaluation works. The instances of operators are distinguished by subscripts. Static attributes of oper- ators are given in square brackets. Solid lines represent bindings to the indepen- dent suboperators, dashed lines stand for bindings to the dependent ones. The intermediate results of key operators can be seen above them. The subtree starting with MapFromItem1 represents the first for loop of the query Q1. It computes a single-column table with all XML nodes of the name b from the input XML. The Select1 stands for the $x//d part of the where clause. QuickXDB: A Prototype of a Native XML DBMS 41 P1a P1b x y MapToItem1 MapToItem x y b1 c1 b1 c1 x y TupleAccess4 TupleAccess Select2 b1 c1 [IN#y] TupleTreePattern [IN#y] b4 c4 x b1 #b $x MapConcat1 b2 x b4 #d #c $y b1 b2 ¬ b3 Select1 Call3 #x b4 [boolean] Call MapFromItem1 Call2 MapFromItem2 Call3 [root] [boolean] [not] TreeJoin1 Tuple1 TreeJoin2 TreeJoin3 Tuple2 TreeJoin4 [descendant::b] [x: IN] [descendant::d] [descendant::c] [y: IN] [child::x] Call1 TupleAccess1 TupleAccess2 TupleAccess3 Independent suboperator [root] [IN#x] [IN#x] [IN#y] Dependent suboperator Fig. 5. Example of query plans Select1 restricts the table from the previous step to only those rows where there exists a d descendant for a particular b. The MapConcat1 operator represents the second loop of the for clause. MapConcat is a crucial operator for this algebra. It evaluates its dependent suboperator for all tuples of the independent input table and joins the evaluated semi-results to the tuples of which they have been computed. Then we can evaluate the not($y/c) part of the where expression using the Select2 operator. The final result is evaluated by the MapToItem1 operator. It selects all values of the y column of the input table since we refer to the $y variable in the return clause. 4.3 Relational rewritings We can see in the P1a plan that there is no operator evaluating the and boolean expression of the where clause. The and expression is divided into two separated selections: Select1 and Select2 . We can push the selection operators in order to evaluate them as early as possible. This is a well-known rewriting rule from the relational databases. Application of such rule has two advantages: (1) evaluation of the selection as early as possible speeds up evaluation of the entire query, (2) this rewriting extends the possibilities of locating TPQ patterns in the query plan. There are some other important relational rewritings enabling us to use tradi- tional join operators. In the Q2 query, we can see two independent for loops and a typical joining where clause. If we use some of the advanced joining algorithms such as hash-join or merge-join, we can significantly speed up the query evalu- ation. Selection of the proper joining algorithm can be ensured by a cost-based optimization (see Section 6). 4.4 Query unnesting Note that Q3 is a simple query returning a name of an item with the maximum id. The query can be run over the XMark testing documents [14]. The where 42 Petr Lukáš, Radim Bača, Michal Krátký clause of Q3 will be compiled into a Selection operator. Such operator evaluates the restricting condition over all input tuples carrying individual items in our case. There is a relatively complicated expression computing the maximum id of all items in the right hand side of the equality comparison. Query unnesting means that since an expression is not dependent on any context value (e.g., the $x variable of the for clause), it can be evaluated only once because its resulting value is always the same. 4.5 Static typing The static typing step evaluates types of output values of operators in a plan wherever it is possible. Static typing is a crucial feature making some important static rewritings possible. For example, Q4 from Figure 4 contains all three possible types of XPath predicates2 . The first predicate selects items matching a given equality boolean expression. The second one selects such items where a sequence of XML nodes in the predicate is not empty. The third one passes only the first item matching the above two predicates. Since the purpose of a predicate depends on the type of its expression, there has to be an alternative construction in the unoptimized query plan taking the type into account. However, the output type can be statically evaluated in many cases. Therefore, we are able to rewrite the query plan and use directly one branch of an alternative operator. Moreover, since there is a possibility to query the order of a node (the last predicate), we need to add (or more technically map) an extra column containing sequential numbers. Such column represents a special positional variable. If we simplify the query plan knowing we surely do not access the positional variable (the first two predicates), we are able to remove the positional mappings. There are several other rewritings using the statically pre-evaluated types. The goal is to simplify the query plan in order to be able to locate patterns that can be rewritten into a form of TPQ. 4.6 TPQ rewritings There is still a gap between XQuery algebras and algorithms evaluating TPQs. A lot of algebras have been proposed, but the tree patterns are not supported by them. Only several approaches such as [12] are oriented to search tree patterns in XQuery queries. A static rewriting of a query plan is the only method how we detect TPQs or more precisely GTPs. Unlike [12], we do not perform any rewritings of the query before the compilation phase. There is a set of rewriting rules searching for the largest possible subtree of operators in the query plan and replacing it by a single tree pattern. A special operator TupleTreePattern ensures the use of holistic join algorithms. This operator has been already proposed in [12], but our implementation uses a GTP as its static attribute instead of a linear sequence of path steps. 2 An XPath predicate is a restricting condition written in square brackets. QuickXDB: A Prototype of a Native XML DBMS 43 The rewriting rules are based on two basic principles: (1) replace single Tree- Join 3 by TupleTreeJoin with simple tree patterns, (2) merge individual Tuple- TreeJoins together. Let us consider the TPQ of the TupleTreePattern operator in the P1b plan. All output (circled) query nodes have a reference to the corresponding compo- nent of output tuples denoted by $q, where q is the name of the component. We can see that the single TupleTreePattern operator in the P1b plan is able to provide the same functionality as the whole subtree of the Select2 operator in the P1a plan. Since the TupleTreePattern uses the GTPStack [4] holistic algorithm, the evaluation of P1b is much more efficient for large XML documents than the evaluation of P1a. 5 Experiments In this section, we compare QuickXDB with some other commonly used XML databases. We choose two standard relational databases Oracle 11g4 and Mi- crosoft SQL Server 20125 , three native XML databases Monet DB46 , Saxon 9.47 and BaseX48 . Both Microsoft SQL Server and Oracle database have the possi- bility of indexing XML. Oracle database supports one type of XML index, SQL Server supports one type of primary and three types of secondary XML indices. Also BaseX provides optional XML indexing. Our QuickXDB is able to work with or without any XML index and persistent data structures. In what follows, we write indexed or non-indexed QuickXDB. For testing we choose XMark (1.1 GB) [14] and TreeBank9 (86 MB) data collections and 20 XQueries (10 queries per each collection). A complete set of chosen XQueries can be found in [10]. The queries were divided into two groups according to their purpose: (1) structure oriented queries and (2) content oriented queries. All experiments were performed on a machine with Intel Xeon E5-2690@2.9GHz processor and Microsoft Windows Server 2008 R2 Datacenter (SP1) operating system. The time of a query execution was the main factor we were focused on. Every query was run 5 times for each database and each indexing variant. A measured time includes both query execution and query preprocessing (parsing, compiling, etc.). The results are given in Table 1. Each value represents an arithmetic mean of 3 measured times (without the worst and the best case) in seconds. 3 A TreeJoin operator performs elementary path steps in an XML document. 4 http://www.oracle.com/products/database/ 5 http://www.microsoft.com/sqlserver/ 6 http://monetdb.project.cwi.nl/monetdb/XQuery/ 7 http://saxon.sourceforge.net/ 8 http://www.basex.org/ 9 XML Data Repository, http://www.cs.washington.edu/research/xmldatasets/ www/repository.html 44 Petr Lukáš, Radim Bača, Michal Krátký Oracle Oracle SQL SQL SQL Saxon Monet BaseX BaseX Quick Quick Server Server Server DB XDB XDB wo/index w/index wo/index prim/idx sec/idx wo/index w/index non-idx indexed Structure oriented XMark queries XM1 7.509 DNF DNF 201.57 DNF 0.374 0.171 1.129 1.203 - 0.066 XM2 DNF DNF DNF DNF DNF MEM E DNF DNF - 0.076 XM3 DNF DNF DNF DNF DNF MEM E DNF DNF - E XM4 29.272 DNF DNF DNF DNF 0.837 0.52 1.468 1.444 - 0.47 XM5 24.067 DNF DNF DNF DNF 0.853 0.676 1.843 1.945 - 0.477 Content oriented XMark queries XM6 DNF DNF DNF DNF DNF DNF 0.394 DNF 18.877 - 10.051 XM7 E DNF - - - 2.694 - DNF DNF - 8.86 XM8 DNF DNF DNF DNF 51.6 52.942 0.226 252.459 3.141 - 6.271 XM9 E E DNF DNF DNF 9.048 2.079 9.044 9.409 - 3.829 XM10 DNF DNF - - - DNF 0.184 DNF DNF - E Structure oriented TreeBank queries TB1 8.345 DNF DNF DNF DNF 0.385 0.317 1.799 1.808 0.93 0.043 TB2 8.233 DNF DNF DNF DNF 0.494 0.313 2.543 2.655 0.504 0.027 TB3 84.59 DNF 247.662 116.554 358.584 0.281 0.335 11.083 11.246 1.13 0.045 TB4 11.561 DNF DNF DNF 148.258 0.499 0.447 2.448 2.239 0.695 0.079 TB5 2.199 DNF 7.072 2.101 407.059 0.759 0.234 1.880 1.724 0.66 0.38 TB6 19.895 DNF E E E 0.25 0.33 7.385 7.242 0.493 0.427 TB7 10.947 DNF E E E 1.492 0.383 8.464 8.308 0.846 0.74 TB8 DNF E - - - 0.374 1.336 3.058 3.228 0.947 DNF Content oriented TreeBank queries TB9 E DNF - - - 1.347 0.358 4.897 5.104 0.479 1.746 TB10 1.08 DNF 4.233 2.303 2.189 0.26 0.332 1.236 1.264 0.725 0.52 Table 1. Execution times [s] The DNF symbol means that a query execution exceeds 10 minutes, E means that a query execution finished with an error. The MEM shortcut stands for the cases when we had to manually stop a query execution due to the protection of the server operating system because of using unacceptably high amount of an operating memory (over 50 GB). A hyphen mark means that a query could not be run because of an unsupported construction or a database was not able to load a data collection. 5.1 XMark queries The 1GB XMark collection is a representative of large XML documents. This kind of documents are a problem for memory-oriented10 processors such as Saxon and non-indexed QuickXDB. Saxon processor was able to load the XMark col- lection, but it consumed over 6 GB space of operating memory. Since the non- indexed QuickXDB (without using persistent structures) is limited to use up to 2 GB of operating memory, it was not able to load the entire XMark document. Let us see the results of the structure oriented queries (XM1 – XM5) in Table 1. Oracle was able to process 3 of the queries without using index, SQL Server processed only 1 of them using primary XML index. The most problematic query seemed to be XM3 which could not be processed by any of the processors due to the unacceptable high query result. 10 Memory-oriented processors load the entire XML document into the operating mem- ory and do not use any persistent data structures. QuickXDB: A Prototype of a Native XML DBMS 45 We can observe that indexed QuickXDB outperforms all the other databases for every structure oriented query. The key reason is a detection of a GTP in an XQuery and application of the GTPStack holistic algorithm. The XMark testing documents contain human-readable data, so querying content may be desirable. Since the current prototype of indexed QuickXDB do not use any content-based index, it performs many random disk accesses. That is the main reason, why the indexed QuickXDB is slower than Saxon and MonetDB in the most content oriented queries (XM6 – XM10). 5.2 TreeBank queries TreeBank is a relatively small XML document (86 MB) with complex recursive and irregular structure. TreeBank can be loaded into the operating memory by any of the memory oriented query processors. TreeBank does not have human-readable data so the most of the queries (TB1 – TB8) are structure oriented, where holistic algorithm ensures the fast evaluation. Only 2 of the TreeBank queries (TB9 – TB10) are content oriented, where the non-indexed QuickXDB can give a better performance when compared to indexed QuickXDB. 5.3 Comparison results We can find four graphs showing relative results of the indexed QuickXDB compared with the other processors in Figure 6. Each value is computed as 100 − 100G, where G is a geometric mean of relevant quotients txdb /tother , where txdb is an execution time of the indexed QuickXDB processor and tother is an execution time of a compared database for the same query. For instance, a value 90 means that the indexed QuickXDB runs ten times faster than the compared database. Generally we can say that our prototype implementation of indexed Quick- XDB runs evidently faster on structure oriented queries. As mentioned before, the key reason is the detection of TPQs and evaluating them by an GTPStack holistic algorithm. 6 Conclusion and future work In this article, we describe a prototype of our XML database called QuickXDB. We outline the core data structures representing our database and we focus on a description of the XQuery algebra that support a query processing. XQuery al- gebra enables integration of the state-of-the-art techniques such as holistic joins with other common techniques well known from relational databases. We per- formed an experiment, where we compare QuickXDB with two major database systems and two native XQuery processors. QuickXDB outperforms all databases for structure oriented queries. As expected, QuickXDB was less successful for 46 Petr Lukáš, Radim Bača, Michal Krátký [%] 100 [%] 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 (a) (b) 100 100 [%] [%] -100 50 -300 0 -500 -50 -700 -100 -900 -150 -200 -1100 (c) (d) Fig. 6. Comparison of indexed QuickXDB with other XML databases. (a) Structure oriented XMark queries (b) Structure oriented TreeBank queries (c) Content oriented XMark queries (d) Content oriented TreeBank queries content oriented queries, however, these queries can be accelerated by a com- mon content-based index. Our algebra contains wide set of rewritings which will support cost-based op- timizations in a future extension of the prototype. The major substance missing in our solution are statistics that could help us to automatically select appropri- ate query plan using all available indices. References 1. S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In Pro- ceedings of ICDE 2002, pages 141–152. IEEE CS, 2002. 2. R. Bača and M. Krátkỳ. A Cost-based Join Selection for XML Twig Content-based Queries. In Proceedings of the 2008 EDBT workshop on Database technologies for handling XML information on the web, pages 13–20. ACM, 2008. 3. R. Bača and M. Krátký. On the Efficiency of a Prefix Path Holistic Algorithm. In Proceedings of Database and XML Technologies, XSym 2009, volume LNCS 5679, pages 25–32. Springer–Verlag, 2009. 4. R. Bača, M. Krátký, T. Ling, and J. Lu. Optimal and efficient generalized twig pattern processing: a combination of preorder and postorder filterings. The VLDB Journal, pages 1–25, 2012. QuickXDB: A Prototype of a Native XML DBMS 47 5. N. Bruno, D. Srivastava, and N. Koudas. Holistic Twig Joins: Optimal XML Pattern Matching. In Proceedings of ACM SIGMOD 2002, pages 310–321. ACM Press, 2002. 6. S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K. S. Candan. Twig2Stack: Bottom-up Processing of Generalized-tree-pattern Queries Over XML documents. In Proceedings of VLDB 2006, pages 283–294, 2006. 7. P. F. Dietz. Maintaining order in a linked list. In Proceedings of 14th annual ACM symposium on Theory of Computing (STOC 1982), pages 122–127, 1982. 8. R. Goldman and J. Widom. DataGuides: Enabling Query Formulation and Op- timization in Semistructured Databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pages 436–445, 1997. 9. J. Lu, T. Chen, and T. W. Ling. Efficient Processing of XML Twig Patterns with Parent Child Edges: a Look-ahead Approach. In Proceedings of ACM CIKM 2004, pages 533–542. ACM Press, 2004. 10. P. Lukáš, R. Bača, and M. Krátký. QuickXDB: A Prototype of a Native XML DBMS. Technical report No. CS/DBRG/2013-001, 2013, http://db.cs.vsb.cz/TechnicalReports/CS-DBRG-2013-001.pdf. 11. N. May, S. Helmer, and G. Moerkotte. Strategies for query unnesting in XML databases. ACM Transactions on Database Systems (TODS), 31:968 – 1013, September 2006. 12. P. Michiels, G. Mihaila, and J. Siméon. Put a Tree Pattern in Your Algebra. In Proceedings of the 23th International Conference on Data Engineering, ICDE 2007, pages 246–255. IEEE, 2007. 13. C. Re, J. Siméon, and M. Fernandez. A complete and efficient algebraic com- piler for XQuery. In Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, pages 14–14. IEEE, 2006. 14. A. R. Schmidt et al. The XML Benchmark. Technical Report INS-R0103, CWI, The Netherlands, April, 2001, http://monetdb.cwi.nl/xml/. 15. I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and Querying Ordered XML Using a Relational Database Sys- tem. In Proceedings of ACM SIGMOD 2002, pages 204–215, 2002. 16. W3 Consortium. XQuery 1.0: An XML Query Language, W3C Working Draft, 12 November 2003, http://www.w3.org/TR/xquery/. 17. H. Wang, S. Park, W. Fan, and P. S. Yu. ViST: a Dynamic Index Method for Querying XML data by Tree Structures. In Proceedings of the ACM SIGMOD 2003, pages 110–121. ACM Press, 2003. 18. A. M. Weiner and T. Härder. Using Structural Joins and Holistic Twig Joins for Native XML Query Optimization. In Advances in Databases and Information Systems, volume 5739 of LNCS, pages 149–163. Springer - Berlin Heidelberg, 2009. 19. C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In Proceedings of ACM SIGMOD 2001, pages 425–436, 2001.