To Sort or not to Sort: The Evaluation of R-Tree and B +-Tree in Transactional Environment with Ordered Result Set Requirement ∗ c Pavel Fedotovsky George Erokhin Kirill Cherednik Kirill Smirnov George Chernishev Saint-Petersburg University {pavel.v.fedotovsky, george.erokhin, chernishev}@gmail.com, {kirill.cherednik, kirill.k.smirnov}@math.spbu.ru Abstract This work was inspired by participation in ACM SIG- MOD Contest 2012. This problem was provided by the In this paper we consider multidimensional in- contest organizers, as well as benchmarks and example dexing with the additional constraint of lexi- Berkeley DB-based implementation. Our team partici- cographical ordering. In order to deal with pated in this contest and was ranked 5th on public tests2 . this problem we discuss two well-known tree The problem is formulated as follows: given a n- data structures: R-Tree and B-Tree. We study dimensional space and queries in transactional environ- the problem in the transactional environment ment, what kind of data structure should we use for opti- with read committed isolation level. To evalu- mal performance? ate these approaches we had implemented these In order to solve this problem we implemented a pro- structures (modified GiST ensures concurrency) totype of multidimensional transactional index. This and provide extensive experiments. index works within read committed isolation level. Our prototype contains both B + -Tree and R-Tree built 1 Introduction around GiST model. In this paper we consider the problem of multidimen- The contribution of this paper is following: sional indexing with one additional constraint — the lex- icographical ordering of the resultset. Effective multidi- • The validation of our prototypes by comparison mensional indexing is rather old and well-explored topic, with industrial-strength databases: Berkeley DB however, one can’t say that the problem is solved. New and PostgreSQL. approaches continue to emerge. The addition of the or- dering requirement further drives this problem into the • Experimental study of influence of workload pa- domain of research activity. rameters on performance of these two structures. Effective solutions for the problem of multidimen- These workload parameters include query window sional indexing are needed for geospatial data, CAD sys- size and others. tems, multimedia data and also of use for OLAP data. There are two main approaches for multidimensional The rest of this paper is organized as follows. In the indexing: tree-based and hash-based. The former are R- next section we provide detailed specification of the task, Tree, KDB tree, Octree, X-Tree and many others. The describe queries and data. Then, in the section 3 we latter are mainly used for nearest neighboor and similar- describe two alternative approaches and survey related ity query evaluation. works. Section 4 contains overview of our system. In the We are mainly interested in R-Tree because of it’s section 5 we provide evaluations and comparisons with popularity in commercial DBMS systems [4]: Post- PostgreSQL and Berkeley DB. greSQL, Oracle, Informix, SQLite and MySQL use this approach. This interest proves, that despite being rather 2 The Task old (more than 25 years), R-Tree still may be called industrial-strength technology. Moreover, until recently The task offered at the contest was to build a multidimen- R-Tree was the only one method of multidimensional in- sional high-throughput in-memory indexing system. The dexing in PostgreSQL1 . index should support concurrent access by many threads ∗ This work is partially supported by Russian Foundation for Ba- and work within read committed isolation level. The sic Research grant 12-07-31050. index resides in-memory and no crash-recovery compo- Proceedings of the Ninth Spring Researcher’s Colloquium nent is required. on Database and Information Systems, Kazan, Russia, 2013 1 http://www.postgresql.org/docs/9.2/static/spgist. 2 http://wwwdb.inf.tu-dresden.de/sigmod2012contest/ html leaderboard/ 2.1 Queries • No need to sort, because keys are already stored in the right order. There are several possible types of queries: Let’s review the last item. Suppose that we have a • Point queries: insert, update, delete and select. three dimensional index and a query: (1, 2, ∗). In order to evaluate it, we have to find the first entry with pre- • Range queries — they select a subset of data and fix “1|2|” and then sequentially scan the tree until prefix the result should be sorted. This type of query is mismatch. defined by a conjunction of attribute predicates. The However one can name weak points: individual predicates may be not only be intervals or points, but also a wildcards. • We have to pack and unpack the keys with each comparison. The distribution of query types is described in the specification and it can be tuned. • Queries containing interval predicates are harder to Another important aspect to consider is the admissible process. amount of operations per transaction. It is specified, that • This tree may perform poorly with wildcard queries. there are no more than few hundred retrieved points per transaction. In particular, the original task states that no The first one is the minor drawback, its cost may be more that 200 points are touched by any transaction. This negligible. However, the second and the third are more number is justified by the fact that OLTP transactions are formidable ones. very light-weight. For examle, the heaviest transaction The intervals inside attributes can be processed in the in TPC-C reads about 200 records [12]. same manner as above, but additional checks are needed. This results in additional complexity of the implementa- 2.2 Data And Workloads tion. The task statement specifies several datatypes:INT(4), Regarding the third item, consider query (1, ∗, 3). In INT(8) and VARCHAR(512). However, in this work, we order to evaluate it, we have to find the key starting with had to drop VARCHAR (see section 5 for details). The a prefix “1”, then we have to iterate through all values key consists of several attributes of these datatypes. The which have it. It will require a lot more of comparisons, payload is represented by a sequence of bytes. and what is more important, we will be forced to discard The data may come in one of several types of distribu- a lot of value in the middle. Consider the following leaf tions: normal, uniform and zipf (each is applied to coor- level: dinate independently). In our tests we used only uniform one. 1|2|3|, 1|2|4|, 1|2|4|, ..., 1|2|4|, 1|3|3|. Duplicate keys are allowed, we refer the reader to the In this situation we will need only two values: 1|2|3| web site for the detailed handling description. and 1|3|3|. But we would be forced to iterate through In our experiments we heavily rely upon workloads all these values and discard them.The situation becomes and benchmark driver provided by organizers. These grave when we have wilcard condition in the first at- workloads are essentially synthetic datasets. We don’t tribute: (∗, 2, 3). In this case we have to scan the whole reuse workloads used during the contest, instead we use index. the provided framework to define our own. R-Tree is the specialized data structure proposed first Thorough task specification can be found here3 . by Antonin Guttman in [6]. This study prompted a wave of research papers and one can say that it gave birth to 3 Related Work And Architectural Alter- the new area of research. This research related to devel- natives opment of the new R-Tree variants [4, 8, 11], niche ap- proaches [10, 11], split techniques [2, 3, 5], concurrency In order to solve this problem two architectural ap- techniques [7, 9] etc. The study [10] states that there is proaches may be used. The first one is to use B + -Tree more than 100 variants of R-Trees. and concatenate the values of individual coordinates into R-Tree can be thought of as an extension of B-Tree the composite key. The B + -Tree [13] is the balanced for multidimensional indexing. It shares some concepts: data structure, which contains values in the leaf nodes while inner nodes contain pointers and intervals. These • Data are kept in the leaves, too. intervals define the unique path to the leaf. The strong points of this approach are: • This data structure is also balanced. • The overall simplicity of this data structure and gen- • Inner nodes keep bounding boxes, which may be eral easiness for implementation. thought as the generalization of intervals. • The abudance of concurrency control mechanisms The main differences are: for this kind of tree [13]. • There may be more than one path to the key. This is • It is possible to tune one, a lot of cache-conscious the result of bounding box intersection permission. modifications exist. • Node split is unambiguous, determining the optimal 3 http://wwwdb.inf.tu-dresden.de/sigmod2012contest/ node split is a very hard problem. • No link to sibling leaves for easy range query exe- 5.2 PostgreSQL validation and tuning cution. We also compared our implementation with PostgreSQL GiST (Generalized Search Tree) [9] is a “template” v9.1 database system. This step was needed to check the index structure which supports extensible set of queries relative level of achieved performance and general trans- and datatypes. This index can be parametrized by a vari- ferability of results. We implemented a simple wrapper ety of data structures. application which directed queries to PostgreSQL. Post- Unlike B + -Tree based one, this approach would re- greSQL uses a disk-based GiST index, while our proto- quire sorting of the results. This is a significant drawback type is an in-memory one. Also, our prototype lacks a which may negatively impact performance. The goal of logging and recovery component. Thus, in order to con- this paper is to evaluate, which of these approaches is duct fair tests we had to simulate in-memory index in better. Intuitively one can say that the outcome should PostgreSQL. depend on the query selectivity. To completely eliminate slow disk-related operations we placed database cluster on tmpfs. This way we can be sure that every operation PostgreSQL performs (log- 4 System Overview ging, committing, buffers flushing, etc) does not involve Our system follows classical design guidelines and con- interactions with a hard drive. tains several components: Other important implementation aspects included: • A tree data structure. Currently implemented as • Wrapper connection pooling. We used a pool of B + -Tree and R-Tree. R-Tree is based upon GiST connections inside our wrapper to eliminate the cost [7], a popular template index structure including of connection creation every time a transaction is concurrency control techniques. This model allows executed. to extend with the means of concurrent access al- most any tree conforming to certain requirements. • We parametrized GiST with cube data structure. This is a widespread approach and it is used, for ex- • To eliminate overheads related to durability we ample, in PostgreSQL. turned off: fsync, full page writes and synchronous • Concurrency control. We used mechanism adapted commit. Checkpoint segments setting was left in- from [9] with locks, latches and Node Sequence tact. Numbers. Also we provided deadlock resolution • We were forced to abandon string datatype due to mechanism. Eventually, we ensure the read com- PostgreSQL cube restrictions (only float parameters mitted isolation level. However currently our proto- supported). type lacks logging and recovery features. • PostgreSQL runs in read committed isolation level • Memory manager. It is a well-know fact that a by default. standard memory manager can’t provide optimal performance for the whole range of applications Unfortunately, due to several reasons, we were not able and sometimes it is desirable to find or implement to completely approach the performance of our system. a specifically-tailored one. Our memory manager First, unlike BDB, PostgreSQL needs to maintain not is essentially a wrapper which intercepts new and only the index, but also a table. Second, calls to Post- delete calls to make use a pool of free blocks. greSQL via connections are less effective than the direct • Sorting of the results. In order to solve the prob- function calls. The last issue is the security checks which lem one must present lexicographically sorted re- were also left intact. sults. While B + -Tree provides already ordered re- sults, R-Tree does not. Our R-Tree implementation 5.3 Hardware and software setup sorts the results via merge-sort (we keep sorted data For the first group of experiments (comparison with Post- inside boxes). greSQL and Berkeley DB) used the following hardware and software setup: • Deletion of records. In our implementation we don’t delete records, instead, we mark them as • Intel Core i7-2630QM, 2.00 GHz, Hyper-Threading “deleted” and take this into account during the pro- Enabled, L1 Cache 64KB, L2 Cache 256 (per core), cessing. L3 Cache 6MB, 6GB RAM 5 Validation and Experiments • x86 64 GNU/Linux, kernel 3.5.0-21, gcc 4.7.2 5.1 Validation • PostgreSQL 9.1.7 We validated our implementation in two ways. First, we The second group used the more performing one: used public unit-tests supplied by the contest organizers. These unit-tests ensured correctness of an isolation level • Hardware: 2 x Intel Xeon CPU E5-2660 0 @ (read committed) implementation and several other im- 2.20GHz, 64GB RAM, MB S2600GZ plementation issues. We also extended basic set of test cases with new ones. Then, our implementation partici- • Software: Linux Ubuntu 3.2.0-29-generic x86 64, pated in the contest [1]. GCC 4.6.3 5.4 Comparison with PostgreSQL and Berkeley DB Tree type 2 4 6 8 R-Tree (64MB) −0.43 −0.22 −0.10 −0.01 In this section we provide a comparison of our prototypes R-Tree (512MB) −0.61 −0.25 −0.11 −0.01 with industrial strength systems. The wrapper for Berke- B + -Tree (64MB) 0.50 0.69 0.62 0.55 ley DB was provided by the organizers, PostgreSQL B + -Tree (512MB) 0.49 0.70 0.60 0.40 wrapper was developed by the authors (it’s architecture was described earlier). We compare the performance var- ing the number of dimension and use single 64MB index. Table 1: Parameter values for R-Tree and B + -Tree The query type distribution is the same as in the original contest task, uniformly distributed data was used. • The considered query type affects the performance We can see: of the systems in the following way: the perfor- mance of R-Tree degrades as the value of query se- • Our prototypes are comparable to industrial ones in lectivity decreases, while at the same time B + -Tree terms of overall performance. performance increases. • The solution which uses R-Tree significantly differs • As the number of dimensions increases, the expo- from B + -Tree in terms of performance. This dif- nent b changes in the way shown in the Table 1. In- ference has prompted us into further investigation, creasing the dimensionality leads to b decrease in which resulted in this paper. case of R-Tree, i.e. having more dimensions lowers impact of query selectivity. There is no manifested trend in B + -Tree behaviour. 5.5 Experimental Evaluation • The following hypothesis can be advanced: the ex- The goal of this paper is to evaluate, what is better: to ponent in power-law does not depend on size of the use R-Tree and to sort or not to sort with B + -Tree, but index, only on dimensionality. To prove this hy- risk excess comparisons. pothesis more thoroughful investigation is needed. In order to solve this problem we had conducted a se- ries of experiments. In these experiments we evaluate • There is no simple way to determine intersection the performance of two systems, while varing the query point of R-Tree and B-Tree, it depends on number selectivity. We separately consider the following dimen- of dimensions and index size. sions: 2, 4, 6, 8. We had considered indexes of two sizes: 64 and 512 MB, uniform data distribution. We concen- trate on the most interesting query type, which present in the original contest workload: a range query without 6 Conclusions wildcard predicates. These experiments were conducted In this paper we have considered the problem of multi- using our prototypes, which we had described in the pre- dimensional point indexing and under condition of addi- vious section. The reason of this switch is the time it tional restriction: ordering the results. We have exper- takes to construct an index by PostgreSQL DBMS and imentally evaluated two data structures — R-Tree and also the query plan problem. The plans which are gen- B + -Tree on uniformly distributed data. The experiments erated by the optimizer are essentially the following: at allowed us to establish the impact of the query selectivity first, perform index scan (e.g. read all R-Tree boxes), on system performance as power function. Also we ex- then sort the results. It is impossible to push down sort- amined the dependency of power-law parameters on di- ing in PostgreSQL because it’s GiST selection method mension. As a future work we will provide more empiri- doesn’t uses merge-sort. This is a critical drawback, be- cal evidence to the hypothesis of independance of power- cause in our task we select at most 200 entries. Thus, law exponent on index size. Recommendation for B-Tree our prototype can read only a part of the data and don’t and R-Tree user: unfortunately, we were not able to find sort all the content of the touched boxes. The query plan an easy way to calculate intersection point, so workloads problem is not an issue in BDB, because of the simplicity should be evaluated ad hoc. of BDB and the fact that B + -Tree is already sorted. The results are presented on Figures 3-6. Note the double logarithmic scales, which we used in 7 Acknowledgements order to illustrate our finds. They are the following: We would like to thank organizers of ACM SIGMOD Programming Contest’12 for providing a benchmark, • The throughput of the system depends on a query data generator, unit tests and Berkeley DB wrapper im- selectivity. This dependence can be described by plementation. This work is partially supported by Rus- the power law: sian Foundation for Basic Research grant 12-07-31050. P = a ∗ Sb, References where P denotes the throughput, S — query selec- tivity, a and b are parameters. The graphs show this [1] ACM SIGMOD Programming Contest’12. kind of dependency by the straight line. This ap- http://wwwdb.inf.tu-dresden.de/ proximately linear dependency persists in all con- sigmod2012contest/. Last accessed: sidered dimension sizes. 11/09/2012. 105 R-Tree (PostgreSQL) + Throughput (transactions per second) R-Tree (prototype) × × 104 + + + × × + × 103 2 4 6 8 Dimension Figure 1: Performance of PostgreSQL and our prototype (R-Tree). 106 × B-Tree (Berkeley DB) + Throughput (transactions per second) B-Tree (prototype) × + 105 × + × 4 10 × + + 103 2 4 6 8 Dimension Figure 2: Performance of Berkeley DB and our prototype (B + -Tree). 107 107 Throughput (transactions per second) Throughput (transactions per second) R-Tree + R-Tree + B-Tree × B-Tree × × × × × × + + × + × 106 + + × 106 + × + + × + × + × × + + × + × + × + × + × + + × + × + × + × + 5 5 10 10−4 10−3 10−2 10 10−4 10−3 10−2 Query selectivity Query selectivity Figure 3: Performance of R-Tree and B-tree indexes,Figure 4: Performance of R-Tree and B-tree indexes, 64MB, 2 dimensions (first hardware setup). 512MB, 2 dimensions (second hardware setup). 106 106 Throughput (transactions per second) Throughput (transactions per second) R-Tree + R-Tree + × B-Tree × B-Tree ×× × × × × × × × × 105 × 105 × × + + × × + + + × + × × + + × + + + + + × + + + + × + + + × + + 104 10−4 10−3 10−2 104 10−4 10 −3 10−2 Query selectivity Query selectivity Figure 5: Performance of R-Tree and B-tree indexes,Figure 6: Performance of R-Tree and B-tree indexes, 64MB, 4 dimensions (first hardware setup). 512MB, 4 dimensions (second hardware setup). 105 105 × Throughput (transactions per second) Throughput (transactions per second) R-Tree + × R-Tree + B-Tree × × × × × × B-Tree × × × × × + + + + × × × + + + + × + + + × × + + 10 4 × 104 + + + + + × + + + + × 103 10−4 10−3 10−2 103 10−4 10−3 10−2 Query selectivity Query selectivity Figure 7: Performance of R-Tree and B-tree indexes,Figure 8: Performance of R-Tree and B-tree indexes, 64MB, 6 dimensions (first hardware setup). 512MB, 6 dimensions (second hardware setup). 105 105 Throughput (transactions per second) Throughput (transactions per second) R-Tree + R-Tree + B-Tree × B-Tree × × × × × × × + × × 4 + + + + + + × × + + × × 10 104 + × × + + × × × + + + + + + + × × + + + × × × 103 10−4 10−3 10−2 103 10−4 10−3 10−2 Query selectivity Query selectivity Figure 9: Performance of R-Tree and B-tree indexes,Figure 10: Performance of R-Tree and B-tree indexes, 64MB, 8 dimensions (first hardware setup). 512MB, 8 dimensions (second hardware setup). [2] Amer F. Al-Badarneh, Qussai Yaseen, and Ismail bases, VLDB ’07, pages 1150–1160. VLDB En- Hmeidi. A new enhancement to the R-tree node dowment, 2007. splitting. J. Inf. Sci., 36(1):3–18, feb 2010. [13] Gerhard Weikum and Gottfried Vossen. Transac- [3] C. Ang and T. Tan. New linear node splitting algo- tional information systems: theory, algorithms, and rithm for R-trees. In Michel Scholl and Agnès Vois- the practice of concurrency control and recovery. ard, editors, Advances in Spatial Databases, vol- Morgan Kaufmann Publishers Inc., San Francisco, ume 1262 of Lecture Notes in Computer Science, CA, USA, 2001. pages 337–349. Springer Berlin / Heidelberg, 1997. 10.1007/3-540-63238-7 38. [4] Norbert Beckmann and Bernhard Seeger. A revised R*-tree in comparison with related index struc- tures. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, SIGMOD ’09, pages 799–812, New York, NY, USA, 2009. ACM. [5] Sotiris Brakatsoulas, Dieter Pfoser, and Yannis Theodoridis. Revisiting R-Tree Construction Prin- ciples. In Proceedings of the 6th East European Conference on Advances in Databases and Infor- mation Systems, ADBIS ’02, pages 149–162, Lon- don, UK, UK, 2002. Springer-Verlag. [6] Antonin Guttman. R-trees: a dynamic index struc- ture for spatial searching. SIGMOD Rec., 14(2):47– 57, June 1984. [7] Joseph M. Hellerstein, Jeffrey F. Naughton, and Avi Pfeffer. Generalized Search Trees for Database Systems. In Proceedings of the 21th International Conference on Very Large Data Bases, VLDB ’95, pages 562–573, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc. [8] Ibrahim Kamel and Christos Faloutsos. Hilbert R- tree: An Improved R-tree using Fractals. In Pro- ceedings of the 20th International Conference on Very Large Data Bases, VLDB ’94, pages 500–509, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc. [9] Marcel Kornacker, C. Mohan, and Joseph M. Hellerstein. Concurrency and recovery in gener- alized search trees. SIGMOD Rec., 26(2):62–72, June 1997. [10] Yannis Manolopoulos, Alexandros Nanopoulos, Apostolos N. Papadopoulos, and Y. Theodoridis. R-Trees: Theory and Applications (Advanced In- formation and Knowledge Processing). Springer- Verlag New York, Inc., Secaucus, NJ, USA, 2005. [11] Timos K. Sellis, Nick Roussopoulos, and Christos Faloutsos. The R+ -Tree: A Dynamic Index for Multi-Dimensional Objects. In Proceedings of the 13th International Conference on Very Large Data Bases, VLDB ’87, pages 507–518, San Francisco, CA, USA, 1987. Morgan Kaufmann Publishers Inc. [12] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. The end of an architectural era: (it’s time for a complete rewrite). In Proceedings of the 33rd international conference on Very large data