Tasting the Time: How M-Tree Broke the “27 Club” Curse Paolo Ciaccia1,∗ , Marco Patella1 , Fausto Rabitti2 and Pavel Zezula3 1 Alma Mater Studiorum Universitá di Bologna, viale del Risorgimento, 2, 40134 - Bologna, Italy 2 ISTI - CNR, Via G. Moruzzi, 1, 56124 - Pisa, Italy 3 Masaryk University, Botanickà 554/68a, 602 00 - Brno, Czech Republic Abstract The M-tree turned 27 this year, since it was first published in 1997, at SEBD [5] and VLDB [6]. Differently from the likes of Jim Morrison, Jimi Hendrix, and Amy Winehouse, it is still alive and kicking, receiving dozens of downloads and citations every year.1 In this paper, we offer a quick overview of the context in which the M-tree was created and show how it helped starting a whole family of metric trees. In the mid of 90’s we were involved in the Esprit LTR HERMES project on high-performance multimedia storage management, funded by the European Community. At that time we faced a couple of basic issues dealing with the problem of indexing multimedia data: (i) the high dimensionality of feature vectors characterizing the content of such data, and (ii) the huge variety of distance-based criteria used for assessing the similarity of two multimedia objects. Both issues ruled out the possibility of using at-the-time established solutions available for spatial data, such as the R-tree [1]. And even novel proposals, such as the X-tree [2], able to alleviate the first issue, were not able to deal at all with more sophisticated distance functions that were not based on a Cartesian coordinate space. The abstraction of metric space was the 1st ingredient underlying the design of the M-tree. A metric space is a pair (𝑈 , 𝑑), where 𝑈 is the objects’ domain, and 𝑑 is a distance function that obeys the metric postulates: (i) symmetry: ∀𝑥, 𝑦 ∈ 𝑈 ∶ 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥); (ii) non-negativity: ∀𝑥, 𝑦 ∈ 𝑈: if 𝑥 ≠ 𝑦 then 𝑑(𝑥, 𝑦) > 0, 𝑑(𝑥, 𝑥) = 0; (iii) triangle inequality: ∀𝑥, 𝑦, 𝑧 ∈ 𝑈 ∶ 𝑑(𝑥, 𝑦) ≤ 𝑑(𝑥, 𝑧) + 𝑑(𝑧, 𝑦). Even without any notion of objects’ coordinates (as used by spatial indices), a metric index can effectively prune part of the search space by exploiting above properties, in particular the triangle inequality. Although at that time some proposals of metric indices were already available, such as the VP-tree [3] and the GNATree [4], they were based on a main-memory implementation, thus unsuitable for large, disk-resident datasets, the common case for multimedia data. The 2nd ingredient/requirement for the design of the M-tree was therefore to devise a paged and balanced organization, along the successful lines adopted by B+ -trees and R-trees. Since its first introduction, in 1997 at SEBD [5] and VLDB [6], M-tree has been extended with: 1 “M-tree gets a lot of citations because it is easily beaten.” Benjamin Bustos, personal communication. SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy ∗ Corresponding author. Envelope-Open paolo.ciaccia@unibo.it (P. Ciaccia); marco.patella@unibo.it (M. Patella); fausto.rabitti@isti.cnr.it (F. Rabitti); zezula@fi.muni.cz (P. Zezula) Orcid 0000-0002-1794-6244 (P. Ciaccia); 0000-0003-2655-0759 (M. Patella); 0000-0001-5438-6760 (P. Zezula) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings • an efficient bulk loading technique [7]; • cost models for estimating search costs [8] and [10]; • techniques for evaluating complex queries, where multiple similarity predicates are defined on a single feature [9] or on multiple features (M2 -tree) [13]; • efficient algorithms for solving probably approximately correct (PAC) queries [11] and [12]; • a technique to correctly solve queries using a user-defined distance, different from the one used to build the actual index (QIC-M-tree) [14]. The two original papers were seminal in generating a whole family of metric access methods, sharing the general M-tree structure. Besides the already cited M2 -tree and QIC-M-tree, several techniques have been proposed to improve the search performance of M-tree, among which the slim-tree [15], the PM-tree [16], the M+ -tree [17], the BM+ -tree [18], and the M∗ -tree [19]. Finally, the NM-tree [20] is able to deal with distance functions that do not satisfy the metric postulates, in particular the triangle inequality. In our view, one of the reasons for M-tree success is the fact that, since the very beginning, its source code has been freely available for research purposes at http://www-db.disi.unibo.it/Mtree/. The code of the original M-tree implementation was written in C++ and it is based on the GiST library [21]. A parallel version was presented in [22]. M-tree is now also available, among others, in PostgreSQL, again exploiting GiST [23], as a plugin for the Secondo DBMS (https:// secondo-database.github.io/content_plugins.html) and the ELKI data mining environment (https: //elki-project.github.io/), and in the SurrealDB multi-model database (https://surrealdb.com/). It is finally interesting to note that, in a conference held exactly on the same days as VLDB 1997, a homonymous abstract data type, generalizing a quadtree for parallel adaptive computations, was proposed [24]: such data structure however did not stand the test of time, rapidly fading into oblivion. References [1] A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD 1984: 47–57. [2] S. Berchtold, D.A. Keim, and H.-P. Kriegel. The X-tree : An Index Structure for High- Dimensional Data. VLDB 1996: 28–39. [3] J.K. Uhlmann. Satisfying General Proximity/Similarity Queries with Metric Trees. Infor- mation Processing Letters 40(4): 175–179 (1991). [4] S. Brin. Near Neighbor Search in Large Metric Spaces. VLDB 1995: 574–584. [5] P. Ciaccia, M. Patella, F. Rabitti, and P. Zezula. Indexing Metric Spaces with M-Tree. SEBD 1997: 67–86. [6] P. Ciaccia, M. Patella, and P. Zezula. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426–435. [7] P. Ciaccia and M. Patella. Bulk Loading the M-Tree. ADC’98: 15–26. [8] P. Ciaccia, M. Patella, and P. Zezula. A Cost Model for Similarity Queries in Metric Spaces. PODS 1998: 59–68. [9] P. Ciaccia, M. Patella, and P. Zezula. Processing Complex Similarity Queries with Distance- Based Access Methods. EDBT 1998: 9–23. [10] P. Ciaccia, A. Nanni, and M. Patella. A Query-sensitive Cost Model for Similarity Queries with M-Tree. ADC’99: 65–76. [11] P. Ciaccia and M. Patella. PAC Nearest Neighbor Queries: Using the Distance Distribution for Searching in High-Dimensional Metric Spaces. SEBD 1999: 259–273. [12] P. Ciaccia and M. Patella. PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces. ICDE 2000: 244–255. [13] P. Ciaccia and M. Patella. The M2 -tree: Processing Complex Multi-Feature Queries with Just One Index. DELOS 2000. [14] P. Ciaccia and M. Patella. Searching in Metric Spaces with User-Defined and Approximate Distances. ACM TODS 27(4): 398–437 (2002). [15] C. Traina, A.J.M. Traina, B. Seeger, and C. Faloutsos. Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes. EDBT ’00: 51–-65. [16] T. Skopal, J. Pokorný and V. Snasel. PM-Tree: Pivoting Metric Tree for Similarity Search in Multimedia Databases. ADBIS 2004: 803–815. [17] X. Zhou, G. Wang, J.X. Yu, and G. Yu. M+ -Tree: A New Dynamical Multidimensional Index For Metric Spaces. ADC’03: 161–-168. [18] X. Zhou, G. Wang, X. Zhou, and G. Yu. BM+ -Tree: A Hyperplane-Based Index Method for High-Dimensional Metric Spaces. DASFAA’05: 398.–409. [19] T. Skopal and D. Hoksza. Improving the Performance of M-Tree Family by Nearest- Neighbor Graphs. ADBIS 2007: 172–188. [20] T. Skopal and J. Lokoč. NM-Tree: Flexible Approximate Similarity Search in Metric and Non-Metric Spaces. DEXA 2008: 312–325. [21] J.M. Hellerstein, J.F. Naughton and A. Pfeffer. Generalized Search Trees for Database Systems. VLDB 1995: 562–573. [22] P. Zezula, P. Savino, F. Rabitti, G. Amato, and P. Ciaccia, Processing M-trees with Parallel Resources, RIDE 1998: 147–154. [23] I. Donkó, J. Szalai-Gindl, G. Gombos, and A. Kiss. An Implementation of the M-Tree Index Structure for PostgreSQL Using GiST. Informatics’2019: 189–194. [24] Q. Wu, A.J. Field, and P.H.J. Kelly. M-Tree: A Parallel Abstract Data Type for Block-Irregular Adaptive Applications. Euro-Par’97: 638–649.