Colossal Trajectory Mining Semantic Co-movement Pattern Mining (Discussion Paper) Chiara Forresi1 , Matteo Francia1,˚ , Enrico Gallinucci1 , Matteo Golfarelli1 and Manuele Pasini1 1 DISI — University of Bologna, Cesena, Italy Abstract Spatio-temporal mobility patterns are at the core of strategic applications such as urban planning and monitoring. Depending on the strength of spatio-temporal constraints, different mobility patterns can be defined. While existing approaches work well in the extraction of groups of objects sharing fine-grained paths, the huge volume of large-scale data asks for coarse-grained solutions. Colossal Trajectory Mining (CTM) efficiently extracts heterogeneous mobility patterns out of a multidimensional space that, along with space and time dimensions, can consider additional trajectory features (e.g., means of transport or activity) to characterize behavioral mobility patterns. The algorithm is natively designed in a distributed fashion, and the experimental evaluation shows its scalability with respect to the involved features and the cardinality of the trajectory dataset. Keywords Trajectory mining, Mobility patterns, Frequent itemset mining 1. Introduction The spreading of Internet of Things and mobile devices [1] stimulated the rise of applications based on mobility patterns, such as urban mobility and traffic planning [2], and moving objects (MOs) profiling and linking [3] where objects moving in similar locations can share interests or relationships. Since different applications require identifying different mobility behaviors, researchers tailored a plethora of specific patterns. However, the need for a unifying analytic framework is well-understood and debated in the literature [4, 5]. Generally speaking, a mobility pattern captures behaviors that frequently occur among MOs; each MO follows a trajectory that can be partially or completely shared with others. Mobility patterns can be classified by the strength of the constraints on spatial and temporal proximity. All patterns require spatial proximity to be satisfied for a sufficient length of the trajectories. Conversely, temporal proximity is not always mandatory: a mobility pattern can be interesting even if the same path has been traveled at different times. A co-location [6] is a group of objects located at the same points at any time (e.g., customers frequenting the same shops on different days). Similarly, a flow [7] is a co-location group where the path is contiguous (e.g., individuals moving through adjacent road segments). A swarm [8] is a group of objects moving within the SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy ˚ Corresponding author. $ chiara.forresi@unibo.it (C. Forresi); m.francia@unibo.it (M. Francia); enrico.gallinucci@unibo.it (E. Gallinucci); matteo.golfarelli@unibo.it (M. Golfarelli); manuele.pasini@unibo.it (M. Pasini)  0000-0001-5652-2455 (C. Forresi); 0000-0002-0805-1051 (M. Francia); 0000-0002-0931-4255 (E. Gallinucci); 0000-0002-0437-0725 (M. Golfarelli); 0009-0005-7023-6742 (M. Pasini) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings same points at the same, possibly non-consecutive, time instants (e.g., individuals attending the same sport events each week). Similarly, a convoy [9] is a swarm group sharing (at least some) consecutive timestamps (e.g., individuals sharing a means of transport). Most of the approaches in the literature focus on a specific pattern and a few unifying approaches have been introduced. [10] and [11] target small groups of objects sharing fine- grained paths, however, the huge volume of large-scale data (e.g., hundreds of thousands of trajectories spanning the entire USA) asks for coarse-grained solutions. For instance, if our goal is to extract the groups of people flying across countries (e.g., USA has 50 states and 328 million inhabitants) or moving across city neighborhoods (e.g., Milan has 88 neighborhoods and 3 million inhabitants), meaningful groups are the ones in the order of hundreds/thousands of individuals. Indeed, extracting groups of 10 people given an average domestic flight of 150 people would return 10 “ 1.2 ¨ 1015 groups. `150 ˘ In this paper, we describe Colossal Trajectory Mining (CTM), previously introduced in [12], an approach that generalizes many of the previous approaches while overcoming their limitations and integration effort. CTM (i) is a unifying framework for the extraction of heterogeneous mobility patterns; (ii) is suitable for applications working on datasets with a huge number of (long) trajectories traveling across limited spatial regions (i.e., #trajectories " #regions); (iii) characterizes MOs through a tessellation including spatial and temporal features as well as additional features that enable the comprehension of semantic mobility behaviors [13] (e.g., characterizing mobility behaviors by means of transport or activity); and (iv) does not require similarity computation between trajectories, preventing the design of burdensome metrics. The remainder of the paper is organized as follows. In Section 2, we position and compare our approach with respect to the related literature. In Section 3, we describe CTM. In Section 4, we assess the effectiveness and efficiency of CTM by leveraging both real-world and synthetic case studies. Finally, we summarize our approach and future research directions in Section 5. 2. Related work Frequent Itemset Mining uncovers co-occurrences among the items in a transaction dataset [14]. FIM has been also applied to trajectory data to find routes [15, 16] or regions [17] frequently trafficked by MOs. This is an orthogonal problem to extracting groups of objects moving together. Colossal itemset mining is a branch of FIM that computes FI in highly-dimensional datasets; for instance, a dataset with 30 transactions/rows/instances each with 107 items/columns [18]. Carpenter [19] generates frequent closed itemsets in a depth-first centralized fashion. In [18], the authros distribute the branches of the depth-first exploration over distributed executors. However, (i) this results in highly unbalanced partitions, and (ii) a centralization mechanism must be iteratively applied to discard redundant branches (i.e., patterns are redundantly generated). While other algorithms for colossal itemset mining have been recently proposed [19, 20, 21], none address the extraction of constrained co-movement patterns. Clustering groups trajectories such that intra-group similarity is maximized and inter-group similarity is minimized. Since “classic” clustering algorithms do not enforce a definition of co-movement patterns (e.g., cardinality, time span, cohesion, spatial and/or temporal contiguity), flock [22], convoy [9], and swarm [23] patterns have been formalized. As swarm [23] groups Table 1 Excerpt of the comparison with existing co-movement patterns approaches Features Co-location Flow Convoy Swarm Distributed Contributions Any ✓ ✓ ✓ ✓ ✓ CTM Space - ✓ - - - [24], [25], [26] Space ✓ - - - ✓ [27] Space, Time - - - ✓ - [28], [8] Space, Time - - - ✓ ✓ [29], [30] Space, Time - - ✓ - - [23], [31], [32] Space, Time - - ✓ - ✓ [33], [34] Space, Time - - ✓ ✓ ✓ [11] Space, Time - ✓ - - - [7], [2], [35] Raw trajectories Trajectories Transactions Co-movement Patterns Abstracting Creating Mining Trajectories Transactions Tessellation Figure 1: Overview of CTM. objects moving in spatial proximity for a given amount of possibly-non-contiguous time, classical clustering algorithms can be referred to as swarms if no minimal duration is enforced. Table 1 summarizes the comparison with existing approaches. 3. Mining semantic co-movement patterns Our approach is based on the following steps (Figure 1). 3.1. Abstracting trajectories We consider a dataset of raw trajectories, where each trajectory point is labeled with a set of features. Mandatory features are those needed to define spatial or spatio-temporal locations (e.g., latitude, longitude, and timestamp), but other semantic features might be available. Definition 1 (Raw trajectory and feature). A raw trajectory 𝑃 is a sequence of points p𝑝1 , . . . , 𝑝|𝑃 | q generated by a MO. The feature space is a set of features 𝐹 “ t𝑓1 , . . . , 𝑓|𝐹 | u that is collected for each point. Raw trajectories are mapped to a multidimensional tessellation, a composition of multidimen- sional tiles (from now on, simply tiles) of any shape — with no overlaps and no gaps — that cover a multidimensional space; the tessellation is not necessarily a regular grid. Each dimension of the tessellation corresponds to a feature describing the raw trajectory points: common features are space and time, but additional features can be added to specialize each trajectory point (e.g., whether an individual is moving by car or by bicycle). This allows grouping trajectories on semantic and behavioral concepts (e.g., to capture groups of individuals moving across city neighborhoods with different means of transport). Definition 2 (Tessellation). We call tessellation a multidimensional partitioning 𝑆 “ t𝑠1 , . . . , 𝑠𝑚 u of the feature space 𝐹 . Each tile of the tessellation is identified by an 𝑖𝑑 and is characterized by an interval/set of values for each continuous/nominal feature in 𝐹 . For each feature 𝑓 , the function 𝑑𝑖𝑠𝑡𝑓 p𝑠𝑖 , 𝑠𝑗 q computes the distance between two tiles on the tessellation. Two tiles 𝑠𝑖 , 𝑠𝑗 P 𝑆 are adjacent (𝑠𝑖 –𝑓 𝑠𝑗 ) if @𝑠𝑧 P 𝑆 it is 𝑑𝑖𝑠𝑡𝑓 p𝑠𝑖 , 𝑠𝑗 q ď 𝑑𝑖𝑠𝑡𝑓 p𝑠𝑖 , 𝑠𝑧 q`𝑑𝑖𝑠𝑡𝑓 p𝑠𝑧 , 𝑠𝑗 q. In our implementation, the distance function is " 𝑓 is ordinal 𝑔𝑒𝑜𝑑p𝑠𝑖 , 𝑠𝑗 q 𝑑𝑖𝑠𝑡𝑓 p𝑠𝑖 , 𝑠𝑗 q “ 𝑓 is nominal ^ 𝑠𝑖 “ 𝑠𝑗 0 (1) 𝑓 is nominal ^ 𝑠𝑖 ‰ 𝑠𝑗 8 where 𝑔𝑒𝑜𝑑p𝑠𝑖 , 𝑠𝑗 q is the geodesic distance [36] computed on the tessellation, which is the number of tiles along the shortest path of neighboring tiles connecting 𝑠𝑖 and 𝑠𝑗 . Definition 3 (Tile connection). Two tiles 𝑠𝑖 , 𝑠𝑗 in the tessellation 𝑆 are connected (𝑠𝑖 Ø𝑓 𝑠𝑗 ) if there exists a path of adjacent tiles in 𝑆 connecting them in 𝑓 . A trajectory is an abstraction of a raw trajectory at the grain defined by the tessellation. Definition 4 (Trajectory). Given a raw trajectory 𝑃 and a tessellation 𝑆, we define the trajectory 𝑇 corresponding to 𝑃 in 𝑆 as the sequence of tiles p𝑠1 , . . . , 𝑠|𝑇 | q such that a tile 𝑠 is added to 𝑇 if at least a point 𝑝 P 𝑃 is in 𝑠. A point 𝑝 is in the tile 𝑠 if, for each feature in 𝐹 , the values of the feature for 𝑠 contain the corresponding feature value characterizing 𝑝. Noticeably, the tessellation: (i) defines the level of the analysis (CTM transparently allows the extraction of patterns at neighborhood/city/country scales) and (ii) compresses trajectories since a trajectory moves through a tile if at least one of its points belongs to the tile (a single tile instance is added to 𝑇 if consecutive points fall in that tile, thus |𝑃 | ě |𝑇 |). 3.2. Creating transactions To formalize CTM as a colossal itemset mining problem, we introduce transactions and items. Definition 5 (Item, itemset, and transaction). Given a trajectory dataset 𝒯 and a tessellation 𝑆, each trajectory represents an item, and a set 𝐼 of trajectories is an itemset. We define the transaction for a tile 𝑠 P 𝑆 as the itemset containing all the trajectories having at least a point in 𝑠. 𝒬, the set of all transactions, is a transaction dataset. As transactions are sets of items, the data is further compressed: if a tile is traversed more than once by a trajectory, the transaction (tile) will contain the item (trajectory) only once. Since we are looking for trajectories sharing feature values, we need to identify the itemsets contained in a large number of transactions. This property is captured by the support function. Tb 1 Tg 2 p Tr 3 4 A B C D Figure 2: Trajectories moving through a tessellation. Definition 6 (Support). Given a transaction dataset 𝒬, the support 𝑠𝑢𝑝p𝐼q Ď 𝒬 of an itemset 𝐼 is the set of transactions containing 𝐼. Definition 7 (Frequent and closed itemsets). An itemset is 𝐼 frequent (FI) if |𝑠𝑢𝑝p𝐼q| ě 𝑚𝑆𝑢𝑝, where 𝑚𝑆𝑢𝑝 is the minimum number of transactions to consider the itemset as frequent. A frequent itemset is closed (FCI) if there exists no superset with the same support. FCIs provide a lossless compression of FIs [37] (i.e., the output is non-redundant and the complete set of FIs is recoverable) which are exponential in the number of trajectories/items [14]. Working with FCIs rather than FIs simplifies data analysis [38]. Example 1 (Trajectory and transaction). With reference to Figure 2, 𝑇𝑏 , 𝑇𝑔 , and 𝑇𝑟 are tra- jectories while 𝑄𝐴1 and 𝑄𝐵2 are transactions that correspond to tiles A1 and B2. 𝑇𝑏 “ p𝐴1, 𝐵1, 𝐶1, 𝐷1, 𝐶1q, 𝑇𝑔 “ p𝐴1, 𝐵1, 𝐵2, 𝐵3, 𝐵4, 𝐴4q, 𝑇𝑟 “ p𝐴1, 𝐵1, 𝐵2, 𝐵3, 𝐷3q, 𝑄𝐴1 “ t𝑇𝑏 , 𝑇𝑔 , 𝑇𝑟 u, 𝑄𝐵2 “ t𝑇𝑔 , 𝑇𝑟 u The maximum number of transactions depends on the number of tiles (i.e., |𝒬| “ |𝑆|). Model- ing trajectories as items and tiles as transactions makes our frequent itemset approach a colossal one since the number of tiles (i.e., transactions) is typically in the order of magnitudes smaller than the number of trajectories (i.e., items). This is a fair assumption to make: for instance, Milan has 88 neighborhoods with over 3 ¨ 106 inhabitants (i.e., potential MOs). Obviously, our assumption is no more true when a very fine tessellation is adopted. For example, the Milan metropolitan area spans about 1500𝑘𝑚2 , corresponding to 1.5 ¨ 105 uniform tiles with side 100𝑚 and 1.5 ¨ 107 uniform tiles with side 10𝑚. 3.3. Mining co-movement patterns We retrieve the sets of trajectories satisfying minimum cardinality (i.e., the number of trajecto- ries), minimum support (i.e., the length of the shared path), and spatio-temporal constraints. Different mobility patterns can be obtained by specializing such constraints. Definition 8 (Co-movement pattern). A co-movement pattern 𝐼 is a FCI s.t. |𝐼| ě 𝑚𝐶𝑟𝑑, where 𝑚𝐶𝑟𝑑 is the minimum number of trajectories to consider a FCI as a co-movement pattern. Table 2 Characterization of co-movement patterns Pattern Feature Shape constraints Co-loc. 𝑓 𝑠𝑝 None Flow 𝑓 𝑠𝑝 D𝑆 1 Ď 𝑠𝑢𝑝p𝐼q s.t. @𝑠𝑖 , 𝑠𝑗 P 𝑆 1 , 𝑠𝑖 Ø𝑓 𝑠𝑝 𝑠𝑗 ^ |𝑆 1 | ě 𝑚𝑆𝑢𝑝 Swarm 𝑓 𝑠𝑝 , 𝑓 𝑡𝑚 None Convoy 𝑓 𝑠𝑝 , 𝑓 𝑡𝑚 D𝑆 1 Ď 𝑠𝑢𝑝p𝐼q s.t. @𝑠𝑖 , 𝑠𝑗 P 𝑆 1 , 𝑠𝑖 Ø𝑓 𝑡𝑚 𝑠𝑗 ^ |𝑆 1 | ě 𝑚𝑆𝑢𝑝 This “basic” co-movement pattern can be specialized depending on the involved features. Co-location and flow are defined over a spatial feature (𝑓 𝑠𝑝 ; i.e., they are required to happen in the same spatial tile), while swarm and convoy require both spatial (𝑓 𝑠𝑝 ) and temporal (𝑓 𝑡𝑚 ) features (i.e., they are required to happen in the same spatio-temporal tiles). In other words, while for swarm and convoy it is necessary to be in the same spatio-temporal tile, co-location and flow only require to be in the same spatial tile (e.g., same location even if at different times). Although the simplest representation of the space and time features is a regular binning of their absolute values, CTM allows adopting abstractions richer in semantics as long as these determine a tessellation (i.e., a partitioning) of space and time. Characterizing co-movement patterns with additional features means imposing additional constraints, which we call behavioral constraints. Note that behavioral constraints are more expressive than filtering trajectories based on a specific feature value since they further characterize objects that behaves similarly while moving in space and time; for instance, behavioral features are highly important in the linkage/anonymization of mobility data [39]. In CTM, behavioral constraints are transparently enforced by simply extending the input tessellation with additional features (see Definition 4, the tile directly models the features in the tessellation). More formally, two or more trajectories share a tile 𝑠 if they have at least a point in 𝑠. Then, the support of an itemset 𝐼 (a set of trajectories), includes all and only the transactions (tiles) shared by the trajectories. Note that the behavioral constraints must be computed jointly with the spatio-temporal ones, and not before/after running CTM. When the number of items is much larger than the number of transactions, searching for frequent itemsets by enumerating all itemsets with an Apriori-like strategy [14] can be unfeasible or very inefficient. In this case, it is convenient to adopt a row-enumeration-like strategy [19]. CTM extracts co-movement patterns through a breadth-first enumeration of closed itemsets: starting from single transactions, CTM progressively intersects them with further transactions. Closed itemsets associated with a larger set of transactions are characterized by lower cardinalities and larger supports (intuitively, fewer trajectories sharing more tiles). The enumeration process can be represented as an enumeration tree, where each node corresponds to a distinct closed itemset. Naively enumerating the whole tree is inefficient, thus we exploit several pruning mechanisms to limit the enumerated portion of the tree. We conceive CTM as a parallel and big-data approach for co-movement pattern mining. Specifically, CTM: (i) adopts a breadth-first enumeration approach to fully exploit task parallelization and workload balancing; (ii) adopts local pruning criteria to avoid centralized checks that would limit parallelization; (iii) adopts spatio-temporal pruning criteria that have been specifically devised for trajectories- related patterns; (iv) broadcasts the transaction dataset to locally compute the itemset support. For the sake of space, the details of the enumeration process are omitted and are available in Table 3 KPIs by dataset and pattern type Type |𝑆| Patterns Enum. Time (s) Shape check Card. check Red. check Flow 88 9.7 ¨ 104 2.1 ¨ 107 31 53% 82% 30% Co-locaction 88 2.3 ¨ 105 2.6 ¨ 107 44 31% 82% 31% Convoy1 528 0 5.5 ¨ 108 180 1% 99% 90% Swarm 528 124 3.0 ¨ 108 127 8.2% 98% 70% the extended version [12]. Note that approaching this problem as a colossal itemset mining rather than a typical cluster- ing one (i) avoids the computation of similarities that would make computational complexity explode for large datasets; and (ii) exploits monotonicity properties (e.g., as the generation process proceeds, the cardinality of trajectory groups decreases while the length of the path shared by trajectories in the same group increases) to filter out invalid co-movement patterns without generating them all. 4. Evaluation The approach has been implemented in Spark and is available at https://github.com/big-unibo/ ctm. All tests run on a cluster of 10 nodes, each equipped with an 8-core i7 CPU@3.60GHz and 16 GB of RAM and interconnected by Gigabit Ethernet. We tested CTM on several use cases and scalability tests, but for the sake of space we only report the results on a real-world dataset. Milan is a real trajectory dataset that contains trajectories from 6 ¨ 106 MOs (i.e., individ- uals) from the Milan metropolitan area (around 200𝑘𝑚2 ). Trajectories are sparse in time since they represent inhabitants as well as travelers over three months. The dataset has been collected during the urban mobility analysis project “La città intorno” (https://lacittaintorno. fondazionecariplo.it/) that aims to understand the mobility patterns of inhabitants living in sub- urban neighborhoods and to rank neighborhoods by their attractiveness in order to understand how to allocate economic resources for requalification. The attractiveness of a neighborhood is defined as the percentage of co-movement patterns passing through that neighborhood. To fulfill the analysis, we initially define a tessellation where the spatial feature represents the 88 neighborhoods in Milan and the temporal feature represents a relative dimension that partitions absolute timestamps into six bins, such as night (from 0 to 3) and morning (from 8 to 11); overall |𝑆| “ 88 ¨ 6 “ 528 tiles. Then, together with domain experts, we set relevant values for 𝑚𝐶𝑟𝑑 (100) and 𝑚𝑆𝑢𝑝 (7). Table 3 reports the outcomes for all co-movement pattern types. Table 4 shows the results of our attractiveness analysis using swarm patterns, highlighting the need for higher requalification in “Lodi - Corvetto", “Padova", and “Adriano"; the most attractive neighborhoods are the ones closest to the city center2 . Finally, we tested CTM against SPARE [11] (the only big-data approach to the extraction of generic co-movement patterns), and PFPGrowth [40] (a big data approach to Frequent Itemset Mining) on two synthetic datasets Oldenburg and Hermoupolis. For the sake of space, the 1 Due to the sparsity in time, no convoy pattern is returned in the Milan dataset. 2 By filtering tiles on the time bin, it is possible to characterize how attractiveness changes during the day. Table 4 An excerpt of attractiveness for neighborhoods in Milan Neighborhood Attractiveness Brera 83% Duomo 83% Lodi - Corvetto 4% Padova 3% Adriano 1% Oldenburg Hermoupolis 103 104 CTM PFPGrowth SPARE 103 102 Time (s) 102 101 101 108 109 107 106 107 | | 105 105 104 103 103 102 5 10 15 20 25 30 5 10 15 20 25 30 mSup mSup Figure 3: Comparing CTM, SPARE, and PFPGrowth in terms of computational time (top) and co- movement patterns (bottom). details of the scalability testing process are omitted and are available in the extended version [12]. Overall, CTM scalability proves to be better than the others for big groups of trajectories. 5. Conclusion CTM is a big-data approach to extract spatial and spatio-temporal mobility patterns possibly enriched by additional trajectory features that characterize behavioral mobility patterns. With respect to the existing literature, CTM is general-purpose as it provides a unifying approach to extract different pattern types and is particularly suited for applications characterized by a high number of trajectories to be analyzed on a feature space with limited cardinality (e.g., to capture the daily commuting of a citizen through different neighborhoods, rather than analyzing her detailed path at the single street level). As new research directions, we plan to: (i) introduce a definition of group cohesion to further prune mobility patterns based on the shared tiles, (ii) investigate how the extracted mobility patterns can be summarized in a more succinct representation, (iii) investigate how gridding affects the stability of co-movement patterns, and (iv) consider a unifying extraction of mobility patterns from streaming trajectory data in order to apply CTM to online location-based systems. References [1] G. Vitali, M. Francia, M. Golfarelli, M. Canavari, Crop management with the IoT: An interdisciplinary survey, Agronomy 11 (2021) 181. doi:10.3390/agronomy11010181. [2] D. Kumar, H. Wu, S. Rajasegarar, C. Leckie, S. Krishnaswamy, M. Palaniswami, Fast and scalable big data trajectory clustering for understanding urban mobility, IEEE Trans. Intel- ligent Transportation Systems 19 (2018) 3709–3722. doi:10.1109/tits.2018.2854775. [3] M. Francia, E. Gallinucci, M. Golfarelli, N. Santolini, Dart: De-anonymization of personal gazetteers through social trajectories, J. of Inf. Security and Applications 55 (2020) 102634. doi:10.1016/j.jisa.2020.102634. [4] X. Ding, L. Chen, Y. Gao, C. S. Jensen, H. Bao, Ultraman: a unified platform for big trajectory data management and analytics, in: Proc. of VLDB, volume 11, 2018, pp. 787– 799. doi:10.14778/3192965.3192970. [5] M. M. Kwakye, Conceptual model and design of semantic trajectory data warehouse, Int. J. Data Warehous. Min. 16 (2020) 108–131. doi:10.4018/ijdwm.2020070106. [6] X. Bao, J. Lu, T. Gu, L. Chang, Z. Xu, L. Wang, Mining non-redundant co-location patterns, IEEE Trans. Neural Networks and Learning Systems 33 (2022) 6613–6626. doi:10.1109/ tnnls.2021.3082628. [7] B. Han, L. Liu, E. Omiecinski, Road-network aware trajectory clustering: Integrating locality, flow, and density, IEEE Trans. Mob. Comput. 14 (2015) 416–429. [8] Z. Li, B. Ding, J. Han, R. Kays, Swarm: mining relaxed temporal moving object clusters, in: Proc. of VLDB, volume 3, 2010, pp. 723–734. doi:10.14778/1920841.1920934. [9] H. Jeung, M. L. Yiu, X. Zhou, C. S. Jensen, H. T. Shen, Discovery of convoys in trajectory databases, in: Proc. of VLDB, volume 1, VLDB Endowment, 2008, pp. 1068–1080. doi:10. 14778/1453856.1453971. [10] N. Phan, P. Poncelet, M. Teisseire, All in one: Mining multiple movement patterns, Int. J. of Inf. Technology and Decision Making 15 (2016) 1115–1156. doi:10.1142/ s0219622016500280. [11] Q. Fan, D. Zhang, H. Wu, K. Tan, A general and parallel platform for mining co-movement patterns over large-scale trajectories, in: Proc. of VLDB, volume 10, 2016, pp. 313–324. doi:10.14778/3025111.3025114. [12] M. Francia, E. Gallinucci, M. Golfarelli, Colossal trajectory mining: A unifying approach to mine behavioral mobility patterns, Expert Syst. Appl. 238 (2024) 122055. doi:10.1016/ J.ESWA.2023.122055. [13] Z. Yan, D. Chakraborty, C. Parent, S. Spaccapietra, K. Aberer, Semantic trajectories: Mobility data computation and annotation, ACM Transactions on Intelligent Systems and Technology 4 (2013) 1–38. doi:10.1145/2483669.2483682. [14] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: Proc. of VLDB, volume 25, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994, p. 487–499. doi:10.1145/235968.233311. [15] M. Qiu, D. Pi, Mining frequent trajectory patterns in road network based on similar trajectory, in: Proc. of IDEAL, Springer, Yangzhou, China, 2016, pp. 46–57. doi:10.1007/ 978-3-319-46257-8_6. [16] Z. Fu, Z. Tian, Y. Xu, K. Zhou, Mining frequent route patterns based on personal trajectory abstraction, IEEE Access 5 (2017) 11352–11363. doi:10.1109/access.2017.2712703. [17] L. Zheng, D. Xia, X. Zhao, L. Tan, H. Li, L. Chen, W. Liu, Spatial–temporal travel pat- tern mining using massive taxi trajectory data, Physica A: Statistical Mechanics and its Applications 501 (2018) 24–41. [18] D. Apiletti, E. Baralis, T. Cerquitelli, P. Garza, F. Pulvirenti, P. Michiardi, A parallel mapreduce algorithm to efficiently support itemset mining on high dimensional data, Big Data Research 10 (2017) 53–69. doi:10.1016/j.bdr.2017.10.004. [19] F. Pan, G. Cong, A. K. H. Tung, J. Yang, M. J. Zaki, Carpenter: finding closed patterns in long biological datasets, in: Proc. of KDD, ACM, Washington, DC, USA, 2003, pp. 637–642. doi:10.1145/956750.956832. [20] F. A. M. Zaki, N. F. Zulkurnain, RARE: mining colossal closed itemset in high dimensional data, Knowl.-Based Systems 161 (2018) 1–11. doi:10.1016/j.knosys.2018.07.025. [21] M. K. Vanahalli, N. Patil, An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets, Inf. Sci. 496 (2019) 343–362. doi:10.1016/j.ins.2018.08.009. [22] J. Gudmundsson, M. J. van Kreveld, Computing longest duration flocks in trajectory data, in: 14th ACM Int. Symposium on Geographic Inf. Systems, ACM, Arlington, Virginia, USA, 2006, pp. 35–42. doi:10.1145/1183471.1183479. [23] H. H. Aung, K.-L. Tan, Discovery of evolving convoys, in: M. Gertz, B. Ludäscher (Eds.), Scientific and Statistical Database Management, Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, pp. 196–213. doi:10.1007/978-3-642-13818-8_16. [24] T. L. C. da Silva, F. Lettich, J. A. F. de Macêdo, K. Zeitouni, M. A. Casanova, Online clustering of trajectories in road networks, in: Proc. of MDM, IEEE, Versailles, France, 2020, pp. 99–108. doi:10.1109/MDM48529.2020.00031. [25] P. Yang, L. Wang, X. Wang, L. Zhou, SCPM-CR: A novel method for spatial co-location pattern mining with coupling relation consideration, in: Proc. of ICDE, IEEE, 2022, pp. 1503–1504. doi:10.1109/icde53745.2022.00128. [26] A. Tritsarolis, G.-S. Theodoropoulos, Y. Theodoridis, Online discovery of co-movement patterns in mobility data, International Journal of Geographical Information Science 35 (2021) 819–845. [27] J. C. Fonseca-Galindo, G. de Castro Surita, J. M. Neto, C. L. de Castro, A. P. Lemos, A multi-agent system for solving the dynamic capacitated vehicle routing problem with stochastic customers using trajectory data mining, Expert Systems with Applications 195 (2022) 116602. doi:10.1016/j.eswa.2022.116602. [28] J. Lee, J. Han, K. Whang, Trajectory clustering: a partition-and-group framework, in: Proc. of SIGMOD, ACM, Beijing, China, 2007, pp. 593–604. doi:10.1145/1247480.1247546. [29] C. Hu, X. Kang, N. Luo, Q. Zhao, Parallel clustering of big data of spatio-temporal trajectory, in: 11th Int. Conf. on Natural Computation, ICNC 2015, Zhangjiajie, China, August 15-17, 2015, 2015, pp. 769–774. doi:10.1109/icnc.2015.7378088. [30] P. Tampakis, N. Pelekis, C. Doulkeridis, Y. Theodoridis, Scalable distributed subtrajectory clustering, in: 2019 IEEE Int. Conf. on Big Data, IEEE, Los Angeles, CA, USA, 2019, pp. 950–959. doi:10.1109/bigdata47090.2019.9005563. [31] F. Orakzai, T. Calders, T. B. Pedersen, k/2-hop: Fast mining of convoy patterns with effective pruning, in: Proc. of VLDB, volume 12, 2019, pp. 948–960. doi:10.14778/ 3329772.3329773. [32] Y. Liu, H. Dai, B. Li, J. Li, G. Yang, J. Wang, ECMA: An efficient convoy mining algorithm for moving objects, in: Proc. of CIKM, ACM, Queensland, Australia, 2021, pp. 1089–1098. doi:10.1145/3459637.3482255. [33] F. Orakzai, T. B. Pedersen, T. Calders, Distributed mining of convoys in large scale datasets, GeoInformatica 25 (2021) 353–396. doi:10.1007/s10707-020-00431-w. [34] A. Tritsarolis, E. Chondrodima, P. Tampakis, A. Pikrakis, Y. Theodoridis, Predicting co-movement patterns in mobility data, GeoInformatica (2022) 1–23. doi:10.1007/ s10707-022-00478-x. [35] S. Wang, Z. Bao, J. S. Culpepper, T. Sellis, X. Qin, Fast large-scale trajectory clustering, in: Proc. of VLDB, volume 13, VLDB Endowment, 2019, pp. 29–42. doi:10.14778/3357377. 3357380. [36] J. Bouttier, P. Di Francesco, E. Guitter, Geodesic distance in planar graphs, Nuclear physics B 663 (2003) 535–567. doi:10.1016/s0550-3213(03)00355-9. [37] J. Pei, J. Han, R. Mao, CLOSET: an efficient algorithm for mining frequent closed itemsets, in: Proc. of SIGMOD, ACM, Dallas, Texas, USA, 2000, pp. 21–30. doi:10.1145/956750. 956779. [38] M. Francia, M. Golfarelli, S. Rizzi, Summarization and visualization of multi-level and multi- dimensional itemsets, Inf. Sciences 520 (2020) 63–85. doi:10.1016/j.ins.2020.02.006. [39] F. Jin, W. Hua, M. Francia, P. Chao, M. E. Orlowska, X. Zhou, A survey and experimental study on privacy-preserving trajectory data publishing, IEEE Trans. Knowl. Data Eng. 35 (2023) 5577–5596. doi:10.1109/TKDE.2022.3174204. [40] H. Li, Y. Wang, D. Zhang, M. Zhang, E. Y. Chang, Pfp: parallel fp-growth for query recommendation, in: Proc. of RecSys, ACM, New York, NY, USA, 2008, pp. 107–114. doi:10.1145/1454008.1454027.