Online Co-movement Pattern Prediction in Mobility Data Andreas Tritsarolis Eva Chondrodima Data Science Lab., Department of Informatics, Data Science Lab., Department of Informatics, University of Piraeus University of Piraeus Piraeus, Greece Piraeus, Greece andrewt@unipi.gr evachon@unipi.gr Panagiotis Tampakis Aggelos Pikrakis Data Science Lab., Department of Informatics, Data Science Lab., Department of Informatics, University of Piraeus University of Piraeus Piraeus, Greece Piraeus, Greece ptampak@unipi.gr pikrakis@unipi.gr ABSTRACT Co-movement Patterns, has not been addressed in the relevant Predictive analytics over mobility data are of great importance literature yet. since they can assist an analyst to predict events, such as col- Concerning the definition of co-movement patterns, there are lisions, encounters, traffic jams, etc. A typical example of such several approaches in the literature, such as [2, 5, 6, 8]. However, analytics is future location prediction, where the goal is to pre- all of the above are either offline and/or operate at predefined dict the future location of a moving object, given a look-ahead temporal snapshots that imply temporal alignment and uniform time. What is even more challenging is being able to accurately sampling, which is not realistic assumptions. For this reason, predict collective behavioural patterns of movement, such as co- we adopt the approach presented in [33], which, to the best of movement patterns. In this paper, we provide an accurate solution our knowledge, is the first online method for the discovery of to the problem of Online Prediction of Co-movement Patterns. In co-movement patterns in mobility data that does not assume more detail, we split the original problem into two sub-problems, temporal alignment and uniform sampling. The goal in [33] is namely Future Location Prediction and Evolving Cluster Detection. to discover co-movement patterns, namely Evolving Clusters, in Furthermore, in order to be able to calculate the accuracy of our an online fashion, by employing a graph-based representation. solution, we propose a co-movement pattern similarity measure, By doing so, the problem of co-movement pattern detection is which facilitates the matching of the predicted clusters with the transformed to identifying Maximal Cliques (MCs) (for spherical actual ones. Finally, the accuracy of our solution is demonstrated clusters) or Maximal Connected Subgraphs (MCSs) (for density- experimentally over a real dataset from the maritime domain. connected clusters). Figure 1 illustrates such an example, where in blue we have the historical evolving clusters and in orange KEYWORDS the predicted future ones. Several mobility-related applications could benefit from such Machine Learning, Predictive Analytics, Co-movement Patterns, an operation. In the urban traffic domain, predicting co-movement Trajectory Prediction patterns could assist in detecting future traffic jams which in turn can help the authorities take the appropriate measures (e.g. adjust- 1 INTRODUCTION ing traffic lights) in order to avoid them. In the maritime domain, The vast spread of GPS-enabled devices, such as smartphones, a typical problem is illegal transshipment, where groups of ves- tablets and GPS trackers, has led to the production of large sels move together "close" enough for some time duration and amounts of mobility related data. By nature, this kind of data with low speed. It becomes obvious that predicting co-movement are streaming and there are several application scenarios where patterns could help in predicting illegal transshipment events. the processing needs to take place in an online fashion. These Finally, in large epidemic crisis, contact tracing is one of the tools properties have posed new challenges in terms of efficient stor- to identify individuals that have been close to infected persons age, analytics, and knowledge extraction out of such data. One for some time duration. Being able to predict these groups can of these challenges is online cluster analysis, where the goal is help avoid future contacts with possibly infected individuals. to unveil hidden patterns of collective behaviour from streaming The problem of predicting the spatial properties of group pat- trajectories, such as co-movement patterns [2, 5, 6, 8, 33]. What ters has only been recently studied [12]. In more detail, the au- is even more challenging is predictive analytics over mobility thors in [12] adopt a spherical definition of groups, where each data, where the goal is to predict the future behaviour of moving group consists of moving objects that are confined within a ra- objects, which can have a wide range of applications, such as dius ๐‘‘ and their goal is to predict the centroid of the groups at the predicting collisions, future encounters, traffic jams, etc. At an next timeslice. However, this approach is offline and cannot be individual level, a typical and well-studied example of such ana- applied in an online scenario. Furthermore, the group definition lytics is future location prediction [23, 24, 27, 32], where the goal adopted in [12] is rather limited, since the identify only spherical is to predict the future location of a moving object, given a look- groups, as opposed to [33] where both spherical and density- ahead time. However, prediction of future mobility behaviour connected clusters can be identified. Finally, the authors in [12] at a collective level and more specifically Online Prediction of predict only the centroids of the clusters and not the shape and the membership of each cluster. ยฉ 2021 Copyright for this paper by its the author(s). Published in the Workshop Proceedings of the EDBT/ICDT 2021 Joint Conference (March 23-26, 2021, Nicosia, Inspired by the above, the problem that we address in this Cyprus) on CEUR-WS.org. Use permitted under the terms of the Creative Commons paper is the Online Prediction of Co-movement Patterns. Informally, License Attribution 4.0 International (CC BY 4.0). Figure 1: Predicting evolving clusters via (singular) trajectory prediction given a look-ahead time interval ฮ”๐‘ก, the goal is to predict the mined patterns, their main limitation is that they search for spe- groups, i.e. their spatial shape (spherical or density-connected), cific collective behaviours, defined by respective parameters. An temporal coverage and membership, after ฮ”๐‘ก time. In more detail, approach that defines a new generalized mobility pattern is pre- we split the original problem into two sub-problems, namely sented in [5]. In more detail, the general co-movement pattern Future Location Prediction and Evolving Cluster Detection. The (GCMP), is proposed, which includes Temporal Replication and problem of Online Prediction of Co-movement Patterns is quite Parallel Mining, a method that, as suggested by its name, splits challenging, since, apart from the inherent difficulty of predicting a data snapshot spatially and replicates data when necessary to the future, we also need to define how the error between the ensure full coverage, and Star Partitioning and ApRiori Enumer- actual and the predicted clusters will be measured. This further ator, a technique that uses graph pruning in order to avoid the implies that a predicted cluster should be correctly matched with data replication that takes place in the previous method. In [8], the corresponding actual cluster which is not a straightforward the authors propose a frequent co-movement pattern (f-CoMP) procedure. To the best of our knowledge, the problem of Online definition for discovering patterns at multiple spatial scales, also Prediction of Co-movement Patterns, has not been addressed in exploiting the overall shape of the objectsโ€™ trajectories, while the literature yet. Our main contributions are the following: at the same time it relaxes the temporal and spatial constraints of the seminal works (i.e. Flocks, Convoys, etc.) in order to dis- โ€ข We provide an accurate solution to the problem of Online cover more interesting patterns. The authors in [2, 6], propose Prediction of Co-movement Patterns. a two-phase online distributed co-movement pattern detection โ€ข We propose a co-movement pattern similarity measure, framework, which includes the clustering and the pattern enu- which helps us โ€œmatchโ€ the predicted with the actual clus- meration phase, respectively. During the clustering phase for ters. timestamp ๐‘ก๐‘  , the snapshot ๐‘†๐‘ก is clustered using Range-Join and โ€ข We perform an experimental study with a real dataset DBSCAN. from the maritime domain, which verifies the accuracy of Another line of research, tries to discover groups of either our proposed methodology. entire or portions of trajectories considering their routes. There The rest of the paper is organized as follows. Section 2 dis- are several approaches whose goal is to group whole trajecto- cusses related work. In Section 3, we formally define the problem ries, including T-OPTICS [18, 19], that incorporates a trajectory of Online Prediction of Co-movement Patterns. Subsequently, in similarity function into the OPTICS algorithm. However, discov- Section 4 we propose our two-step methodology and in Section 5, ering clusters of complete trajectories can overlook significant we introduce a co-movement pattern similarity measure along patterns that might exist only for portions of their lifespan. To with the cluster โ€œmatchingโ€ algorithm. Section 6, presents our deal with this, another line of research has emerged, that of Sub- experimental findings and, finally, in Section 7 we conclude the trajectory Clustering[20, 21, 28, 29], where the goal is to partition paper and discuss future extensions. a trajectory into subtrajectories, whenever the density or the composition and its neighbourhood changes โ€œsignificantlyโ€, then form groups of similar ones, while, at the same time, separate 2 RELATED WORK the ones that fit into no group, called outliers. The work performed in this paper is closely related to three topics, Another perspective into co-movement pattern discovery, is to (a) trajectory clustering and more specifically co-movement pat- reduce cluster types into graph properties and view them as such. tern discovery, (b) future location prediction and (c) co-movement In [31, 33], the authors propose a novel co-movement pattern pattern prediction. definition, called evolving clusters, that unifies the definitions Co-movement patterns. One of the first approaches for iden- of flocks and convoys and reduces them to Maximal Cliques tifying such collective mobility behaviour is the so-called flock (MC), and Connected Subgraphs (MCS), respectively. In addition, pattern [14], which identifies groups of at least ๐‘š objects that the authors propose an online algorithm, that discovers several move within a disk of radius ๐‘Ÿ for at least ๐‘˜ consecutive time- evolving cluster types simultaneously in real time using Apache points. Inspired by this, several related works followed, such as Kafkaยฎ , without assuming temporal alignment, in constrast to moving clusters [11], convoys [10], swarms [16], platoons [15], the seminal works (i.e. flocks, convoys). traveling companion [30] and gathering pattern [38]. Even though In the proposed predictive model, we will use the definition all of these approaches provide explicit definitions of several of evolving clusters [33] for co-movement pattern discovery. The reason why is this the most appropriate, is that we can predict 3 PROBLEM DEFINITION the course of several pattern types at the same time, without the As already mentioned, we divide the problem into two sub- need to call several other algorithms, therefore adding redundant problems, namely Future Location Prediction and Evolving Clus- computational complexity. ters Detection. Before proceeding to the actual formulation of the Future Location Prediction. The fact that the Future Lo- problem, let us provide some preliminary definitions. cation Prediction (FLP) problem has been extensivelly studied brings up its importance and applicability in a wide range of Definition 3.1. (Trajectory) A trajectory ๐‘‡ = {๐‘ 1, . . . ๐‘๐‘› } is applications. Towards tackling the FLP problem, one line of work considered as a sequence of timestamped locations, where ๐‘› is includes efforts that take advantage of historical movement pat- the latest reported position of ๐‘‡ . Further, ๐‘๐‘– = {๐‘ฅ๐‘– , ๐‘ฆ๐‘– , ๐‘ก๐‘– }, with terns in order to predict the future location. Such an approach is 1 โ‰ค ๐‘– โ‰ค ๐‘›. presented in [32], where the authors propose MyWay, a hybrid, Definition 3.2. (Future Location Prediction). Given an input pattern-based approach that utilizes individual patterns when dataset ๐ท = {๐‘‡1, . . . ,๐‘‡ |๐ท | } of trajectories and a time interval available, and when not, collective ones, in order to provide more ฮ”๐‘ก, our goal is โˆ€๐‘‡๐‘– โˆˆ ๐ท to predict ๐‘๐‘๐‘Ÿ๐‘’๐‘‘ ๐‘– ๐‘– = {๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘‘ ๐‘– , ๐‘ฆ๐‘๐‘Ÿ๐‘’๐‘‘ } at accurate predictions and increase the predictive ability of the ๐‘– timestamp ๐‘ก๐‘๐‘Ÿ๐‘’๐‘‘ = ๐‘ก๐‘›๐‘– + ฮ”๐‘ก. system. In another effort, the authors in [23, 24] utilize the work done by [29] on distributed subtrajectory clustering in order to An informal definition regarding group patterns could be: โ€œa be able to extract individual subtrajectory patterns from big mo- large enough number of objects moving close enough to each bility data. These patterns are subsequently utilized in order to other, in space and time, for some time durationโ€. As already predict the future location of the moving objects in parallel. mentioned, in this paper we adopt the definition provided in [33]. A different way of addressing the FLP problem includes ma- chine learning approaches. Definition 3.3. (Evolving Cluster). Given: a set ๐ท of trajecto- Recurrent Neural Network (RNN) -based models [26] consti- ries, a minimum cardinality threshold ๐‘, a maximum distance tute a popular method for trajectory prediction due to their pow- threshold ๐œƒ , and a minimum time duration threshold ๐‘‘, an Evolv- erful ability to fit complex functions, along with their ability of ing Cluster โŸจ๐ถ, ๐‘ก๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก , ๐‘ก๐‘’๐‘›๐‘‘ , ๐‘ก๐‘โŸฉ is a subset ๐ถ โˆˆ ๐ท of the moving adjusting the dynamic behaviour as well as capturing the causal- objectsโ€™ population, |๐ถ | โ‰ฅ ๐‘, which appeared at time point ๐‘ก๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก ity relationships across sequences. However, research in the mar- and remained alive until time point ๐‘ก๐‘’๐‘›๐‘‘ (with ๐‘ก๐‘’๐‘›๐‘‘ โˆ’ ๐‘ก๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก โ‰ฅ ๐‘‘) itime domain is limited regarding vessel trajectory prediction during the lifetime [๐‘ก๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก , ๐‘ก๐‘’๐‘›๐‘‘ ] of which the participating mov- and Gated Recurrent Units (GRU) [3] models, which constitute ing objects were spatially connected with respect to distance ๐œƒ the newer generation of RNN. and cluster type ๐‘ก๐‘. Suo et.al. [27] presented a GRU model to predict vessel tra- Definition 3.4. (Group Pattern Prediction Online). Given: a jectories based on a) the Density-Based Spatial Clustering of set ๐ท of trajectories, ๐บ of co-movement patterns up to timeslice Applications with Noise (DBSCAN) algorithm to derive main tra- ๐‘‡ ๐‘† ๐‘›๐‘œ๐‘ค and a look-ahead threshold ฮ”๐‘ก, we aim to predict all the jectories and, b) a symmetric segmented-path distance approach valid co-movement patterns ๐บ โ€ฒ โˆˆ (๐‘‡ ๐‘† ๐‘›๐‘œ๐‘ค ,๐‘‡ ๐‘† ๐‘›๐‘œ๐‘ค + ฮ”๐‘ก]. to eliminate the influence of a large number of redundant data and to optimize incoming trajectories. Ground truth data from Figure 1 provides an illustration of Definition 3.4. More specif- AIS raw data in the port of Zhangzhou, China were used to train ically, we know the movement of nine objects from ๐‘‡ ๐‘† 1 until ๐‘‡ ๐‘† 3 and verify the validity of the proposed model. and via EvolvingClusters with ๐‘ = 3 and ๐‘‘ = 2 that they form four Liu et.al. [17] proposed a trajectory classifier called Spatio- evolving clusters ๐‘ƒ1 = {๐‘Ž, ๐‘, ๐‘, ๐‘‘, ๐‘’, ๐‘“ , ๐‘”, โ„Ž, ๐‘–}, ๐‘ƒ2 = {๐‘Ž, ๐‘, ๐‘, ๐‘‘, ๐‘’}, Temporal GRU to model the spatio-temporal correlations and ir- ๐‘ƒ 3 = {๐‘Ž, ๐‘, ๐‘}, ๐‘ƒ4 = {๐‘, ๐‘, ๐‘‘, ๐‘’}, ๐‘ƒ5 = {๐‘”, โ„Ž, ๐‘–}. Our goal is to predict regular temporal intervals prevalently presented in spatio-temporal their respective locations until ๐‘‡ ๐‘† 5 . Running EvolvingClusters trajectories. Particularly, a segmented convolutional weight mech- with the same parameters for the predicted timeslices, reveals us anism was proposed to capture short-term local spatial corre- (with high probability) that ๐‘ƒ2, ๐‘ƒ3, ๐‘ƒ4, ๐‘ƒ5 will continue to exist as lations in trajectories along with an additional temporal gate well as the creation of a new pattern ๐‘ƒ6 = {๐‘“ , ๐‘”, โ„Ž, ๐‘–}. to control the information flow related to the temporal interval information. 4 METHODOLOGY Wang et.al. [34] aiming at predicting the movement trend of In this section we present the proposed solution to the problem vessels in the crowded port water of Tianjin port, proposed a of Online Prediction of Co-movement Patterns, composed of two vessel berthing trajectory prediction model based on bidirectional parts: a) the FLP method, and b) the Evolving Cluster Discovery GRU (Bi-GRU) and cubic spline interpolation. algorithm. Also, an example is presented illustrating the approach Co-movement pattern prediction. The most similar work operation. to ours has only been recently presented in [12]. More specifically, the authors in [12], divide time into time slices of fixed step size 4.1 Overview and adopt a spherical definition of groups, where each group Figure 2 illustrates the architecture of our proposed methodology. consists of moving objects that are confined within a radius ๐‘‘ First we split the problem of Online Prediction of Co-movement and their goal is to predict the centroid of the groups at the Patterns into two parts, the FLP, and the Evolving Cluster Dis- next timeslice. However, this approach is offline and cannot be covery. The FLP method is, also, divided to two parts: a) the applied in an online scenario. Furthermore, the group definition FLP-offline part, where the training procedure of the model is adopted in [12] is rather limited, since the identify only spherical taking place, and b) the FLP-online part, where the trained FLP groups, as opposed to [33] where both spherical and density- model is applied to streaming GPS locations to predict the next connected clusters can be identified. Finally, the authors in [12] objectsโ€™ location. predict only the centroids of the clusters and not the shape and Thus, our proposed approach is further divided in the offline the membership of each cluster. phase and the online one. Particularly, at the offline phase, we Predicted Predicted id Trajectories Evolving Clusters yer Streaming speed e La GPS Locations longitutde EvolvingClusters Onlin latitude ... yer e La Future Location Prediction (FLP) Model Of๏ฌ‚in Historic Trajectories Figure 2: Workflow for evolving clusters prediction via (singular) trajectory prediction train our FLP model by using historic trajectories. Afterwards, at differences are computed between consecutive points of each the online phase we receive the streaming GPS locations in order vessel. to use them to create a buffer for each moving object. Then, we In this work, a GRU-based model is employed to solve the use our trained FLP model to predict the next objectsโ€™ location future location prediction problem. The proposed GRU-based and apply EvolvingClusters to each produced timeslice. network architecture is composed of the following layers: a) an input layer of four neurons, one for each input variable, b) a single 4.2 Future Location Prediction GRU hidden layer composed of 150 neurons, c) a fully-connected Trajectories can be considered as time sequence data [37] and hidden layer composed of 50 neurons, and d) an output layer thus are suited to be treated with techniques that are capable of two neurons, one for each prediction coordinate (longitude of handling sequential data and/or time series [25]. Over the and latitude). A schematic overview of the proposed network past two decades, the research interest on forecasting time series architecture is presented in Figure 3. Also, details for the Back- has been moved to RNN-based models, with the GRU architec- ward Propagation Through Time algorithm and for the Adam ture being the newer generation of RNN, which has emerged approach, which were employed for the NN learning purposes, as an effective technique for several difficult learning problems can be found in [36] and [13], respectively. (including sequential or temporal data -based applications) [4]. Although, the most popular RNN-based architecture is the well- 4.3 Evolving Clusters Discovery known Long Short-Term Memory (LSTM) [9], GRU present some After getting the predicted locations for each moving object, we interesting advantages over the LSTM. More specifically, GRU use EvolvingClusters in order to finally present the predicted are less complicated, easier to modify and faster to train. Also, co-movement patterns. Because the sampling rate may vary for GRU networks achieve better accuracy performance compared each moving object, we use linear interpolation to temporally to LSTM models on trajectory prediction problems on various align the predicted locations at a common timeslice with a stable domains, such as on maritime [27], on aviation [7] and on land sampling (alignment) rate ๐‘ ๐‘Ÿ . traffic [1]. Hence, this work follows this direction and employs a Given a timeslice ๐‘‡ ๐‘†๐‘›๐‘œ๐‘ค , EvolvingClusters works in a nutshell, GRU-based method. as follows: GRU includes internal mechanisms called gates that can regu- late the flow of information. Particularly, the GRU hidden layer โ€ข Calculates the pairwise distance for each object within include two gates, a reset gate which is used to decide how much ๐‘‡ ๐‘†๐‘›๐‘œ๐‘ค , and drop the locations with distance less than ๐œƒ ; past information to forget and an update gate which decides what โ€ข Creates a graph based on the filtered locations, and extract information to throw away and what new information to add. its Maximal Connected Subgraphs (MCS) and Cliques (MC) We briefly state the update rules for the employed GRU layer. with respect to ๐‘; For more details, the interested reader is referred to the original โ€ข Maintains the currently active (and inactive) clusters, given publications [3]. Also, details for the BPTT algorithm, which was the MCS and MC of ๐‘‡ ๐‘†๐‘›๐‘œ๐‘ค and the recent (active) pattern employed for training the model, can be found in [35]. history; and โ€ข Outputs the eligible active patterns with respect to ๐‘, ๐‘ก and z๐‘˜ = ๐œŽ (Wpฬƒ๐‘ง ยท pฬƒ๐‘˜ + Wโ„Ž๐‘ง ยท h๐‘˜โˆ’1 + b๐‘ง ) (1) ๐œƒ. r๐‘˜ = ๐œŽ (Wpฬƒ๐‘Ÿ ยท pฬƒ๐‘˜ + Wโ„Ž๐‘Ÿ ยท h๐‘˜โˆ’1 + b๐‘Ÿ ) (2) The output of EvolvingClusters, and by extension of the whole predictive model, is a tuple of four elements, the set of objects hฬƒ๐‘˜ = tanh(Wpฬƒโ„Ž ยท pฬƒ๐‘˜ + Wโ„Žโ„Ž ยท (r๐‘˜ โˆ— h๐‘˜โˆ’1 ) + bโ„Ž ) (3) ๐‘œ๐‘–๐‘‘๐‘  that form an evolving cluster, the starting time ๐‘ ๐‘ก, the ending h๐‘˜ = z๐‘˜ โŠ™ h๐‘˜โˆ’1 + (1 โˆ’ z๐‘˜ ) โŠ™ hฬƒ๐‘˜ (4) time ๐‘’๐‘ก, and the type ๐‘ก๐‘ of the group pattern, respectively. For where z and r represent the update and reset gates, respec- instance, the final output of the model at the example given at tively, hฬƒ and h represent the intermediate memory and output, Section 3 would be a set of 4-element tuples, i.e., {(๐‘ƒ2,๐‘‡ ๐‘† 1,๐‘‡ ๐‘† 5, 2), ร respectively. Also, in these equations, the Wโˆ— variables are the (๐‘ƒ3,๐‘‡ ๐‘† 1,๐‘‡ ๐‘† 5, 1), (๐‘ƒ4,๐‘‡ ๐‘† 1,๐‘‡ ๐‘† 4, 1), (๐‘ƒ 5,๐‘‡ ๐‘† 1,๐‘‡ ๐‘† 5, 1)} {(๐‘ƒ4,๐‘‡ ๐‘† 1, weight matrices and the bโˆ— variables are the biases. Moreover, ๐‘‡ ๐‘† 5, 2), (๐‘ƒ 6,๐‘‡ ๐‘† 4,๐‘‡ ๐‘† 5, 1)}, where ๐‘ก๐‘ = 1(2) corresponds to MC pฬƒ represents the input, which is composed of the differences in (respectively, MCS). We observe that, the first four evolving clus- space (longitude and latitude), the difference in time and the time ters are maintained exactly as found in the historic dataset. In horizon for which we want to predict the vesselโ€™s position; the addition to those, we predict (via the FLP model) the following: Figure 3: GRU-based neural network architecture. โ€ข ๐‘ƒ4 becomes inactive at timeslice ๐‘‡ ๐‘† 5 , but it remains active where ๐œ†1 + ๐œ†2 + ๐œ†3 = 1, ๐œ†๐‘– โˆˆ (0, 1) , ๐‘– โˆˆ {1, 2, 3}. as an MCS at timeslice ๐‘‡ ๐‘† 5 This further implies that a predicted cluster should be correctly โ€ข A new evolving cluster ๐‘ƒ6 is discovered at timeslice ๐‘‡ ๐‘† 5 matched with the corresponding actual cluster which is not a In the Sections that will follow, we define the evaluation mea- straightforward procedure. Our methdology for matching each sure we use in order to map, each discovered evolving cluster predicted co-movement pattern ๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ with the corresponding from the predicted to the respective ones in the actual locations, actual one ๐ถ๐‘Ž๐‘๐‘ก is depicted in Algorithm 1. as well present our preliminary results. Algorithm 1: ClusterMatching. Matches the pre- 5 EVALUATION MEASURES dicted with the actual evolving clusters The evaluation of a co-movement pattern prediction approach Input: Evolving Clusters disovered using the predicted is not a straightforward task, since we need to define how the ๐ธ๐ถ๐‘ ; and actual ๐ธ๐ถ๐‘Ž data-points; Measuresโ€™ error between the predicted and the actual co-movement patterns weights ๐œ†๐‘– , ๐‘– โˆˆ {1, 2, 3} will be quantified. Intuitively, we try to match each predicted Output: โ€œMatchedโ€ Evolving Clusters ๐ธ๐ถ๐‘š co-movement pattern with the most similar actual one. Towards 1 ๐ธ๐ถ๐‘š โ† {} this direction, we need to define a similarity measure between 2 for predicted pattern ๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ โˆˆ ๐ธ๐ถ๐‘ do co-movement patterns. In more detail, we break down this prob- 3 ๐‘ ๐‘–๐‘š๐‘–๐‘™๐‘Ž๐‘Ÿ๐‘–๐‘ก๐‘ฆ_๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’๐‘  โ† {} lem into three subproblems, the spatial similarity, the temporal 4 ๐‘ก๐‘œ๐‘๐‘†๐‘–๐‘š = 0 similarity and the membership similarity. Concerning the spatial 5 for actual pattern ๐ถ๐‘Ž๐‘๐‘ก โˆˆ ๐ธ๐ถ๐‘Ž do similarity this defined as follows: 6 calculate ๐‘†๐‘–๐‘š โˆ— (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ , ๐ถ๐‘Ž๐‘๐‘ก ) ร‘ 7 if ๐‘†๐‘–๐‘š โˆ— (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ , ๐ถ๐‘Ž๐‘๐‘ก ) โ‰ฅ ๐‘ก๐‘œ๐‘๐‘†๐‘–๐‘š then ๐‘€๐ต๐‘…(๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ ) ๐‘€๐ต๐‘…(๐ถ๐‘Ž๐‘๐‘ก ) ๐‘†๐‘–๐‘š๐‘ ๐‘๐‘Ž๐‘ก๐‘–๐‘Ž๐‘™ (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ , ๐ถ๐‘Ž๐‘๐‘ก ) = ร (5) 8 ๐‘ก๐‘œ๐‘๐‘†๐‘–๐‘š = ๐‘†๐‘–๐‘š โˆ— (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ , ๐ถ๐‘Ž๐‘๐‘ก ) ๐‘€๐ต๐‘…(๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ ) ๐‘€๐ต๐‘…(๐ถ๐‘Ž๐‘๐‘ก ) 9 ๐‘š๐‘Ž๐‘ก๐‘โ„Ž๐‘๐‘’๐‘ ๐‘ก โ† ๐ถ๐‘Ž๐‘๐‘ก where ๐‘€๐ต๐‘…(๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ ) (๐‘€๐ต๐‘…(๐ถ๐‘Ž๐‘๐‘ก )) is the Minimum Bounding Rec- 10 end tangle of the predicted co-movement pattern (actual co-movement 11 end pattern, respectively). Regarding the temporal similarity: 12 ๐ธ๐ถ๐‘š โ† ๐ธ๐ถ๐‘š โˆช ๐‘š๐‘Ž๐‘ก๐‘โ„Ž๐‘๐‘’๐‘ ๐‘ก 13 end ร‘ ๐‘ก๐‘’๐‘š๐‘ ๐ผ๐‘›๐‘ก๐‘’๐‘Ÿ๐‘ฃ๐‘Ž๐‘™ (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ ) ๐ผ๐‘›๐‘ก๐‘’๐‘Ÿ๐‘ฃ๐‘Ž๐‘™ (๐ถ๐‘Ž๐‘๐‘ก ) ๐‘†๐‘–๐‘š (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ , ๐ถ๐‘Ž๐‘๐‘ก ) = ร ๐ผ๐‘›๐‘ก๐‘’๐‘Ÿ๐‘ฃ๐‘Ž๐‘™ (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ ) ๐ผ๐‘›๐‘ก๐‘’๐‘Ÿ๐‘ฃ๐‘Ž๐‘™ (๐ถ๐‘Ž๐‘๐‘ก ) In more detail, we โ€œmatchโ€ each predicted co-movement pat- (6) tern ๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ with the most similar actually detected pattern ๐ถ๐‘Ž๐‘๐‘ก . After all predicted clusters get traversed we end up with ๐ธ๐ถ๐‘š where ๐ผ๐‘›๐‘ก๐‘’๐‘Ÿ๐‘ฃ๐‘Ž๐‘™ (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ ) (๐ผ๐‘›๐‘ก๐‘’๐‘Ÿ๐‘ฃ๐‘Ž๐‘™ (๐ถ๐‘Ž๐‘๐‘ก )) is the time interval when wich holds all the โ€œmatchingsโ€, which subsequently will help us the the predicted co-movement pattern was valid (actual co- in evaluate the prediction procedure by quantifuing the error movement pattern, respectively). As for the membership sim- between the predicted and the actual co-movement patterns. ilarity, we adopt the Jaccard similarity: ร‘ |๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ ๐ถ๐‘Ž๐‘๐‘ก | 6 EXPERIMENTAL STUDY ๐‘š๐‘’๐‘š๐‘๐‘’๐‘Ÿ ๐‘†๐‘–๐‘š (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ , ๐ถ๐‘Ž๐‘๐‘ก ) = ร (7) In this section, we evaluate our predictive model on a real-life |๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ ๐ถ๐‘Ž๐‘๐‘ก | mobility dataset from the maritime domain, and present our Finally, we define the co-movement pattern similarity as: preliminary results. ๏ฃฑ ๏ฃด ๐œ†1 ยท ๐‘†๐‘–๐‘š๐‘ ๐‘๐‘Ž๐‘ก๐‘–๐‘Ž๐‘™ + 6.1 Experimental Setup ๏ฃด ๏ฃด ๐‘ก๐‘’๐‘š๐‘ All algorithms were implemented in Python3 (via Anaconda31 ๏ฃฒ ๐œ†2 ยท ๐‘†๐‘–๐‘š + ๐‘†๐‘–๐‘š๐‘ก๐‘’๐‘š๐‘ > 0 ๏ฃด ๏ฃด ๏ฃด โˆ— ๏ฃด virtual environments). The experiments were conducted using ๐‘†๐‘–๐‘š (๐ถ๐‘๐‘Ÿ๐‘’๐‘‘ , ๐ถ๐‘Ž๐‘๐‘ก ) = ๐œ†3 ยท ๐‘†๐‘–๐‘š๐‘š๐‘’๐‘š๐‘๐‘’๐‘Ÿ (8) ๏ฃด ๏ฃด ๏ฃด Apache Kafkaยฎ with 1 topic for the transmitted (loaded from ๏ฃด ๏ฃด a CSV file) and predicted locations, as well as 1 consumer for ๏ฃด ๏ฃด ๏ฃณ 0 ๐ธ๐‘™๐‘ ๐‘’ 1 https://www.anaconda.com/ FLP and evolving cluster discovery, respectively. The machine we used is a single node with 8 CPU cores, 16 GB of RAM and 1.0 256 GB of HDD, provided by okeanos-knossos2 , an IAAS service for the Greek Research and Academic Community. 0.8 0.6 6.2 Dataset It is a well-known fact that sensor-based information is prone to 0.4 errors due to device malfunctioning. Therefore, a necessary step before any experiment(s) is that of pre-processing. In general, 0.2 pre-processing of mobility data includes data cleansing (e.g. noise elimination) as well as data transformation (e.g. segmentation, 0.0 temporal alignment), tasks necessary for whatever analysis is going to follow [22]. simtemp simspatial simmember simโˆ— In the experiments that will follow, we use a real-life mobility dataset3 from the maritime domain. The dataset, as product of our preprocessing pipeline, consists of 148,223 records from 246 Figure 4: Distribution of Cluster Similarity Measures and fishing vessels organized in 2,089 trajectories moving within the Total Cluster Similarity Aegean Sea. The dataset ranges in time and space, as follows: โ€ข Temporal range: 2nd June, 2018 โ€“ 31st August, 2018 (approx. Min. Q25 Q50 Q75 Mean. Max. 3 months) Record Lag 0 0 0 0 0.01 1 โ€ข Spatial range: longitude in [23.006, 28.996]; latitude in Consump. Rate 0 0 0 0 2.26 76.99 [35.345, 40.999] Table 1: Timeliness of the Proposed Methodology using During the preprocessing stage, we drop erroneous records (i.e. Apache Kafka GPS locations) based on a speed threshold ๐‘ ๐‘๐‘’๐‘’๐‘‘๐‘š๐‘Ž๐‘ฅ as well as stop points (i.e. locations with speed close to zero); afterwards we organize the cleansed data into trajectories based on their pair- wise temporal difference, given a threshold ๐‘‘๐‘ก. Finally, in order to discover evolving clusters, we need a stable and temporally close to the median, we visualize the trajectory of each partici- aligned sampling rate. For the aforementioned dataset, we set pating object on the map, as well as the MBRs for each respective the following thresholds: ๐‘ ๐‘๐‘’๐‘’๐‘‘๐‘š๐‘Ž๐‘ฅ = 50๐‘˜๐‘›๐‘œ๐‘ก๐‘ , ๐‘‘๐‘ก = 30๐‘š๐‘–๐‘›., and timeslice, in order to visualize the clustersโ€™ temporal and spa- alignment rate equal to 1๐‘š๐‘–๐‘›. tial similarity. It can be observed that deviations from the actual The rationale behind these thresholds stems from the char- trajectories resulted in minor changes in the area of the pointsโ€™ acteristics of the dataset which were unveiled after a statistical MBR, and consequently to the overall similarity. analysis of the distribution of the ๐‘ ๐‘๐‘’๐‘’๐‘‘ and ๐‘‘๐‘ก between succesive points of the same trajectory. 6.3 Preliminary Results In this section, we evaluate the prediction error of the proposed model with respect to the โ€œground truthโ€. We define as โ€œground truthโ€, the discovered evolving clusters on the actual GPS loca- tions. For the pattern discovery phase, we tune ๐ธ๐‘ฃ๐‘œ๐‘™๐‘ฃ๐‘–๐‘›๐‘”๐ถ๐‘™๐‘ข๐‘ ๐‘ก๐‘’๐‘Ÿ๐‘ , using ๐‘ = 3 vessels, ๐‘‘ = 3 timeslices, and ๐œƒ = 1500 meters. For the following experimental study, we focus โ€“ without loss of general- ity โ€“ on the MCS output of EvolvingClusters (density-connected clusters). Figure 4 illustrates the distribution of the three cluster simi- larity measures, namely ๐‘ ๐‘–๐‘š๐‘ก๐‘’๐‘š๐‘ , ๐‘ ๐‘–๐‘š๐‘ ๐‘๐‘Ž๐‘ก๐‘–๐‘Ž๐‘™ , and ๐‘ ๐‘–๐‘š๐‘š๐‘’๐‘š๐‘๐‘’๐‘Ÿ , as well as the overall similarity ๐‘†๐‘–๐‘š โˆ— . We observe that the majority of the predicted clusters are very close to their โ€œground truthโ€ values, with the median overall similarity being almost 88%. This Figure 5: Trajectory of a predicted (blue) vs. an actual is expected however, as the quality of EvolvingClustersโ€™ output evolving cluster (orange) is determined by two factors; the selected parameters; and the input data. Focusing on the latter4 , we observe that the algorithm is quite insensitive to prediction errors, as deviations from the Finally, Table 1 presents the metrics on the Kafka Consumers actual trajectory has minor impact to ๐‘ ๐‘–๐‘š๐‘ ๐‘๐‘Ž๐‘ก๐‘–๐‘Ž๐‘™ . used for the online layer of our predictive model, namely, Record Figure 5 illustrates the previous discussion. More specifically, Lag and Consumption Rate. Observing the Record Lag, we deduce for the predicted and corresponding actual MCS with similarity that our algorithm can keep up with the data-stream in a timely manner, while looking at Consumption Rate (i.e., the average 2 https://okeanos-knossos.grnet.gr/home/ 3 Kindly provided to us by MarineTraffic. number of records consumed per second) we conclude that our 4 The parameter sensitivity of EvolvingClusters is out of the scope of this paper. For proposed solution can process up to almost 77 records per second, more details see [33] which is compliant with the online real-time processing scenario. 7 CONCLUSIONS AND FUTURE WORK [12] Sameera Kannangara, Hairuo Xie, Egemen Tanin, Aaron Harwood, and Shanika Karunasekera. 2020. Tracking Group Movement in Location Based In this paper, we proposed an accurate solution to the problem Social Networks. In SIGSPATIAL โ€™20: 28th International Conference on Advances of Online Prediction of Co-movement Patterns, which is divided in Geographic Information Systems. ACM, 251โ€“262. [13] P. D. Kingma and J. Ba. 2015. Adam: A Method for Stochastic Optimization. into two phases: Future Location Prediction and Evolving Cluster In International Conference on Learning Representations (ICLR). Detection. The proposed method is based on a combination of [14] Patrick Laube, Stephan Imfeld, and Robert Weibel. 2005. Discovering relative GRU models and Evolving Cluster Detection algorithm and is motion patterns in groups of moving point objects. IJGIS 19, 6 (2005), 639โ€“668. [15] Yuxuan Li, James Bailey, and Lars Kulik. 2015. Efficient mining of platoon evaluated through a real-world dataset from the maritime domain patterns in trajectory databases. Data Knowl. Eng. 100 (2015), 167โ€“187. taking into account a novel co-movement pattern similarity mea- [16] Zhenhui Li, Bolin Ding, Jiawei Han, and Roland Kays. 2010. Swarm: Mining sure, which is able to match the predicted clusters with the actual Relaxed Temporal Moving Object Clusters. PVLDB 3, 1 (2010), 723โ€“734. [17] H. Liu, H. Wu, W. Sun, and I. Lee. 2019. Spatio-Temporal GRU for Trajectory ones. Our study on a real-life maritime dataset demonstrates Classification. In 2019 IEEE International Conference on Data Mining (ICDM). the efficiency and effectiveness of the proposed methodology. 1228โ€“1233. https://doi.org/10.1109/ICDM.2019.00152 [18] Mirco Nanni and Dino Pedreschi. 2006. Time-focused clustering of trajectories Thus, based on the potential applications, as well as the qual- of moving objects. J. Intell. Inf. Syst. 27, 3 (2006), 267โ€“289. ity of the results produced, we believe that the proposed model [19] Nikos Pelekis, Stylianos Sideridis, Panagiotis Tampakis, and Yannis Theodor- can be a valuable utility for researchers and practitioners alike. idis. 2016. Simulating Our LifeSteps by Example. ACM Trans. Spatial Algo- rithms Syst. 2, 3 (2016), 11:1โ€“11:39. In the near future, we aim to develop an online co-movement [20] Nikos Pelekis, Panagiotis Tampakis, Marios Vodas, Christos Doulkeridis, and pattern prediction approach that, instead of breaking the prob- Yannis Theodoridis. 2017. On temporal-constrained sub-trajectory cluster lem at hand into two disjoint sub-problems without any specific analysis. Data Min. Knowl. Discov. 31, 5 (2017), 1294โ€“1330. [21] Nikos Pelekis, Panagiotis Tampakis, Marios Vodas, Costas Panagiotakis, and synergy (i.e. first predict the future location of objects and then Yannis Theodoridis. 2017. In-DBMS Sampling-based Sub-trajectory Clustering. detect future co-movement patterns), will combine the two steps In EDBT. 632โ€“643. [22] Nikos Pelekis and Yannis Theodoridis. 2014. Mobility Data Management and in a unified solution that will be able to directly predict the future Exploration. Springer. co-movement patterns. [23] Petros Petrou, Panagiotis Nikitopoulos, Panagiotis Tampakis, Apostolos Gle- nis, Nikolaos Koutroumanis, Georgios M. Santipantakis, Kostas Patroumpas, Akrivi Vlachou, Harris V. Georgiou, Eva Chondrodima, Christos Doulkeridis, ACKNOWLEDGMENTS Nikos Pelekis, Gennady L. Andrienko, Fabian Patterson, Georg Fuchs, Yannis Theodoridis, and George A. Vouros. 2019. ARGO: A Big Data Framework for This work was partially supported by projects i4Sea (grant T1EDK- Online Trajectory Prediction. In Proceedings of the 16th International Sympo- 03268) and Track&Know (grant agreement No 780754), which sium on Spatial and Temporal Databases, SSTD 2019, Vienna, Austria, August 19-21, 2019. 194โ€“197. have received funding by the European Regional Development [24] Petros Petrou, Panagiotis Tampakis, Harris V. Georgiou, Nikos Pelekis, and Fund of the EU and Greek national funds (through the Oper- Yannis Theodoridis. 2019. Online Long-Term Trajectory Prediction Based on ational Program Competitiveness, Entrepreneurship and Inno- Mined Route Patterns. In MASTER@ECML-PKDD 2019. 34โ€“49. [25] A. Rossi, G. Barlacchi, M. Bianchini, and B. Lepri. 2020. Modelling Taxi Driversโ€™ vation, under the call Research-Create-Innovate) and the EU Behaviour for the Next Destination Prediction. IEEE Transactions on Intelligent Horizon 2020 R&I Programme, respectively. Transportation Systems 21, 7 (2020), 2980 โ€“ 2989. [26] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323 (1986), 533โ€“536. REFERENCES [27] Yongfeng Suo, Wenke Chen, Christophe Claramunt, and Shenhua Yang. 2020. A Ship Trajectory Prediction Framework Based on a Recurrent Neural Net- [1] A. Benterki, V. Judalet, M. Choubeila, and M. Boukhnifer. 2019. Long-Term work. Sensors 20, 18 (2020). https://doi.org/10.3390/s20185133 Prediction of Vehicle Trajectory Using Recurrent Neural Networks. In IECON [28] Panagiotis Tampakis, Nikos Pelekis, Natalia V. Andrienko, Gennady L. An- 2019 - 45th Annual Conference of the IEEE Industrial Electronics Society, Vol. 1. drienko, Georg Fuchs, and Yannis Theodoridis. 2018. Time-Aware Sub- 3817โ€“3822. Trajectory Clustering in Hermes@PostgreSQL. In ICDE. 1581โ€“1584. [2] Lu Chen, Yunjun Gao, Ziquan Fang, Xiaoye Miao, Christian S. Jensen, and [29] Panagiotis Tampakis, Nikos Pelekis, Christos Doulkeridis, and Yannis Theodor- Chenjuan Guo. 2019. Real-time Distributed Co-Movement Pattern Detection idis. 2019. Scalable Distributed Subtrajectory Clustering. In 2019 IEEE Interna- on Streaming Trajectories. Proc. VLDB Endow. 12, 10 (2019), 1208โ€“1220. tional Conference on Big Data (Big Data). IEEE, 950โ€“959. [3] Kyunghyun Cho, Bart van Merriรซnboer, Caglar Gulcehre, Dzmitry Bahdanau, [30] Lu An Tang, Yu Zheng, Jing Yuan, Jiawei Han, Alice Leung, Chih-Chieh Hung, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase and Wen-Chih Peng. 2012. On Discovery of Traveling Companions from Representations using RNN Encoderโ€“Decoder for Statistical Machine Trans- Streaming Trajectories. In ICDE. 186โ€“197. lation. In Proceedings of the 2014 Conference on Empirical Methods in Natural [31] George S. Theodoropoulos, Andreas Tritsarolis, and Yannis Theodoridis. 2019. Language Processing (EMNLP). Association for Computational Linguistics, EvolvingClusters: Online Discovery of Group Patterns in Enriched Maritime Doha, Qatar, 1724โ€“1734. https://doi.org/10.3115/v1/D14-1179 Data. In MASTER@PKDD/ECML (Lecture Notes in Computer Science, Vol. 11889). [4] R. Dey and F. M. Salem. 2017. Gate-variants of Gated Recurrent Unit (GRU) Springer, 50โ€“65. neural networks. In 2017 IEEE 60th International Midwest Symposium on Cir- [32] Roberto Trasarti, Riccardo Guidotti, Anna Monreale, and Fosca Giannotti. cuits and Systems (MWSCAS). 1597โ€“1600. https://doi.org/10.1109/MWSCAS. 2017. MyWay: Location prediction via mobility profiling. Inf. Syst. 64 (2017), 2017.8053243 350โ€“367. [5] Qi Fan, Dongxiang Zhang, Huayu Wu, and Kian-Lee Tan. 2016. A General [33] Andreas Tritsarolis, George-Stylianos Theodoropoulos, and Yannis Theodor- and Parallel Platform for Mining Co-Movement Patterns over Large-scale idis. 2020. Online discovery of co-movement patterns in mobility data. Trajectories. Proc. VLDB Endow. 10, 4 (2016), 313โ€“324. International Journal of Geographical Information Science 0, 0 (2020), 1โ€“27. [6] Ziquan Fang, Yunjun Gao, Lu Pan, Lu Chen, Xiaoye Miao, and Christian S. https://doi.org/10.1080/13658816.2020.1834562 Jensen. 2020. CoMing: A Real-time Co-Movement Mining System for Stream- [34] C. Wang, H. Ren, and H. Li. 2020. Vessel trajectory prediction based on AIS ing Trajectories. In SIGMOD 2020, David Maier, Rachel Pottinger, AnHai Doan, data and bidirectional GRU. In 2020 International Conference on Computer Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 2777โ€“ Vision, Image and Deep Learning (CVIDL). 260โ€“264. https://doi.org/10.1109/ 2780. CVIDL51233.2020.00-89 [7] P. Han, W. Wang, Q. Shi, and J. Yang. 2019. Real-time Short- Term Trajectory [35] P. J. Werbos. 1990. Backpropagation through time: what it does and how to Prediction Based on GRU Neural Network. In 2019 IEEE/AIAA 38th Digital do it. Proc. IEEE 78, 10 (1990), 1550โ€“1560. https://doi.org/10.1109/5.58337 Avionics Systems Conference (DASC). 1โ€“8. https://doi.org/10.1109/DASC43569. [36] P. J. Werbos. 1990. Backpropagation through time: what it does and how to 2019.9081618 do it. Proc. IEEE 78, 10 (1990), 1550โ€“1560. [8] Shahab Helmi and Farnoush Banaei Kashani. 2020. Multiscale Frequent Co- [37] H. Xue, D. Q. Huynh, and M. Reynolds. 2018. SS-LSTM: A Hierarchical LSTM movement Pattern Mining. In 36th IEEE International Conference on Data Model for Pedestrian Trajectory Prediction. In 2018 IEEE Winter Conference Engineering, ICDE 2020. IEEE, Washington, DC, 829โ€“840. on Applications of Computer Vision. 1186โ€“1194. [9] Sepp Hochreiter and Jรผrgen Schmidhuber. 1997. Long Short-Term Memory. [38] Kai Zheng, Yu Zheng, Nicholas Jing Yuan, and Shuo Shang. 2013. On discovery Neural Computation 9, 8 (1997), 1735โ€“1780. of gathering patterns from trajectories. In ICDE. 242โ€“253. [10] Hoyoung Jeung, Man Lung Yiu, Xiaofang Zhou, Christian S. Jensen, and Heng Tao Shen. 2008. Discovery of convoys in trajectory databases. PVLDB 1, 1 (2008), 1068โ€“1080. [11] Panos Kalnis, Nikos Mamoulis, and Spiridon Bakiras. 2005. On Discovering Moving Clusters in Spatio-temporal Data. In SSTD. 364โ€“381.