Online Co-movement Pattern Prediction in Mobility Data
                          Andreas Tritsarolis                                                        Eva Chondrodima
           Data Science Lab., Department of Informatics,                              Data Science Lab., Department of Informatics,
                       University of Piraeus                                                      University of Piraeus
                         Piraeus, Greece                                                            Piraeus, Greece
                        andrewt@unipi.gr                                                           evachon@unipi.gr

                         Panagiotis Tampakis                                                         Aggelos Pikrakis
          Data Science Lab., Department of Informatics,                               Data Science Lab., Department of Informatics,
                      University of Piraeus                                                       University of Piraeus
                         Piraeus, Greece                                                            Piraeus, Greece
                       ptampak@unipi.gr                                                            pikrakis@unipi.gr

ABSTRACT                                                                          Co-movement Patterns, has not been addressed in the relevant
Predictive analytics over mobility data are of great importance                   literature yet.
since they can assist an analyst to predict events, such as col-                      Concerning the definition of co-movement patterns, there are
lisions, encounters, traffic jams, etc. A typical example of such                 several approaches in the literature, such as [2, 5, 6, 8]. However,
analytics is future location prediction, where the goal is to pre-                all of the above are either offline and/or operate at predefined
dict the future location of a moving object, given a look-ahead                   temporal snapshots that imply temporal alignment and uniform
time. What is even more challenging is being able to accurately                   sampling, which is not realistic assumptions. For this reason,
predict collective behavioural patterns of movement, such as co-                  we adopt the approach presented in [33], which, to the best of
movement patterns. In this paper, we provide an accurate solution                 our knowledge, is the first online method for the discovery of
to the problem of Online Prediction of Co-movement Patterns. In                   co-movement patterns in mobility data that does not assume
more detail, we split the original problem into two sub-problems,                 temporal alignment and uniform sampling. The goal in [33] is
namely Future Location Prediction and Evolving Cluster Detection.                 to discover co-movement patterns, namely Evolving Clusters, in
Furthermore, in order to be able to calculate the accuracy of our                 an online fashion, by employing a graph-based representation.
solution, we propose a co-movement pattern similarity measure,                    By doing so, the problem of co-movement pattern detection is
which facilitates the matching of the predicted clusters with the                 transformed to identifying Maximal Cliques (MCs) (for spherical
actual ones. Finally, the accuracy of our solution is demonstrated                clusters) or Maximal Connected Subgraphs (MCSs) (for density-
experimentally over a real dataset from the maritime domain.                      connected clusters). Figure 1 illustrates such an example, where
                                                                                  in blue we have the historical evolving clusters and in orange
KEYWORDS                                                                          the predicted future ones.
                                                                                      Several mobility-related applications could benefit from such
Machine Learning, Predictive Analytics, Co-movement Patterns,
                                                                                  an operation. In the urban traffic domain, predicting co-movement
Trajectory Prediction
                                                                                  patterns could assist in detecting future traffic jams which in turn
                                                                                  can help the authorities take the appropriate measures (e.g. adjust-
1    INTRODUCTION                                                                 ing traffic lights) in order to avoid them. In the maritime domain,
The vast spread of GPS-enabled devices, such as smartphones,                      a typical problem is illegal transshipment, where groups of ves-
tablets and GPS trackers, has led to the production of large                      sels move together "close" enough for some time duration and
amounts of mobility related data. By nature, this kind of data                    with low speed. It becomes obvious that predicting co-movement
are streaming and there are several application scenarios where                   patterns could help in predicting illegal transshipment events.
the processing needs to take place in an online fashion. These                    Finally, in large epidemic crisis, contact tracing is one of the tools
properties have posed new challenges in terms of efficient stor-                  to identify individuals that have been close to infected persons
age, analytics, and knowledge extraction out of such data. One                    for some time duration. Being able to predict these groups can
of these challenges is online cluster analysis, where the goal is                 help avoid future contacts with possibly infected individuals.
to unveil hidden patterns of collective behaviour from streaming                      The problem of predicting the spatial properties of group pat-
trajectories, such as co-movement patterns [2, 5, 6, 8, 33]. What                 ters has only been recently studied [12]. In more detail, the au-
is even more challenging is predictive analytics over mobility                    thors in [12] adopt a spherical definition of groups, where each
data, where the goal is to predict the future behaviour of moving                 group consists of moving objects that are confined within a ra-
objects, which can have a wide range of applications, such as                     dius 𝑑 and their goal is to predict the centroid of the groups at the
predicting collisions, future encounters, traffic jams, etc. At an                next timeslice. However, this approach is offline and cannot be
individual level, a typical and well-studied example of such ana-                 applied in an online scenario. Furthermore, the group definition
lytics is future location prediction [23, 24, 27, 32], where the goal             adopted in [12] is rather limited, since the identify only spherical
is to predict the future location of a moving object, given a look-               groups, as opposed to [33] where both spherical and density-
ahead time. However, prediction of future mobility behaviour                      connected clusters can be identified. Finally, the authors in [12]
at a collective level and more specifically Online Prediction of                  predict only the centroids of the clusters and not the shape and
                                                                                  the membership of each cluster.
© 2021 Copyright for this paper by its the author(s). Published in the Workshop
Proceedings of the EDBT/ICDT 2021 Joint Conference (March 23-26, 2021, Nicosia,
                                                                                      Inspired by the above, the problem that we address in this
Cyprus) on CEUR-WS.org. Use permitted under the terms of the Creative Commons     paper is the Online Prediction of Co-movement Patterns. Informally,
License Attribution 4.0 International (CC BY 4.0).
                            Figure 1: Predicting evolving clusters via (singular) trajectory prediction


given a look-ahead time interval Δ𝑡, the goal is to predict the        mined patterns, their main limitation is that they search for spe-
groups, i.e. their spatial shape (spherical or density-connected),     cific collective behaviours, defined by respective parameters. An
temporal coverage and membership, after Δ𝑡 time. In more detail,       approach that defines a new generalized mobility pattern is pre-
we split the original problem into two sub-problems, namely            sented in [5]. In more detail, the general co-movement pattern
Future Location Prediction and Evolving Cluster Detection. The         (GCMP), is proposed, which includes Temporal Replication and
problem of Online Prediction of Co-movement Patterns is quite          Parallel Mining, a method that, as suggested by its name, splits
challenging, since, apart from the inherent difficulty of predicting   a data snapshot spatially and replicates data when necessary to
the future, we also need to define how the error between the           ensure full coverage, and Star Partitioning and ApRiori Enumer-
actual and the predicted clusters will be measured. This further       ator, a technique that uses graph pruning in order to avoid the
implies that a predicted cluster should be correctly matched with      data replication that takes place in the previous method. In [8],
the corresponding actual cluster which is not a straightforward        the authors propose a frequent co-movement pattern (f-CoMP)
procedure. To the best of our knowledge, the problem of Online         definition for discovering patterns at multiple spatial scales, also
Prediction of Co-movement Patterns, has not been addressed in          exploiting the overall shape of the objects’ trajectories, while
the literature yet. Our main contributions are the following:          at the same time it relaxes the temporal and spatial constraints
                                                                       of the seminal works (i.e. Flocks, Convoys, etc.) in order to dis-
    • We provide an accurate solution to the problem of Online         cover more interesting patterns. The authors in [2, 6], propose
      Prediction of Co-movement Patterns.                              a two-phase online distributed co-movement pattern detection
    • We propose a co-movement pattern similarity measure,             framework, which includes the clustering and the pattern enu-
      which helps us “match” the predicted with the actual clus-       meration phase, respectively. During the clustering phase for
      ters.                                                            timestamp 𝑡𝑠 , the snapshot 𝑆𝑡 is clustered using Range-Join and
    • We perform an experimental study with a real dataset             DBSCAN.
      from the maritime domain, which verifies the accuracy of             Another line of research, tries to discover groups of either
      our proposed methodology.                                        entire or portions of trajectories considering their routes. There
   The rest of the paper is organized as follows. Section 2 dis-       are several approaches whose goal is to group whole trajecto-
cusses related work. In Section 3, we formally define the problem      ries, including T-OPTICS [18, 19], that incorporates a trajectory
of Online Prediction of Co-movement Patterns. Subsequently, in         similarity function into the OPTICS algorithm. However, discov-
Section 4 we propose our two-step methodology and in Section 5,        ering clusters of complete trajectories can overlook significant
we introduce a co-movement pattern similarity measure along            patterns that might exist only for portions of their lifespan. To
with the cluster “matching” algorithm. Section 6, presents our         deal with this, another line of research has emerged, that of Sub-
experimental findings and, finally, in Section 7 we conclude the       trajectory Clustering[20, 21, 28, 29], where the goal is to partition
paper and discuss future extensions.                                   a trajectory into subtrajectories, whenever the density or the
                                                                       composition and its neighbourhood changes “significantly”, then
                                                                       form groups of similar ones, while, at the same time, separate
2   RELATED WORK                                                       the ones that fit into no group, called outliers.
The work performed in this paper is closely related to three topics,       Another perspective into co-movement pattern discovery, is to
(a) trajectory clustering and more specifically co-movement pat-       reduce cluster types into graph properties and view them as such.
tern discovery, (b) future location prediction and (c) co-movement     In [31, 33], the authors propose a novel co-movement pattern
pattern prediction.                                                    definition, called evolving clusters, that unifies the definitions
   Co-movement patterns. One of the first approaches for iden-         of flocks and convoys and reduces them to Maximal Cliques
tifying such collective mobility behaviour is the so-called flock      (MC), and Connected Subgraphs (MCS), respectively. In addition,
pattern [14], which identifies groups of at least 𝑚 objects that       the authors propose an online algorithm, that discovers several
move within a disk of radius 𝑟 for at least 𝑘 consecutive time-        evolving cluster types simultaneously in real time using Apache
points. Inspired by this, several related works followed, such as      Kafka® , without assuming temporal alignment, in constrast to
moving clusters [11], convoys [10], swarms [16], platoons [15],        the seminal works (i.e. flocks, convoys).
traveling companion [30] and gathering pattern [38]. Even though           In the proposed predictive model, we will use the definition
all of these approaches provide explicit definitions of several        of evolving clusters [33] for co-movement pattern discovery. The
reason why is this the most appropriate, is that we can predict        3     PROBLEM DEFINITION
the course of several pattern types at the same time, without the      As already mentioned, we divide the problem into two sub-
need to call several other algorithms, therefore adding redundant      problems, namely Future Location Prediction and Evolving Clus-
computational complexity.                                              ters Detection. Before proceeding to the actual formulation of the
   Future Location Prediction. The fact that the Future Lo-            problem, let us provide some preliminary definitions.
cation Prediction (FLP) problem has been extensivelly studied
brings up its importance and applicability in a wide range of             Definition 3.1. (Trajectory) A trajectory 𝑇 = {𝑝 1, . . . 𝑝𝑛 } is
applications. Towards tackling the FLP problem, one line of work       considered as a sequence of timestamped locations, where 𝑛 is
includes efforts that take advantage of historical movement pat-       the latest reported position of 𝑇 . Further, 𝑝𝑖 = {𝑥𝑖 , 𝑦𝑖 , 𝑡𝑖 }, with
terns in order to predict the future location. Such an approach is     1 ≤ 𝑖 ≤ 𝑛.
presented in [32], where the authors propose MyWay, a hybrid,            Definition 3.2. (Future Location Prediction). Given an input
pattern-based approach that utilizes individual patterns when          dataset 𝐷 = {𝑇1, . . . ,𝑇 |𝐷 | } of trajectories and a time interval
available, and when not, collective ones, in order to provide more     Δ𝑡, our goal is ∀𝑇𝑖 ∈ 𝐷 to predict 𝑝𝑝𝑟𝑒𝑑   𝑖          𝑖
                                                                                                                        = {𝑥𝑝𝑟𝑒𝑑    𝑖
                                                                                                                                 , 𝑦𝑝𝑟𝑒𝑑 } at
accurate predictions and increase the predictive ability of the                   𝑖
                                                                       timestamp 𝑡𝑝𝑟𝑒𝑑 = 𝑡𝑛𝑖 + Δ𝑡.
system. In another effort, the authors in [23, 24] utilize the work
done by [29] on distributed subtrajectory clustering in order to          An informal definition regarding group patterns could be: “a
be able to extract individual subtrajectory patterns from big mo-      large enough number of objects moving close enough to each
bility data. These patterns are subsequently utilized in order to      other, in space and time, for some time duration”. As already
predict the future location of the moving objects in parallel.         mentioned, in this paper we adopt the definition provided in [33].
   A different way of addressing the FLP problem includes ma-
chine learning approaches.                                                Definition 3.3. (Evolving Cluster). Given: a set 𝐷 of trajecto-
   Recurrent Neural Network (RNN) -based models [26] consti-           ries, a minimum cardinality threshold 𝑐, a maximum distance
tute a popular method for trajectory prediction due to their pow-      threshold 𝜃 , and a minimum time duration threshold 𝑑, an Evolv-
erful ability to fit complex functions, along with their ability of    ing Cluster ⟨𝐶, 𝑡𝑠𝑡𝑎𝑟𝑡 , 𝑡𝑒𝑛𝑑 , 𝑡𝑝⟩ is a subset 𝐶 ∈ 𝐷 of the moving
adjusting the dynamic behaviour as well as capturing the causal-       objects’ population, |𝐶 | ≥ 𝑐, which appeared at time point 𝑡𝑠𝑡𝑎𝑟𝑡
ity relationships across sequences. However, research in the mar-      and remained alive until time point 𝑡𝑒𝑛𝑑 (with 𝑡𝑒𝑛𝑑 − 𝑡𝑠𝑡𝑎𝑟𝑡 ≥ 𝑑)
itime domain is limited regarding vessel trajectory prediction         during the lifetime [𝑡𝑠𝑡𝑎𝑟𝑡 , 𝑡𝑒𝑛𝑑 ] of which the participating mov-
and Gated Recurrent Units (GRU) [3] models, which constitute           ing objects were spatially connected with respect to distance 𝜃
the newer generation of RNN.                                           and cluster type 𝑡𝑝.
   Suo et.al. [27] presented a GRU model to predict vessel tra-            Definition 3.4. (Group Pattern Prediction Online). Given: a
jectories based on a) the Density-Based Spatial Clustering of          set 𝐷 of trajectories, 𝐺 of co-movement patterns up to timeslice
Applications with Noise (DBSCAN) algorithm to derive main tra-         𝑇 𝑆 𝑛𝑜𝑤 and a look-ahead threshold Δ𝑡, we aim to predict all the
jectories and, b) a symmetric segmented-path distance approach         valid co-movement patterns 𝐺 ′ ∈ (𝑇 𝑆 𝑛𝑜𝑤 ,𝑇 𝑆 𝑛𝑜𝑤 + Δ𝑡].
to eliminate the influence of a large number of redundant data
and to optimize incoming trajectories. Ground truth data from              Figure 1 provides an illustration of Definition 3.4. More specif-
AIS raw data in the port of Zhangzhou, China were used to train        ically, we know the movement of nine objects from 𝑇 𝑆 1 until 𝑇 𝑆 3
and verify the validity of the proposed model.                         and via EvolvingClusters with 𝑐 = 3 and 𝑑 = 2 that they form four
   Liu et.al. [17] proposed a trajectory classifier called Spatio-     evolving clusters 𝑃1 = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓 , 𝑔, ℎ, 𝑖}, 𝑃2 = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒},
Temporal GRU to model the spatio-temporal correlations and ir-         𝑃 3 = {𝑎, 𝑏, 𝑐}, 𝑃4 = {𝑏, 𝑐, 𝑑, 𝑒}, 𝑃5 = {𝑔, ℎ, 𝑖}. Our goal is to predict
regular temporal intervals prevalently presented in spatio-temporal    their respective locations until 𝑇 𝑆 5 . Running EvolvingClusters
trajectories. Particularly, a segmented convolutional weight mech-     with the same parameters for the predicted timeslices, reveals us
anism was proposed to capture short-term local spatial corre-          (with high probability) that 𝑃2, 𝑃3, 𝑃4, 𝑃5 will continue to exist as
lations in trajectories along with an additional temporal gate         well as the creation of a new pattern 𝑃6 = {𝑓 , 𝑔, ℎ, 𝑖}.
to control the information flow related to the temporal interval
information.
                                                                       4     METHODOLOGY
   Wang et.al. [34] aiming at predicting the movement trend of         In this section we present the proposed solution to the problem
vessels in the crowded port water of Tianjin port, proposed a          of Online Prediction of Co-movement Patterns, composed of two
vessel berthing trajectory prediction model based on bidirectional     parts: a) the FLP method, and b) the Evolving Cluster Discovery
GRU (Bi-GRU) and cubic spline interpolation.                           algorithm. Also, an example is presented illustrating the approach
   Co-movement pattern prediction. The most similar work               operation.
to ours has only been recently presented in [12]. More specifically,
the authors in [12], divide time into time slices of fixed step size   4.1    Overview
and adopt a spherical definition of groups, where each group           Figure 2 illustrates the architecture of our proposed methodology.
consists of moving objects that are confined within a radius 𝑑         First we split the problem of Online Prediction of Co-movement
and their goal is to predict the centroid of the groups at the         Patterns into two parts, the FLP, and the Evolving Cluster Dis-
next timeslice. However, this approach is offline and cannot be        covery. The FLP method is, also, divided to two parts: a) the
applied in an online scenario. Furthermore, the group definition       FLP-offline part, where the training procedure of the model is
adopted in [12] is rather limited, since the identify only spherical   taking place, and b) the FLP-online part, where the trained FLP
groups, as opposed to [33] where both spherical and density-           model is applied to streaming GPS locations to predict the next
connected clusters can be identified. Finally, the authors in [12]     objects’ location.
predict only the centroids of the clusters and not the shape and          Thus, our proposed approach is further divided in the offline
the membership of each cluster.                                        phase and the online one. Particularly, at the offline phase, we
                                                                     Predicted                                     Predicted
                                                      id
                                                                    Trajectories                                Evolving Clusters


                              yer
                                      Streaming       speed


                           e La
                                     GPS Locations
                                                      longitutde                         EvolvingClusters


                         Onlin
                                                      latitude

                                                              ...


                              yer
                              e La


                                                                                        Future Location
                                                                                     Prediction (FLP) Model
                         Ofﬂin


                                                       Historic
                                                     Trajectories


                    Figure 2: Workflow for evolving clusters prediction via (singular) trajectory prediction


train our FLP model by using historic trajectories. Afterwards, at                 differences are computed between consecutive points of each
the online phase we receive the streaming GPS locations in order                   vessel.
to use them to create a buffer for each moving object. Then, we                       In this work, a GRU-based model is employed to solve the
use our trained FLP model to predict the next objects’ location                    future location prediction problem. The proposed GRU-based
and apply EvolvingClusters to each produced timeslice.                             network architecture is composed of the following layers: a) an
                                                                                   input layer of four neurons, one for each input variable, b) a single
4.2    Future Location Prediction                                                  GRU hidden layer composed of 150 neurons, c) a fully-connected
Trajectories can be considered as time sequence data [37] and                      hidden layer composed of 50 neurons, and d) an output layer
thus are suited to be treated with techniques that are capable                     of two neurons, one for each prediction coordinate (longitude
of handling sequential data and/or time series [25]. Over the                      and latitude). A schematic overview of the proposed network
past two decades, the research interest on forecasting time series                 architecture is presented in Figure 3. Also, details for the Back-
has been moved to RNN-based models, with the GRU architec-                         ward Propagation Through Time algorithm and for the Adam
ture being the newer generation of RNN, which has emerged                          approach, which were employed for the NN learning purposes,
as an effective technique for several difficult learning problems                  can be found in [36] and [13], respectively.
(including sequential or temporal data -based applications) [4].
Although, the most popular RNN-based architecture is the well-                     4.3      Evolving Clusters Discovery
known Long Short-Term Memory (LSTM) [9], GRU present some                          After getting the predicted locations for each moving object, we
interesting advantages over the LSTM. More specifically, GRU                       use EvolvingClusters in order to finally present the predicted
are less complicated, easier to modify and faster to train. Also,                  co-movement patterns. Because the sampling rate may vary for
GRU networks achieve better accuracy performance compared                          each moving object, we use linear interpolation to temporally
to LSTM models on trajectory prediction problems on various                        align the predicted locations at a common timeslice with a stable
domains, such as on maritime [27], on aviation [7] and on land                     sampling (alignment) rate 𝑠𝑟 .
traffic [1]. Hence, this work follows this direction and employs a                    Given a timeslice 𝑇 𝑆𝑛𝑜𝑤 , EvolvingClusters works in a nutshell,
GRU-based method.                                                                  as follows:
   GRU includes internal mechanisms called gates that can regu-
late the flow of information. Particularly, the GRU hidden layer                         • Calculates the pairwise distance for each object within
include two gates, a reset gate which is used to decide how much                           𝑇 𝑆𝑛𝑜𝑤 , and drop the locations with distance less than 𝜃 ;
past information to forget and an update gate which decides what                         • Creates a graph based on the filtered locations, and extract
information to throw away and what new information to add.                                 its Maximal Connected Subgraphs (MCS) and Cliques (MC)
We briefly state the update rules for the employed GRU layer.                              with respect to 𝑐;
For more details, the interested reader is referred to the original                      • Maintains the currently active (and inactive) clusters, given
publications [3]. Also, details for the BPTT algorithm, which was                          the MCS and MC of 𝑇 𝑆𝑛𝑜𝑤 and the recent (active) pattern
employed for training the model, can be found in [35].                                     history; and
                                                                                         • Outputs the eligible active patterns with respect to 𝑐, 𝑡 and
              z𝑘 = 𝜎 (Wp̃𝑧 · p̃𝑘 + Wℎ𝑧 · h𝑘−1 + b𝑧 )                      (1)              𝜃.
              r𝑘 = 𝜎 (Wp̃𝑟 · p̃𝑘 + Wℎ𝑟 · h𝑘−1 + b𝑟 )                      (2)          The output of EvolvingClusters, and by extension of the whole
                                                                                   predictive model, is a tuple of four elements, the set of objects
        h̃𝑘 = tanh(Wp̃ℎ · p̃𝑘 + Wℎℎ · (r𝑘 ∗ h𝑘−1 ) + bℎ )                 (3)
                                                                                   𝑜𝑖𝑑𝑠 that form an evolving cluster, the starting time 𝑠𝑡, the ending
                 h𝑘 = z𝑘 ⊙ h𝑘−1 + (1 − z𝑘 ) ⊙ h̃𝑘                (4)               time 𝑒𝑡, and the type 𝑡𝑝 of the group pattern, respectively. For
    where z and r represent the update and reset gates, respec-                    instance, the final output of the model at the example given at
tively, h̃ and h represent the intermediate memory and output,                     Section 3 would be a set of 4-element tuples, i.e., {(𝑃2,𝑇 𝑆 1,𝑇 𝑆 5, 2),
                                                                                                                                                  Ð
respectively. Also, in these equations, the W∗ variables are the                   (𝑃3,𝑇 𝑆 1,𝑇 𝑆 5, 1), (𝑃4,𝑇 𝑆 1,𝑇 𝑆 4, 1), (𝑃 5,𝑇 𝑆 1,𝑇 𝑆 5, 1)} {(𝑃4,𝑇 𝑆 1,
weight matrices and the b∗ variables are the biases. Moreover,                     𝑇 𝑆 5, 2), (𝑃 6,𝑇 𝑆 4,𝑇 𝑆 5, 1)}, where 𝑡𝑝 = 1(2) corresponds to MC
p̃ represents the input, which is composed of the differences in                   (respectively, MCS). We observe that, the first four evolving clus-
space (longitude and latitude), the difference in time and the time                ters are maintained exactly as found in the historic dataset. In
horizon for which we want to predict the vessel’s position; the                    addition to those, we predict (via the FLP model) the following:
                                                 Figure 3: GRU-based neural network architecture.


     • 𝑃4 becomes inactive at timeslice 𝑇 𝑆 5 , but it remains active                 where 𝜆1 + 𝜆2 + 𝜆3 = 1, 𝜆𝑖 ∈ (0, 1) , 𝑖 ∈ {1, 2, 3}.
       as an MCS at timeslice 𝑇 𝑆 5                                                      This further implies that a predicted cluster should be correctly
     • A new evolving cluster 𝑃6 is discovered at timeslice 𝑇 𝑆 5                     matched with the corresponding actual cluster which is not a
   In the Sections that will follow, we define the evaluation mea-                    straightforward procedure. Our methdology for matching each
sure we use in order to map, each discovered evolving cluster                         predicted co-movement pattern 𝐶𝑝𝑟𝑒𝑑 with the corresponding
from the predicted to the respective ones in the actual locations,                    actual one 𝐶𝑎𝑐𝑡 is depicted in Algorithm 1.
as well present our preliminary results.
                                                                                        Algorithm 1: ClusterMatching. Matches the pre-
5    EVALUATION MEASURES                                                                dicted with the actual evolving clusters
The evaluation of a co-movement pattern prediction approach                              Input: Evolving Clusters disovered using the predicted
is not a straightforward task, since we need to define how the                                   𝐸𝐶𝑝 ; and actual 𝐸𝐶𝑎 data-points; Measures’
error between the predicted and the actual co-movement patterns                                 weights 𝜆𝑖 , 𝑖 ∈ {1, 2, 3}
will be quantified. Intuitively, we try to match each predicted                          Output: “Matched” Evolving Clusters 𝐸𝐶𝑚
co-movement pattern with the most similar actual one. Towards                          1 𝐸𝐶𝑚 ← {}
this direction, we need to define a similarity measure between                         2 for predicted pattern 𝐶𝑝𝑟𝑒𝑑 ∈ 𝐸𝐶𝑝 do
co-movement patterns. In more detail, we break down this prob-                         3     𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦_𝑠𝑐𝑜𝑟𝑒𝑠 ← {}
lem into three subproblems, the spatial similarity, the temporal
                                                                                       4     𝑡𝑜𝑝𝑆𝑖𝑚 = 0
similarity and the membership similarity. Concerning the spatial
                                                                                       5     for actual pattern 𝐶𝑎𝑐𝑡 ∈ 𝐸𝐶𝑎 do
similarity this defined as follows:
                                                                                       6         calculate 𝑆𝑖𝑚 ∗ (𝐶𝑝𝑟𝑒𝑑 , 𝐶𝑎𝑐𝑡 )
                                                         Ñ                             7         if 𝑆𝑖𝑚 ∗ (𝐶𝑝𝑟𝑒𝑑 , 𝐶𝑎𝑐𝑡 ) ≥ 𝑡𝑜𝑝𝑆𝑖𝑚 then
                                        𝑀𝐵𝑅(𝐶𝑝𝑟𝑒𝑑 )          𝑀𝐵𝑅(𝐶𝑎𝑐𝑡 )
    𝑆𝑖𝑚𝑠𝑝𝑎𝑡𝑖𝑎𝑙 (𝐶𝑝𝑟𝑒𝑑 , 𝐶𝑎𝑐𝑡 ) =                         Ð                      (5)    8             𝑡𝑜𝑝𝑆𝑖𝑚 = 𝑆𝑖𝑚 ∗ (𝐶𝑝𝑟𝑒𝑑 , 𝐶𝑎𝑐𝑡 )
                                        𝑀𝐵𝑅(𝐶𝑝𝑟𝑒𝑑 )          𝑀𝐵𝑅(𝐶𝑎𝑐𝑡 )
                                                                                       9             𝑚𝑎𝑡𝑐ℎ𝑏𝑒𝑠𝑡 ← 𝐶𝑎𝑐𝑡
where 𝑀𝐵𝑅(𝐶𝑝𝑟𝑒𝑑 ) (𝑀𝐵𝑅(𝐶𝑎𝑐𝑡 )) is the Minimum Bounding Rec-                           10         end
tangle of the predicted co-movement pattern (actual co-movement                       11     end
pattern, respectively). Regarding the temporal similarity:                            12     𝐸𝐶𝑚 ← 𝐸𝐶𝑚 ∪ 𝑚𝑎𝑡𝑐ℎ𝑏𝑒𝑠𝑡
                                                                                      13 end
                                                         Ñ
          𝑡𝑒𝑚𝑝
                                     𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝐶𝑝𝑟𝑒𝑑 )       𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝐶𝑎𝑐𝑡 )
    𝑆𝑖𝑚          (𝐶𝑝𝑟𝑒𝑑 , 𝐶𝑎𝑐𝑡 ) =                       Ð
                                     𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝐶𝑝𝑟𝑒𝑑 )       𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝐶𝑎𝑐𝑡 )            In more detail, we “match” each predicted co-movement pat-
                                                                                (6)   tern 𝐶𝑝𝑟𝑒𝑑 with the most similar actually detected pattern 𝐶𝑎𝑐𝑡 .
                                                                                      After all predicted clusters get traversed we end up with 𝐸𝐶𝑚
where 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝐶𝑝𝑟𝑒𝑑 ) (𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝐶𝑎𝑐𝑡 )) is the time interval when                  wich holds all the “matchings”, which subsequently will help us
the the predicted co-movement pattern was valid (actual co-                           in evaluate the prediction procedure by quantifuing the error
movement pattern, respectively). As for the membership sim-                           between the predicted and the actual co-movement patterns.
ilarity, we adopt the Jaccard similarity:
                                                  Ñ
                                         |𝐶𝑝𝑟𝑒𝑑 𝐶𝑎𝑐𝑡 |                                6     EXPERIMENTAL STUDY
               𝑚𝑒𝑚𝑏𝑒𝑟
            𝑆𝑖𝑚        (𝐶𝑝𝑟𝑒𝑑 , 𝐶𝑎𝑐𝑡 ) =          Ð               (7)                 In this section, we evaluate our predictive model on a real-life
                                         |𝐶𝑝𝑟𝑒𝑑 𝐶𝑎𝑐𝑡 |
                                                                                      mobility dataset from the maritime domain, and present our
    Finally, we define the co-movement pattern similarity as:                         preliminary results.

                       
                         𝜆1 · 𝑆𝑖𝑚𝑠𝑝𝑎𝑡𝑖𝑎𝑙 +                                           6.1     Experimental Setup
                       
                                  𝑡𝑒𝑚𝑝                                               All algorithms were implemented in Python3 (via Anaconda31
                        𝜆2 · 𝑆𝑖𝑚       +                𝑆𝑖𝑚𝑡𝑒𝑚𝑝 > 0
                       
                       
                       
    ∗
                                                                                     virtual environments). The experiments were conducted using
 𝑆𝑖𝑚 (𝐶𝑝𝑟𝑒𝑑 , 𝐶𝑎𝑐𝑡 ) =    𝜆3 · 𝑆𝑖𝑚𝑚𝑒𝑚𝑏𝑒𝑟                                        (8)
                       
                       
                                                                                     Apache Kafka® with 1 topic for the transmitted (loaded from
                       
                                                                                     a CSV file) and predicted locations, as well as 1 consumer for
                       
                       
                        0                               𝐸𝑙𝑠𝑒
                                                                                      1 https://www.anaconda.com/
FLP and evolving cluster discovery, respectively. The machine
we used is a single node with 8 CPU cores, 16 GB of RAM and                                1.0
256 GB of HDD, provided by okeanos-knossos2 , an IAAS service
for the Greek Research and Academic Community.
                                                                                           0.8

                                                                                           0.6
6.2      Dataset
It is a well-known fact that sensor-based information is prone to                          0.4
errors due to device malfunctioning. Therefore, a necessary step
before any experiment(s) is that of pre-processing. In general,                            0.2
pre-processing of mobility data includes data cleansing (e.g. noise
elimination) as well as data transformation (e.g. segmentation,                            0.0
temporal alignment), tasks necessary for whatever analysis is
going to follow [22].                                                                              simtemp simspatial simmember                sim∗
    In the experiments that will follow, we use a real-life mobility
dataset3 from the maritime domain. The dataset, as product of
our preprocessing pipeline, consists of 148,223 records from 246                         Figure 4: Distribution of Cluster Similarity Measures and
fishing vessels organized in 2,089 trajectories moving within the                                         Total Cluster Similarity
Aegean Sea. The dataset ranges in time and space, as follows:
     • Temporal range: 2nd June, 2018 – 31st August, 2018 (approx.                                           Min.    Q25    Q50    Q75    Mean.     Max.
       3 months)
                                                                                           Record Lag     0      0     0    0    0.01   1
     • Spatial range: longitude in [23.006, 28.996]; latitude in
                                                                                           Consump. Rate 0       0     0    0    2.26   76.99
       [35.345, 40.999]
                                                                                         Table 1: Timeliness of the Proposed Methodology using
   During the preprocessing stage, we drop erroneous records (i.e.
                                                                                         Apache Kafka
GPS locations) based on a speed threshold 𝑠𝑝𝑒𝑒𝑑𝑚𝑎𝑥 as well as
stop points (i.e. locations with speed close to zero); afterwards we
organize the cleansed data into trajectories based on their pair-
wise temporal difference, given a threshold 𝑑𝑡. Finally, in order
to discover evolving clusters, we need a stable and temporally                           close to the median, we visualize the trajectory of each partici-
aligned sampling rate. For the aforementioned dataset, we set                            pating object on the map, as well as the MBRs for each respective
the following thresholds: 𝑠𝑝𝑒𝑒𝑑𝑚𝑎𝑥 = 50𝑘𝑛𝑜𝑡𝑠, 𝑑𝑡 = 30𝑚𝑖𝑛., and                           timeslice, in order to visualize the clusters’ temporal and spa-
alignment rate equal to 1𝑚𝑖𝑛.                                                            tial similarity. It can be observed that deviations from the actual
   The rationale behind these thresholds stems from the char-                            trajectories resulted in minor changes in the area of the points’
acteristics of the dataset which were unveiled after a statistical                       MBR, and consequently to the overall similarity.
analysis of the distribution of the 𝑠𝑝𝑒𝑒𝑑 and 𝑑𝑡 between succesive
points of the same trajectory.

6.3      Preliminary Results
In this section, we evaluate the prediction error of the proposed
model with respect to the “ground truth”. We define as “ground
truth”, the discovered evolving clusters on the actual GPS loca-
tions. For the pattern discovery phase, we tune 𝐸𝑣𝑜𝑙𝑣𝑖𝑛𝑔𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑠,
using 𝑐 = 3 vessels, 𝑑 = 3 timeslices, and 𝜃 = 1500 meters. For the
following experimental study, we focus – without loss of general-
ity – on the MCS output of EvolvingClusters (density-connected
clusters).
   Figure 4 illustrates the distribution of the three cluster simi-
larity measures, namely 𝑠𝑖𝑚𝑡𝑒𝑚𝑝 , 𝑠𝑖𝑚𝑠𝑝𝑎𝑡𝑖𝑎𝑙 , and 𝑠𝑖𝑚𝑚𝑒𝑚𝑏𝑒𝑟 , as
well as the overall similarity 𝑆𝑖𝑚 ∗ . We observe that the majority
of the predicted clusters are very close to their “ground truth”
values, with the median overall similarity being almost 88%. This                          Figure 5: Trajectory of a predicted (blue) vs. an actual
is expected however, as the quality of EvolvingClusters’ output                                           evolving cluster (orange)
is determined by two factors; the selected parameters; and the
input data. Focusing on the latter4 , we observe that the algorithm
is quite insensitive to prediction errors, as deviations from the                           Finally, Table 1 presents the metrics on the Kafka Consumers
actual trajectory has minor impact to 𝑠𝑖𝑚𝑠𝑝𝑎𝑡𝑖𝑎𝑙 .                                       used for the online layer of our predictive model, namely, Record
   Figure 5 illustrates the previous discussion. More specifically,                      Lag and Consumption Rate. Observing the Record Lag, we deduce
for the predicted and corresponding actual MCS with similarity                           that our algorithm can keep up with the data-stream in a timely
                                                                                         manner, while looking at Consumption Rate (i.e., the average
2 https://okeanos-knossos.grnet.gr/home/
3 Kindly provided to us by MarineTraffic.
                                                                                         number of records consumed per second) we conclude that our
4 The parameter sensitivity of EvolvingClusters is out of the scope of this paper. For   proposed solution can process up to almost 77 records per second,
more details see [33]                                                                    which is compliant with the online real-time processing scenario.
7     CONCLUSIONS AND FUTURE WORK                                                        [12] Sameera Kannangara, Hairuo Xie, Egemen Tanin, Aaron Harwood, and
                                                                                              Shanika Karunasekera. 2020. Tracking Group Movement in Location Based
In this paper, we proposed an accurate solution to the problem                                Social Networks. In SIGSPATIAL ’20: 28th International Conference on Advances
of Online Prediction of Co-movement Patterns, which is divided                                in Geographic Information Systems. ACM, 251–262.
                                                                                         [13] P. D. Kingma and J. Ba. 2015. Adam: A Method for Stochastic Optimization.
into two phases: Future Location Prediction and Evolving Cluster                              In International Conference on Learning Representations (ICLR).
Detection. The proposed method is based on a combination of                              [14] Patrick Laube, Stephan Imfeld, and Robert Weibel. 2005. Discovering relative
GRU models and Evolving Cluster Detection algorithm and is                                    motion patterns in groups of moving point objects. IJGIS 19, 6 (2005), 639–668.
                                                                                         [15] Yuxuan Li, James Bailey, and Lars Kulik. 2015. Efficient mining of platoon
evaluated through a real-world dataset from the maritime domain                               patterns in trajectory databases. Data Knowl. Eng. 100 (2015), 167–187.
taking into account a novel co-movement pattern similarity mea-                          [16] Zhenhui Li, Bolin Ding, Jiawei Han, and Roland Kays. 2010. Swarm: Mining
sure, which is able to match the predicted clusters with the actual                           Relaxed Temporal Moving Object Clusters. PVLDB 3, 1 (2010), 723–734.
                                                                                         [17] H. Liu, H. Wu, W. Sun, and I. Lee. 2019. Spatio-Temporal GRU for Trajectory
ones. Our study on a real-life maritime dataset demonstrates                                  Classification. In 2019 IEEE International Conference on Data Mining (ICDM).
the efficiency and effectiveness of the proposed methodology.                                 1228–1233. https://doi.org/10.1109/ICDM.2019.00152
                                                                                         [18] Mirco Nanni and Dino Pedreschi. 2006. Time-focused clustering of trajectories
Thus, based on the potential applications, as well as the qual-                               of moving objects. J. Intell. Inf. Syst. 27, 3 (2006), 267–289.
ity of the results produced, we believe that the proposed model                          [19] Nikos Pelekis, Stylianos Sideridis, Panagiotis Tampakis, and Yannis Theodor-
can be a valuable utility for researchers and practitioners alike.                            idis. 2016. Simulating Our LifeSteps by Example. ACM Trans. Spatial Algo-
                                                                                              rithms Syst. 2, 3 (2016), 11:1–11:39.
In the near future, we aim to develop an online co-movement                              [20] Nikos Pelekis, Panagiotis Tampakis, Marios Vodas, Christos Doulkeridis, and
pattern prediction approach that, instead of breaking the prob-                               Yannis Theodoridis. 2017. On temporal-constrained sub-trajectory cluster
lem at hand into two disjoint sub-problems without any specific                               analysis. Data Min. Knowl. Discov. 31, 5 (2017), 1294–1330.
                                                                                         [21] Nikos Pelekis, Panagiotis Tampakis, Marios Vodas, Costas Panagiotakis, and
synergy (i.e. first predict the future location of objects and then                           Yannis Theodoridis. 2017. In-DBMS Sampling-based Sub-trajectory Clustering.
detect future co-movement patterns), will combine the two steps                               In EDBT. 632–643.
                                                                                         [22] Nikos Pelekis and Yannis Theodoridis. 2014. Mobility Data Management and
in a unified solution that will be able to directly predict the future                        Exploration. Springer.
co-movement patterns.                                                                    [23] Petros Petrou, Panagiotis Nikitopoulos, Panagiotis Tampakis, Apostolos Gle-
                                                                                              nis, Nikolaos Koutroumanis, Georgios M. Santipantakis, Kostas Patroumpas,
                                                                                              Akrivi Vlachou, Harris V. Georgiou, Eva Chondrodima, Christos Doulkeridis,
ACKNOWLEDGMENTS                                                                               Nikos Pelekis, Gennady L. Andrienko, Fabian Patterson, Georg Fuchs, Yannis
                                                                                              Theodoridis, and George A. Vouros. 2019. ARGO: A Big Data Framework for
This work was partially supported by projects i4Sea (grant T1EDK-                             Online Trajectory Prediction. In Proceedings of the 16th International Sympo-
03268) and Track&Know (grant agreement No 780754), which                                      sium on Spatial and Temporal Databases, SSTD 2019, Vienna, Austria, August
                                                                                              19-21, 2019. 194–197.
have received funding by the European Regional Development                               [24] Petros Petrou, Panagiotis Tampakis, Harris V. Georgiou, Nikos Pelekis, and
Fund of the EU and Greek national funds (through the Oper-                                    Yannis Theodoridis. 2019. Online Long-Term Trajectory Prediction Based on
ational Program Competitiveness, Entrepreneurship and Inno-                                   Mined Route Patterns. In MASTER@ECML-PKDD 2019. 34–49.
                                                                                         [25] A. Rossi, G. Barlacchi, M. Bianchini, and B. Lepri. 2020. Modelling Taxi Drivers’
vation, under the call Research-Create-Innovate) and the EU                                   Behaviour for the Next Destination Prediction. IEEE Transactions on Intelligent
Horizon 2020 R&I Programme, respectively.                                                     Transportation Systems 21, 7 (2020), 2980 – 2989.
                                                                                         [26] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning
                                                                                              representations by back-propagating errors. Nature 323 (1986), 533–536.
REFERENCES                                                                               [27] Yongfeng Suo, Wenke Chen, Christophe Claramunt, and Shenhua Yang. 2020.
                                                                                              A Ship Trajectory Prediction Framework Based on a Recurrent Neural Net-
 [1] A. Benterki, V. Judalet, M. Choubeila, and M. Boukhnifer. 2019. Long-Term                work. Sensors 20, 18 (2020). https://doi.org/10.3390/s20185133
     Prediction of Vehicle Trajectory Using Recurrent Neural Networks. In IECON          [28] Panagiotis Tampakis, Nikos Pelekis, Natalia V. Andrienko, Gennady L. An-
     2019 - 45th Annual Conference of the IEEE Industrial Electronics Society, Vol. 1.        drienko, Georg Fuchs, and Yannis Theodoridis. 2018. Time-Aware Sub-
     3817–3822.                                                                               Trajectory Clustering in Hermes@PostgreSQL. In ICDE. 1581–1584.
 [2] Lu Chen, Yunjun Gao, Ziquan Fang, Xiaoye Miao, Christian S. Jensen, and             [29] Panagiotis Tampakis, Nikos Pelekis, Christos Doulkeridis, and Yannis Theodor-
     Chenjuan Guo. 2019. Real-time Distributed Co-Movement Pattern Detection                  idis. 2019. Scalable Distributed Subtrajectory Clustering. In 2019 IEEE Interna-
     on Streaming Trajectories. Proc. VLDB Endow. 12, 10 (2019), 1208–1220.                   tional Conference on Big Data (Big Data). IEEE, 950–959.
 [3] Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau,             [30] Lu An Tang, Yu Zheng, Jing Yuan, Jiawei Han, Alice Leung, Chih-Chieh Hung,
     Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase                 and Wen-Chih Peng. 2012. On Discovery of Traveling Companions from
     Representations using RNN Encoder–Decoder for Statistical Machine Trans-                 Streaming Trajectories. In ICDE. 186–197.
     lation. In Proceedings of the 2014 Conference on Empirical Methods in Natural       [31] George S. Theodoropoulos, Andreas Tritsarolis, and Yannis Theodoridis. 2019.
     Language Processing (EMNLP). Association for Computational Linguistics,                  EvolvingClusters: Online Discovery of Group Patterns in Enriched Maritime
     Doha, Qatar, 1724–1734. https://doi.org/10.3115/v1/D14-1179                              Data. In MASTER@PKDD/ECML (Lecture Notes in Computer Science, Vol. 11889).
 [4] R. Dey and F. M. Salem. 2017. Gate-variants of Gated Recurrent Unit (GRU)                Springer, 50–65.
     neural networks. In 2017 IEEE 60th International Midwest Symposium on Cir-          [32] Roberto Trasarti, Riccardo Guidotti, Anna Monreale, and Fosca Giannotti.
     cuits and Systems (MWSCAS). 1597–1600. https://doi.org/10.1109/MWSCAS.                   2017. MyWay: Location prediction via mobility profiling. Inf. Syst. 64 (2017),
     2017.8053243                                                                             350–367.
 [5] Qi Fan, Dongxiang Zhang, Huayu Wu, and Kian-Lee Tan. 2016. A General                [33] Andreas Tritsarolis, George-Stylianos Theodoropoulos, and Yannis Theodor-
     and Parallel Platform for Mining Co-Movement Patterns over Large-scale                   idis. 2020. Online discovery of co-movement patterns in mobility data.
     Trajectories. Proc. VLDB Endow. 10, 4 (2016), 313–324.                                   International Journal of Geographical Information Science 0, 0 (2020), 1–27.
 [6] Ziquan Fang, Yunjun Gao, Lu Pan, Lu Chen, Xiaoye Miao, and Christian S.                  https://doi.org/10.1080/13658816.2020.1834562
     Jensen. 2020. CoMing: A Real-time Co-Movement Mining System for Stream-             [34] C. Wang, H. Ren, and H. Li. 2020. Vessel trajectory prediction based on AIS
     ing Trajectories. In SIGMOD 2020, David Maier, Rachel Pottinger, AnHai Doan,             data and bidirectional GRU. In 2020 International Conference on Computer
     Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 2777–                   Vision, Image and Deep Learning (CVIDL). 260–264. https://doi.org/10.1109/
     2780.                                                                                    CVIDL51233.2020.00-89
 [7] P. Han, W. Wang, Q. Shi, and J. Yang. 2019. Real-time Short- Term Trajectory        [35] P. J. Werbos. 1990. Backpropagation through time: what it does and how to
     Prediction Based on GRU Neural Network. In 2019 IEEE/AIAA 38th Digital                   do it. Proc. IEEE 78, 10 (1990), 1550–1560. https://doi.org/10.1109/5.58337
     Avionics Systems Conference (DASC). 1–8. https://doi.org/10.1109/DASC43569.         [36] P. J. Werbos. 1990. Backpropagation through time: what it does and how to
     2019.9081618                                                                             do it. Proc. IEEE 78, 10 (1990), 1550–1560.
 [8] Shahab Helmi and Farnoush Banaei Kashani. 2020. Multiscale Frequent Co-             [37] H. Xue, D. Q. Huynh, and M. Reynolds. 2018. SS-LSTM: A Hierarchical LSTM
     movement Pattern Mining. In 36th IEEE International Conference on Data                   Model for Pedestrian Trajectory Prediction. In 2018 IEEE Winter Conference
     Engineering, ICDE 2020. IEEE, Washington, DC, 829–840.                                   on Applications of Computer Vision. 1186–1194.
 [9] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory.               [38] Kai Zheng, Yu Zheng, Nicholas Jing Yuan, and Shuo Shang. 2013. On discovery
     Neural Computation 9, 8 (1997), 1735–1780.                                               of gathering patterns from trajectories. In ICDE. 242–253.
[10] Hoyoung Jeung, Man Lung Yiu, Xiaofang Zhou, Christian S. Jensen, and
     Heng Tao Shen. 2008. Discovery of convoys in trajectory databases. PVLDB 1,
     1 (2008), 1068–1080.
[11] Panos Kalnis, Nikos Mamoulis, and Spiridon Bakiras. 2005. On Discovering
     Moving Clusters in Spatio-temporal Data. In SSTD. 364–381.