=Paper=
{{Paper
|id=Vol-2841/BMDA_4
|storemode=property
|title=A Denoising Hybrid Model for Anomaly Detection in Trajectory Sequences
|pdfUrl=https://ceur-ws.org/Vol-2841/BMDA_4.pdf
|volume=Vol-2841
|authors=Maria Liatsikou,Symeon Papadopoulos,Lazaros Apostolidis,Ioannis Kompatsiaris
|dblpUrl=https://dblp.org/rec/conf/edbt/LiatsikouPAK21
}}
==A Denoising Hybrid Model for Anomaly Detection in Trajectory Sequences==
<pdf width="1500px">https://ceur-ws.org/Vol-2841/BMDA_4.pdf</pdf>
<pre>
             A Denoising Hybrid Model for Anomaly Detection in
                           Trajectory Sequences
                               Maria Liatsikou                                                        Symeon Papadopoulos
             Information Technologies Institute, CERTH                                      Information Technologies Institute, CERTH
                        Thessaloniki, Greece                                                           Thessaloniki, Greece
                       maria_liatsikou@iti.gr                                                            papadop@iti.gr

                           Lazaros Apostolidis                                                         Ioannis Kompatsiaris
             Information Technologies Institute, CERTH                                      Information Technologies Institute, CERTH
                        Thessaloniki, Greece                                                           Thessaloniki, Greece
                          laaposto@iti.gr                                                                  ikom@iti.gr

ABSTRACT                                                                               context dependent [25]. For instance, vehicle trajectory outliers
The advances of mobile technologies have led to massive amounts                        may happen because of anomalies in traffic, caused by various
of trajectory data, i.e. data about tracked routes of moving ob-                       events, like protests, accidents, physical disasters or because of
jects. The detection of anomalies in trajectory data is an evolving                    errors in the GPS devices or of unexpected drivers’ behaviors.
research domain, which has applications in traffic management                             In this work, we propose the use of deep learning based un-
and in public safety, but also in climate research and in animal                       supervised outlier detection and its combination with a density-
habit analysis. It is a challenging task due to the existence of non-                  based outlier detection algorithm, during the detection phase.
linear spatiotemporal dependencies. In this work we propose the                        Two types of autoencoder architectures are applied, both variants
combination of deep learning techniques with a traditional out-                        of LSTM-based networks. The aim is the detection of abnormal
lier detection methodology for detecting anomalous trajectories.                       trajectories, defining them as the ones that the autoencoder fails
Two variants of denoising sequential autoencoder architectures                         to reconstruct. Our models are trained using real-world data,
are applied for unsupervised anomaly detection in vehicle trajec-                      including both normal and abnormal trajectories. Since there is
tories, an LSTM-based autoencoder and a Sequence to Sequence                           no ground truth in our task, in order to evaluate our models on
LSTM autoencoder. A weighted distance-based loss function is                           unseen data and compare their performance, we generate differ-
optimized during the training phase, taking into account the im-                       ent types of abnormal data based on our test set and incorporate
pact of trajectory length on the detection outcome. We propose                         them in it. Anomalous trajectories are detected based on their
a hybrid architecture for the detection phase, which combines                          reconstruction errors.
each of the autoencoders with the Local Outlier Factor algorithm                          The proposed methods can address the problems of mixed dis-
– a density–based anomaly detection method – in order to detect                        tribution densities in a dataset and the dependencies on suitable
anomalies. Our models are evaluated on different variants of                           similarity metrics. These are common drawbacks of traditional
synthetic anomalies generated by our dataset. The results indi-                        outlier detection approaches, in which user-defined parameters,
cate a clear performance advantage of our approach compared                            like distance or density parameters, affect the outcome and are
to competitive baselines.                                                              not easily defined for sequence data. In general, deep learning
                                                                                       models have several advantages over other methods for outlier de-
KEYWORDS                                                                               tection in sequences as they can effectively address the challenges
                                                                                       of feature extraction, high dimensionality and non-linearity [18].
Trajectory Analysis, Anomaly Detection, Denoising Autoencoder,
                                                                                       Moreover, there is no need to explicitly describe a normal pattern
Recurrent Neural Network, Long Short-Term Memory Network,
                                                                                       for a trajectory and to define the type of the anomaly (e.g. a
LSTM, Sequence to Sequence
                                                                                       trajectory with not so many neighbors or with small density).
                                                                                          Instead, the proposed method detects trajectories that differ
1    INTRODUCTION                                                                      from the normal patterns in a rather abstract manner. In spe-
                                                                                       cific, the model is trained to reconstruct most of the trajectories
The rapid advances in Global Positioning Systems (GPS), com-
                                                                                       included in a dataset, while it fails on the more irregular ones,
bined with those in computation and storage systems allow the
                                                                                       which are classified as anomalies.
large-scale collection and analysis of the digital traces obtained
                                                                                          What is more, the applied architectures are capable of captur-
from GPS devices. The knowledge extracted from vehicle or hu-
                                                                                       ing the temporal dependencies in the trajectory data as they are
man trajectory data can offer important insights in the fields of
                                                                                       both based on recurrent neural networks (RNNs).
traffic monitoring and management, public safety and surveil-
                                                                                          Several recent works have applied neural autoencoders in tra-
lance. A relevant research domain, which has lately been attract-
                                                                                       jectory outlier detection tasks [4, 9, 21, 25]. They either use feed
ing interest, is anomaly detection in trajectory data.
                                                                                       forward or sequential autoencoders in order to detect outliers in
    An anomalous trajectory is one that appears to be different
                                                                                       trajectory data. In these works a threshold is set on the errors’
compared to others with respect to some kind of similarity [3]. It
                                                                                       values for classifying a trajectory as outlier or not, or a qualita-
is really difficult to uniformly define abnormality, as it is usually
                                                                                       tive analysis is conducted on the trajectories with the highest
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-   reconstruction errors. Moreover, the autoencoders are trained
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)       minimizing a standard loss function, which does not account for
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0)
                                                                                       the trajectory length.
                                                                                          In this work we make the following contributions:
    • We propose a hybrid architecture that combines a sequen-          is detected if there is a drastic change in the similarity values.
      tial denoising autoencoder with a density-based outlier de-       Though related to our task, this method analyses temporal out-
      tection model for unsupervised anomaly detection in trajec-       liers in road network links rather than outliers in trajectories.
      tories. We apply the Local Outlier Factor (LOF) algorithm             Classification-based methods refer to outlier detection with
      on the autoencoder’s reconstruction errors of unseen data.        motion-classifiers or to machine learning classification models.
      In scoring outlier detection techniques an anomaly score is       In ROAM [16] – a supervised trajectory anomaly detection algo-
      assigned to each instance and the final output is a ranked        rithm – trajectories are represented as pattern fragments, which
      list of all instances with respect to their scores: those in      are transformed to a feature space of spatiotemporal attributes.
      the highest ranks are detected as anomalies.                      A hierarchical rule-based classifier is applied for classifying the
    • We train two variants of sequential denoising autoen-             trajectories in one of the two classes (supervised setting).
      coders as a first step in this hybrid setting, by minimizing a        iBAT [29] is a lazy isolation technique for trajectory outlier
      Haversine distance-based weighted loss function, effectively      detection. A grid is used to split the studied area, and all trajecto-
      accounting for the overall length of each trajectory.             ries that share the same starting/ending grid cells are grouped
    • We test on several anomaly detection tasks with synthetic         together to detect the anomalous trajectories among them. Thus,
      data, showing important gains in performance using the            this method focuses on discovering anomalous trajectories in
      proposed models against competitive baselines.                    specific regions. Furthermore, trajectories are represented as se-
                                                                        quences of discrete variables (cell id), which results in a loss of
                                                                        spatial information. One Class SVM clustering is applied on a
2   RELATED WORK
                                                                        dataset of trajectories, extracted from video sequences in the
Most trajectory anomaly detection methods rely on distance, den-        work of Piciarelli et al.[23]. The aim is to detect the region in
sity, historical similarity or classification [3, 11]. Distance-based   the feature space that encloses the normal data and excludes the
methods detect as outliers trajectories with insufficient number        anomalous ones. To address the problem of correctly choosing
of neighbors according to a similarity measure. They detect global      the number of acceptable anomalies – unknown in unlabeled
outliers and their results highly depend on the suitability of the      data – the authors suggest a technique for removing the outliers
chosen similarity metric. Methods relying on density classify a         in the training set. They leverage geometric characteristics of
trajectory as an outlier if its density is lower than a threshold.      the feature space of the model. After clustering the trajectories,
They can detect local abnormalities in datasets with varying            the outliers are detected using geometric criteria in the model’s
densities, but are computationally expensive. Historical similar-       feature space, without taking account of temporal dependencies.
ity approaches are applied for detecting temporal outliers based            More recent works have started applying deep learning meth-
on the calculation of changes in historical similarity between          ods for detecting outliers in trajectory data. Roy & Bilodeau [25]
data points. Classification-based methods include the detection         work on abnormal event detection by analyzing fixed length tra-
of anomalous trajectories using a motion-classifier (a classifier       jectories in road intersections. A deep feed forward autoencoder
applied on trajectories represented as sequences of motion fea-         is trained on normal data, learning to reconstruct the input tra-
tures) in a supervised setting, and also machine learning models,       jectories from its latent-space representation. A threshold value
either supervised or unsupervised. The latter include Isolation         based on the reconstruction error in training/validation sets is
Forests, One-Class SVMs and deep learning architectures, like           used for detecting outlying trajectories. A feed forward autoen-
autoencoders, which have remarkable performance in sequential           coder is also used in the work of Olive et al. [21] to discover
outlier detection. In this section we present examples of each of       outlying (fixed-length) trajectory patterns in flight routes. Tra-
these approaches.                                                       jectories with high reconstruction errors are then analyzed to
   A density-based algorithm is used by Fontes et al. [10] for          find possible associations with urgent situations. Though feed
trajectory anomaly detection between regions of interest. Spa-          forward neural networks have shown good potential in anomaly
tiotemporal outliers are detected among sub-trajectories based          detection tasks, they fail at capturing temporal dependencies.
on the concepts of neighborhood, ‘standard’ sub-trajectory (the             Instead, recurrent neural networks learn from sequential data
neighborhood of which has a minimum density) and synchro-               and can be effectively applied in outlier detection tasks with a
nization. Although spatiotemporal aspects are considered, this          temporal dimension. They also have the advantage of captur-
approach refers to trajectories between specific regions, and its       ing the context of the entire trajectory. This makes them more
outcomes are strongly related to user-specified parameters.             effective than methods based on trajectory segmentation [18].
   TRAOD [15] is a hybrid (distance/density-based) partition-and-       Di Mauro & Ferilli [9] deploy a Seq2Seq architecture based on
detect algorithm for trajectory outlier detection. After a trajectory   an LSTM autoencoder in an unsupervised setting in order to
is partitioned, outliers are detected in its parts using distance and   detect anomalous vehicle routes. The routes are represented as
density metrics, so that anomalies are detected both in dense and       sequences of gate ids, which ignores the spatial information, as
sparse areas. A trajectory is considered as anomalous if its outly-     in [29]. A Sequence to Sequence real-time outlier detector in
ing parts are more than a threshold. Though TRAOD alleviates            human trajectories is proposed by Bouritsas et al. [4] aiming at
the problem of local density, its outcome depends on user-defined       discovering suspicious behavior in video surveillance footage.
parameters and it only considers spatial data, without capturing        Focusing on localization of anomaly detection, each trajectory is
the temporal sequence.                                                  partitioned in sub-trajectories, represented as sequences of coor-
   Methods using historical similarity between points detect tem-       dinates. Synthetic data are used for evaluation and the training
poral, rather than spatial, outliers. Li et al. [17] propose a frame-   is applied only on normal instances.
work for detecting temporal outliers in road segments. For every            In this work, instead of applying one of the aforementioned
segment and time step, a ‘temporal neighborhood vector’ is de-          methods for trajectory anomaly detection, we propose a hybrid
fined, consisting of aggregated historical similarities between         approach, a combination of deep learning and density-based
specific variables in relation to all other segments. An outlier        anomaly detection techniques, while taking into consideration
the length of each trajectory during the training phase. The                     loss function is:
performance of the proposed architecture is tested on synthetic                              1 ÕÕ
                                                                                  𝐿𝐻𝑉 𝑅 =             [(𝑥𝑖 (𝑡) −𝑥𝑖𝑟 (𝑡)) 2 + (𝑦𝑖 (𝑡) −𝑦𝑖𝑟 (𝑡)) 2 ] ∗𝑤𝑖 (1)
data, generated to account for different patterns of anomalies.                             2|𝐼 | 𝑖 𝑡
                                                                                 where 𝑖 is a specific instance from a total number of instances |𝐼 |, 𝑡
                                                                                 denotes a certain timestep within a trajectory route, {𝑥𝑖 (𝑡), 𝑦𝑖 (𝑡)}
3 METHODS                                                                        is the actual and {𝑥𝑖𝑟 (𝑡), 𝑦𝑖𝑟 (𝑡)} is the reconstructed location (lon-
3.1 Models                                                                       gitude, latitude) of the trajectory 𝑖 at timestep 𝑡. The weight factor
Autoencoders are used as the basis for unsupervised outlier detec-               𝑤𝑖 is given by the log Haversine distance covered in the (actual)
tion method. Given an input sequence of vectors of spatial coordi-               trajectory. For two given points (𝛼, 𝛽) it is given by:
nates (longitude, latitude) 𝑇𝑖 = {(𝑥 1, 𝑦1 ), (𝑥 2, 𝑦2 ), . . . , (𝑥𝑚 , 𝑦𝑚 )},                      s
the autoencoder attempts to reproduce this input, by minimizing                                             (𝑦𝛼 − 𝑦 𝛽 )                               (𝑥𝛼 − 𝑥 𝛽 )
a loss function. This loss function refers to the error of the recon-            𝑤 𝛼,𝛽 = 2 arcsin    sin2                 + cos(𝑦𝛼 ) cos(𝑦 𝛽 ) sin2
structed outputs compared to the original input sequences. The                                                  2                             2
                                                                                                                                              (2)
instances that the model failed to reconstruct successfully suffer
                                                                                 where 𝑥𝑘 and 𝑦𝑘 refer to the longitude and latitude of the point
from high reconstruction error and are detected as anomalies. In
                                                                                 𝑘 in radians, respectively. In our experiments, we employ both
our work, we employ a Long Short-Term Memory (LSTM) net-
                                                                                 models (LSTM, SEQ) with both loss functions (MSE, HVR).
work [13] as the basis for our models due to its ability to capture
temporal dependencies in the data.
                                                                                 3.2    Anomaly detection methods
   The autoencoder consists of two components – the encoder
and the decoder. During the encoding phase, the sequence is com-                 After training the models, typically an anomaly threshold is set
pressed into a latent vector of lower dimension, which carries                   in order to classify a trajectory as normal or not [2, 4, 20, 25]. We
all its spatiotemporal information in a compressed format (bot-                  employ two different methods to detect outliers in unseen trajec-
tleneck architecture). The decoder then reconstructs the input                   tories, both operating on the sequences of their reconstruction
using its latent representation. Both the encoder and the decoder                errors (i.e. those derived by LSTM/SEQ models, as presented in
may consist of one or more layers. Dense layers are added on top                 the previous subsection). The evaluation is conducted using a
of the last decoder layer and produce the final reconstruction for               test dataset and synthetic anomalous data.
the 𝑚 timesteps.                                                                 AVG: The average values of errors are calculated on the se-
   In our work, we use two variants of denoising LSTM autoen-                    quences in the test set and are used as a proxy for measuring
coders [28], which attempt to output the clean input data from a                 the abnormality in a sequence of locations visited in a trajectory.
partially corrupted input. As a result, the model learns represen-               The test instances are then sorted based on these values, with the
tations which are robust to noise. This is accomplished by adding                most anomalous trajectories being placed at the highest ranks.
to the input data a Gaussian noise function with zero mean and
unit variance.                                                                   LOF: Instead of working with averages, we propose a hybrid
                                                                                 architecture, combining each autoencoder with the Local Outlier
LSTM: In our first variant, the last layer of the encoder returns the            Factor algorithm [5] (LOF). LOF is a density-based outlier de-
encoder’s output, which is copied 𝑚 times (equal to the number of                tection algorithm, which performs well even in case of different
the sequence timesteps) before being fed to the decoder. Similar                 outlying densities and detects both local and global anomalies.
model architectures have been used in anomaly detection tasks                    Anomalies are detected by comparing their local densities with
in other areas of research, such as natural language processing                  the average local densities of their 𝑘 nearest neighbors. The al-
[27]. The decoder’s architecture is the same as the encoder’s and                gorithm contains three steps:
dense layers added on the top produce the final output.
                                                                                    (1) The 𝑘 nearest neighbors of each record are found, using
SEQ: Our second variant of autoencoders is a Sequence to Se-                            a distance metric (in our case, Euclidean distance on the
quence network (Seq2Seq) [6, 26], which has shown good results                          reconstruction errors of LSTM or SEQ models, trained
in the area of machine translation. LSTM layers comprise both                           using MSE or HVR).
the encoder and the decoder. The encoder sequentially reads each                    (2) The Local Reachability Density is computed for each record
element of the sequence and encodes it in a hidden state, updated                       p, using its 𝑘 nearest neighbors 𝑁 𝑁𝑘 , according to the
at every timestep. Instead of copying the encoder’s output, as                          equation:
with the LSTM autoencoder, this hidden state is used as the ini-                                             Í
                                                                                                               𝑜𝜖𝑁 𝑁𝑘 (𝑝) 𝑟𝑑𝑘 (𝑝, 𝑜)
tial state of the decoder, which is trained to generate the output                               𝐿𝑅𝐷𝑘 = 1/                                     (3)
                                                                                                                  |𝑁 𝑁𝑘 (𝑝))|
sequence step by step. We train the model using teacher forcing,
so that the actual value at each timestep is used as input to the                       where 𝑟𝑑𝑘 (𝑝, 𝑜) is the reachability distance of the object p
decoder. The final reconstructed sequence is output by dense                            with respect to object o.
layers. In our work, both architectures (LSTM, SEQ) consist of                      (3) The Local Outlier Factor score of a record is computed as
one encoder, one decoder and one dense (output) layer.                                  the ratio of the average local densities of its neighbors to
                                                                                        its local density. The equation for LOF score is:
   Traditionally the loss function minimized during training is                                               Í            𝐿𝑅𝐷𝑘 (𝑜)
the mean squared error (MSE) between predicted and actual (loca-                                                    𝑜𝜖𝑁 𝑁𝑘 (𝑝) 𝐿𝑅𝐷𝑘 (𝑝)
tion) sequences. Here, we propose a weighted MSE loss function.                                     𝐿𝑂𝐹 (𝑝) =                                              (4)
                                                                                                                          |𝑁 𝑁𝑘 (𝑝)|
The weight variable is introduced in order to penalize the long
trajectories more, as they tend to have higher reconstruction er-                Finally, LOF scores close to 1 are considered to be associated
ror, regardless of their abnormality. The equation of the weighted               with normal records, as their densities are similar to those of
                                                                                 their neighbors, while outliers have higher score values. The
LOF algorithm is applied on the sequences of the reconstruction          Table 1: Overview of our methods for generating synthetic
errors, predicted by the autoencoder. The test instances are then        anomalies in the test set. The ‘Pattern’ column refers to
sorted based on their LOF scores and outliers are detected as in         the pattern we use to alter the locations of the actual trip
the AVG method.                                                          [0, ..., 8].

4 EXPERIMENTS                                                              Method   Variant   Description                       Pattern
                                                                                    DSTRT𝑐    Complete noise.                   [0,8,1,7,2,6,3,5,4]
4.1 Data                                                                   DSTRT
                                                                                    DSTRT𝑝    Light noise.                      [0,1,2,4,3,5,6,7,8]
Dataset: We make use of the dataset released in the Prediction                                Perform the same cycle twice
                                                                                    CYCLE𝑐    and then continue with the        [0,1,2,3,0,1,2,3,4]
Challenge of ECML PKDD 20151 [19], which contains 1,710,671                CYCLE
                                                                                              rest of the trip.
taxi trips made by all 442 taxis in the city of Porto, Portugal dur-                CYCLE𝑏    Go back-and-forth all the time.   [3,4,5,4,3,4,5,4,3]
ing the time period from 01/07/2013 to 30/06/2014. This dataset
has been used in the past for related tasks, such as destination
prediction [8, 24] and travel time prediction [12].
   The location of each trip is sampled every 15 seconds; we             4.2    Model Comparison
sub-sample this signal so that we have one timestep per minute,
                                                                         Task Definition: Our goal is to detect anomalous driving pat-
in order to test the performance of our models in low sampling
                                                                         terns by analysing the trajectories. To this end, we first compare
rates. Furthermore, we also observe that there are some dupli-
                                                                         the ability of our four variants to reconstruct the trajectories
cates, which are removed from the dataset. In order to keep out
                                                                         in the test set against a non-sequential model (‘FF’, see next
trajectories with very short length, we exclude trajectories for
                                                                         paragraph). We then conduct our main evaluation task.
which the sum of distances between the first, the middle and
                                                                         Anomaly Detection Task: We form this task in a rank-based man-
the last point is less than 500 meters. We used only three points
                                                                         ner. We rank the trajectories based on their anomaly factor, as
of each trajectory instead of all of its points for computational
                                                                         derived by the models, aiming at ranking higher the (artificially
reasons. Trips with very low duration (lower than three minutes)
                                                                         generated) anomalous trajectories compared to the ‘normal’ ones.
or very high duration (higher than two hours) are also excluded.
                                                                         We compare the performance of our sequential models against
Since the autoencoders require the trajectories to have specific
                                                                         baseline methods, as detailed next.
number of points, we choose this number to be nine, so that 75%
                                                                            Finally, we compare two of our model variants in a qualitative
of the total number of trajectories has more points. As a result,
                                                                         way, to get insights on the advantages of using the Haversine-
we keep trajectories with more than nine points and keep the
                                                                         weighted loss function compared to standard MSE, as well as the
first nine of them. We note that keeping the nine first points of
                                                                         hybrid (+LOF) approach operating on the reconstruction errors
each trajectory may result in excluding some anomalies, which
                                                                         compared to considering the average error.
require more of them to be expressed. However, we choose this
percentage (75% of all trajectories), so that we take into account       Models: We contrast the two models defined in section 3.1, each
a large proportion of the total number of trajectories, while the        trained by minimising (a) the Mean Squared Error (MSE) and
number of points is large enough for the task of detecting anoma-        (b) the Haversine-weighted (HVR) loss function. We denote the
lous trajectories to be meaningful. We end up with 1,218,657             models as {LSTM𝑚 , SEQ𝑚 } when they use the MSE loss function
sequences of nine spatial coordinates (longitude, latitude). The         and as {LSTMℎ , SEQℎ } when they use the HVR loss function
features of the dataset (longitude, latitude) are standardized to        respectively. We further compare their performance by ranking
zero mean and unit variance, according to the equation:                  the trajectories based on the AVG or LOF method, as detailed in
                                     𝑥 − 𝜇 (𝑥)                           section 3.2. All versions of our models have two hidden layers
                              𝑥˜ =                                (5)    of 16 units, followed by one dense layer for the final prediction,
                                       𝜎 (𝑥)
                                                                         since using these parameters offered the best performance in
where 𝜇 (𝑥) and 𝜎 (𝑥) are the average and variance of all values         terms of reconstruction error for all cases.
of each feature, respectively.                                              To test the importance of the temporal component in our mod-
   We use 80% of the trajectories of each of the 12 months of the        els for the Anomaly Detection task, we compare them against a
time period for training. From the remaining 20%, 10% is used in         denoising autoencoder implemented via a Feed-Forward Neural
order to define the end of the training process to avoid overfitting     Network with two hidden layers (FF) and a Local Outlier Factor
(validation set) and 10% for evaluating our models (test set).           (LOF) algorithm trained on the raw trajectory data instead of the
Synthetic Examples of Anomalies: We are interested in cap-               sequences of reconstruction errors derived by one of our models.
turing different types of anomalies in the trips made by the taxis       Both of these baseline models treat the different timesteps as dif-
in our dataset. Due to the lack of labels in our test set, we generate   ferent features, ignoring their temporal dimension. FF is trained
synthetic data by altering a small portion (1%) of the trajectories      in the same data and using the same noise as our models. LOF is
in the test set. We therefore aim at capturing clear cases of anoma-     trained straight away on the test set to detect the anomalies and
lous trips, as well as anomalies that correspond to certain driving      we select the number of neighbors (trials: [10,50,100,500,1000])
patterns. Thus, we generate artificial anomalies corresponding to        on the basis of its performance on the test set, as to provide
(a) noisy (distorted) trips (DSTRT) and (b) cyclic routes (CYCLE).       it with a strong advantage against our models. Our sequential
Table 1 summarises the two different approaches, their variants          models as well as the FF baseline are trained using the Adam
and the exact pattern we have used to generate such synthetic tra-       optimizer [14], an L2 regularization norm equal to 0.1, a learning
jectories. In the experiments that are discussed next, we employ         rate equal to 0.0001 and a batch size equal to 512. Finally, we
each of these patterns independently in our test set.                    provide the evaluation metrics (see next subsection) of a naïve
                                                                         random ranking model (NRR), by averaging the performance
1 http://www.geolink.pt/ecmlpkdd2015-challenge/                          after 10K experiments with random scores on the test set. For the
case of the Trajectory Reconstruction subtask, we also compare                          Table 2: Mean Absolute and Mean Squared Error of our
our models against FF.                                                                  four models and the FF baseline. Performance is measured
   We implement neural network architectures based on the                               via the reconstruction errors in the test set.
Keras library [7] and the LOF algorithm using the Scikit-learn
library [22], both in Python.                                                                                       MAE          MSE
                                                                                                       FF          0.00200    8.726 · 10−6
Anomaly Scores. In the Anomaly Detection task, once our mod-                                           LSTM𝑚       0.00130    4.112 · 10−6
els (LSTM𝑚 , LSTMℎ , SEQ𝑚 , SEQℎ ) and the FF baseline make                                            SEQ𝑚        0.00156    4.968 · 10−6
predictions on the test set2 , we measure the MSE between the                                          LSTMℎ       0.00130    3.908 · 10−6
model predictions and the actual location of each timestep of                                          SEQℎ        0.00144    4.286 · 10−6
every trip in the test set. We then rank the trips of the test set by
employing any of the methods described in section 3.2. For LOF,
we rank trips in the test set based on their LOF scores in order to
                                                                                        5.2    Main Task – Anomaly Detection
identify (i.e. rank higher) the synthetic ones.
                                                                                        Table 3 shows the results of our models and baselines on the
Evaluation. We make use of three evaluation metrics that are                            anomaly detection task across all experiments (i.e. across all
suited for ranking tasks and widely used in related work. In spe-                       synthetic datasets, as summarised in Table 1). In the following,
cific, we use two common classification-based metrics, adjusted                         we examine the effects of the different components of our models
for the needs of ranking tasks: Recall at k (𝑟 @𝑘) measures the                         for the anomaly detection task.
portion of the anomalous trips that are present in the top-k trips
ranked by our model; Precision at k (𝑝@𝑘) measures the portion                          Anomaly Detection Methods: Our sequential models outper-
of the top-k trips ranked by our model that are indeed anomalous.                       form the FF and LOF baseline models in almost all cases. The
We finally calculate the F1 Measure at k as our main evaluation                         application of LOF on the sequences of reconstruction errors in
metric, defined as follows:                                                             all cases of synthetic datasets improves the performance of all
                                                                                        four sequential autoencoders compared to the AVG method, in
                                    2 · 𝑟 @𝑘 · 𝑝@𝑘                                      which the ranking of trajectories is determined by the average
                             𝐹1 =                                                (6)
                                     𝑟 @𝑘 + 𝑝@𝑘                                         error values. The relative improvements in F1 measure (averaged
                                                                                        over the corresponding values in the four datasets) of LSTM𝑚
   We set 𝑘 to be equal to 5% of the size of our test set. This                         and LSTMℎ models are 18.3% and 15.1% respectively. The corre-
value is larger than the percentage of the artificial anomalies we                      sponding numbers for Seq𝑚 and Seqℎ are much higher (50.2%
aim at detecting (1%), since we expect that there will be more                          and 33.2% respectively). The hybrid approach gives much higher
(non artificial) anomalies in our test set. Nevertheless, we further                    F1 score values even in the predictions made by the FF model
experiment with different values of 𝑘 (range: [0.1%-50%]).                              (i.e., comparing FF+AVG vs FF+LOF). It is worth noting that FF
   To help with the reproducibility of our experiments, we make                         with LOF outperforms the other models in the case of CYCLE𝑏 ,
available the implementation of the proposed methods and the                            while its performance in this case is much lower compared to all
conducted experiments on GitHub3 .                                                      other models (except for NRR) in the case of AVG method. These
                                                                                        findings show the clear and consistent benefit we gain via the
5     RESULTS                                                                           hybrid architecture proposed in this work.
In this section we present the results from our models’ compari-
                                                                                        Loss Functions: We examine the performance of the models
son. The reconstruction errors of our sequential models – a metric
                                                                                        with respect to the loss functions used. The Haversine-weighted
of their ability to accurately reconstruct the trajectories – is com-
                                                                                        loss function (HVR) outperforms Mean Squared Error loss (MSE)
pared to the FF model error in our test set. Then, we present the
                                                                                        with respect to F1 measure in 13 out of total 16 cases shown in
comparison between the performance of our hybrid LOF models
                                                                                        Table 3 (the cases refer to the predictions of the two Sequential
against the AVG method (described in section 3.2). The impact
                                                                                        models for the four different datasets). The relative gains in per-
of using a Haversine-weighted loss function is also examined
                                                                                        formance by using the Haversine-weighted loss are 1.32% and
next. Finally, we present our empirical insights by comparing the
                                                                                        4.14% for the LSTM and SEQ models, respectively.
anomalies detected in our test set by SEQ𝑚 with AVG against
                                                                                           To further examine the performance of the four variants (the
SEQℎ with LOF, to test the advantages of the latter approach.
                                                                                        SEQ/LSTM models (a) trained using the MSE/HVR loss function
                                                                                        and (b) using the AVG/LOF method for their final ranking), Fig-
5.1      Base Task – Trajectory Reconstruction                                          ures 1, 2 show the variation of F1 measure over different threshold
The mean absolute and mean squared error in the test set are                            values of 𝑘 for LSTM and SEQ model respectively. It becomes
computed as metrics of the reconstruction error. The results are                        clear that our hybrid approach combined with the Haversine
shown in Table 2. The two sequential models show better recon-                          weighted loss function (HVR+LOF) gives better results for LSTM
struction ability than the FF model, demonstrating the advantage                        and SEQ in most cases, providing a strong boost in performance
of models that consider the temporal dimension of the trajectories.                     compared to the common MSE+AVG approach.
We also find that the incorporation of the Haversine-weighted
(HVR) loss function further improves the results consistently                           5.3    Qualitative Analysis
across the two models (SEQ, LSTM).                                                      As a final step, we employ SEQ𝑚 with AVG method and SEQℎ
                                                                                        with LOF method to rank all the trajectories of our original test
2 Note that the 1% of the trips in the test set have been altered according to one of   set. As opposed to the quantitative analysis conducted thus far
the four patterns described in section 4.1.                                             on the artificially generated anomalies, here we are interested in
3 https://github.com/marialiatsikou/BMDA_anomaly_detection
                                                                                        gaining qualitative insights on the detected anomalies by each
Table 3: Evaluation (in %) of all models in all runs for the Anomaly Detection task for k=5%. Blue cells indicate the best-
performing anomaly detection method (AVG vs LOF) for each model, per synthetic dataset. Green cells indicate the best
performing model per synthetic dataset. Bold scores show the best-performing loss (MSE vs HVR) when comparing the
two variants of each of our sequential models.

                                                        DSTRT                          CYCLE
                                                  DSTRT𝑐      DSTRT𝑝            CYCLE𝑐      CYCLE𝑏
                                                AVG   LOF AVG LOF             AVG LOF AVG       LOF
                                        NRR      1.00  1.00  1.00  1.00        1.00  1.00  1.00  1.00
                                        LOF         – 17.23     –  2.69           –  4.43     –  4.43
                           Precision


                                        FF      10.98 16.43  1.74  3.53        0.94  3.58  1.72  9.29
                                        LSTM𝑚   16.72 18.22  3.02  5.15        3.68  4.41  4.91  5.73
                                        SEQ𝑚    15.08 18.12  2.48  4.94        2.48  5.22  3.15  6.53
                                        LSTMℎ   16.90 18.38  3.10  4.73        4.00  4.56  5.12  5.86
                                        SEQℎ    16.02 18.78  2.79  5.05        3.33  5.10  3.76  5.55
                                        NRR      5.00  5.00  5.00  5.00        5.00  5.00  5.00  5.00
                                        LOF         – 86.21     – 13.46           – 22.17     – 22.17
                                        FF      54.93 82.18  8.70 17.65        4.68 17.90  8.62 46.47
                           Recall


                                        LSTM𝑚   83.66 91.13 15.11 25.78       18.39 22.09 24.55 28.65
                                        SEQ𝑚    75.45 90.64 12.40 24.71       12.40 26.11 15.76 32.68
                                        LSTMℎ   84.56 91.95 15.52 23.65       20.03 22.82 25.62 29.31
                                        SEQℎ    80.13 93.92 13.96 25.29       16.67 25.53 18.80 27.75
                                        NRR      1.67    1.67   1.67   1.67    1.67    1.67     1.67    1.67
                                        LOF         –   28.72      –   4.49       –    7.39        –    7.39
                           F1 Measure


                                        FF      18.30   27.38   2.90   5.88    1.56    5.96     2.87   15.48
                                        LSTM𝑚   27.88   30.37   5.03   8.59    6.13    7.36     8.18    9.55
                                        SEQ𝑚    25.14   30.20   4.13   8.23    4.13    8.70     5.25   10.89
                                        LSTMℎ   28.18   30.64   5.17   7.88    6.67    7.60     8.54    9.77
                                        SEQℎ    26.70   31.30   4.65   8.43    5.55    8.51     6.26    9.25


                             Figure 1: F1 Measure for different values of 𝑘 (x-axis) for LSTM model.


                                 Figure 2: F1 Measure for different values of 𝑘 (x-axis) for SEQ model.


of these methods on our actual test set. We obtained the two             Two annotators labelled these trajectories as ‘normal’ or ‘ab-
sets of the top-0.1% (i.e., most anomalous) of the trajectories        normal’ (or ‘NA’, if they could not tell). To familiarise themselves
detected by each of the two models in the test set and focused         with the task, the annotators (one author, one PhD graduate with
on those that are predicted by one model only (i.e., we excluded       no domain knowledge) were first provided with examples of
the intersection of the two sets).                                     normal and abnormal trajectories in the test set and were then
Figure 3: The types of anomalies detected by the SEQℎ with LOF method. The patterns from left to right are: GPS device
error, unreasonably long covered distance, cyclic route, back and forth movement.


Figure 4: The types of anomalies detected by the SEQ𝑚 with AVG method. The patterns from left to right are: One back
and forth movement and three routes with cyclic parts.


                     Figure 5: Four typical trajectories classified as normal by the SEQℎ with LOF method.


asked to label the trajectories in our dataset of interest in a sim-   15 out of 29 (accuracy=51.7%), certifying the superiority of our
ilar manner. There was an agreement on 73/86 cases, with the           hybrid model.
inter-annotator agreement in Cohen’s kappa terms being 0.72               In Figure 3 four patterns of anomalies detected by the SEQℎ
(i.e. ‘substantial agreement’). For SEQ𝑚 with the AVG method, 4        with LOF method are displayed. The first corresponds to error of
trajectories out of the 42 (accuracy=9.5%) for which the annota-       the GPS device, while the rest possibly refer to driving patterns:
tions coincide (excluding ‘NA’), were labelled as ‘abnormal’; for      unreasonably long covered distance, a cyclic trip and a trip in-
SEQℎ with the LOF approach the anomalous trajectories were             cluding back and forth movement. For comparison purposes we
present the four trajectories annotated as anomalies by both an-                     [4] Giorgos Bouritsas, Stelios Daveas, Antonios Danelakis, and Stelios CA Tho-
notators for SEQ𝑚 with the AVG method in Figure 4. It is obvious                         mopoulos. 2019. Automated Real-time Anomaly Detection in Human Trajec-
                                                                                         tories using Sequence to Sequence Networks. In 2019 16th IEEE International
that the types of anomalies detected with this approach are more                         Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1–8.
restricted and they refer to a trip which includes back and forth                    [5] Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000.
                                                                                         LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM
movement and three trips with cyclic parts. Finally, some typi-                          SIGMOD international conference on Management of data. 93–104.
cal trajectories that our hybrid model (SEQℎ with LOF method)                        [6] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau,
                                                                                         Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase
classified as normal are displayed in Figure 5. The findings of                          representations using RNN encoder-decoder for statistical machine translation.
this qualitative analysis show that the proposed approach can                            arXiv preprint arXiv:1406.1078 (2014).
capture different types of anomalies, despite not being tailored to                  [7] François Chollet et al. 2015. Keras. https://keras.io.
                                                                                     [8] Alexandre De Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent, and
any of them explicitly. Annotating a larger dataset of trajectories                      Yoshua Bengio. 2015. Artificial neural networks applied to taxi destination
based on their pattern of anomaly could lead to further insights                         prediction. In Proceedings of the 2015th International Conference on ECML
on the performance of our models in future work.                                         PKDD Discovery Challenge-Volume 1526. 40–51.
                                                                                     [9] Nicola Di Mauro and Stefano Ferilli. 2018. Unsupervised LSTMs-based Learn-
                                                                                         ing for Anomaly Detection in Highway Traffic Data. In International Sympo-
6    CONCLUSION                                                                          sium on Methodologies for Intelligent Systems. Springer, 281–290.
                                                                                    [10] Vitor Cunha Fontes, Lucas Andre de Alencar, Chiara Renso, Vania Bogorny,
In this work we introduce a hybrid architecture, which combines                          and It Pisa. 2013. Discovering Trajectory Outliers between Regions of Interest.
a sequential denoising autoencoder with a density-based out-                             In GeoInfo. Citeseer, 49–60.
lier detection algorithm, for trajectory anomaly detection. We                      [11] Manish Gupta, Jing Gao, Charu C Aggarwal, and Jiawei Han. 2013. Outlier
                                                                                         detection for temporal data: A survey. IEEE Transactions on Knowledge and
propose two variants of sequential models, a denoising LSTM                              data Engineering 26, 9 (2013), 2250–2267.
autoencoder and a denoising Seq2Seq autoencoder. They are both                      [12] Thomas Hoch. 2015. An ensemble learning approach for the Kaggle taxi travel
                                                                                         time prediction challenge. In Proceedings of the 2015th International Conference
trained by optimizing a Haversine distance-based weighted loss                           on ECML PKDD Discovery Challenge-Volume 1526. 52–62.
function, in order to take into account the overall distance cov-                   [13] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.
ered in a trajectory. During the detection phase, the Local Outlier                      Neural computation 9, 8 (1997), 1735–1780.
                                                                                    [14] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic
Factor algorithm is applied on the sequences of reconstruction                           optimization. arXiv preprint arXiv:1412.6980 (2014).
errors of unseen trajectory data. The evaluation is conducted                       [15] Jae-Gil Lee, Jiawei Han, and Xiaolei Li. 2008. Trajectory outlier detection: A
on four different types of synthetic data, generated by altering                         partition-and-detect framework. In 2008 IEEE 24th International Conference on
                                                                                         Data Engineering. IEEE, 140–149.
a small portion of the trajectories of our test set. We show that                   [16] Xiaolei Li, Jiawei Han, Sangkyum Kim, and Hector Gonzalez. 2007. Roam:
the hybrid architectures outperform other anomaly detection                              Rule-and motif-based anomaly detection in massive moving object data sets.
                                                                                         In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM,
methods. Regarding the proposed loss function, results for both                          273–284.
detection methods (AVG and LOF) are in most cases improved                          [17] Xiaolei Li, Zhenhui Li, Jiawei Han, and Jae-Gil Lee. 2009. Temporal outlier
by the introduction of the weighted loss function.                                       detection in vehicle traffic data. In 2009 IEEE 25th International Conference on
                                                                                         Data Engineering. IEEE, 1319–1322.
   In the future, we intend to apply our methodology in datasets                    [18] Weining Lu, Yu Cheng, Cao Xiao, Shiyu Chang, Shuai Huang, Bin Liang, and
of different trajectory length and type, including bike sharing                          Thomas Huang. 2017. Unsupervised sequential outlier detection with deep
data from the i-CHANGE platform [1], and to collaborate with                             architectures. IEEE transactions on image processing 26, 9 (2017), 4321–4330.
                                                                                    [19] Joao Mendes-Moreira and Luıs Moreira-Matias. 2015. On learning from taxi-
domain experts to obtain annotations for anomalous trajectories.                         GPS traces. In Proceedings of the 2015th International Conference on ECML
We also intend to test other variants of loss functions during                           PKDD Discovery Challenge-Volume 1526. CEUR-WS. org, 37–39.
                                                                                    [20] Ali H Mirza and Selin Cosan. 2018. Computer network intrusion detection
training and also test transfer learning approaches across dif-                          using sequential LSTM neural networks autoencoders. In 2018 26th signal
ferent datasets and domains as well as to try out different deep                         processing and communications applications conference (SIU). IEEE, 1–4.
learning architectures.                                                             [21] Xavier Olive, Jeremy Grignard, Thomas Dubot, and Julie Saint-Lot. 2018.
                                                                                         Detecting Controllers’ Actions in Past Mode S Data by Autoencoder-Based
                                                                                         Anomaly Detection.
ACKNOWLEDGMENTS                                                                     [22] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,
                                                                                         Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron
This research was co-financed by the European Regional De-                               Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python.
velopment Fund of the European Union and Greek national                                  the Journal of machine Learning research 12 (2011), 2825–2830.
funds through the Operational Program Competitiveness, En-                          [23] Claudio Piciarelli, Christian Micheloni, and Gian Luca Foresti. 2008. Trajectory-
                                                                                         based anomalous event detection. IEEE Transactions on Circuits and Systems
trepreneurship and Innovation, under the call RESEARCH – CRE-                            for video Technology 18, 11 (2008), 1544–1554.
ATE – INNOVATE (project code:T1EDK-04582).                                          [24] Alberto Rossi, Gianni Barlacchi, Monica Bianchini, and Bruno Lepri. 2019.
                                                                                         Modelling Taxi Drivers’ Behaviour for the Next Destination Prediction. IEEE
                                                                                         Transactions on Intelligent Transportation Systems (2019).
REFERENCES                                                                          [25] Pankaj Raj Roy and Guillaume-Alexandre Bilodeau. 2018. Road user abnormal
 [1] Lazaros Apostolidis, Symeon Papadopoulos, Maria Liatsikou, Ioannis Fyroge-          trajectory detection using a deep autoencoder. In International Symposium on
     nis, Efthymis Papadopoulos, George Keikoglou, Konstantinos Alexiou, Nasos           Visual Computing. Springer, 748–757.
     Chondros, Ioannis Kompatsiaris, and Ioannis Politis. 2020. i-CHANGE: A Plat-   [26] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence
     form for Managing Dockless Bike Sharing Systems. In International Conference        learning with neural networks. In Advances in neural information processing
     on Computational Science and Its Applications. Springer, 851–867.                   systems. 3104–3112.
 [2] Javed Ashraf, Asim D Bakhshi, Nour Moustafa, Hasnat Khurshid, Abdullah         [27] Adam Tsakalidis and Maria Liakata. 2020. Sequential Modelling of the Evolu-
     Javed, and Amin Beheshti. 2020. Novel Deep Learning-Enabled LSTM Au-                tion of Word Representations for Semantic Change Detection. In Proceedings
     toencoder Architecture for Discovering Anomalous Events From Intelligent            of the 2020 Conference on Empirical Methods in Natural Language Processing
     Transportation Systems. IEEE Transactions on Intelligent Transportation Sys-        (EMNLP). 8485–8497.
     tems (2020).                                                                   [28] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-
 [3] Kiran Bhowmick and Meera Narvekar. 2018. Trajectory Outlier Detection               Antoine Manzagol, and Léon Bottou. 2010. Stacked denoising autoencoders:
     for Traffic Events: A Survey. In Intelligent Computing and Information and          Learning useful representations in a deep network with a local denoising
     Communication. Springer Singapore, Singapore, 37–46.                                criterion. Journal of machine learning research 11, 12 (2010).
                                                                                    [29] Daqing Zhang, Nan Li, Zhi-Hua Zhou, Chao Chen, Lin Sun, and Shijian Li. 2011.
                                                                                         iBAT: detecting anomalous taxi trajectories from GPS traces. In Proceedings of
                                                                                         the 13th international conference on Ubiquitous computing. 99–108.

</pre>