=Paper= {{Paper |id=Vol-2841/BMDA_3 |storemode=property |title=Local Anomaly Detection In Maritime Traffic Using Visual Analytics |pdfUrl=https://ceur-ws.org/Vol-2841/BMDA_3.pdf |volume=Vol-2841 |authors=Fernando Henrique Oliveira Abreu,Amílcar Soares,Fernando V. Paulovich,Stan Matwin |dblpUrl=https://dblp.org/rec/conf/edbt/AbreuSPM21 }} ==Local Anomaly Detection In Maritime Traffic Using Visual Analytics== https://ceur-ws.org/Vol-2841/BMDA_3.pdf
      Local Anomaly Detection In Maritime Traffic Using Visual
                            Analytics
                         Fernando H. O. Abreu                                                                Amilcar Soares
                             Dalhousie University                                               Memorial University of Newfoundland
                              Halifax, NS, Canada                                                      St. Johns, NL, Canada
                            fernando.abreu@dal.ca                                                        amilcarsj@mun.ca

                         Fernando V. Paulovich                                                                 Stan Matwin
                             Dalhousie University                                                          Dalhousie University
                             Halifax, NS, Canada                                                           Halifax, NS, Canada
                              paulovich@dal.ca                                                                stan@cs.dal.ca




Figure 1: Overview of the Trip Outlier Scoring Tool (TOST). The user uses the Score computation component (A) to control
which spatial regions and attributes will be used in the score. The trip scores are visualizes in the Trip Score component
(C) where the user can filter and sort the data, and select a trajectory trip to be displayed in the map (B).
ABSTRACT                                                                               spatial regions to divide trips into subtrajectories and score them.
With the recent increase in sea transportation usage, maritime                         The scores are displayed in a tabular visualization where users
surveillance’s importance to detect unusual vessel behavior re-                        can rank trips by segment to find local anomalies. The amount of
lated to several illegal activities has also risen. Unfortunately, the                 interpolation in subtrajectories is displayed together with scores,
data collected by the surveillance systems are often incomplete,                       and the trip is displayed on the map so users can use their insight
creating a need for the data gaps to be filled using techniques                        to make sense if the score is reliable.
such as interpolation methods. However, such approaches do not
decrease the uncertainty of ship activities. Depending on the fre-                     1    INTRODUCTION
quency of the data generated, they may even confuse operators,
inducing them to errors when evaluating ship activities to tag                         Maritime transportation is essential nowadays; about 90 percent
them as unusual. Using domain knowledge to classify activities as                      of everything traded in the world is done by sea [11].Since 2004,
anomalous is essential in the maritime navigation environment                          vessels of 300 gross tonnages or more which travel internation-
since there is a well-known lack of labeled data in this domain. In                    ally, and cargo ships of 500 gross tonnages or more are obligated
an area where finding which trips are anomalous is a challenging                       by the International Maritime Organization (IMO) to have Au-
task when using solely automatic approaches, we use visual ana-                        tomatic Identification System (AIS) onboard1 which produces a
lytics to bridge this gap. In this work, we propose a tool that uses                   constant high volume of data [14]. This technology transmits the
                                                                                       vessel destination, speed, position, and many other items of static
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-   information, such as ship name and Maritime Mobile Service
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
                                                                                       Identity (MMSI), which is used to identify a ship uniquely [11].
International (CC BY 4.0)
                                                                                       1 http://www.imo.org/en/OurWork/Safety/Navigation/Pages/AIS.aspx
    The Department of Defense of Canada (DRDC) and surveil-              several AIS messages from vessels traveling close to the coast
lance authorities, such as Coastal Marine Security Operation             due to information overloading [10]. Third, even though Satel-
Centres (MSOCs) which are responsible for guaranteeing coastal           lite AIS has become more common since it can capture longer
safety, have an interest in using this data to uncover several           ranges than shore-based AIS, it is common for the data received
potential issues [5], such as illegal transport of drugs, human          by it to have gaps. Finally, there are also cases where vessel crew
trafficking, fishing in illegal areas, illegal immigration, sea pol-     interfere with AIS signal or turn the transponder off to cover
lution, piracy, and even terrorism [1]. These activities have a          illegal activities [9]. For this reason, vessel trajectories often need
significant impact on society, environment, and economy, and             to be interpolated, which can increase algorithm accuracy [3].
for such, it is essential to identify these types of events as soon as   However, the interpolated data’s anomalies may be incorrect if
possible [16]. Vessels involved in these types of illegal activities     the interpolation was not done correctly or when many consec-
usually follow specific patterns like unexpected stops, speeding,        utive data points are missing. Therefore, it would be important
and deviations from standard routes [1, 11]. Ships that are op-          to present information related to interpolation if an anomaly is
erating legally commonly travel through the same route due to            detected in the interpolated region of a trajectory, such as what
regulations and because it is usually the shortest path between          was the quality of that interpolation or show the interpolation it-
ports, which would decrease the vessel fuel consumption. For             self, so one can assess if the interpolation was done properly and
this reason, ships that navigate non-standard routes or show sig-        if it is indeed an anomaly. The user could also further investigate
nals of route deviations can be potentially labeled as presenting        what could have happened when there was no signal. However,
anomalous behavior [1]. However, identifying which trips are             to our knowledge, there is no work in this field that allows users
anomalous is not an easy task for maritime operators due to the          to explore the potential impact of interpolation on anomalies.
large volume of data produced by AIS systems, which creates an               In this paper, we propose a tool that aims to tackle the problems
overload of instances to be analyzed manually. Currently, oper-          mentioned above. We make very few assumptions about who
ators usually use systems that display vessels on a world map            the users of this tool could be. This paper contributes with the
that they can use to track their movements [6]. Although this            proposal and development of a visual analytics tool for finding
can help operators reach some awareness of what is going on in           local anomalies in trip trajectories while also taking into account
the sea, it can prove a difficult task trying to identify anomalous      the trip’s interpolation. Section 2 describes the proposed tool and
vessels among a large number of normal vessels [5].                      discusses some of the decisions that were made. Section 3 we
    Many works focus on finding anomalies in an automated man-           show a use case of our tool. Finally, in Section 4, we present a
ner, such as [7], [11] and [20] which use different clustering           summary of this work and discuss some of our tool’s limitations;
techniques to extract a group of trajectories with similar be-           and we propose some ideas for future work.
havior. Then other methods are used to classify the trajectories.
However, the problem of automatically identifying anomalies is
very complex and not well-defined [13]; additionally, it requires        2    TRIP OUTLIER SCORING TOOL (TOST)
dynamic adaptation since humans will always try to change their          As mentioned previously, this work aims to develop a tool for
modus operandi to not get caught, which in turn, makes auto-             identifying local anomalies in trip trajectories while also pro-
matic systems less reliable [12]. Thus, systems that automatically       viding users some information about the interpolation, such as
detect anomalies are rarely used in the real world [12, 13]. On the      where and how it happened and how much interpolation there is
other hand, visualizations make use of humans’ inherent ability          on the trajectory. In this work, a trip is defined by the sequence of
to perceive patterns and filter information in combination with          a vessel’s AIS messages when traveling from one port to another.
their creativity and background knowledge [8, 13], which allows          A spatial region can be defined as a 2-dimensional geographic
them to be able to analyze and understand complex, massive, and          polygon. In this work, we create it automatically for the user by
dynamic data.                                                            creating a minimal box containing all points of all trajectories
    Some known works in the field, such as [13] and [5] use a com-       that traveled between two specific ports and then divide it into N
bination of visualization and automated techniques to aid the user       spatial regions of same area. Finally, a subtrajectory is a sequence
when trying to identify anomalies. However, the vast majority            of points of a trajectory contained in a spatial region.
of algorithms proposed to identify anomalies automatically may              Figure 2 shows an overview of our framwework’s steps. It is
not work for local anomalies [18], or they require labeled data to       composed of a preprocessing step that combines two sources
train a model [4, 15]. This means that deviations from normality         of AIS data to get trips’ information. Trips that don’t share the
that happen just in a small portion of a vessel trajectory may be        same origin and destination are removed. The remaining trips
left out when considering the trajectory as a whole, especially          go through a cleaning process where invalid data, such as outlier
when analyzing works in the maritime domain. The only work               points, are removed, and gaps are interpolated. We then create
we found that could partially address this issue is [17]. Their          spatial regions that serve the purpose of partitioning each trip
method chooses N equally spatially distributed sample points             trajectory into subtrajectories. The subtrajectories’ attributes, such
for trips, and then it classifies them as anomalous routes with          as average speed, is given a score based on how much they deviate
low probabilistic density points. However, this work may miss            from the mean over all other trips attribute values; the combined
local anomalies depending on the number of samples chosen,               final score for each subtrajectory is then displayed in a tabular
while ours use all trajectory points. Their tool only works for          visualization. Each trip is represented as a row in the table where
positional data, while we use several attributes.                        the first column may show the maximum or average score for a
    Lastly, when analyzing vessel trajectories from raw AIS data,        trip, depending on the user’s selection. The other columns show
it can be faulty and incomplete, and it can happen for multiple          the subtrajectory scores, which are represented by a bar length,
reasons. First, one of the frequencies used by AIS transceivers          while the color of the bar shows the amount of interpolation in
is Very High Frequency (VHF), which makes AIS data unreli-               the subtrajectory.
able [19]. Second, Vessel Traffic Service (VTS) stations may miss
   We first display an overview of the overall maritime situation                                                            can be seen at the bottom of a table when a user hovers over a
in the table. The users can then use filters to remove uninteresting                                                         row with the mouse. At the top of the table, we show the distri-
data, so it shows only trips of interest. They can hover or select                                                           bution of each region’s scores as purple bars. This visualization
an individual row to see the scores and interpolation values of a                                                            has two purposes: first, the user can brush the region to filter out
trip. By clicking on a row, the trajectory trip will be displayed on                                                         uninteresting vessels, and so decreasing the number of vessels
the map. The user can then compare the trajectory trip against                                                               displayed at the table which could improve the table visibility.
the mean trajectory to see if there were any deviations and if                                                               Second, showing the distribution may reveal a spatial region with
the interpolation was done correctly. The user can also choose                                                               a higher number of outliers than others or a region where the
which attributes and spatial regions should be used during the                                                               outliers have a much higher score.
score computation, which will update the subtrajectory score.
                                                                                                                             3    A USE CASE
 Raw Data            Preprocessing                                                                                           In this use case, we exemplify the use of TOST2 for finding speed
                      (1) Integration     (2) Cleaning                    (3) Segmentation         (4) Feature Extraction    anomalies far from shore. The dataset used includes trips of cargo
 Positional
   Data
              .csv
                         Reads raw           1) Invalid data removal
                                                                                                     Calculate trip values   ships that traveled from Houston to New Orleans from 2009 to
                                                                            Creates spatial         for each segment (avg
              .csv
                          data and
                        populate DB
                                                  2) Interpolation
                                             3) Attributes calculation
                                                                               regions               speed, avg heading,
                                                                                                              etc)
                                                                                                                             2018. We first use the Score Computation (see Figure 1(A)) to
                                                                                                                             select only regions 5, 6, and 7, and we selected only the average
  Voyage
   Data

                                                                                                                             speed attribute that is the main target of this analysis. Other
                                                                                                                             options for regions could have also been used by clicking on the
                                                                      - Trips interpolated data
                                                                          - Spatial regions                                  yellow regions on the map (see Figure 1(B)). If the user clicks on
                                                                     - Subtrajectories features
                                                                                                                             those controls, these interactions would recompute the scores and
                          Visualization                         Web Server                                                   update the visualization only to display the regions of interest.
                            - Score aggregation        .json         - Calculate median route                                   Next, we choose to have the first column to display by highest
                            - Route visualization                  - Calculate scores for each
                               - Trip ranking                     subtrajectory for each feature                             score or average score. Since we want to highlight trips that may
                                                                                                                             have an outlier behavior, we chose the one with the highest score
                                                                                                                             even in only a single region. Given that many trips are being
Figure 2: Overview of the framework of the Trip Outlier
                                                                                                                             displayed, we filter out trips with a score below 2.5 by brushing
Scoring Tool
                                                                                                                             the score distribution in the Highest Score column. This could
                                                                                                                             also have been accomplished by inputting this value manually
   Our tool has three main components: the Score computation                                                                 after clicking "show filters", which is useful when high precision is
(A), a map (B), and Trip Score table (C), as shown in Figure 1.                                                              necessary, the updated trip score table can be seen in Figure 3. By
The Score computation allows the users to chose which spatial                                                                looking at the filtered trips, we can see that most subtrajectories
regions and attributes they want to use to compute the scores                                                                have some degree of interpolation, especially in region 7, which
for each trip subtrajectory. As an aggregate final score for each                                                            may indicate that it is a region where the terrestrial tower cannot
trip, we may show the highest score, which is the highest value                                                              capture the AIS messages.
amongst all trip subtrajectories, or it can show the average score of
the trip subtrajectories. In order to calculate a substrajectory score,
we first calculate the z-score for each attribute selected by the
user. Then these values are summed together and divided by the
number of attributes. When calculating a subtrajectory attribute
z-score, the population comprises all other subtrajectories created
by the same spatial region for trips with the same origin and
destination ports.
   The Map was created to display the previously created re-
gions as well as trip trajectories. It is displayed with a zoom on
the region containing the two ports. Since we want the user to
differentiate the original points and from the ones that were cre-
ated after the interpolation, we distinguish them by color. The
black portion of the trajectory was created from the original data
points, while the red portion was interpolated {colorblueas can                                                              Figure 3: Trip scores filtered to show only trips with score
be seen in Figure 5. We also display a mean trajectory in the map,                                                           above 2.5
representing a path that a trip should make. This trajectory is
calculated using a function of the tool created by Erland et al. [2].
                                                                                                                                After, we rank the trajectories by the highest score and hover
   In the Score Table each row in this table represents a trip.
                                                                                                                             the mouse on top of the row to see the trip’s scores, which has the
For each column, there is a bar in which its length represents
                                                                                                                             subtrajectory with the highest score. This score belongs to the
the subtrajectory aggregated score, and the color represents the
                                                                                                                             trip with id equals to 2187, as can be seen in Figure 4. Trip 2187
percentage of interpolated points. The bar’s height is dynamic;
                                                                                                                             has a high score, especially on region 6 and 7. We can also see
they change based on how many trips are being displayed at a
                                                                                                                             that in region 7, all points are interpolated, which indicates that
given time. A longer bar may indicate a higher deviation from
                                                                                                                             this score is not reliable since the region is not has a considerable
normality since our score is derived from the z-score. Longer
                                                                                                                             size. If we click on the row to plot this trip trajectory in the map,
bars also stand out in comparison to smaller bars. And the inter-
                                                                                                                             we can see that this interpolation does not seem reliable; thus,
polation is displayed as a gradient from blue to red. The exact
scores and interpolation values for a trip, as well as the trip id,                                                          2 https://gitlab.com/Fernando-Abreu/thesis_project
the score for this subtrajectory cannot be trusted. After plotting,     and have a more fine-grained analysis. We also intend to add a
the expert should think if this gap size makes sense or if this trip    page that allows the users to choose between creating the spatial
needs further investigation.                                            regions automatically or manually. If the user chooses to create
                                                                        manually, the user should be able to draw spatial regions on a
                                                                        map using drawing tools in the map. Otherwise, the tool will
                                                                        create regions based on trajectory patterns or using trajectory
                                                                        segmentation methods.

                                                                        REFERENCES
                                                                         [1] Enrica d’Afflisio, Paolo Braca, Leonardo M Millefiori, and Peter Willett. 2018.
                                                                             Detecting anomalous deviations from standard maritime routes using the
                                                                             Ornstein–Uhlenbeck process. IEEE Transactions on Signal Processing 66, 24
                                                                             (2018), 6474–6487.
                                                                         [2] Willem Eerland, Simon Box, Hans Fangohr, and András Sóbester. 2017.
                                                                             Teetool–a probabilistic trajectory analysis tool. Journal of Open Research
                                                                             Software 5, 1 (2017).
                                                                         [3] Dini Oktarina Dwi Handayani, Wahju Sediono, and Asadullah Shah. 2013.
                                                                             Anomaly detection in vessel tracking using support vector machines (SVMs).
Figure 4: Trip Scores with trip with highest subtrajectory                   In 2013 International Conference on Advanced Computer Science Applications
score selected. Trips ranked 1 and 10 are highlighted                        and Technologies. IEEE, 213–217.
                                                                         [4] Amílcar Soares Júnior, Chiara Renso, and Stan Matwin. 2017. Analytic: An
                                                                             active learning system for trajectory classification. IEEE computer graphics
                                                                             and applications 37, 5 (2017), 28–39.
                                                                         [5] Valérie Lavigne. 2014. Interactive visualization applications for maritime
                                                                             anomaly detection and analysis. In ACM SIGKDD Workshop on Interactive
                                                                             Data Exploration and Analytics. 75.
                                                                         [6] Etienne Martineau and Jean Roy. 2011. Maritime anomaly detection: Domain
                                                                             introduction and review of selected literature. Technical Report. DEFENCE
                                                                             RESEARCH AND DEVELOPMENT CANADA VALCARTIER (QUEBEC).
                                                                         [7] Steven Mascaro, Ann E Nicholso, and Kevin B Korb. 2014. Anomaly detection
                                                                             in vessel tracks using Bayesian networks. International Journal of Approximate
                                                                             Reasoning 55, 1 (2014), 84–98.
                                                                         [8] Lucas May Petry, Amilcar Soares, Vania Bogorny, Bruno Brandoli, and Stan
                                                                             Matwin. 2020. Challenges in Vessel Behavior and Anomaly Detection: From
                                                                             Classical Machine Learning to Deep Learning. In Advances in Artificial Intelli-
                                                                             gence, Cyril Goutte and Xiaodan Zhu (Eds.). Springer International Publishing,
                                                                             Cham, 401–407.
                                                                         [9] Fabio Mazzarella, Michele Vespe, Alfredo Alessandrini, Dario Tarchi, Giuseppe
                                                                             Aulicino, and Antonio Vollero. 2017. A novel anomaly detection approach to
                                                                             identify intentional AIS on-off switching. Expert Systems with Applications 78
                                                                             (2017), 110–123.
                                                                        [10] Van-Suong Nguyen, Nam-kyun Im, and Sang-min Lee. 2015. The interpolation
                                                                             method for the missing AIS data of ship. Journal of Navigation and Port
                 Figure 5: Trip 2187 trajectory                              Research 39, 5 (2015), 377–384.
                                                                        [11] Giuliana Pallotta, Michele Vespe, and Karna Bryan. 2013. Vessel pattern
                                                                             knowledge discovery from AIS data: A framework for anomaly detection and
   Another example is trip 339, which is on rank 10 of our selec-            route prediction. Entropy 15, 6 (2013), 2218–2245.
tion. When we look at the table, we can see that although the           [12] Maria Riveiro and Göran Falkman. 2011. The role of visualization and interac-
                                                                             tion in maritime anomaly detection. In Visualization and Data Analysis 2011,
tool added some interpolated points on subtrajectories in regions            Vol. 7868. International Society for Optics and Photonics, 78680M.
6 and 7, region 5 had an outlier behaviour. When we hover this          [13] Maria Riveiro, Göran Falkman, Tom Ziemke, and Håkan Warston. 2009. VISAD:
                                                                             an interactive and visual analytical tool for the detection of behavioral anom-
row to see that it had a 0 percent interpolation and score of 3.28.          alies in maritime traffic data. In Visual Analytics for Homeland Defense and
Therefore, this score is very reliable, and the user could frame             Security, Vol. 7346. International Society for Optics and Photonics, 734607.
this as an outlier behavior. If the expert decides to have a close      [14] Amílcar Soares, Renata Dividino, Fernando Abreu, Matthew Brousseau, An-
                                                                             thony W Isenor, Sean Webb, and Stan Matwin. 2019. CRISIS: integrating AIS
look at the data, they could see that this trip had an average               and ocean data streams using semantic web standards for event detection. In
speed of 5.93 knots in region five, while the average speed in that          2019 International Conference on Military Communications and Information
particular region is 15.69 knots with a 3.24 standard deviation.             Systems (ICMCIS). IEEE, 1–7.
                                                                        [15] Amílcar Soares, Jordan Rose, Mohammad Etemad, Chiara Renso, and Stan
Now it is the expert’s job to try to understand why the vessel               Matwin. 2019. VISTA: A visual analytics platform for semantic annotation of
navigated so slowly in that region compared to other vessels. The            trajectories.. In EDBT. 570–573.
                                                                        [16] Iraklis Varlamis, Ioannis Kontopoulos, Konstantinos Tserpes, Mohammad
conclusion of the investigation could point to engine issues or              Etemad, Amilcar Soares, and Stan Matwin. 2020. Building navigation networks
unregulated or illegal activity associated with the vessel.                  from multi-vessel trajectory data. GeoInformatica (2020). https://doi.org/10.
                                                                             1007/s10707-020-00421-y
                                                                        [17] Guizhen Wang, Abish Malik, Calvin Yau, Chittayong Surakitbanharn, and
4    CONCLUSION                                                              David S Ebert. 2017. TraSeer: A visual analytics tool for vessel movements in
In this work, we identified local anomalies using a combination              the coastal areas. In 2017 IEEE International Symposium on Technologies for
                                                                             Homeland Security (HST). IEEE, 1–6.
of features and used an interpolation strategy to give the user a       [18] Wanqi Yang, Yang Gao, and Longbing Cao. 2013. TRASMIL: A local anomaly
certain degree of reliability to the anomaly. We achieved this goal          detection framework based on trajectory segmentation and multi-instance
by proposing and developing a web tool that partitions and scores            learning. Computer Vision and Image Understanding 117, 10 (2013), 1273–1286.
                                                                        [19] Daiyong Zhang, Jia Li, Qing Wu, Xinglong Liu, Xiumin Chu, and Wei He.
each subtrajectory regarding its attributes. Users can interact              2017. Enhance the AIS data availability by screening and interpolation. In
with this tool through filtering and sorting to find trips with local        2017 4th International Conference on Transportation Information and Safety
                                                                             (ICTIS). IEEE, 981–986.
anomalies. They can also plot trajectories trips in the map and         [20] Rong Zhen, Yongxing Jin, Qinyou Hu, Zheping Shao, and Nikitas Nikitakos.
identify which portions of that trajectory were interpolated.                2017. Maritime anomaly detection within coastal waters based on vessel
   Future works include using a clustering algorithm to group                trajectory clustering and Naïve Bayes Classifier. The Journal of Navigation 70,
                                                                             3 (2017), 648.
trips with similar trajectories to compare the same class of vessels