=Paper=
{{Paper
|id=Vol-2841/BMDA_3
|storemode=property
|title=Local Anomaly Detection In Maritime Traffic Using Visual Analytics
|pdfUrl=https://ceur-ws.org/Vol-2841/BMDA_3.pdf
|volume=Vol-2841
|authors=Fernando Henrique Oliveira Abreu,Amílcar Soares,Fernando V. Paulovich,Stan Matwin
|dblpUrl=https://dblp.org/rec/conf/edbt/AbreuSPM21
}}
==Local Anomaly Detection In Maritime Traffic Using Visual Analytics==
Local Anomaly Detection In Maritime Traffic Using Visual Analytics Fernando H. O. Abreu Amilcar Soares Dalhousie University Memorial University of Newfoundland Halifax, NS, Canada St. Johns, NL, Canada fernando.abreu@dal.ca amilcarsj@mun.ca Fernando V. Paulovich Stan Matwin Dalhousie University Dalhousie University Halifax, NS, Canada Halifax, NS, Canada paulovich@dal.ca stan@cs.dal.ca Figure 1: Overview of the Trip Outlier Scoring Tool (TOST). The user uses the Score computation component (A) to control which spatial regions and attributes will be used in the score. The trip scores are visualizes in the Trip Score component (C) where the user can filter and sort the data, and select a trajectory trip to be displayed in the map (B). ABSTRACT spatial regions to divide trips into subtrajectories and score them. With the recent increase in sea transportation usage, maritime The scores are displayed in a tabular visualization where users surveillance’s importance to detect unusual vessel behavior re- can rank trips by segment to find local anomalies. The amount of lated to several illegal activities has also risen. Unfortunately, the interpolation in subtrajectories is displayed together with scores, data collected by the surveillance systems are often incomplete, and the trip is displayed on the map so users can use their insight creating a need for the data gaps to be filled using techniques to make sense if the score is reliable. such as interpolation methods. However, such approaches do not decrease the uncertainty of ship activities. Depending on the fre- 1 INTRODUCTION quency of the data generated, they may even confuse operators, inducing them to errors when evaluating ship activities to tag Maritime transportation is essential nowadays; about 90 percent them as unusual. Using domain knowledge to classify activities as of everything traded in the world is done by sea [11].Since 2004, anomalous is essential in the maritime navigation environment vessels of 300 gross tonnages or more which travel internation- since there is a well-known lack of labeled data in this domain. In ally, and cargo ships of 500 gross tonnages or more are obligated an area where finding which trips are anomalous is a challenging by the International Maritime Organization (IMO) to have Au- task when using solely automatic approaches, we use visual ana- tomatic Identification System (AIS) onboard1 which produces a lytics to bridge this gap. In this work, we propose a tool that uses constant high volume of data [14]. This technology transmits the vessel destination, speed, position, and many other items of static © 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed- information, such as ship name and Maritime Mobile Service ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus) on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0 Identity (MMSI), which is used to identify a ship uniquely [11]. International (CC BY 4.0) 1 http://www.imo.org/en/OurWork/Safety/Navigation/Pages/AIS.aspx The Department of Defense of Canada (DRDC) and surveil- several AIS messages from vessels traveling close to the coast lance authorities, such as Coastal Marine Security Operation due to information overloading [10]. Third, even though Satel- Centres (MSOCs) which are responsible for guaranteeing coastal lite AIS has become more common since it can capture longer safety, have an interest in using this data to uncover several ranges than shore-based AIS, it is common for the data received potential issues [5], such as illegal transport of drugs, human by it to have gaps. Finally, there are also cases where vessel crew trafficking, fishing in illegal areas, illegal immigration, sea pol- interfere with AIS signal or turn the transponder off to cover lution, piracy, and even terrorism [1]. These activities have a illegal activities [9]. For this reason, vessel trajectories often need significant impact on society, environment, and economy, and to be interpolated, which can increase algorithm accuracy [3]. for such, it is essential to identify these types of events as soon as However, the interpolated data’s anomalies may be incorrect if possible [16]. Vessels involved in these types of illegal activities the interpolation was not done correctly or when many consec- usually follow specific patterns like unexpected stops, speeding, utive data points are missing. Therefore, it would be important and deviations from standard routes [1, 11]. Ships that are op- to present information related to interpolation if an anomaly is erating legally commonly travel through the same route due to detected in the interpolated region of a trajectory, such as what regulations and because it is usually the shortest path between was the quality of that interpolation or show the interpolation it- ports, which would decrease the vessel fuel consumption. For self, so one can assess if the interpolation was done properly and this reason, ships that navigate non-standard routes or show sig- if it is indeed an anomaly. The user could also further investigate nals of route deviations can be potentially labeled as presenting what could have happened when there was no signal. However, anomalous behavior [1]. However, identifying which trips are to our knowledge, there is no work in this field that allows users anomalous is not an easy task for maritime operators due to the to explore the potential impact of interpolation on anomalies. large volume of data produced by AIS systems, which creates an In this paper, we propose a tool that aims to tackle the problems overload of instances to be analyzed manually. Currently, oper- mentioned above. We make very few assumptions about who ators usually use systems that display vessels on a world map the users of this tool could be. This paper contributes with the that they can use to track their movements [6]. Although this proposal and development of a visual analytics tool for finding can help operators reach some awareness of what is going on in local anomalies in trip trajectories while also taking into account the sea, it can prove a difficult task trying to identify anomalous the trip’s interpolation. Section 2 describes the proposed tool and vessels among a large number of normal vessels [5]. discusses some of the decisions that were made. Section 3 we Many works focus on finding anomalies in an automated man- show a use case of our tool. Finally, in Section 4, we present a ner, such as [7], [11] and [20] which use different clustering summary of this work and discuss some of our tool’s limitations; techniques to extract a group of trajectories with similar be- and we propose some ideas for future work. havior. Then other methods are used to classify the trajectories. However, the problem of automatically identifying anomalies is very complex and not well-defined [13]; additionally, it requires 2 TRIP OUTLIER SCORING TOOL (TOST) dynamic adaptation since humans will always try to change their As mentioned previously, this work aims to develop a tool for modus operandi to not get caught, which in turn, makes auto- identifying local anomalies in trip trajectories while also pro- matic systems less reliable [12]. Thus, systems that automatically viding users some information about the interpolation, such as detect anomalies are rarely used in the real world [12, 13]. On the where and how it happened and how much interpolation there is other hand, visualizations make use of humans’ inherent ability on the trajectory. In this work, a trip is defined by the sequence of to perceive patterns and filter information in combination with a vessel’s AIS messages when traveling from one port to another. their creativity and background knowledge [8, 13], which allows A spatial region can be defined as a 2-dimensional geographic them to be able to analyze and understand complex, massive, and polygon. In this work, we create it automatically for the user by dynamic data. creating a minimal box containing all points of all trajectories Some known works in the field, such as [13] and [5] use a com- that traveled between two specific ports and then divide it into N bination of visualization and automated techniques to aid the user spatial regions of same area. Finally, a subtrajectory is a sequence when trying to identify anomalies. However, the vast majority of points of a trajectory contained in a spatial region. of algorithms proposed to identify anomalies automatically may Figure 2 shows an overview of our framwework’s steps. It is not work for local anomalies [18], or they require labeled data to composed of a preprocessing step that combines two sources train a model [4, 15]. This means that deviations from normality of AIS data to get trips’ information. Trips that don’t share the that happen just in a small portion of a vessel trajectory may be same origin and destination are removed. The remaining trips left out when considering the trajectory as a whole, especially go through a cleaning process where invalid data, such as outlier when analyzing works in the maritime domain. The only work points, are removed, and gaps are interpolated. We then create we found that could partially address this issue is [17]. Their spatial regions that serve the purpose of partitioning each trip method chooses N equally spatially distributed sample points trajectory into subtrajectories. The subtrajectories’ attributes, such for trips, and then it classifies them as anomalous routes with as average speed, is given a score based on how much they deviate low probabilistic density points. However, this work may miss from the mean over all other trips attribute values; the combined local anomalies depending on the number of samples chosen, final score for each subtrajectory is then displayed in a tabular while ours use all trajectory points. Their tool only works for visualization. Each trip is represented as a row in the table where positional data, while we use several attributes. the first column may show the maximum or average score for a Lastly, when analyzing vessel trajectories from raw AIS data, trip, depending on the user’s selection. The other columns show it can be faulty and incomplete, and it can happen for multiple the subtrajectory scores, which are represented by a bar length, reasons. First, one of the frequencies used by AIS transceivers while the color of the bar shows the amount of interpolation in is Very High Frequency (VHF), which makes AIS data unreli- the subtrajectory. able [19]. Second, Vessel Traffic Service (VTS) stations may miss We first display an overview of the overall maritime situation can be seen at the bottom of a table when a user hovers over a in the table. The users can then use filters to remove uninteresting row with the mouse. At the top of the table, we show the distri- data, so it shows only trips of interest. They can hover or select bution of each region’s scores as purple bars. This visualization an individual row to see the scores and interpolation values of a has two purposes: first, the user can brush the region to filter out trip. By clicking on a row, the trajectory trip will be displayed on uninteresting vessels, and so decreasing the number of vessels the map. The user can then compare the trajectory trip against displayed at the table which could improve the table visibility. the mean trajectory to see if there were any deviations and if Second, showing the distribution may reveal a spatial region with the interpolation was done correctly. The user can also choose a higher number of outliers than others or a region where the which attributes and spatial regions should be used during the outliers have a much higher score. score computation, which will update the subtrajectory score. 3 A USE CASE Raw Data Preprocessing In this use case, we exemplify the use of TOST2 for finding speed (1) Integration (2) Cleaning (3) Segmentation (4) Feature Extraction anomalies far from shore. The dataset used includes trips of cargo Positional Data .csv Reads raw 1) Invalid data removal Calculate trip values ships that traveled from Houston to New Orleans from 2009 to Creates spatial for each segment (avg .csv data and populate DB 2) Interpolation 3) Attributes calculation regions speed, avg heading, etc) 2018. We first use the Score Computation (see Figure 1(A)) to select only regions 5, 6, and 7, and we selected only the average Voyage Data speed attribute that is the main target of this analysis. Other options for regions could have also been used by clicking on the - Trips interpolated data - Spatial regions yellow regions on the map (see Figure 1(B)). If the user clicks on - Subtrajectories features those controls, these interactions would recompute the scores and Visualization Web Server update the visualization only to display the regions of interest. - Score aggregation .json - Calculate median route Next, we choose to have the first column to display by highest - Route visualization - Calculate scores for each - Trip ranking subtrajectory for each feature score or average score. Since we want to highlight trips that may have an outlier behavior, we chose the one with the highest score even in only a single region. Given that many trips are being Figure 2: Overview of the framework of the Trip Outlier displayed, we filter out trips with a score below 2.5 by brushing Scoring Tool the score distribution in the Highest Score column. This could also have been accomplished by inputting this value manually Our tool has three main components: the Score computation after clicking "show filters", which is useful when high precision is (A), a map (B), and Trip Score table (C), as shown in Figure 1. necessary, the updated trip score table can be seen in Figure 3. By The Score computation allows the users to chose which spatial looking at the filtered trips, we can see that most subtrajectories regions and attributes they want to use to compute the scores have some degree of interpolation, especially in region 7, which for each trip subtrajectory. As an aggregate final score for each may indicate that it is a region where the terrestrial tower cannot trip, we may show the highest score, which is the highest value capture the AIS messages. amongst all trip subtrajectories, or it can show the average score of the trip subtrajectories. In order to calculate a substrajectory score, we first calculate the z-score for each attribute selected by the user. Then these values are summed together and divided by the number of attributes. When calculating a subtrajectory attribute z-score, the population comprises all other subtrajectories created by the same spatial region for trips with the same origin and destination ports. The Map was created to display the previously created re- gions as well as trip trajectories. It is displayed with a zoom on the region containing the two ports. Since we want the user to differentiate the original points and from the ones that were cre- ated after the interpolation, we distinguish them by color. The black portion of the trajectory was created from the original data points, while the red portion was interpolated {colorblueas can Figure 3: Trip scores filtered to show only trips with score be seen in Figure 5. We also display a mean trajectory in the map, above 2.5 representing a path that a trip should make. This trajectory is calculated using a function of the tool created by Erland et al. [2]. After, we rank the trajectories by the highest score and hover In the Score Table each row in this table represents a trip. the mouse on top of the row to see the trip’s scores, which has the For each column, there is a bar in which its length represents subtrajectory with the highest score. This score belongs to the the subtrajectory aggregated score, and the color represents the trip with id equals to 2187, as can be seen in Figure 4. Trip 2187 percentage of interpolated points. The bar’s height is dynamic; has a high score, especially on region 6 and 7. We can also see they change based on how many trips are being displayed at a that in region 7, all points are interpolated, which indicates that given time. A longer bar may indicate a higher deviation from this score is not reliable since the region is not has a considerable normality since our score is derived from the z-score. Longer size. If we click on the row to plot this trip trajectory in the map, bars also stand out in comparison to smaller bars. And the inter- we can see that this interpolation does not seem reliable; thus, polation is displayed as a gradient from blue to red. The exact scores and interpolation values for a trip, as well as the trip id, 2 https://gitlab.com/Fernando-Abreu/thesis_project the score for this subtrajectory cannot be trusted. After plotting, and have a more fine-grained analysis. We also intend to add a the expert should think if this gap size makes sense or if this trip page that allows the users to choose between creating the spatial needs further investigation. regions automatically or manually. If the user chooses to create manually, the user should be able to draw spatial regions on a map using drawing tools in the map. Otherwise, the tool will create regions based on trajectory patterns or using trajectory segmentation methods. REFERENCES [1] Enrica d’Afflisio, Paolo Braca, Leonardo M Millefiori, and Peter Willett. 2018. Detecting anomalous deviations from standard maritime routes using the Ornstein–Uhlenbeck process. IEEE Transactions on Signal Processing 66, 24 (2018), 6474–6487. [2] Willem Eerland, Simon Box, Hans Fangohr, and András Sóbester. 2017. Teetool–a probabilistic trajectory analysis tool. Journal of Open Research Software 5, 1 (2017). [3] Dini Oktarina Dwi Handayani, Wahju Sediono, and Asadullah Shah. 2013. Anomaly detection in vessel tracking using support vector machines (SVMs). Figure 4: Trip Scores with trip with highest subtrajectory In 2013 International Conference on Advanced Computer Science Applications score selected. Trips ranked 1 and 10 are highlighted and Technologies. IEEE, 213–217. [4] Amílcar Soares Júnior, Chiara Renso, and Stan Matwin. 2017. Analytic: An active learning system for trajectory classification. IEEE computer graphics and applications 37, 5 (2017), 28–39. [5] Valérie Lavigne. 2014. Interactive visualization applications for maritime anomaly detection and analysis. In ACM SIGKDD Workshop on Interactive Data Exploration and Analytics. 75. [6] Etienne Martineau and Jean Roy. 2011. Maritime anomaly detection: Domain introduction and review of selected literature. Technical Report. DEFENCE RESEARCH AND DEVELOPMENT CANADA VALCARTIER (QUEBEC). [7] Steven Mascaro, Ann E Nicholso, and Kevin B Korb. 2014. Anomaly detection in vessel tracks using Bayesian networks. International Journal of Approximate Reasoning 55, 1 (2014), 84–98. [8] Lucas May Petry, Amilcar Soares, Vania Bogorny, Bruno Brandoli, and Stan Matwin. 2020. Challenges in Vessel Behavior and Anomaly Detection: From Classical Machine Learning to Deep Learning. In Advances in Artificial Intelli- gence, Cyril Goutte and Xiaodan Zhu (Eds.). Springer International Publishing, Cham, 401–407. [9] Fabio Mazzarella, Michele Vespe, Alfredo Alessandrini, Dario Tarchi, Giuseppe Aulicino, and Antonio Vollero. 2017. A novel anomaly detection approach to identify intentional AIS on-off switching. Expert Systems with Applications 78 (2017), 110–123. [10] Van-Suong Nguyen, Nam-kyun Im, and Sang-min Lee. 2015. The interpolation method for the missing AIS data of ship. Journal of Navigation and Port Figure 5: Trip 2187 trajectory Research 39, 5 (2015), 377–384. [11] Giuliana Pallotta, Michele Vespe, and Karna Bryan. 2013. Vessel pattern knowledge discovery from AIS data: A framework for anomaly detection and Another example is trip 339, which is on rank 10 of our selec- route prediction. Entropy 15, 6 (2013), 2218–2245. tion. When we look at the table, we can see that although the [12] Maria Riveiro and Göran Falkman. 2011. The role of visualization and interac- tion in maritime anomaly detection. In Visualization and Data Analysis 2011, tool added some interpolated points on subtrajectories in regions Vol. 7868. International Society for Optics and Photonics, 78680M. 6 and 7, region 5 had an outlier behaviour. When we hover this [13] Maria Riveiro, Göran Falkman, Tom Ziemke, and Håkan Warston. 2009. VISAD: an interactive and visual analytical tool for the detection of behavioral anom- row to see that it had a 0 percent interpolation and score of 3.28. alies in maritime traffic data. In Visual Analytics for Homeland Defense and Therefore, this score is very reliable, and the user could frame Security, Vol. 7346. International Society for Optics and Photonics, 734607. this as an outlier behavior. If the expert decides to have a close [14] Amílcar Soares, Renata Dividino, Fernando Abreu, Matthew Brousseau, An- thony W Isenor, Sean Webb, and Stan Matwin. 2019. CRISIS: integrating AIS look at the data, they could see that this trip had an average and ocean data streams using semantic web standards for event detection. In speed of 5.93 knots in region five, while the average speed in that 2019 International Conference on Military Communications and Information particular region is 15.69 knots with a 3.24 standard deviation. Systems (ICMCIS). IEEE, 1–7. [15] Amílcar Soares, Jordan Rose, Mohammad Etemad, Chiara Renso, and Stan Now it is the expert’s job to try to understand why the vessel Matwin. 2019. VISTA: A visual analytics platform for semantic annotation of navigated so slowly in that region compared to other vessels. The trajectories.. In EDBT. 570–573. [16] Iraklis Varlamis, Ioannis Kontopoulos, Konstantinos Tserpes, Mohammad conclusion of the investigation could point to engine issues or Etemad, Amilcar Soares, and Stan Matwin. 2020. Building navigation networks unregulated or illegal activity associated with the vessel. from multi-vessel trajectory data. GeoInformatica (2020). https://doi.org/10. 1007/s10707-020-00421-y [17] Guizhen Wang, Abish Malik, Calvin Yau, Chittayong Surakitbanharn, and 4 CONCLUSION David S Ebert. 2017. TraSeer: A visual analytics tool for vessel movements in In this work, we identified local anomalies using a combination the coastal areas. In 2017 IEEE International Symposium on Technologies for Homeland Security (HST). IEEE, 1–6. of features and used an interpolation strategy to give the user a [18] Wanqi Yang, Yang Gao, and Longbing Cao. 2013. TRASMIL: A local anomaly certain degree of reliability to the anomaly. We achieved this goal detection framework based on trajectory segmentation and multi-instance by proposing and developing a web tool that partitions and scores learning. Computer Vision and Image Understanding 117, 10 (2013), 1273–1286. [19] Daiyong Zhang, Jia Li, Qing Wu, Xinglong Liu, Xiumin Chu, and Wei He. each subtrajectory regarding its attributes. Users can interact 2017. Enhance the AIS data availability by screening and interpolation. In with this tool through filtering and sorting to find trips with local 2017 4th International Conference on Transportation Information and Safety (ICTIS). IEEE, 981–986. anomalies. They can also plot trajectories trips in the map and [20] Rong Zhen, Yongxing Jin, Qinyou Hu, Zheping Shao, and Nikitas Nikitakos. identify which portions of that trajectory were interpolated. 2017. Maritime anomaly detection within coastal waters based on vessel Future works include using a clustering algorithm to group trajectory clustering and Naïve Bayes Classifier. The Journal of Navigation 70, 3 (2017), 648. trips with similar trajectories to compare the same class of vessels