Uncovering vessel movement patterns from AIS data with graph evolution analysis Emanuele Carlini Vinicius Monteiro de Lira Amilcar Soares Institute of Information Institute of Information Institute for Big Data Analytics Science and Technologies, Science and Technologies, Dalhousie University National Research Council (CNR) National Research Council (CNR) Halifax, Canada Pisa, Italy Pisa, Italy amilcar.soares@dal.ca emanuele.carlini@isti.cnr.it vinicius.monteirodelira@isti.cnr.it Mohammad Etemad Bruno Brandoli Machado Stan Matwin Institute for Big Data Analytics Institute for Big Data Analytics Institute for Big Data Analytics Dalhousie University Dalhousie University Dalhousie University Halifax, Canada Halifax, Canada Halifax, Canada etemad@dal.ca brunobrandoli@dal.ca stan@cs.dal.ca ABSTRACT Since the automatic identification system (AIS) for vessels was The availability of a large amount of Automatic Identification made mandatory in 2004 [1], there has been a surge of studies on System (AIS) data has fostered many studies on maritime vessel the GSN and other maritime networks that use such data. Many traffic during recent years, often representing vessels and ports works modeled the GSN based on graph theory [19, 22], but only relationships as graphs. Although the continuous research effort, a few of them analyzed the network in terms of its evolution over only a few works explicitly study the evolution of such graphs the years [14, 17]. Also, those works which studied the network and often consider coarse-grained time intervals. In this context, evolution used private data and performed exciting but high-level our ultimate goal is to fill this gap by providing a systematic and coarse-grained analysis, such as in [6]. study in the graph evolution by considering voyages over time. The main goal of our analysis is to provide a systematic study By mining the arrivals and departures of vessels from ports, we of the evolution aspect regarding maritime vessel routes, with the build a graph consisting of vessel voyages between ports. We purpose of identifying recurrent patterns in their evolution. The then provide a study on topological features calculated from such analysis is based on publicly available data and with a defined graphs with a strong focus on their temporal evolution. Finally, and well-documented data model. This aspect is fundamental we discuss the main limitations of our approach and the future for the reproducibility of our analysis and its expansion and perspectives that will spawn from this work. updating of results when new data arrives. Also, it considers the two necessary dimensions of time and layers (i.e., the evolution of the network can be observed for multiple types of vessels, such 1 INTRODUCTION as cargo and passengers). Such an ambitious objective has some inherent challenges Maritime transportation represents 90% of international trade that must be tackled. First, the analysis of AIS datasets typically volume and plays a paramount role in today’s economy, in terms presents a Big Data challenge: the datasets are usually vast, and of cargo shipping, passenger transportation, leisure navigation, data can arrive at a high rate. For example, ExactEarth alone and fishing operation[21]. Globalization and multiple mode trans- claims to consistently track 165,000 vessels and compiling over portation of goods in the shipping industry resulted in a massive extension of the maritime vessel route network. The study of vessel movements is a well-established source of information to understand the role of maritime routes and ports in economic, so- cial, and environmental contexts. These studies include maritime traffic control and prediction [18], human migration flows [8], bioinvasion [9] and maritime piracy [20]. However, such a role cannot be adequately unraveled by looking at ports and routes in isolation, but instead, they must be put in relation to one another. This allows the study of the interplay of all the components in the complex maritime network, and it is even more important for understanding the evolution over time of that interplay. A central concept for the analytical study of vessel routes is the Global Shipping Network (GSN), in which nodes are ports and edges are the routes between ports of cargo ships. Figure 1 illustrates the GSN resulting from the vessels’ routes of 2017 in the American coast using the MarineCadastre.gov dataset [16]. Copyright © 2020 for this paper by its author(s). Published in the Workshop Proceed- Figure 1: American coast vessels’ routes in 2017. The nodes ings of the EDBT/ICDT 2020 Joint Conference, (March 30-April 2, 2020, Copenhagen, represent ports, and the edges are voyages between two Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At- ports tribution 4.0 International (CC BY 4.0). 7,000,000 AIS messages daily1 , but data still need to be processed show that there are new port regions placed in the entrance in a reasonable time. and exit of the Panama Canal, and there are several significant Second, the purpose of any network analysis is to abstract the business opportunities in that region. complexity of a system in order to extract meaningful information A study of topological changes in the maritime trade network that is not directly available when the individual components is shown in [13]. The authors propose two new measures of are examined separately. Therefore, the definition of a network network navigability called random walk discovery and escape that encompasses time information is a complex task. Suitable difficulty. Their results show that the maritime network evolves approaches need to be carefully selected to study the evolving by increasing its navigability while doubling the number of active network. ports. The authors suggest that unlike in other real-world evolv- The contributions of this work can be outlined as the follow- ing networks studied in the literature up to date, the maritime ing: network does not densify over time, and its effective diameter • We propose an approach that uses AIS data to extract con- remains constant [13]. nections between ports derived from the vessels’ move- In [6], the author investigates the degree of overlap among ments. From these connections (or voyages), we build the different layers of circulation composing global maritime graphs in which the vertices correspond to the ports, flows. His work uses several methods from complex network whereas the edges or links correspond to the vessel voyage analysis to understand the dynamics affecting the evolution of between two different ports. In addition, each edge has a ports and shipping. The results show that there is a strong and semantic defined by the vessels’ type. path-dependent influence of multiplexity on traffic volume, range • We study several topological properties of the temporal of interaction, and centrality from various perspectives (e.g., ma- graphs generated from vessels’ movements and how these trices correlations, homophily, assortativity, and single linkage features evolve over time. Specifically, we investigate fea- analysis) [6]. When growing the network and concentrating the tures relative to graphs dimension, ability to form clusters, analysis around large hubs over time, results show that the traffic and geographical spatiality. distribution is place-dependent due to the reinforced position of • We design and run an extensive analysis using a real already established nodes [6]. dataset with AIS Data collected from vessels navigating in The work of [22] builds a GSN using the 2015 AIS data of the the U.S. coastal area. In our experiments, we investigate world with multiple spatial levels. Their process mainly consists the aspect of stationarity of the time series of the topo- of five steps, where the first three generate the network nodes, logical properties of the graph and discuss the obtained and the last two create the network links. The work of [22] applies insights. the Density-Based Spatial Clustering of Applications with Noise The rest of the paper is organized as follows. Section 2 dis- (DBSCAN) to detect where ships stop and cross this information cusses related works. Section 3 introduces some concepts used with terminal candidates of ports. A directed GSN is generated through the paper. In Section 4, we describe our approach for with the trip statistics between two nodes as the edges. Their deriving time-series of topological properties from graphs based work evaluates features such as average degree and betweenness on vessels’ visits to ports. We perform some analysis in Section centrality of each node, average shortest path length between 5 and draw conclusions envisioning future works in Section 6. any two nodes, and community clusters of the GSNs. Following a similar idea of building GSNs, but with focus on 2 RELATED WORK anomaly detection, the work of [19] provides a mechanism that classifies vessel behavior in normal and abnormal, using historical The work done in [11] is one of the first to study the concept information about similar vessels that operate in a particular of GSN as a complex network. They use information about the area. In [19], the authors identify waypoints (i.e., a region of itineraries of 16363 ships of three types (bulk dry carriers, con- interest for a given application) that characterizes the operations tainer ships, and oil tankers) during the year 2007 to build a and the sort of movement patterns that they follow (i.e., the network of links between ports. The work of [11] shows that the nodes). As edges, the work of [19] uses the subtrajectories that three categories of ships differ in their mobility patterns and net- links two waypoints, also using the extracted features of those works. Their results show that container ships follow regularly subtrajectories for analysis. They identified each edge by the repeating paths, whereas bulk dry carriers and oil tankers move subtrajectory that links two ways points. Features of each edge less predictably between ports. They also show that the network are generated using a trajectory mining library introduced by of all ship movements possesses a heavy-tailed distribution for [7]. Their analysis tries to detect outliers from the subtrajectory the connectivity of ports and the loads transported on the links features (e.g., course over ground, speed over ground, etc.) and with systematic differences between ship types [11]. using transition probabilities as the edges of the network. The work of [14] also uses a sample of the Lloyds database Differently from [6, 11, 13, 14], our work use AIS data to de- with the world container ship fleet movements from Chinese termine vessel routes. Differently from [22], which uses stop ports from the years of 2008 to 2010. The objective of their work points as nodes and evaluate centrality, shortest-path, and com- is to look at changes in the maritime network prior to and after munities, our work is focused on the evolution of the network the financial crisis (2008-2010) and to analyze the extent to which instead of using waypoints as nodes and being focused on anom- large ports have seen their position within the network change. aly detection like [19], or using stop points as nodes to evaluate The authors show how the global and local importance of a centrality, shortest-path, and communities like [22], we use the port can be measured using graph theory concepts. They also ports as nodes, and we evaluate the evolution of the network as show that the goods transportation network was contracted with our primary task. respect to port throughput, but no contraction in the distribution capacity of the main hub ports was found [14]. Finally, the authors 1 https://www.exactearth.com/products/exactais 3 PROBLEM FORMULATION AND Algorithm 1: Trips Graph Snapshot Extraction DEFINITIONS Input : M: AIS Messages divided into areas Vessels report their location through AIS messages while navigat- w: Time window size ing. A vessel may send AIS messages with a frequency that varies s: Time shifting from a few seconds to a few minutes, depending on the type of r: Radius message. When they are at the underway, they may send AIS P: Set of ports messages every 2 to 10 seconds, while when they are at anchor, Output : G: set of Voyage Graph Snapshots this time window can increase to every 3 minutes [22]. Therefore, Init : G ← ∅; V ← ∅ positional information extracted from AIS messages can be seen 1 V ← Visits(Ai , P, r); as a representation of the spatial-temporal movement of a travel- 2 R1 ← Voyages(V); ing vessel. We are interested in this spatial information with the 3 R2 ← Clean(R1 ); intent of understating when a vessel is visiting a port. By merging 4 foreach b ∈ Buckets(w, s) do subsequent visits to ports of vessels, it is then possible to build 5 G ← G + Graph(R 2 , b); the sequence of vessels’ voyages. Then the idea is to construct a graph (or network) representation out of these voyages in a 6 return G given interval of time, to be able to study the evolution of the graph with complex network mechanisms. Graphs have some properties useful to unravel interesting in- formation about the dynamism between two and more entities. In 4 METHODOLOGY particular, in the context of a voyages graph, the topological prop- This section describes the procedural methodology that we use erties of the graph can help us identify relevant characteristics to transform raw AIS data to a set of time series observations on within a network that would not have meaningful information if the topological features of voyage graphs. We first describe the the individual entities were examined separately [6, 13]. Topo- algorithm to build the voyage graph from the source data, then logical properties can be applied to the network as a whole or to which features we extract from the graph, and finally how we individual nodes and edges. In particular, for our study, we are create the time series. concerned with global network properties. Below, we provide a compact formalization of an AIS message, as well as of some 4.1 Building Voyage Graphs other important concepts that covers the scope of this paper. Building the set of voyage graph snapshot directly from the origi- Definition 3.1. (AIS Message): An AIS Message me is a tuple nal data would be possible, but also very unpractical. The dataset (x, y, t, c) that represents the GPS coordinate (x, y) at a time stamp has a lot of noise, entries are not ordered in time, and much t assigned to a vessel e of type c. We define M as the set of all information is redundant. Therefore, we applied an incremen- AIS messages. tal approach to process the data. First, we exploit the fact that the original dataset is already divided into geographical areas Definition 3.2. (Port): A Sea Port p is represented as a tuple (i.e., the zones of the MarineCadastre.gov dateset, see Section5). (id, x, y), where x and y are the latitude and longitude coordinates For each area, we remove redundant information (e.g., subse- of its geographical center, and id is the code that identifies the quent entries for the same vessel in the same port, with the same port. We also define the spatial function α(p, r ) that defines a timestamp) and compute the set of visits. Second, we compute circular area of radius r centered on the coordinates of port p. the set of voyages using the visits, removing incorrect and noisy voyages. Third, from the voyages we can determine the graph Definition 3.3. (Visit): Given a radius r , we formally define a and the snapshot according to the time window considered. This visit v = (p, me , t) of the vessel e to a port p when it exists at incremental process has several advantages: (i) graphs building least one me ∈ M at time t whose coordinates x and y are in the is very fast, as the set of voyages is practically an edge list; (ii) the area defined by α(p, r ). costly cleaning process is done only once, and from the clean col- lection of voyages it is possible to build multiple graphs; and (iii) Definition 3.4. (Voyage): A voyage vj = (v 1, v 2 ) is a pair of it is interesting to study visits and voyages without transforming visits, such that v 1 (p) , v 2 (p) and v 1 (e) = v 2 (e) and v 1 (t) < v 2 (t) them into a graph. and they are consecutive (there is no visits to other ports between The pseudocode in Algorithm 1 illustrates the steps that ex- them for the same vessel). The ports of v 1 and v 2 are called tract all the voyage graph snapshots (G 0, . . . , G n ) for a given set respectively origin and destination ports. The duration of the of AIS Messages M. Besides M, the algorithm also receives as voyage vj(d) is the time of the last visit of e in the origin port parameters a Radius r , a set of ports P, a time window size w, and the time of first visit in the destination port. and a shifting s. The shifting parameter s indicates the number Definition 3.5. (Voyage Graph): The Voyage Graph (VG) is a of days to move forward the time window for each successive graph G = (V , E) built according to a set of voyages V J , in which graph. For example, with s = 10, if the first time window starts V contains all the ports in V J and E contains an edge for each on the 1st of January, the second starts at the 10th of January, unique pair of ports in V J . and so on. The algorithm starts by extracting the set of visits V (see Definition 3.6. (Voyage Graph Snapshot): The Voyage Graph definition 3.3) from M in each geographical area Ai in the original Snapshot (VGS) Gw = (V , E)w is an extension of the VG that dataset with the function V isits() (line 1). This function returns includes a temporal time window w used to create the snapshot V ⊂ Ai , such that V contains the AIS records transmitted inside of the graph. This interval is used to select the AIS messages that the area α(p, r ) (see definition 3.2) for each port p ∈ P. Depending will be used to build the snapshot Gw of VG G. on r , there could be overlapping port areas such that the same AIS record results transmitted inside multiple ports. In these cases, global scale activity (low CC number), rather than composed by we discriminate by associating the record to the closest port. a set of not connected and local activities (high CC number). The In turn, V feeds the function V oyaдes() that extracts the voy- Average Clustering coefficient (fa ), is the average of local clus- ages of the vessels (line 2). The underlying assumption is that if tering of nodes. The local clustering of each node in the graph a vessel e is seen at the port po ∈ P at time t 0 and e is also seen is the fraction of triangles (set of 3 vertices such that any two at the port pf at time t 1 and t 1 > t 0 , then we record a voyage of them are connected by an edge) that exist over all possible (definition 3.4). triangles in its neighborhood. In other words, this coefficient The set of voyages is then cleaned by the function Clean() (line represents the probability that two neighbors of a node are neigh- 3), to retain only the valid voyages. AIS data inevitably contains bors themselves, and can serve to evaluate how many voyages noise due to many reasons, including malfunctions, errors in happen around the same set of ports. transmission, and malicious use. In our context, such noise and Another essential metric for the maritime networks is the ex- mistakes translate into incorrect voyages. To remove them, we tension of their geographical spatiality, as the distance between performed two cleaning actions. First, we removed those entries ports can be directed linked with various cost aspects as fuel having null or invalid data in relevant fields (typically position consumption, maintenance rates, and insurance costs [6]. In ad- or vessel type). For example, several incorrect entries had a ves- dition, such a metric can give insights on whether certain vessel sel identifier, called Maritime Mobile Service Identity (MMSI), types are more oriented toward short or long routes. In this pa- whose value is composed only by zeroes, which may indicate a per we have considered the average (fd ) and weighted average placeholder for missing MMSI. We then removed the voyages (fw ) link distance. The former represents the geodesic distances that, depending on their duration (vjd ) and length, imply an between connected ports in the graph, expressed in kilometers. impossible speed for a vessel. We set the length of a voyage by The weighted version multiplies each distance by the number of computing the geodesic distance between starting and arrival voyages made between the two ports. ports. The geodesic distance is the minimum distance between two points on earth, which can be used as the shortest possi- ble maritime voyage. By using the length and the duration, we compute the estimated average speed based and removed those 0.61 0.6 Unique MMSI voyages whose speed exceeds 60 knots (which is still very high Routes Count speed, but we left some margin to cope with a possible degree of 0.5 approximation in the data). We did not remove the slow speed 0.44 voyages as we cannot estimate how long is the actual maritime 0.4 Percentage route between two ports with respect to the geodesic distance. Finally, the algorithm builds the set of Voyage Graphs (defi- 0.3 nition 3.5) thought the function Graph() using the clean set of 0.26 0.23 voyages R 2 (line 5). This function uses the ports as vertices and 0.2 the voyages as edges to build the graph. It considers the func- 0.15 tion Bucket() that creates all the possible time windows for the 0.1 0.09 0.03 0.05 0.030.07 parameters w and s. 0.020.01 0.0 0.0 0.0 cargo tug tow fishing sailing tanker passenger military 4.2 Topological Voyage Graph Features In order to study the evolution of the voyages graph, we employ Figure 2: Total percentage of unique MMSI and voyages by a set of graph metrics that are related to the different aspects vessel type we wish to evaluate, called Topological Voyage Graph Features (TVGs). In this paper, we studied only global network metrics (i.e., related to the whole graphs). We used the networkx library tug tow [10] to compute such metrics. 1000 cargo tanker An important metric in studying and comparing graphs is the 800 passenger sailing unique MSSI dimensions of the graph. To this end, we have considered the 600 order (i.e., amount of nodes/ports in the graph) (fn ) and the size 400 (i.e., amount of edges) (fe ). Semantically, these two measures 200 are connected. A higher order and size can indicate an increased 0 overall vessel traffic, and/or the tendency to perform less conser- 5000 vative routes, as more ports are involved. Inversely, a lower order 4000 and size can indicate a decreased overall vessel traffic, and/or the voyages count tug tow tendency to perform more conservative routes, as less ports are 3000 cargo tanker passenger involved. 2000 sailing A relevant aspect is the identification of cohesive subgroups 1000 of ports in the graph, as a way to identify those ports that share a 0 strong tie in the traffic for a particular vessel type. The number of 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Connected Components (CC, fc ) is the number of subgraphs in which any node is connected to each other by edges, and which is not connected to another subgraph. The number of Connected Figure 3: Unique MMSI and voyages count per vessel type Components (fc ) indicates how much the graph represents a over time (a) February 2016 (b) August 2016 Figure 4: Graphs of North East US in winter (a) and summer (b). Larger and lighter nodes have an higher degree. 4.3 TVGs Time-Series seaports in the world. From the original datasets, we have re- A time-series is a collection of observations made sequentially tained only the ports of North America. over the time [3]. Examples include (i) sales of a particular product The experiments conducted aim to answer the following re- in successive months, (ii) electricity consumption in a particular search questions comprehensively. Our research question can be area for successive one-hour periods. summarized as the following: Are the TVG time series station- In particular, we build time-series of the TVG features. For this ary? Stationarity is an essential characteristic of a time series. purpose, we use Algorithm 1 to return the set of Voyages Graph A time series is said to be stationary if its statistical properties Snapshot G, compute the TVGs for each one of the graph дi ∈ G do not change over time. In other words, it has constant mean and store them as separated time-series T j where j indicate the and variance, and covariance is independent of time. Section 5.2 TVG feature (i.e. number of nodes (fn ), number of edges (fe ), av- address this research question. To address this research question, erage clustering (fa ), number of strongly connected components we will use a statistical test designed to comment on whether a (fc ), the average (fd ) and weighted average (fw ) link distance. time series is stationary explicitly. Let f unction j be the function that compute the topological metric of a given graph д indicated by the index j, for instance 5.1 Data Overview if j is equal to fn then f unction j computes the number of edges Figure 2 shows, in percentage, the amount of unique MMSIs and on the graph д. Also, let I be the set of existing temporal buckets route counts by vessel type and for the whole dataset. In the orig- in G and let J be the set of TVG features. inal dataset, there are several vessel categories: cargo, passenger, We then formalize the TVG time-series (T f ) with the following sailing, military, fishing, tanker, and tug tow. Interestingly, for equations: cargo and tanker, a relevant percentage of unique MMSI corre- spond to a much lesser portion of total voyages. This is expected T j B { f unction j (дi ) | дi ∈ G, i ∈ I, j ∈ J } as these types of vessels perform less but longer voyages con- where: cerning other vessel types. By comparison, tug boats with 26% unique vessels have 61% of total routes. Unlike cargo and tankers, J B { f n , f e , f a , fc , fd , fw } these routes are very short and relative to the vessels moving I B {w 1, w 2, ..., w n } between nearby ports. Among all categories, military and fishing account only for 1% 5 EXPERIMENTS of the total voyages, meaning that there are too little data to build In this section, we present the experiments conducted to assess meaningful graphs for these categories. We also did not consider the stationary behavior of the TVG features. We perform such tug tows as their routes mostly regard only two ports resulting analysis for the different spectrum of types of vessels. In the ex- in not interesting graphs. Therefore, from now on, we consider periments, to build the Voyage Graph Snapshot using Algorithm only the following categories: cargo (which also includes tanker) 1, we assign respectively the values r = 5km, s = 10days, and passenger, and sailing. w = 30days. Figure 3 top and bottom show the amount of unique MMSIs Our source of AIS data is the MarineCadastre.gov dataset [16], respectively and voyages count for each vessel type and for each which contains filtered AIS records for the US coastal waters month in the period considered. Looking at cargo and tug tows, for the years 2015-2017, for a total of about 934 GB. Records are we can notice an evident pattern of less naval traffic in the winter sampled to one minute and organized in the comma-separated periods (for example, from February to March 2016), in which value (CSV) format, for a total of about 8 billions of records. both the number of unique MMSI and voyages has a clear drop. Additionally, we use the Sea-Ports dataset [15] that contains The drops are also visible (especially for cargo) in the same period spatial information, such as latitude and longitude, of all known of 2015 and 2017. Also, it interesting to notice the similarity in Figure 5: TVG Time-Series the trend of the cargo and tug tow, which can be explained by The null hypothesis of the test is that the time series is not the fact the tug boats are mostly used to help large vessels, such stationary. The alternative hypothesis (rejecting the null hypoth- as cargoes, into ports. esis) is that the time series is stationary. When interpreting the Figure 4 shows the graph resulting from cargo vessels in the p-value from this test, values below a threshold (such as 5% or northeast US in February-2016 (Figure 4 a) and August-2016 (Fig- 1%) suggests to reject the null hypothesis, i.e., the time-series is ure 4 b). It is clear from the graph that the traffic is more abundant stationary. While, p-values above the threshold suggests to do during the summer because of the better climatic conditions. The not to reject the null hypothesis, meaning that the time-series is graphs also show the complete halt of voyages in the great lakes non-stationary. during winter due to the formation of ice in the lakes. We performed an ADF test on the TVG Time-series extracted Figure 5 shows the extracted time-series using the MarineCadas- from our dataset. The idea is that the more negative (lower) this tre.gov dataset for the features: number of nodes, number of ADF statistic, the more likely we have a stationary time-series edges, number of connected components, and average clustering, or does not have time-dependent structure. We report the ADF average distances and weighted average distance. For each TGV test results on Table 1. The table reports for each TVG feature, feature, we consider four different series: passenger, sailing, cargo, and for each serie, the ADF-Statistic, the p-value and the critical and all. values (1%, 5%, and 10%). By looking at the results of each TGV feature, it is possible to see that most of the series have statistic values lower than 5.2 Stationary behavior analysis the critical value of 1%, except the features avg_clustering, for In this section, we address our research question of investigat- the Serie sailing (ADF statistic equals to -2.851), and #cc, for the ing the (non) stationary behavior of the TVGs. A time series is Serie passenger (ADF statistic equals to -3.431). This indicates stationary if they do not have a trend or seasonal effects. This weak evidence against the null hypothesis, so for these 2 cases, means that the statistics calculated on the time series, such as considering critical values at 1% level, these two TVG time-series the mean, variance, and auto-correlation of the observations are are non-stationary. consistent over time [2]. Most statistical forecasting methods are Therefore, our analysis using the MarineCadastre.gov dataset based on the assumption that the time series can be modeled suggests that there is a stationary characteristic over the time of approximately stationary through the use of mathematical trans- the vessel’s voyages in the USA coast for the different considered formations [12]. Thus, stationary time series are easier to model. vessels’ type (i.e., all, sailing, cargo and passenger). This char- Indeed, statistical modeling methods assume or require the time acteristic has been presented in most of the investigated TGVs series to be stationary to be effective. time-series, and it implies that the voyage graphs at different There are different methods to verify whether a time series ranges of time points keep constant on average, without showing is stationary or not. Statistical tests are widely used to analyze significant trends or seasonality during the years of 2015 and if the requirements of stationary are met or have been violated. 2017. Here, we adopted the Augmented Dickey-Fuller Test [5] (ADF) that uses an auto-regressive model and optimizes an information criterion across multiple different lag values [4]. Serie ADF p-Value Crit 1% Crit 5% Crit 10% metrics we considered in this paper are global. Still, an analysis # nodes of local node metrics (such as node centrality) is an interesting all -9.355 8.104785e-16 -3.494 -2.889 -2.582 future improvement. cargo -9.831 5.036417e-17 -3.494 -2.889 -2.582 passenger -10.303 3.341735e-18 -3.494 -2.889 -2.582 ACKNOWLEDGMENT sailing -8.957 8.412623e-15 -3.494 -2.889 -2.582 The authors acknowledge the support of the H2020 EU Project # edges all -9.588 2.077671e-16 -3.494 -2.889 -2.582 MASTER (Multiple ASpects TrajEctoRy management and analy- cargo -8.962 8.154913e-15 -3.494 -2.889 -2.582 sis) funded under the Marie Skłodowska-Curie grant agreement passenger -10.476 1.249137e-18 -3.494 -2.889 -2.582 No 777695. sailing -9.668 1.296871e-16 -3.494 -2.889 -2.582 avg_clustering REFERENCES all -3.633 5.152565e-03 -3.499 -2.892 -2.583 [1] 2004. Safety of Life at Sea (SOLAS), Consolidated Edition. (2004). [2] Peter J Brockwell and Richard A Davis. 2016. Introduction to time series and cargo -8.795 2.182865e-14 -3.494 -2.889 -2.582 forecasting. springer. passenger -9.243 1.561285e-15 -3.494 -2.889 -2.582 [3] Chris Chatfield. 2000. Time-series forecasting. Chapman and Hall/CRC. sailing -2.851 5.137153e-02 -3.497 -2.891 -2.582 [4] Yin-Wong Cheung and Kon S Lai. 1995. Lag order and critical values of the augmented Dickey–Fuller test. Journal of Business & Economic Statistics 13, 3 #cc (1995), 277–280. all -8.284 4.431598e-13 -3.494 -2.889 -2.582 [5] David A Dickey and Wayne A Fuller. 1979. Distribution of the estimators for cargo -11.461 5.540629e-21 -3.494 -2.889 -2.582 autoregressive time series with a unit root. Journal of the American statistical passenger -3.431 9.952558e-03 -3.498 -2.891 -2.582 association 74, 366a (1979), 427–431. [6] César Ducruet. 2017. Multilayer dynamics of complex spatial networks: The sailing -8.160 9.189961e-13 -3.494 -2.889 -2.582 case of global maritime flows (1977–2008). Journal of Transport Geography 60 avg_distance (2017), 47–58. all -9.729 9.121595e-17 -3.494 -2.889 -2.582 [7] Mohammad Etemad, Amílcar Soares Júnior, and Stan Matwin. 2018. Predicting transportation modes of GPS trajectories using feature engineering and noise cargo -9.946 2.595040e-17 -3.494 -2.889 -2.582 removal. In Canadian Conference on Artificial Intelligence. Springer, 259–264. passenger -10.358 2.434362e-18 -3.494 -2.889 -2.582 [8] Giorgio Fagiolo and Marina Mastrorillo. 2013. International migration network: sailing -11.361 9.454762e-21 -3.494 -2.889 -2.582 Topology and modeling. Physical Review E 88, 1 (2013), 012812. [9] Stephan Gollasch, Chad L. Hewitt, Sarah Bailey, and Matej David. 2019. Intro- weighted_average_distance ductions and transfers of species by ballast water in the Adriatic Sea. Marine all -4.527 1.758368e-04 -3.495 -2.890 -2.582 Pollution Bulletin 147 (2019), 8 – 15. https://doi.org/10.1016/j.marpolbul.2018. cargo -10.204 5.860322e-18 -3.494 -2.889 -2.582 08.054 passenger -9.783 6.668246e-17 -3.494 -2.889 -2.582 [10] Aric Hagberg, Pieter Swart, and Daniel S Chult. 2008. Exploring network structure, dynamics, and function using NetworkX. Technical Report. Los sailing -12.326 6.577707e-23 -3.494 -2.889 -2.582 Alamos National Lab.(LANL), Los Alamos, NM (United States). Table 1: Augmented Dickey-Fuller (ADF) Test [11] Pablo Kaluza, Andrea Kölzsch, Michael T Gastner, and Bernd Blasius. 2010. The complex network of global cargo ship movements. Journal of the Royal Society Interface 7, 48 (2010), 1093–1103. [12] Genshiro Kitagawa and Hirotugu Akaike. 1978. A procedure for the modeling of non-stationary time series. Annals of the Institute of Statistical Mathematics 6 CONCLUSION 30, 2 (1978), 351–363. This paper presented an analysis of the evolution of networks [13] Zuzanna Kosowska-Stamirowska, César Ducruet, and Nishant Rai. 2016. Evolv- ing structure of the maritime trade network: evidence from the Lloyd’s Ship- made by voyages of vessels between ports, based on several ping Index (1890–2000). Journal of Shipping and Trade 1, 1 (2016), 10. topological features of the network so-called Topological Voyage [14] Fernando González Laxe, Maria Jesus Freire Seoane, and Carlos Pais Montes. 2012. Maritime degree, centrality and vulnerability: port hierarchies and Graph Features (TVGs). The networks were built in a bottom-up emerging areas in containerized transport (2008–2010). Journal of Transport and data-driven fashion, considering 3 years of AIS data of the Geography 24 (2012), 33–44. US coastal waters. The analysis unraveled insights about the [15] marchah. (accessed November 2019). Sea Ports Data. https://github.com/ marchah/sea-ports stationarity of several features over time, including the number [16] MarineCadastre.gov. (accessed November 2019). Vessel Traffic Data. https: of nodes and edges, clustering coefficient, and geodesic distance //marinecadastre.gov/ais/ between the ports (nodes) of the network. The analysis was [17] Carlos Pais Montes, Maria Jesus Freire Seoane, and Fernando González Laxe. 2012. General cargo and containership emergent routes: A complex networks performed on the MarineCadastre.gov dataset containing vessels’ description. Transport Policy 24 (2012), 126–140. voyages in the USA coast. In particular, for this dataset, we found [18] Lokukaluge P Perera, Paulo Oliveira, and C Guedes Soares. 2012. Maritime traffic monitoring based on vessel detection, tracking, state estimation, and that for most of the TVGs, the time-series do not present any trajectory prediction. IEEE Transactions on Intelligent Transportation Systems trend or seasonality behaviors. 13, 3 (2012), 1188–1200. We are aware of the limitations of applying our approach to a [19] Iraklis Varlamis, Konstantinos Tserpes, Mohammad Etemad, Amílcar Soares Júnior, and Stan Matwin. 2019. A Network Abstraction of Multi-vessel Tra- single dataset that only covers the coast of the United States in a jectory Data for Detecting Anomalies.. In Proceedings of the Workshops of the relatively short time period of 3 years (2015 - 2017). In the future, EDBT/ICDT 2019 Joint Conference. we plan to extend our study both in terms of the geographical [20] Michele Vespe, Harm Greidanus, and Marlene Alvarez Alvarez. 2015. The de- clining impact of piracy on maritime transport in the Indian Ocean: Statistical area covered (i.e., by including dataset from other countries and analysis of 5-year vessel tracking data. Marine Policy 59 (2015), 9–15. possibly a worldwide network) and the time period. In addition, [21] Michele Vespe, Ingrid Visentini, Karna Bryan, and Paolo Braca. 2012. Unsu- pervised learning of maritime traffic patterns for anomaly detection. (2012). the definition of the spatial area of ports can be improved to [22] Zhihuan Wang, Christophe Claramunt, and Yinhai Wang. 2019. Extracting increase the precision in voyages detection, in a way similar to the global shipping networks from massive historical automatic identification one performed in [22]. Finally, in terms of network analysis, the system sensor data: a bottom-up approach. Sensors 19, 15 (2019), 3363.