=Paper= {{Paper |id=Vol-2578/BMDA5 |storemode=property |title=Uncovering vessel movement patterns from AIS data with graph evolution analysis |pdfUrl=https://ceur-ws.org/Vol-2578/BMDA5.pdf |volume=Vol-2578 |authors=Emanuele Carlini,Vinicius Monteiro de Lira,Amilcar Soares,Mohammad Etemad,Bruno Brandoli Machado,Stan Matwin |dblpUrl=https://dblp.org/rec/conf/edbt/CarliniLSEMM20 }} ==Uncovering vessel movement patterns from AIS data with graph evolution analysis== https://ceur-ws.org/Vol-2578/BMDA5.pdf
      Uncovering vessel movement patterns from AIS data with
                     graph evolution analysis
               Emanuele Carlini                                  Vinicius Monteiro de Lira                            Amilcar Soares
          Institute of Information                                  Institute of Information                   Institute for Big Data Analytics
         Science and Technologies,                                Science and Technologies,                          Dalhousie University
      National Research Council (CNR)                         National Research Council (CNR)                          Halifax, Canada
                  Pisa, Italy                                               Pisa, Italy                             amilcar.soares@dal.ca
        emanuele.carlini@isti.cnr.it                          vinicius.monteirodelira@isti.cnr.it

             Mohammad Etemad                                     Bruno Brandoli Machado                                 Stan Matwin
       Institute for Big Data Analytics                         Institute for Big Data Analytics               Institute for Big Data Analytics
             Dalhousie University                                     Dalhousie University                           Dalhousie University
               Halifax, Canada                                          Halifax, Canada                                Halifax, Canada
                etemad@dal.ca                                        brunobrandoli@dal.ca                               stan@cs.dal.ca
ABSTRACT                                                                                  Since the automatic identification system (AIS) for vessels was
The availability of a large amount of Automatic Identification                         made mandatory in 2004 [1], there has been a surge of studies on
System (AIS) data has fostered many studies on maritime vessel                         the GSN and other maritime networks that use such data. Many
traffic during recent years, often representing vessels and ports                      works modeled the GSN based on graph theory [19, 22], but only
relationships as graphs. Although the continuous research effort,                      a few of them analyzed the network in terms of its evolution over
only a few works explicitly study the evolution of such graphs                         the years [14, 17]. Also, those works which studied the network
and often consider coarse-grained time intervals. In this context,                     evolution used private data and performed exciting but high-level
our ultimate goal is to fill this gap by providing a systematic                        and coarse-grained analysis, such as in [6].
study in the graph evolution by considering voyages over time.                            The main goal of our analysis is to provide a systematic study
By mining the arrivals and departures of vessels from ports, we                        of the evolution aspect regarding maritime vessel routes, with the
build a graph consisting of vessel voyages between ports. We                           purpose of identifying recurrent patterns in their evolution. The
then provide a study on topological features calculated from such                      analysis is based on publicly available data and with a defined
graphs with a strong focus on their temporal evolution. Finally,                       and well-documented data model. This aspect is fundamental
we discuss the main limitations of our approach and the future                         for the reproducibility of our analysis and its expansion and
perspectives that will spawn from this work.                                           updating of results when new data arrives. Also, it considers the
                                                                                       two necessary dimensions of time and layers (i.e., the evolution
                                                                                       of the network can be observed for multiple types of vessels, such
1    INTRODUCTION                                                                      as cargo and passengers).
                                                                                          Such an ambitious objective has some inherent challenges
Maritime transportation represents 90% of international trade
                                                                                       that must be tackled. First, the analysis of AIS datasets typically
volume and plays a paramount role in today’s economy, in terms
                                                                                       presents a Big Data challenge: the datasets are usually vast, and
of cargo shipping, passenger transportation, leisure navigation,
                                                                                       data can arrive at a high rate. For example, ExactEarth alone
and fishing operation[21]. Globalization and multiple mode trans-
                                                                                       claims to consistently track 165,000 vessels and compiling over
portation of goods in the shipping industry resulted in a massive
extension of the maritime vessel route network. The study of
vessel movements is a well-established source of information to
understand the role of maritime routes and ports in economic, so-
cial, and environmental contexts. These studies include maritime
traffic control and prediction [18], human migration flows [8],
bioinvasion [9] and maritime piracy [20]. However, such a role
cannot be adequately unraveled by looking at ports and routes in
isolation, but instead, they must be put in relation to one another.
    This allows the study of the interplay of all the components
in the complex maritime network, and it is even more important
for understanding the evolution over time of that interplay.
    A central concept for the analytical study of vessel routes is
the Global Shipping Network (GSN), in which nodes are ports
and edges are the routes between ports of cargo ships. Figure 1
illustrates the GSN resulting from the vessels’ routes of 2017 in
the American coast using the MarineCadastre.gov dataset [16].

Copyright © 2020 for this paper by its author(s). Published in the Workshop Proceed-
                                                                                       Figure 1: American coast vessels’ routes in 2017. The nodes
ings of the EDBT/ICDT 2020 Joint Conference, (March 30-April 2, 2020, Copenhagen,      represent ports, and the edges are voyages between two
Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At-              ports
tribution 4.0 International (CC BY 4.0).
7,000,000 AIS messages daily1 , but data still need to be processed    show that there are new port regions placed in the entrance
in a reasonable time.                                                  and exit of the Panama Canal, and there are several significant
   Second, the purpose of any network analysis is to abstract the      business opportunities in that region.
complexity of a system in order to extract meaningful information         A study of topological changes in the maritime trade network
that is not directly available when the individual components          is shown in [13]. The authors propose two new measures of
are examined separately. Therefore, the definition of a network        network navigability called random walk discovery and escape
that encompasses time information is a complex task. Suitable          difficulty. Their results show that the maritime network evolves
approaches need to be carefully selected to study the evolving         by increasing its navigability while doubling the number of active
network.                                                               ports. The authors suggest that unlike in other real-world evolv-
   The contributions of this work can be outlined as the follow-       ing networks studied in the literature up to date, the maritime
ing:                                                                   network does not densify over time, and its effective diameter
     • We propose an approach that uses AIS data to extract con-       remains constant [13].
       nections between ports derived from the vessels’ move-             In [6], the author investigates the degree of overlap among
       ments. From these connections (or voyages), we build            the different layers of circulation composing global maritime
       graphs in which the vertices correspond to the ports,           flows. His work uses several methods from complex network
       whereas the edges or links correspond to the vessel voyage      analysis to understand the dynamics affecting the evolution of
       between two different ports. In addition, each edge has a       ports and shipping. The results show that there is a strong and
       semantic defined by the vessels’ type.                          path-dependent influence of multiplexity on traffic volume, range
     • We study several topological properties of the temporal         of interaction, and centrality from various perspectives (e.g., ma-
       graphs generated from vessels’ movements and how these          trices correlations, homophily, assortativity, and single linkage
       features evolve over time. Specifically, we investigate fea-    analysis) [6]. When growing the network and concentrating the
       tures relative to graphs dimension, ability to form clusters,   analysis around large hubs over time, results show that the traffic
       and geographical spatiality.                                    distribution is place-dependent due to the reinforced position of
     • We design and run an extensive analysis using a real            already established nodes [6].
       dataset with AIS Data collected from vessels navigating in         The work of [22] builds a GSN using the 2015 AIS data of the
       the U.S. coastal area. In our experiments, we investigate       world with multiple spatial levels. Their process mainly consists
       the aspect of stationarity of the time series of the topo-      of five steps, where the first three generate the network nodes,
       logical properties of the graph and discuss the obtained        and the last two create the network links. The work of [22] applies
       insights.                                                       the Density-Based Spatial Clustering of Applications with Noise
   The rest of the paper is organized as follows. Section 2 dis-       (DBSCAN) to detect where ships stop and cross this information
cusses related works. Section 3 introduces some concepts used          with terminal candidates of ports. A directed GSN is generated
through the paper. In Section 4, we describe our approach for          with the trip statistics between two nodes as the edges. Their
deriving time-series of topological properties from graphs based       work evaluates features such as average degree and betweenness
on vessels’ visits to ports. We perform some analysis in Section       centrality of each node, average shortest path length between
5 and draw conclusions envisioning future works in Section 6.          any two nodes, and community clusters of the GSNs.
                                                                          Following a similar idea of building GSNs, but with focus on
2    RELATED WORK                                                      anomaly detection, the work of [19] provides a mechanism that
                                                                       classifies vessel behavior in normal and abnormal, using historical
The work done in [11] is one of the first to study the concept
                                                                       information about similar vessels that operate in a particular
of GSN as a complex network. They use information about the
                                                                       area. In [19], the authors identify waypoints (i.e., a region of
itineraries of 16363 ships of three types (bulk dry carriers, con-
                                                                       interest for a given application) that characterizes the operations
tainer ships, and oil tankers) during the year 2007 to build a
                                                                       and the sort of movement patterns that they follow (i.e., the
network of links between ports. The work of [11] shows that the
                                                                       nodes). As edges, the work of [19] uses the subtrajectories that
three categories of ships differ in their mobility patterns and net-
                                                                       links two waypoints, also using the extracted features of those
works. Their results show that container ships follow regularly
                                                                       subtrajectories for analysis. They identified each edge by the
repeating paths, whereas bulk dry carriers and oil tankers move
                                                                       subtrajectory that links two ways points. Features of each edge
less predictably between ports. They also show that the network
                                                                       are generated using a trajectory mining library introduced by
of all ship movements possesses a heavy-tailed distribution for
                                                                       [7]. Their analysis tries to detect outliers from the subtrajectory
the connectivity of ports and the loads transported on the links
                                                                       features (e.g., course over ground, speed over ground, etc.) and
with systematic differences between ship types [11].
                                                                       using transition probabilities as the edges of the network.
    The work of [14] also uses a sample of the Lloyds database
                                                                          Differently from [6, 11, 13, 14], our work use AIS data to de-
with the world container ship fleet movements from Chinese
                                                                       termine vessel routes. Differently from [22], which uses stop
ports from the years of 2008 to 2010. The objective of their work
                                                                       points as nodes and evaluate centrality, shortest-path, and com-
is to look at changes in the maritime network prior to and after
                                                                       munities, our work is focused on the evolution of the network
the financial crisis (2008-2010) and to analyze the extent to which
                                                                       instead of using waypoints as nodes and being focused on anom-
large ports have seen their position within the network change.
                                                                       aly detection like [19], or using stop points as nodes to evaluate
The authors show how the global and local importance of a
                                                                       centrality, shortest-path, and communities like [22], we use the
port can be measured using graph theory concepts. They also
                                                                       ports as nodes, and we evaluate the evolution of the network as
show that the goods transportation network was contracted with
                                                                       our primary task.
respect to port throughput, but no contraction in the distribution
capacity of the main hub ports was found [14]. Finally, the authors
1 https://www.exactearth.com/products/exactais
3    PROBLEM FORMULATION AND                                                       Algorithm 1: Trips Graph Snapshot Extraction
     DEFINITIONS                                                                      Input : M: AIS Messages divided into areas
Vessels report their location through AIS messages while navigat-                              w: Time window size
ing. A vessel may send AIS messages with a frequency that varies                               s: Time shifting
from a few seconds to a few minutes, depending on the type of                                  r: Radius
message. When they are at the underway, they may send AIS                                      P: Set of ports
messages every 2 to 10 seconds, while when they are at anchor,                        Output : G: set of Voyage Graph Snapshots
this time window can increase to every 3 minutes [22]. Therefore,                     Init   : G ← ∅; V ← ∅
positional information extracted from AIS messages can be seen                        1 V ← Visits(Ai , P, r);
as a representation of the spatial-temporal movement of a travel-                     2 R1 ← Voyages(V);
ing vessel. We are interested in this spatial information with the                    3 R2 ← Clean(R1 );
intent of understating when a vessel is visiting a port. By merging
                                                                                      4 foreach b ∈ Buckets(w, s) do
subsequent visits to ports of vessels, it is then possible to build
                                                                                      5    G ← G + Graph(R 2 , b);
the sequence of vessels’ voyages. Then the idea is to construct
a graph (or network) representation out of these voyages in a                         6   return G
given interval of time, to be able to study the evolution of the
graph with complex network mechanisms.
   Graphs have some properties useful to unravel interesting in-
formation about the dynamism between two and more entities. In                    4       METHODOLOGY
particular, in the context of a voyages graph, the topological prop-              This section describes the procedural methodology that we use
erties of the graph can help us identify relevant characteristics                 to transform raw AIS data to a set of time series observations on
within a network that would not have meaningful information if                    the topological features of voyage graphs. We first describe the
the individual entities were examined separately [6, 13]. Topo-                   algorithm to build the voyage graph from the source data, then
logical properties can be applied to the network as a whole or to                 which features we extract from the graph, and finally how we
individual nodes and edges. In particular, for our study, we are                  create the time series.
concerned with global network properties. Below, we provide
a compact formalization of an AIS message, as well as of some                     4.1      Building Voyage Graphs
other important concepts that covers the scope of this paper.
                                                                                  Building the set of voyage graph snapshot directly from the origi-
   Definition 3.1. (AIS Message): An AIS Message me is a tuple                    nal data would be possible, but also very unpractical. The dataset
(x, y, t, c) that represents the GPS coordinate (x, y) at a time stamp            has a lot of noise, entries are not ordered in time, and much
t assigned to a vessel e of type c. We define M as the set of all                 information is redundant. Therefore, we applied an incremen-
AIS messages.                                                                     tal approach to process the data. First, we exploit the fact that
                                                                                  the original dataset is already divided into geographical areas
   Definition 3.2. (Port): A Sea Port p is represented as a tuple                 (i.e., the zones of the MarineCadastre.gov dateset, see Section5).
(id, x, y), where x and y are the latitude and longitude coordinates              For each area, we remove redundant information (e.g., subse-
of its geographical center, and id is the code that identifies the                quent entries for the same vessel in the same port, with the same
port. We also define the spatial function α(p, r ) that defines a                 timestamp) and compute the set of visits. Second, we compute
circular area of radius r centered on the coordinates of port p.                  the set of voyages using the visits, removing incorrect and noisy
                                                                                  voyages. Third, from the voyages we can determine the graph
   Definition 3.3. (Visit): Given a radius r , we formally define a               and the snapshot according to the time window considered. This
visit v = (p, me , t) of the vessel e to a port p when it exists at               incremental process has several advantages: (i) graphs building
least one me ∈ M at time t whose coordinates x and y are in the                   is very fast, as the set of voyages is practically an edge list; (ii) the
area defined by α(p, r ).                                                         costly cleaning process is done only once, and from the clean col-
                                                                                  lection of voyages it is possible to build multiple graphs; and (iii)
   Definition 3.4. (Voyage): A voyage vj = (v 1, v 2 ) is a pair of
                                                                                  it is interesting to study visits and voyages without transforming
visits, such that v 1 (p) , v 2 (p) and v 1 (e) = v 2 (e) and v 1 (t) < v 2 (t)
                                                                                  them into a graph.
and they are consecutive (there is no visits to other ports between
                                                                                      The pseudocode in Algorithm 1 illustrates the steps that ex-
them for the same vessel). The ports of v 1 and v 2 are called
                                                                                  tract all the voyage graph snapshots (G 0, . . . , G n ) for a given set
respectively origin and destination ports. The duration of the
                                                                                  of AIS Messages M. Besides M, the algorithm also receives as
voyage vj(d) is the time of the last visit of e in the origin port
                                                                                  parameters a Radius r , a set of ports P, a time window size w,
and the time of first visit in the destination port.
                                                                                  and a shifting s. The shifting parameter s indicates the number
   Definition 3.5. (Voyage Graph): The Voyage Graph (VG) is a                     of days to move forward the time window for each successive
graph G = (V , E) built according to a set of voyages V J , in which              graph. For example, with s = 10, if the first time window starts
V contains all the ports in V J and E contains an edge for each                   on the 1st of January, the second starts at the 10th of January,
unique pair of ports in V J .                                                     and so on.
                                                                                      The algorithm starts by extracting the set of visits V (see
   Definition 3.6. (Voyage Graph Snapshot): The Voyage Graph                      definition 3.3) from M in each geographical area Ai in the original
Snapshot (VGS) Gw = (V , E)w is an extension of the VG that                       dataset with the function V isits() (line 1). This function returns
includes a temporal time window w used to create the snapshot                     V ⊂ Ai , such that V contains the AIS records transmitted inside
of the graph. This interval is used to select the AIS messages that               the area α(p, r ) (see definition 3.2) for each port p ∈ P. Depending
will be used to build the snapshot Gw of VG G.                                    on r , there could be overlapping port areas such that the same AIS
record results transmitted inside multiple ports. In these cases,                   global scale activity (low CC number), rather than composed by
we discriminate by associating the record to the closest port.                      a set of not connected and local activities (high CC number). The
    In turn, V feeds the function V oyaдes() that extracts the voy-                 Average Clustering coefficient (fa ), is the average of local clus-
ages of the vessels (line 2). The underlying assumption is that if                  tering of nodes. The local clustering of each node in the graph
a vessel e is seen at the port po ∈ P at time t 0 and e is also seen                is the fraction of triangles (set of 3 vertices such that any two
at the port pf at time t 1 and t 1 > t 0 , then we record a voyage                  of them are connected by an edge) that exist over all possible
(definition 3.4).                                                                   triangles in its neighborhood. In other words, this coefficient
    The set of voyages is then cleaned by the function Clean() (line                represents the probability that two neighbors of a node are neigh-
3), to retain only the valid voyages. AIS data inevitably contains                  bors themselves, and can serve to evaluate how many voyages
noise due to many reasons, including malfunctions, errors in                        happen around the same set of ports.
transmission, and malicious use. In our context, such noise and                        Another essential metric for the maritime networks is the ex-
mistakes translate into incorrect voyages. To remove them, we                       tension of their geographical spatiality, as the distance between
performed two cleaning actions. First, we removed those entries                     ports can be directed linked with various cost aspects as fuel
having null or invalid data in relevant fields (typically position                  consumption, maintenance rates, and insurance costs [6]. In ad-
or vessel type). For example, several incorrect entries had a ves-                  dition, such a metric can give insights on whether certain vessel
sel identifier, called Maritime Mobile Service Identity (MMSI),                     types are more oriented toward short or long routes. In this pa-
whose value is composed only by zeroes, which may indicate a                        per we have considered the average (fd ) and weighted average
placeholder for missing MMSI. We then removed the voyages                           (fw ) link distance. The former represents the geodesic distances
that, depending on their duration (vjd ) and length, imply an                       between connected ports in the graph, expressed in kilometers.
impossible speed for a vessel. We set the length of a voyage by                     The weighted version multiplies each distance by the number of
computing the geodesic distance between starting and arrival                        voyages made between the two ports.
ports. The geodesic distance is the minimum distance between
two points on earth, which can be used as the shortest possi-
ble maritime voyage. By using the length and the duration, we
compute the estimated average speed based and removed those                                                                      0.61
                                                                                                     0.6                                                        Unique MMSI
voyages whose speed exceeds 60 knots (which is still very high                                                                                                  Routes Count
speed, but we left some margin to cope with a possible degree of
                                                                                                     0.5
approximation in the data). We did not remove the slow speed                                                     0.44
voyages as we cannot estimate how long is the actual maritime
                                                                                                     0.4
                                                                                        Percentage




route between two ports with respect to the geodesic distance.
    Finally, the algorithm builds the set of Voyage Graphs (defi-                                    0.3
nition 3.5) thought the function Graph() using the clean set of                                                               0.26
                                                                                                                       0.23
voyages R 2 (line 5). This function uses the ports as vertices and                                   0.2
the voyages as edges to build the graph. It considers the func-                                                                                   0.15
tion Bucket() that creates all the possible time windows for the                                     0.1                                  0.09
                                                                                                                                              0.03  0.05 0.030.07
parameters w and s.                                                                                                              0.020.01                         0.0 0.0
                                                                                                     0.0
                                                                                                                   cargo tug tow fishing sailing tanker passenger military
4.2    Topological Voyage Graph Features
In order to study the evolution of the voyages graph, we employ                     Figure 2: Total percentage of unique MMSI and voyages by
a set of graph metrics that are related to the different aspects                    vessel type
we wish to evaluate, called Topological Voyage Graph Features
(TVGs). In this paper, we studied only global network metrics
(i.e., related to the whole graphs). We used the networkx library
                                                                                                                                                                       tug tow
[10] to compute such metrics.                                                           1000                                                                           cargo
                                                                                                                                                                       tanker
    An important metric in studying and comparing graphs is the                         800                                                                            passenger
                                                                                                                                                                       sailing
                                                                        unique MSSI




dimensions of the graph. To this end, we have considered the                            600

order (i.e., amount of nodes/ports in the graph) (fn ) and the size                     400
(i.e., amount of edges) (fe ). Semantically, these two measures                         200
are connected. A higher order and size can indicate an increased                            0
overall vessel traffic, and/or the tendency to perform less conser-
                                                                                        5000
vative routes, as more ports are involved. Inversely, a lower order
                                                                                        4000
and size can indicate a decreased overall vessel traffic, and/or the
                                                                        voyages count




                                                                                                           tug tow
tendency to perform more conservative routes, as less ports are                         3000               cargo
                                                                                                           tanker
                                                                                                           passenger
involved.                                                                               2000               sailing
    A relevant aspect is the identification of cohesive subgroups                       1000

of ports in the graph, as a way to identify those ports that share a                        0
strong tie in the traffic for a particular vessel type. The number of                                 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
                                                                                                   -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20
                                                                                                Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec
Connected Components (CC, fc ) is the number of subgraphs in
which any node is connected to each other by edges, and which
is not connected to another subgraph. The number of Connected                           Figure 3: Unique MMSI and voyages count per vessel type
Components (fc ) indicates how much the graph represents a                              over time
                               (a) February 2016                                                  (b) August 2016


      Figure 4: Graphs of North East US in winter (a) and summer (b). Larger and lighter nodes have an higher degree.


4.3      TVGs Time-Series                                               seaports in the world. From the original datasets, we have re-
A time-series is a collection of observations made sequentially         tained only the ports of North America.
over the time [3]. Examples include (i) sales of a particular product      The experiments conducted aim to answer the following re-
in successive months, (ii) electricity consumption in a particular      search questions comprehensively. Our research question can be
area for successive one-hour periods.                                   summarized as the following: Are the TVG time series station-
    In particular, we build time-series of the TVG features. For this   ary? Stationarity is an essential characteristic of a time series.
purpose, we use Algorithm 1 to return the set of Voyages Graph          A time series is said to be stationary if its statistical properties
Snapshot G, compute the TVGs for each one of the graph дi ∈ G           do not change over time. In other words, it has constant mean
and store them as separated time-series T j where j indicate the        and variance, and covariance is independent of time. Section 5.2
TVG feature (i.e. number of nodes (fn ), number of edges (fe ), av-     address this research question. To address this research question,
erage clustering (fa ), number of strongly connected components         we will use a statistical test designed to comment on whether a
(fc ), the average (fd ) and weighted average (fw ) link distance.      time series is stationary explicitly.
    Let f unction j be the function that compute the topological
metric of a given graph д indicated by the index j, for instance        5.1    Data Overview
if j is equal to fn then f unction j computes the number of edges       Figure 2 shows, in percentage, the amount of unique MMSIs and
on the graph д. Also, let I be the set of existing temporal buckets     route counts by vessel type and for the whole dataset. In the orig-
in G and let J be the set of TVG features.                              inal dataset, there are several vessel categories: cargo, passenger,
We then formalize the TVG time-series (T f ) with the following         sailing, military, fishing, tanker, and tug tow. Interestingly, for
equations:                                                              cargo and tanker, a relevant percentage of unique MMSI corre-
                                                                        spond to a much lesser portion of total voyages. This is expected
           T j B { f unction j (дi ) | дi ∈ G, i ∈ I, j ∈ J }
                                                                        as these types of vessels perform less but longer voyages con-
where:                                                                  cerning other vessel types. By comparison, tug boats with 26%
                                                                        unique vessels have 61% of total routes. Unlike cargo and tankers,
            J B { f n , f e , f a , fc , fd , fw }
                                                                        these routes are very short and relative to the vessels moving
            I B {w 1, w 2, ..., w n }                                   between nearby ports.
                                                                           Among all categories, military and fishing account only for 1%
5     EXPERIMENTS                                                       of the total voyages, meaning that there are too little data to build
In this section, we present the experiments conducted to assess         meaningful graphs for these categories. We also did not consider
the stationary behavior of the TVG features. We perform such            tug tows as their routes mostly regard only two ports resulting
analysis for the different spectrum of types of vessels. In the ex-     in not interesting graphs. Therefore, from now on, we consider
periments, to build the Voyage Graph Snapshot using Algorithm           only the following categories: cargo (which also includes tanker)
1, we assign respectively the values r = 5km, s = 10days, and           passenger, and sailing.
w = 30days.                                                                Figure 3 top and bottom show the amount of unique MMSIs
   Our source of AIS data is the MarineCadastre.gov dataset [16],       respectively and voyages count for each vessel type and for each
which contains filtered AIS records for the US coastal waters           month in the period considered. Looking at cargo and tug tows,
for the years 2015-2017, for a total of about 934 GB. Records are       we can notice an evident pattern of less naval traffic in the winter
sampled to one minute and organized in the comma-separated              periods (for example, from February to March 2016), in which
value (CSV) format, for a total of about 8 billions of records.         both the number of unique MMSI and voyages has a clear drop.
   Additionally, we use the Sea-Ports dataset [15] that contains        The drops are also visible (especially for cargo) in the same period
spatial information, such as latitude and longitude, of all known       of 2015 and 2017. Also, it interesting to notice the similarity in
                                                        Figure 5: TVG Time-Series


the trend of the cargo and tug tow, which can be explained by               The null hypothesis of the test is that the time series is not
the fact the tug boats are mostly used to help large vessels, such       stationary. The alternative hypothesis (rejecting the null hypoth-
as cargoes, into ports.                                                  esis) is that the time series is stationary. When interpreting the
   Figure 4 shows the graph resulting from cargo vessels in the          p-value from this test, values below a threshold (such as 5% or
northeast US in February-2016 (Figure 4 a) and August-2016 (Fig-         1%) suggests to reject the null hypothesis, i.e., the time-series is
ure 4 b). It is clear from the graph that the traffic is more abundant   stationary. While, p-values above the threshold suggests to do
during the summer because of the better climatic conditions. The         not to reject the null hypothesis, meaning that the time-series is
graphs also show the complete halt of voyages in the great lakes         non-stationary.
during winter due to the formation of ice in the lakes.                     We performed an ADF test on the TVG Time-series extracted
   Figure 5 shows the extracted time-series using the MarineCadas-       from our dataset. The idea is that the more negative (lower) this
tre.gov dataset for the features: number of nodes, number of             ADF statistic, the more likely we have a stationary time-series
edges, number of connected components, and average clustering,           or does not have time-dependent structure. We report the ADF
average distances and weighted average distance. For each TGV            test results on Table 1. The table reports for each TVG feature,
feature, we consider four different series: passenger, sailing, cargo,   and for each serie, the ADF-Statistic, the p-value and the critical
and all.                                                                 values (1%, 5%, and 10%).
                                                                            By looking at the results of each TGV feature, it is possible
                                                                         to see that most of the series have statistic values lower than
5.2    Stationary behavior analysis                                      the critical value of 1%, except the features avg_clustering, for
In this section, we address our research question of investigat-         the Serie sailing (ADF statistic equals to -2.851), and #cc, for the
ing the (non) stationary behavior of the TVGs. A time series is          Serie passenger (ADF statistic equals to -3.431). This indicates
stationary if they do not have a trend or seasonal effects. This         weak evidence against the null hypothesis, so for these 2 cases,
means that the statistics calculated on the time series, such as         considering critical values at 1% level, these two TVG time-series
the mean, variance, and auto-correlation of the observations are         are non-stationary.
consistent over time [2]. Most statistical forecasting methods are          Therefore, our analysis using the MarineCadastre.gov dataset
based on the assumption that the time series can be modeled              suggests that there is a stationary characteristic over the time of
approximately stationary through the use of mathematical trans-          the vessel’s voyages in the USA coast for the different considered
formations [12]. Thus, stationary time series are easier to model.       vessels’ type (i.e., all, sailing, cargo and passenger). This char-
Indeed, statistical modeling methods assume or require the time          acteristic has been presented in most of the investigated TGVs
series to be stationary to be effective.                                 time-series, and it implies that the voyage graphs at different
    There are different methods to verify whether a time series          ranges of time points keep constant on average, without showing
is stationary or not. Statistical tests are widely used to analyze       significant trends or seasonality during the years of 2015 and
if the requirements of stationary are met or have been violated.         2017.
Here, we adopted the Augmented Dickey-Fuller Test [5] (ADF)
that uses an auto-regressive model and optimizes an information
criterion across multiple different lag values [4].
 Serie        ADF         p-Value Crit 1% Crit 5%           Crit 10%    metrics we considered in this paper are global. Still, an analysis
                              # nodes                                   of local node metrics (such as node centrality) is an interesting
 all          -9.355  8.104785e-16      -3.494 -2.889          -2.582   future improvement.
 cargo        -9.831  5.036417e-17      -3.494 -2.889          -2.582
 passenger    -10.303 3.341735e-18      -3.494 -2.889          -2.582   ACKNOWLEDGMENT
 sailing      -8.957  8.412623e-15      -3.494 -2.889          -2.582
                                                                        The authors acknowledge the support of the H2020 EU Project
                              # edges
 all          -9.588  2.077671e-16      -3.494 -2.889          -2.582
                                                                        MASTER (Multiple ASpects TrajEctoRy management and analy-
 cargo        -8.962  8.154913e-15      -3.494 -2.889          -2.582   sis) funded under the Marie Skłodowska-Curie grant agreement
 passenger    -10.476 1.249137e-18      -3.494 -2.889          -2.582   No 777695.
 sailing      -9.668  1.296871e-16      -3.494 -2.889          -2.582
                           avg_clustering                               REFERENCES
 all          -3.633  5.152565e-03      -3.499 -2.892          -2.583    [1] 2004. Safety of Life at Sea (SOLAS), Consolidated Edition. (2004).
                                                                         [2] Peter J Brockwell and Richard A Davis. 2016. Introduction to time series and
 cargo        -8.795  2.182865e-14      -3.494 -2.889          -2.582        forecasting. springer.
 passenger    -9.243  1.561285e-15      -3.494 -2.889          -2.582    [3] Chris Chatfield. 2000. Time-series forecasting. Chapman and Hall/CRC.
 sailing      -2.851  5.137153e-02      -3.497 -2.891          -2.582    [4] Yin-Wong Cheung and Kon S Lai. 1995. Lag order and critical values of the
                                                                             augmented Dickey–Fuller test. Journal of Business & Economic Statistics 13, 3
                                #cc
                                                                             (1995), 277–280.
 all          -8.284  4.431598e-13      -3.494 -2.889          -2.582    [5] David A Dickey and Wayne A Fuller. 1979. Distribution of the estimators for
 cargo        -11.461 5.540629e-21      -3.494 -2.889          -2.582        autoregressive time series with a unit root. Journal of the American statistical
 passenger    -3.431  9.952558e-03      -3.498 -2.891          -2.582        association 74, 366a (1979), 427–431.
                                                                         [6] César Ducruet. 2017. Multilayer dynamics of complex spatial networks: The
 sailing      -8.160  9.189961e-13      -3.494 -2.889          -2.582        case of global maritime flows (1977–2008). Journal of Transport Geography 60
                            avg_distance                                     (2017), 47–58.
 all          -9.729  9.121595e-17      -3.494 -2.889          -2.582    [7] Mohammad Etemad, Amílcar Soares Júnior, and Stan Matwin. 2018. Predicting
                                                                             transportation modes of GPS trajectories using feature engineering and noise
 cargo        -9.946  2.595040e-17      -3.494 -2.889          -2.582        removal. In Canadian Conference on Artificial Intelligence. Springer, 259–264.
 passenger    -10.358 2.434362e-18      -3.494 -2.889          -2.582    [8] Giorgio Fagiolo and Marina Mastrorillo. 2013. International migration network:
 sailing      -11.361 9.454762e-21      -3.494 -2.889          -2.582        Topology and modeling. Physical Review E 88, 1 (2013), 012812.
                                                                         [9] Stephan Gollasch, Chad L. Hewitt, Sarah Bailey, and Matej David. 2019. Intro-
                     weighted_average_distance
                                                                             ductions and transfers of species by ballast water in the Adriatic Sea. Marine
 all          -4.527  1.758368e-04      -3.495 -2.890          -2.582        Pollution Bulletin 147 (2019), 8 – 15. https://doi.org/10.1016/j.marpolbul.2018.
 cargo        -10.204 5.860322e-18      -3.494 -2.889          -2.582        08.054
 passenger    -9.783  6.668246e-17      -3.494 -2.889          -2.582   [10] Aric Hagberg, Pieter Swart, and Daniel S Chult. 2008. Exploring network
                                                                             structure, dynamics, and function using NetworkX. Technical Report. Los
 sailing      -12.326 6.577707e-23      -3.494 -2.889          -2.582        Alamos National Lab.(LANL), Los Alamos, NM (United States).
         Table 1: Augmented Dickey-Fuller (ADF) Test                    [11] Pablo Kaluza, Andrea Kölzsch, Michael T Gastner, and Bernd Blasius. 2010.
                                                                             The complex network of global cargo ship movements. Journal of the Royal
                                                                             Society Interface 7, 48 (2010), 1093–1103.
                                                                        [12] Genshiro Kitagawa and Hirotugu Akaike. 1978. A procedure for the modeling
                                                                             of non-stationary time series. Annals of the Institute of Statistical Mathematics
6   CONCLUSION                                                               30, 2 (1978), 351–363.
This paper presented an analysis of the evolution of networks           [13] Zuzanna Kosowska-Stamirowska, César Ducruet, and Nishant Rai. 2016. Evolv-
                                                                             ing structure of the maritime trade network: evidence from the Lloyd’s Ship-
made by voyages of vessels between ports, based on several                   ping Index (1890–2000). Journal of Shipping and Trade 1, 1 (2016), 10.
topological features of the network so-called Topological Voyage        [14] Fernando González Laxe, Maria Jesus Freire Seoane, and Carlos Pais Montes.
                                                                             2012. Maritime degree, centrality and vulnerability: port hierarchies and
Graph Features (TVGs). The networks were built in a bottom-up                emerging areas in containerized transport (2008–2010). Journal of Transport
and data-driven fashion, considering 3 years of AIS data of the              Geography 24 (2012), 33–44.
US coastal waters. The analysis unraveled insights about the            [15] marchah. (accessed November 2019). Sea Ports Data. https://github.com/
                                                                             marchah/sea-ports
stationarity of several features over time, including the number        [16] MarineCadastre.gov. (accessed November 2019). Vessel Traffic Data. https:
of nodes and edges, clustering coefficient, and geodesic distance            //marinecadastre.gov/ais/
between the ports (nodes) of the network. The analysis was              [17] Carlos Pais Montes, Maria Jesus Freire Seoane, and Fernando González Laxe.
                                                                             2012. General cargo and containership emergent routes: A complex networks
performed on the MarineCadastre.gov dataset containing vessels’              description. Transport Policy 24 (2012), 126–140.
voyages in the USA coast. In particular, for this dataset, we found     [18] Lokukaluge P Perera, Paulo Oliveira, and C Guedes Soares. 2012. Maritime
                                                                             traffic monitoring based on vessel detection, tracking, state estimation, and
that for most of the TVGs, the time-series do not present any                trajectory prediction. IEEE Transactions on Intelligent Transportation Systems
trend or seasonality behaviors.                                              13, 3 (2012), 1188–1200.
   We are aware of the limitations of applying our approach to a        [19] Iraklis Varlamis, Konstantinos Tserpes, Mohammad Etemad, Amílcar Soares
                                                                             Júnior, and Stan Matwin. 2019. A Network Abstraction of Multi-vessel Tra-
single dataset that only covers the coast of the United States in a          jectory Data for Detecting Anomalies.. In Proceedings of the Workshops of the
relatively short time period of 3 years (2015 - 2017). In the future,        EDBT/ICDT 2019 Joint Conference.
we plan to extend our study both in terms of the geographical           [20] Michele Vespe, Harm Greidanus, and Marlene Alvarez Alvarez. 2015. The de-
                                                                             clining impact of piracy on maritime transport in the Indian Ocean: Statistical
area covered (i.e., by including dataset from other countries and            analysis of 5-year vessel tracking data. Marine Policy 59 (2015), 9–15.
possibly a worldwide network) and the time period. In addition,         [21] Michele Vespe, Ingrid Visentini, Karna Bryan, and Paolo Braca. 2012. Unsu-
                                                                             pervised learning of maritime traffic patterns for anomaly detection. (2012).
the definition of the spatial area of ports can be improved to          [22] Zhihuan Wang, Christophe Claramunt, and Yinhai Wang. 2019. Extracting
increase the precision in voyages detection, in a way similar to the         global shipping networks from massive historical automatic identification
one performed in [22]. Finally, in terms of network analysis, the            system sensor data: a bottom-up approach. Sensors 19, 15 (2019), 3363.