<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Uncovering vessel movement paterns from AIS data with graph evolution analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emanuele Carlini</string-name>
          <email>emanuele.carlini@isti.cnr.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Etemad</string-name>
          <email>etemad@dal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vinicius Monteiro de Lira</string-name>
          <email>vinicius.monteirodelira@isti.cnr.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bruno Brandoli Machado</string-name>
          <email>brunobrandoli@dal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amilcar Soares</string-name>
          <email>amilcar.soares@dal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stan Matwin</string-name>
          <email>stan@cs.dal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Big Data Analytics, Dalhousie University</institution>
          ,
          <addr-line>Halifax</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Information, Science and Technologies, National Research Council (CNR)</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>The availability of a large amount of Automatic Identification System (AIS) data has fostered many studies on maritime vessel trafic during recent years, often representing vessels and ports relationships as graphs. Although the continuous research efort, only a few works explicitly study the evolution of such graphs and often consider coarse-grained time intervals. In this context, our ultimate goal is to fill this gap by providing a systematic study in the graph evolution by considering voyages over time. By mining the arrivals and departures of vessels from ports, we build a graph consisting of vessel voyages between ports. We then provide a study on topological features calculated from such graphs with a strong focus on their temporal evolution. Finally, we discuss the main limitations of our approach and the future perspectives that will spawn from this work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Maritime transportation represents 90% of international trade
volume and plays a paramount role in today’s economy, in terms
of cargo shipping, passenger transportation, leisure navigation,
and fishing operation[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Globalization and multiple mode
transportation of goods in the shipping industry resulted in a massive
extension of the maritime vessel route network. The study of
vessel movements is a well-established source of information to
understand the role of maritime routes and ports in economic,
social, and environmental contexts. These studies include maritime
trafic control and prediction [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], human migration flows [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
bioinvasion [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and maritime piracy [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. However, such a role
cannot be adequately unraveled by looking at ports and routes in
isolation, but instead, they must be put in relation to one another.
      </p>
      <p>This allows the study of the interplay of all the components
in the complex maritime network, and it is even more important
for understanding the evolution over time of that interplay.</p>
      <p>
        A central concept for the analytical study of vessel routes is
the Global Shipping Network (GSN), in which nodes are ports
and edges are the routes between ports of cargo ships. Figure 1
illustrates the GSN resulting from the vessels’ routes of 2017 in
the American coast using the MarineCadastre.gov dataset [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Since the automatic identification system (AIS) for vessels was
made mandatory in 2004 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], there has been a surge of studies on
the GSN and other maritime networks that use such data. Many
works modeled the GSN based on graph theory [
        <xref ref-type="bibr" rid="ref19 ref22">19, 22</xref>
        ], but only
a few of them analyzed the network in terms of its evolution over
the years [
        <xref ref-type="bibr" rid="ref14 ref17">14, 17</xref>
        ]. Also, those works which studied the network
evolution used private data and performed exciting but high-level
and coarse-grained analysis, such as in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The main goal of our analysis is to provide a systematic study
of the evolution aspect regarding maritime vessel routes, with the
purpose of identifying recurrent patterns in their evolution. The
analysis is based on publicly available data and with a defined
and well-documented data model. This aspect is fundamental
for the reproducibility of our analysis and its expansion and
updating of results when new data arrives. Also, it considers the
two necessary dimensions of time and layers (i.e., the evolution
of the network can be observed for multiple types of vessels, such
as cargo and passengers).</p>
      <p>Such an ambitious objective has some inherent challenges
that must be tackled. First, the analysis of AIS datasets typically
presents a Big Data challenge: the datasets are usually vast, and
data can arrive at a high rate. For example, ExactEarth alone
claims to consistently track 165,000 vessels and compiling over
7,000,000 AIS messages daily1, but data still need to be processed
in a reasonable time.</p>
      <p>Second, the purpose of any network analysis is to abstract the
complexity of a system in order to extract meaningful information
that is not directly available when the individual components
are examined separately. Therefore, the definition of a network
that encompasses time information is a complex task. Suitable
approaches need to be carefully selected to study the evolving
network.</p>
      <p>The contributions of this work can be outlined as the
following:
• We propose an approach that uses AIS data to extract
connections between ports derived from the vessels’
movements. From these connections (or voyages), we build
graphs in which the vertices correspond to the ports,
whereas the edges or links correspond to the vessel voyage
between two diferent ports. In addition, each edge has a
semantic defined by the vessels’ type.
• We study several topological properties of the temporal
graphs generated from vessels’ movements and how these
features evolve over time. Specifically, we investigate
features relative to graphs dimension, ability to form clusters,
and geographical spatiality.
• We design and run an extensive analysis using a real
dataset with AIS Data collected from vessels navigating in
the U.S. coastal area. In our experiments, we investigate
the aspect of stationarity of the time series of the
topological properties of the graph and discuss the obtained
insights.</p>
      <p>The rest of the paper is organized as follows. Section 2
discusses related works. Section 3 introduces some concepts used
through the paper. In Section 4, we describe our approach for
deriving time-series of topological properties from graphs based
on vessels’ visits to ports. We perform some analysis in Section
5 and draw conclusions envisioning future works in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        The work done in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is one of the first to study the concept
of GSN as a complex network. They use information about the
itineraries of 16363 ships of three types (bulk dry carriers,
container ships, and oil tankers) during the year 2007 to build a
network of links between ports. The work of [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] shows that the
three categories of ships difer in their mobility patterns and
networks. Their results show that container ships follow regularly
repeating paths, whereas bulk dry carriers and oil tankers move
less predictably between ports. They also show that the network
of all ship movements possesses a heavy-tailed distribution for
the connectivity of ports and the loads transported on the links
with systematic diferences between ship types [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        The work of [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] also uses a sample of the Lloyds database
with the world container ship fleet movements from Chinese
ports from the years of 2008 to 2010. The objective of their work
is to look at changes in the maritime network prior to and after
the financial crisis (2008-2010) and to analyze the extent to which
large ports have seen their position within the network change.
The authors show how the global and local importance of a
port can be measured using graph theory concepts. They also
show that the goods transportation network was contracted with
respect to port throughput, but no contraction in the distribution
capacity of the main hub ports was found [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Finally, the authors
1https://www.exactearth.com/products/exactais
show that there are new port regions placed in the entrance
and exit of the Panama Canal, and there are several significant
business opportunities in that region.
      </p>
      <p>
        A study of topological changes in the maritime trade network
is shown in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The authors propose two new measures of
network navigability called random walk discovery and escape
dificulty . Their results show that the maritime network evolves
by increasing its navigability while doubling the number of active
ports. The authors suggest that unlike in other real-world
evolving networks studied in the literature up to date, the maritime
network does not densify over time, and its efective diameter
remains constant [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the author investigates the degree of overlap among
the diferent layers of circulation composing global maritime
lfows. His work uses several methods from complex network
analysis to understand the dynamics afecting the evolution of
ports and shipping. The results show that there is a strong and
path-dependent influence of multiplexity on trafic volume, range
of interaction, and centrality from various perspectives (e.g.,
matrices correlations, homophily, assortativity, and single linkage
analysis) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. When growing the network and concentrating the
analysis around large hubs over time, results show that the trafic
distribution is place-dependent due to the reinforced position of
already established nodes [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The work of [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] builds a GSN using the 2015 AIS data of the
world with multiple spatial levels. Their process mainly consists
of five steps, where the first three generate the network nodes,
and the last two create the network links. The work of [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] applies
the Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) to detect where ships stop and cross this information
with terminal candidates of ports. A directed GSN is generated
with the trip statistics between two nodes as the edges. Their
work evaluates features such as average degree and betweenness
centrality of each node, average shortest path length between
any two nodes, and community clusters of the GSNs.
      </p>
      <p>
        Following a similar idea of building GSNs, but with focus on
anomaly detection, the work of [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] provides a mechanism that
classifies vessel behavior in normal and abnormal, using historical
information about similar vessels that operate in a particular
area. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the authors identify waypoints (i.e., a region of
interest for a given application) that characterizes the operations
and the sort of movement patterns that they follow (i.e., the
nodes). As edges, the work of [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] uses the subtrajectories that
links two waypoints, also using the extracted features of those
subtrajectories for analysis. They identified each edge by the
subtrajectory that links two ways points. Features of each edge
are generated using a trajectory mining library introduced by
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Their analysis tries to detect outliers from the subtrajectory
features (e.g., course over ground, speed over ground, etc.) and
using transition probabilities as the edges of the network.
      </p>
      <p>
        Diferently from [
        <xref ref-type="bibr" rid="ref11 ref13 ref14 ref6">6, 11, 13, 14</xref>
        ], our work use AIS data to
determine vessel routes. Diferently from [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], which uses stop
points as nodes and evaluate centrality, shortest-path, and
communities, our work is focused on the evolution of the network
instead of using waypoints as nodes and being focused on
anomaly detection like [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], or using stop points as nodes to evaluate
centrality, shortest-path, and communities like [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], we use the
ports as nodes, and we evaluate the evolution of the network as
our primary task.
      </p>
    </sec>
    <sec id="sec-3">
      <title>PROBLEM FORMULATION AND</title>
    </sec>
    <sec id="sec-4">
      <title>DEFINITIONS</title>
      <p>
        Vessels report their location through AIS messages while
navigating. A vessel may send AIS messages with a frequency that varies
from a few seconds to a few minutes, depending on the type of
message. When they are at the underway, they may send AIS
messages every 2 to 10 seconds, while when they are at anchor,
this time window can increase to every 3 minutes [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Therefore,
positional information extracted from AIS messages can be seen
as a representation of the spatial-temporal movement of a
traveling vessel. We are interested in this spatial information with the
intent of understating when a vessel is visiting a port. By merging
subsequent visits to ports of vessels, it is then possible to build
the sequence of vessels’ voyages. Then the idea is to construct
a graph (or network) representation out of these voyages in a
given interval of time, to be able to study the evolution of the
graph with complex network mechanisms.
      </p>
      <p>
        Graphs have some properties useful to unravel interesting
information about the dynamism between two and more entities. In
particular, in the context of a voyages graph, the topological
properties of the graph can help us identify relevant characteristics
within a network that would not have meaningful information if
the individual entities were examined separately [
        <xref ref-type="bibr" rid="ref13 ref6">6, 13</xref>
        ].
Topological properties can be applied to the network as a whole or to
individual nodes and edges. In particular, for our study, we are
concerned with global network properties. Below, we provide
a compact formalization of an AIS message, as well as of some
other important concepts that covers the scope of this paper.
      </p>
      <p>Definition 3.1. (AIS Message): An AIS Message me is a tuple
(x, y, t, c) that represents the GPS coordinate (x, y) at a time stamp
t assigned to a vessel e of type c. We define M as the set of all
AIS messages.</p>
      <p>Definition 3.2. (Port): A Sea Port p is represented as a tuple
(id, x, y), where x and y are the latitude and longitude coordinates
of its geographical center, and id is the code that identifies the
port. We also define the spatial function α (p, r ) that defines a
circular area of radius r centered on the coordinates of port p.</p>
      <p>Definition 3.3. (Visit): Given a radius r , we formally define a
visit v = (p, me , t ) of the vessel e to a port p when it exists at
least one me ∈ M at time t whose coordinates x and y are in the
area defined by α (p, r ).</p>
      <p>Definition 3.4. (Voyage): A voyage vj = (v1, v2) is a pair of
visits, such that v1(p) , v2(p) and v1(e) = v2(e) and v1(t ) &lt; v2(t )
and they are consecutive (there is no visits to other ports between
them for the same vessel). The ports of v1 and v2 are called
respectively origin and destination ports. The duration of the
voyage vj(d) is the time of the last visit of e in the origin port
and the time of first visit in the destination port.</p>
      <p>Definition 3.5. (Voyage Graph): The Voyage Graph (VG) is a
graph G = (V , E) built according to a set of voyages V J , in which
V contains all the ports in V J and E contains an edge for each
unique pair of ports in V J .</p>
      <p>Definition 3.6. (Voyage Graph Snapshot): The Voyage Graph
Snapshot (VGS) Gw = (V , E)w is an extension of the VG that
includes a temporal time window w used to create the snapshot
of the graph. This interval is used to select the AIS messages that
will be used to build the snapshot Gw of VG G.</p>
      <p>Algorithm 1: Trips Graph Snapshot Extraction</p>
      <p>Input : M: AIS Messages divided into areas
w: Time window size
s: Time shifting
r: Radius</p>
      <p>P: Set of ports
Output : G: set of Voyage Graph Snapshots</p>
      <p>Init : G ← ∅; V ← ∅
1 V ← Visits(Ai , P, r);
2 R1 ← Voyages(V);
3 R2 ← Clean(R1);
4 foreach b ∈ Buckets(w, s) do
5 G ← G + Graph(R2, b);
6 return G
4</p>
    </sec>
    <sec id="sec-5">
      <title>METHODOLOGY</title>
      <p>This section describes the procedural methodology that we use
to transform raw AIS data to a set of time series observations on
the topological features of voyage graphs. We first describe the
algorithm to build the voyage graph from the source data, then
which features we extract from the graph, and finally how we
create the time series.
4.1</p>
    </sec>
    <sec id="sec-6">
      <title>Building Voyage Graphs</title>
      <p>Building the set of voyage graph snapshot directly from the
original data would be possible, but also very unpractical. The dataset
has a lot of noise, entries are not ordered in time, and much
information is redundant. Therefore, we applied an
incremental approach to process the data. First, we exploit the fact that
the original dataset is already divided into geographical areas
(i.e., the zones of the MarineCadastre.gov dateset, see Section5).
For each area, we remove redundant information (e.g.,
subsequent entries for the same vessel in the same port, with the same
timestamp) and compute the set of visits. Second, we compute
the set of voyages using the visits, removing incorrect and noisy
voyages. Third, from the voyages we can determine the graph
and the snapshot according to the time window considered. This
incremental process has several advantages: (i) graphs building
is very fast, as the set of voyages is practically an edge list; (ii) the
costly cleaning process is done only once, and from the clean
collection of voyages it is possible to build multiple graphs; and (iii)
it is interesting to study visits and voyages without transforming
them into a graph.</p>
      <p>The pseudocode in Algorithm 1 illustrates the steps that
extract all the voyage graph snapshots (G0, . . . , Gn ) for a given set
of AIS Messages M. Besides M, the algorithm also receives as
parameters a Radius r , a set of ports P , a time window size w,
and a shifting s. The shifting parameter s indicates the number
of days to move forward the time window for each successive
graph. For example, with s = 10, if the first time window starts
on the 1st of January, the second starts at the 10th of January,
and so on.</p>
      <p>The algorithm starts by extracting the set of visits V (see
definition 3.3) from M in each geographical area Ai in the original
dataset with the function V isits() (line 1). This function returns
V ⊂ Ai , such that V contains the AIS records transmitted inside
the area α (p, r ) (see definition 3.2) for each port p ∈ P . Depending
on r , there could be overlapping port areas such that the same AIS
record results transmitted inside multiple ports. In these cases,
we discriminate by associating the record to the closest port.</p>
      <p>In turn, V feeds the function V oyaдes() that extracts the
voyages of the vessels (line 2). The underlying assumption is that if
a vessel e is seen at the port po ∈ P at time t0 and e is also seen
at the port pf at time t1 and t1 &gt; t0, then we record a voyage
(definition 3.4).</p>
      <p>The set of voyages is then cleaned by the function Clean() (line
3), to retain only the valid voyages. AIS data inevitably contains
noise due to many reasons, including malfunctions, errors in
transmission, and malicious use. In our context, such noise and
mistakes translate into incorrect voyages. To remove them, we
performed two cleaning actions. First, we removed those entries
having null or invalid data in relevant fields (typically position
or vessel type). For example, several incorrect entries had a
vessel identifier, called Maritime Mobile Service Identity (MMSI),
whose value is composed only by zeroes, which may indicate a
placeholder for missing MMSI. We then removed the voyages
that, depending on their duration (vjd ) and length, imply an
impossible speed for a vessel. We set the length of a voyage by
computing the geodesic distance between starting and arrival
ports. The geodesic distance is the minimum distance between
two points on earth, which can be used as the shortest
possible maritime voyage. By using the length and the duration, we
compute the estimated average speed based and removed those
voyages whose speed exceeds 60 knots (which is still very high
speed, but we left some margin to cope with a possible degree of
approximation in the data). We did not remove the slow speed
voyages as we cannot estimate how long is the actual maritime
route between two ports with respect to the geodesic distance.</p>
      <p>Finally, the algorithm builds the set of Voyage Graphs
(definition 3.5) thought the function Graph() using the clean set of
voyages R2 (line 5). This function uses the ports as vertices and
the voyages as edges to build the graph. It considers the
function Bucket () that creates all the possible time windows for the
parameters w and s.
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>Topological Voyage Graph Features</title>
      <p>
        In order to study the evolution of the voyages graph, we employ
a set of graph metrics that are related to the diferent aspects
we wish to evaluate, called Topological Voyage Graph Features
(TVGs). In this paper, we studied only global network metrics
(i.e., related to the whole graphs). We used the networkx library
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to compute such metrics.
      </p>
      <p>An important metric in studying and comparing graphs is the
dimensions of the graph. To this end, we have considered the
order (i.e., amount of nodes/ports in the graph) (fn ) and the size
(i.e., amount of edges) (fe ). Semantically, these two measures
are connected. A higher order and size can indicate an increased
overall vessel trafic, and/or the tendency to perform less
conservative routes, as more ports are involved. Inversely, a lower order
and size can indicate a decreased overall vessel trafic, and/or the
tendency to perform more conservative routes, as less ports are
involved.</p>
      <p>A relevant aspect is the identification of cohesive subgroups
of ports in the graph, as a way to identify those ports that share a
strong tie in the trafic for a particular vessel type. The number of
Connected Components (CC, fc ) is the number of subgraphs in
which any node is connected to each other by edges, and which
is not connected to another subgraph. The number of Connected
Components (fc ) indicates how much the graph represents a
1000
I 800
S
eSM600
u
iunq 400
200
0
5000
t4000
n
u
co3000
seg
ya2000
o
v
1000
0
0.6
0.5
e0.4
g
a
t
n
ce0.3
r
e
P
0.2
0.1
0.0
tug tow
cargo
tanker
passenger
sailing
global scale activity (low CC number), rather than composed by
a set of not connected and local activities (high CC number). The
Average Clustering coeficient ( fa ), is the average of local
clustering of nodes. The local clustering of each node in the graph
is the fraction of triangles (set of 3 vertices such that any two
of them are connected by an edge) that exist over all possible
triangles in its neighborhood. In other words, this coeficient
represents the probability that two neighbors of a node are
neighbors themselves, and can serve to evaluate how many voyages
happen around the same set of ports.</p>
      <p>
        Another essential metric for the maritime networks is the
extension of their geographical spatiality, as the distance between
ports can be directed linked with various cost aspects as fuel
consumption, maintenance rates, and insurance costs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In
addition, such a metric can give insights on whether certain vessel
types are more oriented toward short or long routes. In this
paper we have considered the average (fd ) and weighted average
(fw ) link distance. The former represents the geodesic distances
between connected ports in the graph, expressed in kilometers.
The weighted version multiplies each distance by the number of
voyages made between the two ports.
      </p>
      <p>0.61
0.44
0.23
0.26
Unique MMSI</p>
      <p>
        Routes Count
A time-series is a collection of observations made sequentially
over the time [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Examples include (i) sales of a particular product
in successive months, (ii) electricity consumption in a particular
area for successive one-hour periods.
      </p>
      <p>In particular, we build time-series of the TVG features. For this
purpose, we use Algorithm 1 to return the set of Voyages Graph
Snapshot G, compute the TVGs for each one of the graph дi ∈ G
and store them as separated time-series Tj where j indicate the
TVG feature (i.e. number of nodes (fn ), number of edges (fe ),
average clustering (fa ), number of strongly connected components
(fc ), the average (fd ) and weighted average (fw ) link distance.</p>
      <p>Let f unctionj be the function that compute the topological
metric of a given graph д indicated by the index j, for instance
if j is equal to fn then f unctionj computes the number of edges
on the graph д. Also, let I be the set of existing temporal buckets
in G and let J be the set of TVG features.</p>
      <p>We then formalize the TVG time-series (T f ) with the following
equations:</p>
      <p>Tj B { f unctionj (дi ) | дi ∈ G, i ∈ I, j ∈ J }
where:</p>
      <p>J B { fn, fe , fa, fc , fd , fw }</p>
      <p>I B {w1, w2, ..., wn }
5</p>
    </sec>
    <sec id="sec-8">
      <title>EXPERIMENTS</title>
      <p>In this section, we present the experiments conducted to assess
the stationary behavior of the TVG features. We perform such
analysis for the diferent spectrum of types of vessels. In the
experiments, to build the Voyage Graph Snapshot using Algorithm
1, we assign respectively the values r = 5km, s = 10days, and
w = 30days.</p>
      <p>
        Our source of AIS data is the MarineCadastre.gov dataset [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ],
which contains filtered AIS records for the US coastal waters
for the years 2015-2017, for a total of about 934 GB. Records are
sampled to one minute and organized in the comma-separated
value (CSV) format, for a total of about 8 billions of records.
      </p>
      <p>
        Additionally, we use the Sea-Ports dataset [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] that contains
spatial information, such as latitude and longitude, of all known
seaports in the world. From the original datasets, we have
retained only the ports of North America.
      </p>
      <p>The experiments conducted aim to answer the following
research questions comprehensively. Our research question can be
summarized as the following: Are the TVG time series
stationary? Stationarity is an essential characteristic of a time series.
A time series is said to be stationary if its statistical properties
do not change over time. In other words, it has constant mean
and variance, and covariance is independent of time. Section 5.2
address this research question. To address this research question,
we will use a statistical test designed to comment on whether a
time series is stationary explicitly.
5.1</p>
    </sec>
    <sec id="sec-9">
      <title>Data Overview</title>
      <p>Figure 2 shows, in percentage, the amount of unique MMSIs and
route counts by vessel type and for the whole dataset. In the
original dataset, there are several vessel categories: cargo, passenger,
sailing, military, fishing, tanker, and tug tow. Interestingly, for
cargo and tanker, a relevant percentage of unique MMSI
correspond to a much lesser portion of total voyages. This is expected
as these types of vessels perform less but longer voyages
concerning other vessel types. By comparison, tug boats with 26%
unique vessels have 61% of total routes. Unlike cargo and tankers,
these routes are very short and relative to the vessels moving
between nearby ports.</p>
      <p>Among all categories, military and fishing account only for 1%
of the total voyages, meaning that there are too little data to build
meaningful graphs for these categories. We also did not consider
tug tows as their routes mostly regard only two ports resulting
in not interesting graphs. Therefore, from now on, we consider
only the following categories: cargo (which also includes tanker)
passenger, and sailing.</p>
      <p>Figure 3 top and bottom show the amount of unique MMSIs
respectively and voyages count for each vessel type and for each
month in the period considered. Looking at cargo and tug tows,
we can notice an evident pattern of less naval trafic in the winter
periods (for example, from February to March 2016), in which
both the number of unique MMSI and voyages has a clear drop.
The drops are also visible (especially for cargo) in the same period
of 2015 and 2017. Also, it interesting to notice the similarity in
the trend of the cargo and tug tow, which can be explained by
the fact the tug boats are mostly used to help large vessels, such
as cargoes, into ports.</p>
      <p>Figure 4 shows the graph resulting from cargo vessels in the
northeast US in February-2016 (Figure 4 a) and August-2016
(Figure 4 b). It is clear from the graph that the trafic is more abundant
during the summer because of the better climatic conditions. The
graphs also show the complete halt of voyages in the great lakes
during winter due to the formation of ice in the lakes.</p>
      <p>Figure 5 shows the extracted time-series using the
MarineCadastre.gov dataset for the features: number of nodes, number of
edges, number of connected components, and average clustering,
average distances and weighted average distance. For each TGV
feature, we consider four diferent series: passenger, sailing, cargo,
and all.
5.2</p>
    </sec>
    <sec id="sec-10">
      <title>Stationary behavior analysis</title>
      <p>
        In this section, we address our research question of
investigating the (non) stationary behavior of the TVGs. A time series is
stationary if they do not have a trend or seasonal efects. This
means that the statistics calculated on the time series, such as
the mean, variance, and auto-correlation of the observations are
consistent over time [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Most statistical forecasting methods are
based on the assumption that the time series can be modeled
approximately stationary through the use of mathematical
transformations [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Thus, stationary time series are easier to model.
Indeed, statistical modeling methods assume or require the time
series to be stationary to be efective.
      </p>
      <p>
        There are diferent methods to verify whether a time series
is stationary or not. Statistical tests are widely used to analyze
if the requirements of stationary are met or have been violated.
Here, we adopted the Augmented Dickey-Fuller Test [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] (ADF)
that uses an auto-regressive model and optimizes an information
criterion across multiple diferent lag values [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>The null hypothesis of the test is that the time series is not
stationary. The alternative hypothesis (rejecting the null
hypothesis) is that the time series is stationary. When interpreting the
p-value from this test, values below a threshold (such as 5% or
1%) suggests to reject the null hypothesis, i.e., the time-series is
stationary. While, p-values above the threshold suggests to do
not to reject the null hypothesis, meaning that the time-series is
non-stationary.</p>
      <p>We performed an ADF test on the TVG Time-series extracted
from our dataset. The idea is that the more negative (lower) this
ADF statistic, the more likely we have a stationary time-series
or does not have time-dependent structure. We report the ADF
test results on Table 1. The table reports for each TVG feature,
and for each serie, the ADF-Statistic, the p-value and the critical
values (1%, 5%, and 10%).</p>
      <p>By looking at the results of each TGV feature, it is possible
to see that most of the series have statistic values lower than
the critical value of 1%, except the features avg_clustering, for
the Serie sailing (ADF statistic equals to -2.851), and #cc, for the
Serie passenger (ADF statistic equals to -3.431). This indicates
weak evidence against the null hypothesis, so for these 2 cases,
considering critical values at 1% level, these two TVG time-series
are non-stationary.</p>
      <p>Therefore, our analysis using the MarineCadastre.gov dataset
suggests that there is a stationary characteristic over the time of
the vessel’s voyages in the USA coast for the diferent considered
vessels’ type (i.e., all, sailing, cargo and passenger). This
characteristic has been presented in most of the investigated TGVs
time-series, and it implies that the voyage graphs at diferent
ranges of time points keep constant on average, without showing
significant trends or seasonality during the years of 2015 and
2017.
all
cargo
passenger
sailing
all
cargo
passenger
sailing
all
cargo
passenger
sailing
all
cargo
passenger
sailing
all
cargo
passenger
sailing
all
cargo
passenger
sailing
-9.355
-9.831
# nodes
# edges
avg_distance
-3.494
-3.494
-3.494
-3.494
-3.494
-3.497
-3.494
-3.494
-3.494
-3.494
-3.494
-3.494
-2.889
-2.889
-2.889
-2.889
-2.889
-2.889
-2.889
-2.892
-2.889
-2.889
-2.891
-2.889
-2.889
-2.891
-2.889
-2.889
-2.889
-2.889
-2.889
-2.890
-2.889
-2.889
-2.889
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.583
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582
-2.582</p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION</title>
      <p>This paper presented an analysis of the evolution of networks
made by voyages of vessels between ports, based on several
topological features of the network so-called Topological Voyage
Graph Features (TVGs). The networks were built in a bottom-up
and data-driven fashion, considering 3 years of AIS data of the
US coastal waters. The analysis unraveled insights about the
stationarity of several features over time, including the number
of nodes and edges, clustering coeficient, and geodesic distance
between the ports (nodes) of the network. The analysis was
performed on the MarineCadastre.gov dataset containing vessels’
voyages in the USA coast. In particular, for this dataset, we found
that for most of the TVGs, the time-series do not present any
trend or seasonality behaviors.</p>
      <p>
        We are aware of the limitations of applying our approach to a
single dataset that only covers the coast of the United States in a
relatively short time period of 3 years (2015 - 2017). In the future,
we plan to extend our study both in terms of the geographical
area covered (i.e., by including dataset from other countries and
possibly a worldwide network) and the time period. In addition,
the definition of the spatial area of ports can be improved to
increase the precision in voyages detection, in a way similar to the
one performed in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Finally, in terms of network analysis, the
metrics we considered in this paper are global. Still, an analysis
of local node metrics (such as node centrality) is an interesting
future improvement.
      </p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGMENT</title>
      <p>The authors acknowledge the support of the H2020 EU Project
MASTER (Multiple ASpects TrajEctoRy management and
analysis) funded under the Marie Skłodowska-Curie grant agreement</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <fpage>2004</fpage>
          .
          <article-title>Safety of Life at Sea (SOLAS), Consolidated Edition</article-title>
          . (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Peter J Brockwell</surname>
          </string-name>
          and Richard A Davis.
          <year>2016</year>
          .
          <article-title>Introduction to time series and forecasting</article-title>
          . springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Chris</given-names>
            <surname>Chatfield</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Time-series forecasting</article-title>
          . Chapman and Hall/CRC.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Yin-Wong Cheung</surname>
          </string-name>
          and
          <string-name>
            <surname>Kon S Lai</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>Lag order and critical values of the augmented Dickey-Fuller test</article-title>
          .
          <source>Journal of Business &amp; Economic Statistics 13</source>
          ,
          <issue>3</issue>
          (
          <year>1995</year>
          ),
          <fpage>277</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>David</surname>
            <given-names>A Dickey</given-names>
          </string-name>
          <string-name>
            <surname>and Wayne A Fuller</surname>
          </string-name>
          .
          <year>1979</year>
          .
          <article-title>Distribution of the estimators for autoregressive time series with a unit root</article-title>
          .
          <source>Journal of the American statistical association 74</source>
          ,
          <year>366a</year>
          (
          <year>1979</year>
          ),
          <fpage>427</fpage>
          -
          <lpage>431</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>César</given-names>
            <surname>Ducruet</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Multilayer dynamics of complex spatial networks: The case of global maritime flows (1977-2008)</article-title>
          .
          <source>Journal of Transport Geography</source>
          <volume>60</volume>
          (
          <year>2017</year>
          ),
          <fpage>47</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Mohammad</given-names>
            <surname>Etemad</surname>
          </string-name>
          , Amílcar Soares Júnior, and
          <string-name>
            <given-names>Stan</given-names>
            <surname>Matwin</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Predicting transportation modes of GPS trajectories using feature engineering and noise removal</article-title>
          .
          <source>In Canadian Conference on Artificial Intelligence</source>
          . Springer,
          <fpage>259</fpage>
          -
          <lpage>264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Fagiolo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Marina</given-names>
            <surname>Mastrorillo</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>International migration network: Topology and modeling</article-title>
          .
          <source>Physical Review E 88</source>
          ,
          <issue>1</issue>
          (
          <year>2013</year>
          ),
          <fpage>012812</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Stephan</given-names>
            <surname>Gollasch</surname>
          </string-name>
          , Chad L. Hewitt, Sarah Bailey,
          <string-name>
            <given-names>and Matej</given-names>
            <surname>David</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Introductions and transfers of species by ballast water in the Adriatic Sea</article-title>
          .
          <source>Marine Pollution Bulletin</source>
          <volume>147</volume>
          (
          <year>2019</year>
          ),
          <fpage>8</fpage>
          -
          <lpage>15</lpage>
          . https://doi.org/10.1016/j.marpolbul.
          <year>2018</year>
          .
          <volume>08</volume>
          .054
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Aric</surname>
            <given-names>Hagberg</given-names>
          </string-name>
          , Pieter Swart, and Daniel S Chult.
          <year>2008</year>
          .
          <article-title>Exploring network structure, dynamics, and function using NetworkX</article-title>
          .
          <source>Technical Report</source>
          . Los Alamos National Lab.
          <source>(LANL)</source>
          , Los Alamos,
          <string-name>
            <surname>NM (United</surname>
            <given-names>States).</given-names>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Pablo</surname>
            <given-names>Kaluza</given-names>
          </string-name>
          , Andrea Kölzsch,
          <string-name>
            <surname>Michael T Gastner</surname>
            , and
            <given-names>Bernd</given-names>
          </string-name>
          <string-name>
            <surname>Blasius</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The complex network of global cargo ship movements</article-title>
          .
          <source>Journal of the Royal Society Interface 7</source>
          ,
          <issue>48</issue>
          (
          <year>2010</year>
          ),
          <fpage>1093</fpage>
          -
          <lpage>1103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Genshiro</given-names>
            <surname>Kitagawa</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hirotugu</given-names>
            <surname>Akaike</surname>
          </string-name>
          .
          <year>1978</year>
          .
          <article-title>A procedure for the modeling of non-stationary time series</article-title>
          .
          <source>Annals of the Institute of Statistical Mathematics 30</source>
          ,
          <issue>2</issue>
          (
          <year>1978</year>
          ),
          <fpage>351</fpage>
          -
          <lpage>363</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Zuzanna</given-names>
            <surname>Kosowska-Stamirowska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>César</given-names>
            <surname>Ducruet</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Nishant</given-names>
            <surname>Rai</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Evolving structure of the maritime trade network: evidence from the Lloyd's Shipping Index (</article-title>
          <year>1890</year>
          -
          <fpage>2000</fpage>
          ).
          <source>Journal of Shipping and Trade</source>
          <volume>1</volume>
          ,
          <issue>1</issue>
          (
          <year>2016</year>
          ),
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Fernando</given-names>
            <surname>González</surname>
          </string-name>
          <string-name>
            <surname>Laxe</surname>
          </string-name>
          ,
          <source>Maria Jesus Freire Seoane, and Carlos Pais Montes</source>
          .
          <year>2012</year>
          .
          <article-title>Maritime degree, centrality and vulnerability: port hierarchies and emerging areas in containerized transport (2008-2010)</article-title>
          .
          <source>Journal of Transport Geography</source>
          <volume>24</volume>
          (
          <year>2012</year>
          ),
          <fpage>33</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <fpage>marchah</fpage>
          .
          <source>(accessed November</source>
          <year>2019</year>
          ).
          <article-title>Sea Ports Data</article-title>
          . https://github.com/ marchah/sea-ports
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16] MarineCadastre.gov.
          <source>(accessed November</source>
          <year>2019</year>
          ).
          <article-title>Vessel Trafic Data</article-title>
          . https: //marinecadastre.gov/ais/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Pais</surname>
          </string-name>
          <string-name>
            <surname>Montes</surname>
          </string-name>
          ,
          <source>Maria Jesus Freire Seoane, and Fernando González Laxe</source>
          .
          <year>2012</year>
          .
          <article-title>General cargo and containership emergent routes: A complex networks description</article-title>
          .
          <source>Transport Policy</source>
          <volume>24</volume>
          (
          <year>2012</year>
          ),
          <fpage>126</fpage>
          -
          <lpage>140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Lokukaluge</surname>
            <given-names>P Perera</given-names>
          </string-name>
          , Paulo Oliveira, and
          <string-name>
            <given-names>C Guedes</given-names>
            <surname>Soares</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Maritime trafic monitoring based on vessel detection, tracking, state estimation, and trajectory prediction</article-title>
          .
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          <volume>13</volume>
          ,
          <issue>3</issue>
          (
          <year>2012</year>
          ),
          <fpage>1188</fpage>
          -
          <lpage>1200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Iraklis</surname>
            <given-names>Varlamis</given-names>
          </string-name>
          , Konstantinos Tserpes, Mohammad Etemad, Amílcar Soares Júnior, and
          <string-name>
            <given-names>Stan</given-names>
            <surname>Matwin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A Network Abstraction of Multi-vessel Trajectory Data for Detecting Anomalies.</article-title>
          .
          <source>In Proceedings of the Workshops of the EDBT/ICDT 2019 Joint Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Michele</surname>
            <given-names>Vespe</given-names>
          </string-name>
          , Harm Greidanus, and Marlene Alvarez Alvarez.
          <year>2015</year>
          .
          <article-title>The declining impact of piracy on maritime transport in the Indian Ocean: Statistical analysis of 5-year vessel tracking data</article-title>
          .
          <source>Marine Policy</source>
          <volume>59</volume>
          (
          <year>2015</year>
          ),
          <fpage>9</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Michele</surname>
            <given-names>Vespe</given-names>
          </string-name>
          , Ingrid Visentini, Karna Bryan, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Braca</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Unsupervised learning of maritime trafic patterns for anomaly detection</article-title>
          . (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Zhihuan</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christophe Claramunt</surname>
            , and
            <given-names>Yinhai</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Extracting global shipping networks from massive historical automatic identification system sensor data: a bottom-up approach</article-title>
          .
          <source>Sensors</source>
          <volume>19</volume>
          ,
          <issue>15</issue>
          (
          <year>2019</year>
          ),
          <fpage>3363</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>