<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining Vessel Trajectory Data for Patterns of Search and Rescue</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Product and Systems Design Engineering, University of the Aegean</institution>
          ,
          <addr-line>Syros</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Konstantinos Chatzikokolakis</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>MarineTraffic</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <fpage>117</fpage>
      <lpage>124</lpage>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The overall aim of this work is to explore the possibility of
automatically detecting Search And Rescue (SAR) activity,
even when a distress call has on yet been received. For this,
we exploit a large volume of historical Automatic
Identification System (AIS) data so as to detect SAR activity
from vessel trajectories, in a scalable, data-driven supervised
way, with no reliance on external sources of information
(e.g. coast guard reports). Specifically, we present our
approach which is based on a parallelised, nonparametric
statistical method (Random Forests), which has proved
capable of achieving prediction accuracy rates higher than
77%.</p>
    </sec>
    <sec id="sec-2">
      <title>1 INTRODUCTION</title>
      <p>1For many years, North Africa has served as the jumping
off point for refugees and migrants hoping to cross the</p>
      <p>Mediterranean Sea to Europe. Since the Syrian war in 2011,
there has been a rapid increase in the number of people
crossing; a trend which is not expected to stop any time soon.
According to the UN Refugee Agency, this year alone, at
least 2,030 people have died or gone missing on the voyage,
with the greatest number of fatalities occurring along the
socalled Central Mediterranean Route, through Libya [23].
Although under maritime law, any vessel in the area of a
vessel in distress is obliged to offer assistance, numerous
national and international missions have been launched on
the EU borders and in the international waters of the
Mediterranean, so as to assist in Search and Rescue (SAR)
operations, such as Operation Mare Nostrum led by Italy,
Operation Triton led by Frontex, NATO Operation Sea
Guardian and the EU operation Sophia. Many of these
operations were not designed with SAR as a primary mission
goal. Due to this numerous Non-Governmental
Organisations (NGO) have stepped in and have been
performing SAR operations in the area; these include
Migrant Offshore Aid Station (MOAS), Doctors Without
Borders, Sea-Watch and others. According to the UNHCR
an overall 41% of those rescued have been by the NGOs.</p>
      <p>Recently though concerns have been raised about
the possible interactions between NGOs and smugglers. A
report published by the EU agency Frontex stated that there
were “clear indications before departure on the precise
direction to be followed in order to reach the NGOs’
boats”[4]. According to this same report, during 2015, and
the first months of 2016, smuggling groups instructed
migrants to make satellite phone calls to the Maritime
Rescue Coordination Centre (MRCC) in Rome so as to
initiate targeted rescues on the high seas. During this period,
SAR operations were mainly undertaken by Italian law
enforcement, EUNAVFOR Med or Frontex vessels with
NGO vessels involved in less than 5% of the incidents. From
June to October 2016, however, the pattern was reversed.
“Satellite phone calls to MRCC Rome decreased sharply
(down to 10%) and NGO rescue operations rose significantly
to more than 40% of all incidents. Since June 2016, a
significant number of boats were intercepted or rescued by
NGO vessels without any prior distress call and without
official information as to the rescue location” according to
Frontex [4].</p>
      <p>Maritime Domain Awareness (MDA) is the
effective understanding of activities, events and threats in
the maritime environment that could impact global safety,
security, economic activity or the environment [5]. Whilst in
the past, MDA had suffered from a lack of data, current
tracking technology has transformed the problem into one of
an overabundance of data and information. Currently, huge
amounts of structured and unstructured data, tracking vessels
during their voyages across the seas, are becoming available,
mostly due to the Automatic Identification System (AIS) that
vessels of specific categories are required to carry. The AIS
is a collaborative, self-reporting system that allows maritime
vessels to broadcast their information to nearby vessels and
coastal based stations [26]. AIS transceivers allow real time
information exchange between vessels and shore based
stations through digital radio signals transmitted over
dedicated channels in VHF band. The major challenge faced
today, is exploiting these vast amounts of data and transform
it into actionable information. Discovering patterns
emerging within these huge datasets is of great importance
so as to provide critical insights into the patterns vessels
follow during their voyages at sea.</p>
      <p>The main objective of our work is to explore the
possibility of leveraging these huge mobility datasets so as
to automatically detect vessels performing SAR operations.
Towards this direction we adopt a practical data mining and
machine learning approach which is capable of overcoming
the shortcomings and difficulties presented by AIS data
(highly skewed, non-uniform, reception errors etc.) [6]. In
sum, this work presents novelties on two fronts:</p>
      <sec id="sec-2-1">
        <title>Domain Specific: The overall aim of this work is</title>
        <p>to explore if it is possible to automatically detect
SAR activity from open data (such as AIS), even
when a distress call has not been received. This
work has an important social impact, as it can help
improve coordination of SAR efforts and
understanding of implicated activities (e.g.
response time).
• Algorithmic: We extract patterns of “rescue-like
behavior” from billions of records of
spatiotemporal (AIS) data and apply Random Forests,
which is a parallelised nonparametric statistical
method, evaluated as capable of achieving
prediction accuracy rates of more than 77%, even
when applied to large volumes of highly skewed
geospatial data. To the best of the authors
knowledge, no previous work has considered
deriving SAR activity from AIS data.</p>
        <p>The rest of the paper is organized as follows: Section 2
shortly presents previous work in this domain, while Section
3 describes our approach and Section 4 presents the
preliminary results while section 5 concludes this paper by
briefly outlining the main contributions of this work and
suggesting future improvements.
2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>The rise in the availability of larger quantity and better
quality mobility data, has increased the interest of
researchers in data driven knowledge discovery. Some of the
typical mining tasks in the spatio-temporal context include,
frequent pattern discovery, trajectory pattern clustering,
trajectory classification, forecasting, and outlier detection.
Recent works on pattern discovery are based on online event
recognition systems that recognize suspicious and illegal
vessel activities of compressed routes (i.e., only critical
points of routes are preserved)[17]. Although this solution
identifies complex events, it does not classify those to
specific vessel operations (e.g. tugging, fishing, search and
rescue, etc.). The merits of this work have been extended in
where vessels’ moving pattern analysis is performed through
an ontology-based system[14]. Trajectory classification,
includes constructing a model capable of predicting the class
labels of moving objects based on their trajectories and other
features [9]. Trajectory classification has been applied in
many mobility applications and numerous methods have
been proposed throughout the given literature, however less
attention has been paid to the maritime domain and
classifying a vessel’s type with regards to its trajectory. For
example, in [9], authors propose a feature generation
framework TraClass for trajectory data from satellite images
and trace gas measurements, which generates a hierarchy of
features by partitioning trajectories and explores two types
of clustering: (1) region-based and (2) trajectory-based. In
this paper, hierarchical region-based and trajectory-based
clustering after trajectory partitioning is performed, and a
vessel classification rate as high as 84.4% is reported, but
unfortunately information on how many vessel types are
included in the dataset is not provided [9].</p>
      <p>Several studies have proved the value of using AIS
data for data driven knowledge discovery in this domain [12,
15, 16]. An interesting trajectory classification case that has
caught researchers attention, is that of fishing activity
detection; especially for applications such as illegal fishing,
where the task can be defined as given a ship trajectory T,
predict a label   for each data point   where   ∈{Fishing,
NonFishing}[21]. In [21], authors develop three different
models to detect potential fishing behavior according to the
type of fishing activity; for trawlers a Hidden Markov Model
(HMM) is developed using vessel speed as observation
variable; for longliners a pattern recognition approach
named Lavielle’s algorithm has been applied; and for purse
seiners a multi-layered filtering strategy based on vessel
speed and operation time was implemented. Validation
against expert-labeled datasets showed average detection
accuracies of 83% for trawler and longliner, and 97% for
purse seiner. Although these methods were designed for
wide applicability, high accuracy results are only achieved
by preprocessing AIS data, where wrong detections, noise
and faulty out-of-bounds data (e.g. observations on land) are
previously removed [21]. The use of AIS data poses a series
of data management and data processing challenges linked
to the treatment of large volumes of data which may heavily
reduce the applicability of the approach. Many traditional
data mining approaches assume that the underlying data
distribution is uniform and spatially continuous. This is not
the case for global AIS data, as it is often to have large
geographical coverage gaps, message collisions or erroneous
messages especially when processing large areas [18, 25].</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref32">11</xref>
        ] Mazzarella, Vespe, Damalas and Osio focus
on discovering and characterising fishing areas by exploiting
historical AIS data broadcast by fishing vessels.
Specifically, they focused on detecting the behavior of
fishing boats that are probably actively fishing. The
methodology used for the identification of fishing activity
was based on assuming a fishing behaviour highly dependent
and characterised by speed. Detecting changes and
frequency of speed could help identifying which part of the
vessel track can be considered as fishing and which not
[13].Their approach relies on DB-SMoT [20] and DBSCAN
[3] but unfortunately it is difficult to evaluate the overall
accuracy of their results due to the limited availability of
ground truth data.
      </p>
      <p>
        In [24], authors make use of trajectory kernels in
combination with a Support Vector Machines (SVM) to
detect fishing activity from AIS data, which was collected in
a 50km radius around the Port of Rotterdam. For their
classification experiments they use the four most common
vessel types: cargo ship, tanker, tug and law-enforcement
vessel with the best accuracy score being 76.25%. Jiang,
Silver, Hu, De Souza, and Matwin in [8], also make use of
AIS data and compare Autoencoders with SVMs and
Random Forests. In their work they suggest that
autoencoders can perform at least as well as and sometimes
better than SVM and Random Forests on classification
fishing activities, achieving up to 85% accuracy [8].
However, the nature of the autoencoders is to capture as
much information as possible and not as much relevant
information as possible and since this work utilised only a
small dataset it would be difficult to have only a small part
of the input that is relevant to the considered problem.
Furthermore, SVMs do not work well with categorical
features and often fail to handle larger datasets as they pose
significant memory requirements and computational
complexity in such cases. Other studies indicate the
superiority of Random Forests when used for classification
tasks, compared to SVMs and back propagation neural
networks [
        <xref ref-type="bibr" rid="ref18">10</xref>
        ].
      </p>
      <p>Random Forests, which are based on decision trees
combined with aggregation and bootstrap ideas, were first
introduced by Breiman in 2001 [2]. They are a powerful
nonparametric statistical method allowing to consider in a
single and versatile framework regression problems, as well
as two-class and multi-class classification problems [19].
Random Forests can deal with large numbers of predictor
variables even in the presence of complex interactions, and
have been applied successfully in genetics, clinical
medicine, and bioinformatics within the past few years.
Random Forests have been shown to achieve a high
prediction accuracy in such applications and to provide
descriptive variable importance measures reflecting the
impact of each variable in both main effects and interactions
[22]. They are considered capable of good accuracy,
relatively robust of outliers and noise, can be pararellised
and are thus considered suitable data mining algorithm for
big data [1, 2].
3</p>
    </sec>
    <sec id="sec-4">
      <title>PROPOSED APPROACH</title>
      <p>Our aim is to explore the possibility of automatically
detecting SAR activity from open data (such as AIS), even
when a distress call has not been received. The task can be
formulated as given a set of vessel trajectories T, predict a
label   for each trajectory   where   ∈{SAR, Non-SAR}.
A trajectory T is a set of AIS messages monitoring a vessel’s
movement from a departure port to a destination port.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Dataset description and processing requirements</title>
      <p>According to International Organisation for Migration, more
than 360.000 migrants have arrived to EU by sea in 2016,
mainly at Italy, Greece and Spain [7]. With respect to the
spatial coverage, our analysis has been focused on a
bounding box covering the Central Mediterranean Route,
where most of the refugee fatalities have been observed.
Figure 2 below illustrates the bounding box taken into
account in conjunction with the refugee fatalities in 2016. It
should be noted that our approach relies only on AIS data
and the migration fatalities dataset visualised in Figure 2 is
used only as a reference to define the bounding box area.</p>
      <p>The considered dataset includes the 6 most relevant
to navigation AIS messages out of the 27 AIS message types
defined in ITU 1371-4 report [26], which are used in
approximately 90% of AIS-based scenarios. More
specifically, the dataset includes messages of types 1, 2, 3,
5, 18, and 19 out of which 1, 2, 3, 18 and 19 are position
reports, including latitude, longitude, speed-over-ground
(SOG), course-over-ground (COG), and other fields related
to ship movement, while type 5 messages correspond to
static-and voyage information, including the IMO identifier,
radio call sign, name, ship dimensions, ship and cargo types.</p>
      <p>Each vessel's type can be deduced using the
information contained in these messages that the vessel is
transmitting. This piece of information, typically referred to
as AIS SHIPTYPE, usually consists of two digits, the first
one ranging from 1-9 indicates the general category of the
subject vessel (e.g., Special Category, Passenger, Cargo,
etc.), while the second one provides additional information
regarding the vessel’s type of cargo in certain vessel
categories (e.g., Cargo Ships, Tankers, etc.). The vessel's
crew or the accountable officer are responsible for correctly
entering information into the AIS transponder and although
there are explicit types for SAR vessels, it is frequently the
case that vessels participating in SAR operations are not
declared as such. Furthermore, only the fact that a vessel’s
type is SAR does not necessarily infer that each voyage of
the vessel is linked to SAR operations (e.g., such vessel
could travel between ports for maintenance purposes). Data
volume included in our analysis demands large
computational power and a parallel processing approach,
due to the fact that traditional analytics fail to handle such
volumes of data in a considerable time frame. Consequently,
we have deployed our approach in Microsoft Azure which is
a distributed computing framework capable to process large
amount of data fast. Particularly our system included two
Head D12v2 nodes, and six D13v2 Worker nodes summing
up to a Spark cluster with 56 cores and 392GB memory (in
total). The worker nodes have 8 processing cores and 56GB
of memory each and the head nodes have 4 processing cores
and 28GB of memory each.
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Data processing and analysis</title>
      <p>The dataset used for this study consists of all the voyages of
2016 that intersect with the bounding box shown in Figure
2. More specifically this includes 275.657 (SAR and
nonSAR, according to the reported AIS SHIPTYPE) voyages
made by 12.291 vessels. These correspond to 54.766.629
AIS observations. After processing the initial data we used
an algorithmic approach we have introduced in [6], which
determines departure and destination port for each AIS
message, thus transforming them into specific voyages. Each
voyage includes the vessel’s trajectory as well as its static
and voyage information described in the previous
subsection. Then, a data curation process was performed, to
discard voyages with insignificant amount of positions (e.g.
statistically too few to be representative). More specifically,
all the voyages that included less than 50 positions were
removed (as the geographical area selected covers a distance
of over 1500 kilometers, trajectories with only 50 reported
positions translate to a sample rate of less than one sample
per hour). Such voyages suffer from gaps of communication,
which will affect the accuracy and the effectiveness of the
proposed. After the curation process the dataset included
114.762 voyages, performed by 10.816 vessels, containing
52.505.718 AIS records. However, the SAR data available
in this geographic area for 2016 are more than 100-times less
compared to the data of non-SAR voyages. More
specifically, the dataset includes 114.377 non-SAR voyages
of 10.788 vessels which include a total of 52.429.521 AIS
messages while the SAR voyages are 385 made by 28
vessels with 75.797 AIS records. For evaluating the
approach, the dataset was split into training and test data; the
training set included 70% of the SAR voyages and in order
to avoid having imbalanced training data or having
imbalanced evaluation metric of the classifier (e.g. true
positive rate at some false positive threshold), we
subsampled the non-SAR voyages (i.e., randomly selecting
a subset) included in the training data. Particularly the
training data included 1.544 non-SAR voyages and 261 SAR
voyages made by 949 and 26 distinct vessels respectively.
The rest of the data (i.e. 30% of the SAR voyages and all the
non-SAR voyages not included in the training set)
constituted our test data.</p>
      <p>For all the records in the dataset we filter the
following attributes which will be used in our analysis for
distinguishing SAR patterns:
a. Ship id: This is a unique identifier for each vessel
b. Ship type: This is a two-digit code that corresponds
to the general category of the vessel and the vessel’s
type of cargo in certain vessel categories
c. Latitude, Longitude: These represent the geographic
location of the vessel
d. SOG: This is the speed over ground of the vessel
measured in knots
e. COG: This is the course over ground of the vessel
measured in degrees with 0 corresponding to north
f. Heading: This attribute represents the ship's heading
in degrees with 0 corresponding to north
g. Timestamp: This is the full UTC timestamp that the
AIS message was received by MarineTraffic</p>
      <p>It should be noted that COG and Heading may be
different, due to weather conditions such as wind speed and
direction, wave height and currents (e.g. when vessels are
drifting). COG on the one hand is the actual moving
direction of the vessel, while heading simply indicates where
the ship is pointing compared to north. Based on all these
attributes and in conjunction with other datasets that assist
on determining the boundaries of a port the following
additional attributes were calculated:</p>
      <sec id="sec-6-1">
        <title>Departure port id: This is a unique identifier of the</title>
        <p>port from which the vessel departed
Departure timestamp: Full timestamp of the first AIS
message outside of departure port geometry
Departure port name: This is the name of the
departure port
Departure port type: This attribute determines the
type of the port (e.g., port, anchorage, etc.)
Departure country code: This attribute indicates the
country of the departure port</p>
      </sec>
      <sec id="sec-6-2">
        <title>Similar attributes related the arrival of each vessel to a port have been also calculated.</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>3.3 SAR Motion analysis</title>
      <p>All these attributes have been used to transform raw
positional data into vessel voyages. However, in order to
distinguish SAR trajectories from other voyages it has been
required to delve into more details on the motion patterns
during SAR operations and focus on maneuverability of such
vessels. The methodology used for the identification of SAR
activity is based on assuming that SAR behaviour is highly
dependent and characterised by frequency of speed changes,
frequency of turns, departing and arriving at the same port
or anchorage and voyage duration. Detecting changes and
frequency of speed as well as departing and arriving at the
same port will help distinguishing SAR trajectories from
typical voyages (i.e., travelling from one port to another).</p>
      <p>However, there are also other types of ships that may follow
similar patterns. For instance, inland vessels tend to have
frequent changes in course over ground and heading due to
the voyage area topography. Another example are tugboats
that maneuver other vessels by pushing or towing them.
Such vessels typically operate in crowded port or narrow
canals and perform various maneuvers leading to increased
frequency of turns. Furthermore, tugs typically have the
same departure and arrival port as they are called to leave a
port (i.e., depart), reach the vessel to be towed (or pushed)
and return to the same port. One of the distinguishing factors
between such vessels and SAR is the voyage duration. In
many SAR operations, once vessels recover migrants from
sea, they return to the same port from which they departed
so as to disembark rescued people and return back to the
SAR operation area. Furthermore, SAR vessels patrolling
tend to have a steady course, while when they are engaged
in rescuing operation they perform complex maneuvers to
collect migrants. In some cases, it has been observed that
vessels patrolling an area, may be at open sea (i.e., outside
of port boundaries) for several days (or even weeks)
traveling in a rather small bounding box (compared to the
overall time of their voyage).</p>
      <p>Based on those characteristics, we produced some
additional attributes that have been considered as possible
features for the classification process. For each voyage we
have ordered the AIS messages received chronologically and
we calculated COG, SOG and Heading deltas for each pair
of (chronologically) consecutive messages. Negative values
in the COG delta feature indicate moving to the left, while
positive values indicate moving to the right. Similarly,
negative values in the SOG delta feature indicate speed
decrease, while positive values indicate speed increase.
Finally, negative values for the Heading delta imply a turn
of ship’s heading to the left, while positive values indicate a
turn to the right. In our analysis we use the absolute values
of COG, SOG and Heading deltas, which capture the
magnitude of change of the corresponding attributes. In
addition, two extra features have been added to the dataset.
The first one is a Boolean value indicating whether the vessel
has the same departure and arrival port has been added to the
dataset, while the latter one is the voyage duration.</p>
      <p>After constructing these last features, we were able
to measure the quantiles for the COG, SOG and Heading
deltas and it has been observed that SAR operation voyages
have different behavior compared to other voyages. More
specifically, non-SAR voyages seem to have low values
even for large quantiles (i.e., 75%, 80%, 85% etc.) compared
to the SAR voyages, meaning that in most observations the
COG, SOG and Heading deltas are typically small, while for
SAR voyages those quantiles had large values. Thus, we
added to our dataset the 50%, 75%, 85% and 95% quantiles
for each of those voyages.</p>
    </sec>
    <sec id="sec-8">
      <title>4. RESULTS AND DISCUSSION</title>
      <p>The focus of this work is on exploiting large volumes of
historical AIS data so as to identify SAR operations from
trajectories in a scalable data-driven and supervised way.
Our approach is based on a parallelised, non-parametric
statistical method, the Random Forests. To evaluate the
approaches’ performance, we conducted a series of
experiments that showcase its effectiveness to unseen
realworld data. Firstly, we applied a multiple fold cross
validation procedure and measured the F1 score. This score
given by the Equation (1) below is the weighted average of
Precision and Recall taking both false positives and false
negatives into account. Then, using the best model derived
through the cross-validation procedure the algorithm
classified the test data.</p>
      <p>F1 Score = 2 ∗</p>
      <p>Recall ∗ Precision

+ 
(1)</p>
    </sec>
    <sec id="sec-9">
      <title>4.1 Random Forest training and validation</title>
      <p>The training dataset described in subsection 3.2 has been
used to train and validate the Random Forest model using
the features analysed in subsections 3.2 and 3.3. The dataset
has been repeatedly partitioned, following the well-known
k-fold cross-validation procedure, into training and
validation pairs. The partitioning process has been repeated
5 times (i.e. 5-fold cross validation) each time leading to
different training and validation pairs. In each partition we
have split the dataset into five parts. Four of them used as
training set and one of them as validation set with the former
set utilised to create the model of the Random Forest and the
latter one used for predicting the class of the observations
and comparing it against its actual value. Each Random
Forest model derived has 10.000 trees and the F1 metric has
been measured, leading to an average score of 0.946 for all
the 5 folds. The best model derived from the cross-validation
process has been retained and used for predicting the values
of the test set. Finally, it should be noted that, although
classification has not been applied afore for SAR missions,
the Random Forest algorithm shows similar performance
compared to other classification schemes used for
identifying other types of vessels’ motion patterns such as
fishing [8][21][24].</p>
    </sec>
    <sec id="sec-10">
      <title>4.2 Random Forest prediction model evaluation</title>
      <p>The best model obtained through the 5-fold cross validation
process has been used for predicting the labels of the test
dataset. To evaluate the performance of the model against
first seen data, we measured the F1 score, the Accuracy, the
weighted Recall and the weighted Precision presented in
Table 1 below. Accuracy is the most intuitive performance
measure giving the ratio of correctly predicted observation
to the total observations. Precision is the ratio of correctly
predicted positive observations to the total predicted positive
observations. High Precision relates to the low false positive
rate. Finally, Recall is the ratio of correctly predicted
positive observations to the all observations in actual class.</p>
      <p>Table 1: Prediction model metrics scores
Metric Value
F1 score 0.986
Accuracy 0.975
Weighted Recall 0.975</p>
      <p>Weighted Precision 0.998</p>
      <sec id="sec-10-1">
        <title>The results, show high scores in all the metrics.</title>
        <p>This occurs due to the highly imbalanced test dataset. More
specifically it shows that the model can distinguish non-SAR
voyages and classify them as such. The ROC curve and the
Area Under ROC curve shown in Figure 3 below indicate
also the capabilities of the derived model to classify SAR
and non-SAR voyages, as the area under ROC is equal to
0.86.</p>
        <p>However, since the test dataset is imbalanced, and
in order to further investigate how well the algorithm
identified SAR voyages we have measured the
misclassification rate for each vessel type. Particularly the
prediction accuracy of each vessel type class has been
derived and Table 2 below includes the top-5 (i.e., with most
misclassification) vessel types (i.e. false positives) and the
misclassification of SAR voyages (i.e. false negatives). The
results show that the classification model labelled accurately
77,5% of the SAR voyages.</p>
        <p>Though, the misclassification rate of the non-SAR
voyages presented above is high, these classes represent a
small portion of the overall test dataset, with only a few tens
or hundred voyages. On the other hand, the classification
algorithm achieved remarkable accuracy rate reaching up to
99.7% in classes with more voyages in the test set. Table 3
below includes the five vessel types with the most voyages
in the test set and the misclassification rate for those vessel
types.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>5. CONCLUSION AND FUTURE WORK</title>
      <p>This work focused on the task of automatically detecting
SAR vessels from maritime trajectory data. Specifically, we
leveraged a large volume of historical AIS data and
described our approach which is based on Random Forests,
a parallelized nonparametric statistical method, with no
reliance on external sources of information (e.g. coast guard
reports), so as to detect vessels performing SAR operations
in the Mediterranean Sea. The task was formulated as given
a set of ship trajectories T, predict a label   for each
trajectory   where   ∈ {SAR, Non-SAR}. Our proposed
approach proved capable of classifying SAR trajectories at
an accuracy higher than 77%. To the best of the authors
knowledge, no previous work has considered deriving SAR
activity from AIS data in a data driven approach. In the
future, we will attempt to reformulate the problem towards a
point based approach classification, such that given a ship
trajectory T, predict a label   for each data point   where  
∈{SAR, NotSAR}. Based on these labeled points, SAR time
per area can possibly be calculated on any given scale.</p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGEMENT</title>
      <p>This project has received funding from the European
Union’s Horizon 2020 research and innovation programme
under grant agreement No 732310 and by Microsoft
Research through a Microsoft Azure for Research Award.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>A Parallel</given-names>
            <surname>Random</surname>
          </string-name>
          <article-title>Forest Algorithm for Big Data in a Spark Cloud Computing Environment -</article-title>
          IEEE Journals &amp; Magazine: http://ieeexplore.ieee.org/document/7557062/. Accessed:
          <fpage>2017</fpage>
          -11- 30.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2001</year>
          .
          <string-name>
            <given-names>Random</given-names>
            <surname>Forests</surname>
          </string-name>
          .
          <source>Machine Learning</source>
          .
          <volume>45</volume>
          ,
          <issue>1</issue>
          (Oct.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <year>2001</year>
          ),
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          . DOI:https://doi.org/10.1023/A:
          <fpage>1010933404324</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.
          <year>1996</year>
          .
          <article-title>A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise</article-title>
          .
          <source>Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (Portland, Oregon</source>
          ,
          <year>1996</year>
          ),
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>FRONTEX Risk Analysis</surname>
          </string-name>
          for
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Galdorisi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Goshorn</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>Maritime Domain Awareness: The Key to Maritime Security Operational Challenges</article-title>
          and
          <string-name>
            <given-names>Technical</given-names>
            <surname>Solutions</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Giannis</given-names>
            <surname>Spiliopoulos</surname>
          </string-name>
          et al.
          <year>2017</year>
          .
          <article-title>A big data driven approach to extracting global trade patterns</article-title>
          .
          <source>(Sep</source>
          .
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          et al.
          <year>2016</year>
          .
          <article-title>Fishing Activity Detection from AIS Data Using Autoencoders</article-title>
          .
          <fpage>33</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.-G.</given-names>
          </string-name>
          et al.
          <year>2008</year>
          .
          <article-title>TraClass: Trajectory Classification Using Hierarchical Region-based and Trajectory-based Clustering</article-title>
          .
          <source>Proc.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>VLDB Endow. 1</source>
          ,
          <issue>1</issue>
          (Aug.
          <year>2008</year>
          ),
          <fpage>1081</fpage>
          -
          <lpage>1094</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>DOI:https://doi.org/10.14778/1453856.1453972.</mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.
          <year>2013</year>
          .
          <article-title>Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar</article-title>
          .
          <source>Sensors and Actuators B: Chemical.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          177,
          <string-name>
            <surname>Supplement</surname>
            <given-names>C</given-names>
          </string-name>
          (
          <year>Feb</year>
          .
          <year>2013</year>
          ),
          <fpage>970</fpage>
          -
          <lpage>980</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          DOI:https://doi.org/10.1016/j.snb.
          <year>2012</year>
          .
          <volume>11</volume>
          .071.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Mazzarella</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          et al.
          <year>2014</year>
          .
          <article-title>Discovering vessel activities at sea using AIS data: Mapping of fishing footprints</article-title>
          .
          <source>17th International Conference on Information Fusion (FUSION)</source>
          (
          <year>Jul</year>
          .
          <year>2014</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Millefiori</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          et al.
          <year>2016</year>
          .
          <article-title>A distributed approach to estimating sea port operational regions from lots of AIS data</article-title>
          . (Washington D.C., USA,
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Natale</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          et al.
          <year>2015</year>
          .
          <article-title>Mapping Fishing Effort through AIS Data</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>PLOS ONE</source>
          .
          <volume>10</volume>
          ,
          <issue>6</issue>
          (
          <year>2015</year>
          ),
          <year>e0130746</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          DOI:https://doi.org/10.1371/journal.pone.
          <volume>0130746</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>OBDAIR: Ontology-Based Distributed Framework for Accessing, Integrating and Reasoning with Data in Disparate Data Sources (PDF Download Available</article-title>
          ): [14] https://www.researchgate.net/publication/319280828_OBDAIR_O ntologyBased_
          <article-title>Distributed_Framework_for_Accessing_Integrating_and_ Reasoning_with_Data_in_Disparate_Data_Sources</article-title>
          . Accessed:
          <fpage>2018</fpage>
          -02-02.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Pallotta</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          et al.
          <year>2013</year>
          .
          <article-title>Traffic knowledge discovery from AIS data</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>Proceedings of the 16th International Conference on Information Fusion (Jul</source>
          .
          <year>2013</year>
          ),
          <fpage>1996</fpage>
          -
          <lpage>2003</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Pallotta</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          et al.
          <year>2013</year>
          .
          <article-title>Vessel Pattern Knowledge Discovery from AIS Data: A Framework for Anomaly Detection</article-title>
          and
          <string-name>
            <given-names>Route</given-names>
            <surname>Prediction</surname>
          </string-name>
          .
          <source>Entropy</source>
          .
          <volume>15</volume>
          ,
          <issue>6</issue>
          (Jun.
          <year>2013</year>
          ),
          <fpage>2218</fpage>
          -
          <lpage>2245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>DOI:https://doi.org/10.3390/e15062218.</mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Patroumpas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          et al.
          <year>2017</year>
          .
          <article-title>Online event recognition from moving vessel trajectories</article-title>
          .
          <source>GeoInformatica</source>
          .
          <volume>21</volume>
          ,
          <issue>2</issue>
          (Apr.
          <year>2017</year>
          ),
          <fpage>389</fpage>
          -
          <lpage>427</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>DOI:https://doi.org/10.1007/s10707-016-0266-x.</mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Poļevskis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al.
          <year>2012</year>
          .
          <article-title>Methods for Processing and Interpretation of AIS Signals Corrupted by Noise and Packet Collisions</article-title>
          .
          <source>Latvian Journal of Physics and Technical Sciences. 49</source>
          ,
          <string-name>
            <surname>(Jan</surname>
          </string-name>
          .
          <year>2012</year>
          ),
          <fpage>25</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>DOI:https://doi.org/10.2478/v10047-012-0015-3.</mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Random</given-names>
            <surname>Forests for Big</surname>
          </string-name>
          Data - ScienceDirect: http://www.sciencedirect.com/science/article/pii/S2214579616301 939. Accessed:
          <fpage>2017</fpage>
          -11-30.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Rocha</surname>
            ,
            <given-names>J.A.M.R.</given-names>
          </string-name>
          et al.
          <year>2010</year>
          .
          <article-title>DB-SMoT: A direction-based spatiotemporal clustering method</article-title>
          .
          <source>2010 5th IEEE International Conference Intelligent Systems (Jul</source>
          .
          <year>2010</year>
          ),
          <fpage>114</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Souza</surname>
          </string-name>
          , E.N. de et al.
          <year>2016</year>
          .
          <article-title>Improving Fishing Pattern Detection from Satellite AIS Using Data Mining and Machine Learning</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>PLOS ONE</source>
          .
          <volume>11</volume>
          ,
          <issue>7</issue>
          (
          <year>2016</year>
          ),
          <year>e0158248</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          DOI:https://doi.org/10.1371/journal.pone.
          <volume>0158248</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Strobl</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          et al.
          <year>2009</year>
          .
          <article-title>An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests</article-title>
          .
          <source>Psychological Methods</source>
          .
          <volume>14</volume>
          ,
          <issue>4</issue>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>de Vries</surname>
            , G.K.D. and van Someren,
            <given-names>M.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Machine learning for vessel trajectories using compression, alignments and domain knowledge</article-title>
          .
          <source>Expert Systems with Applications</source>
          .
          <volume>39</volume>
          ,
          <issue>18</issue>
          (Dec.
          <year>2012</year>
          ),
          <fpage>13426</fpage>
          -
          <lpage>13439</lpage>
          . DOI:https://doi.org/10.1016/j.eswa.
          <year>2012</year>
          .
          <volume>05</volume>
          .060.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.
          <year>2012</year>
          .
          <article-title>Collision and Detection Performance with Three Overlap Signal Collisions in Space-Based AIS Reception</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <source>2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (Jun</source>
          .
          <year>2012</year>
          ),
          <fpage>1641</fpage>
          -
          <lpage>1648</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <given-names>2001. ITU</given-names>
            <surname>Recommendation</surname>
          </string-name>
          1371-4, “
          <article-title>Technical characteristics for an Automatic Identification System using time-division multiple access in the VHF maritime mobile band</article-title>
          .
          <source>” Tech. Rep.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>