<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Online trajectory analysis with scalable event recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emmanouil Ntoulias</string-name>
          <email>manosntoulias@iit.demokritos.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Artikis</string-name>
          <email>a.artikis@unipi.gr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elias Alevizos</string-name>
          <email>ilalev@di.uoa.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Athanasios Koumparos</string-name>
          <email>athanasios.koumparos@vodafoneinnovus.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>NCSR “Demokritos”</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National and Kapodistrian University of Athens</institution>
          ,
          <addr-line>NCSR “Demokritos”</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Piraeus</institution>
          ,
          <addr-line>NCSR “Demokritos”</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vodafone Innovus</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Moving object monitoring is becoming essential for companies and organizations that need to manage thousands or even millions of commercial vehicles or vessels, detect dangerous situations (e.g., collisions or malfunctions) and optimize their behavior. It is a task that must be executed in real-time, reporting any such situations or opportunities as soon as they appear. Given the growing sizes of fleets worldwide, a monitoring system must be highly eficient and scalable. It is becoming an increasingly common requirement that such monitoring systems should be able to automatically detect complex situations, possibly involving multiple moving objects and requiring extensive background knowledge. Building a monitoring system that is both expressive and scalable is a significant challenge. Typically, the more expressive a system is, the less flexible it becomes in terms of its parallelization potential. We present a system that strikes a balance between expressiveness and scalability. Our proposed system employs a formalism that allows analysts to define complex patterns in a user-friendly manner while maintaining unambiguous semantics and avoiding ad hoc constructs. At the same time, depending on the problem at hand, it can employ diferent parallelization strategies in order to address the issue of scalability. Our experimental results show that our system can detect complex patterns over moving entities with minimal latency, even when the load on our system surpasses what is to be realistically expected in real-world scenarios.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Commercial vehicle fleets constitute a major part of Europe’s
economy. There were approximately 37 million commercial
vehicles in the European Union in 20151 and this number is growing
every year with an increasing rate. Devices emitting spatial and
operational information are installed on commercial vehicles.
This information helps fleet management applications improve
the management and planning of transportation services [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
      </p>
      <p>
        Consider another case of monitoring of moving objects, equally
important from an economic and environmental point of view:
maritime monitoring systems. Such systems have been attracting
considerable attention for economic as well as environmental
reasons [
        <xref ref-type="bibr" rid="ref24 ref29 ref30">24, 29, 30</xref>
        ]. The Automatic Identification System (AIS) 2
is used to track vessels at sea in real-time through data exchange
1http://www.acea.be/statistics/article/vehicles-in-use-europe-2017
2http://www.imo.org/OurWork/Safety/Navigation/Pages/AIS.aspx
with other ships nearby, coastal stations, or even satellites. Cargo
ships of at least 300 gross tonnage and all passenger ships,
regardless of size, are nowadays required to have AIS equipment
installed and regularly emit AIS messages while sailing at sea.
Currently, there are more than 500.000 vessels worldwide that
can be tracked using AIS technology3. It is crucial, both for
authorities and for maritime companies, to be able to track the
behavior of ships at sea in order to avoid accidents and ensure
that ships adhere to international regulations.
      </p>
      <p>
        Streams of transient data emitted from vehicles or ships must
be processed with minimal latency, if a monitoring system is to
provide significant margins for action in case of critical situations.
We therefore need to detect complex patterns of interest upon
these streams in an online and highly eficient manner that can
gracefully scale as the number of monitored entities increases.
Besides kinematic data, it is also important to be able to take
into account static (or “almost” static, with respect to the rate
of the streaming data), background knowledge, such as weather
data, point of interest (POI) information (like gas stations, ports,
parking lots, police departments, etc. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], NATURA areas where
ships are not allowed to sail, etc. This enhanced data stream
produces valuable opportunities for the detection of complex
events. One can identify certain routes a vehicle or a ship is
taking, malfunctions in the device installed, in the GPS tracker or
the AIS transponder, cases of illegal shipping in protected areas
or possible collisions between ships moving dangerously close
to each other, to name but a few of the possible patterns which
could be of interest to analysts.
      </p>
      <p>As a solution to the problem of monitoring of moving objects,
we present a Complex Event Processing system that aims to
improve the operating eficiency of a commercial fleet. It
operates online with enriched data in a streaming environment. Our
contributions are the following:
• We present a Complex Event Processing (CEP) system
based on symbolic automata which allows analysts to
define complex patterns in a user-friendly language. Our
proposed language has formal semantics, while being able
to also take into account background knowledge.
• We define a series of realistic complex patterns that
identify routes and malfunctions of vehicles and detect critical
situations for vessels at sea.
• We present and compare various implementations of
parallel processing techniques and discuss their applicability.
• We test our approach using large, real-world,
heterogeneous data streams from diverse application domains,
showing that we can achieve real-time performance even
in cases of significantly increased load, beyond the current
demand levels.
not always obvious how an input stream should be partitioned,
while avoiding data replication.</p>
      <p>The remainder of this paper is organized as follows. Section 2
discusses related work, while Section 3 describes our CEP engine.
In Section 4 the distributed version of our engine is presented.
Section 5 summarizes the datasets of vehicle and vessel traces
and defines the recognition patterns. It also presents our
empirical evaluation. Finally, we conclude the paper in section 6 and
describe our future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Complex event recognition systems accept as input a stream of
time-stamped, “simple, derived events” (SDEs). These SDEs are
the result of applying a simple transformation to some other
event (e.g., a measurement from a sensor). By processing them, a
CEP engine can recognize complex events (CEs), i.e. collections
of SDEs satisfying some pattern. There are multiple CEP systems
proposed in the literature during the last 15 years, falling under
various classes [
        <xref ref-type="bibr" rid="ref16 ref19">16, 19</xref>
        ]. Automata-based systems constitute the
most common category. They compile patterns (definitions of
complex events) into finite state automata, which are then used to
consume streams of simple events and report matches whenever
an automaton reaches a final state. Examples of such systems
may be found in [
        <xref ref-type="bibr" rid="ref11 ref18 ref28 ref32 ref5">5, 11, 18, 28, 32</xref>
        ]. Another important class of
CEP systems are the logic-based ones. In this case, patterns are
defined as rules, with a head and a body defining the conditions
which, if satisfied, lead to the detection of a CE. A typical example
of a logic-based system may be found in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Finally, there are
some tree-based systems, such as [
        <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
        ], which are attractive
because they are amenable to various optimization techniques.
      </p>
      <p>
        For eficient processing on big data streams, distributed
architectures need to be employed [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Big data platforms, such as
Apache Spark and Storm, have been used to embed CEP engines
into their operators. Both platforms have incorporated Sidhi [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ]
and Esper [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] as their embedded engines. Flink [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], on the other
hand, provides support for CEP with the FlinkCEP built-in library
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Besides using these Big Data platforms, numerous other
parallelization techniques have been proposed in the literature that
can achieve a more fine-grained control over how the processing
load is distributed among workers. Pattern-based parallelization
is the most obvious solution, where the patterns are distributed
among the processing units [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. One disadvantage of this
parallelization scheme is that events have to be replicated to multiple
processing units, since a new input event may need to be
consumed by more than one pattern. Moreover, the parallelization
level is necessarily limited by the number of patterns (for a single
pattern, this method ofers no benefits). Operator-based
parallelization constitutes another approach, where the CEP operators
are assigned to diferent processing units [
        <xref ref-type="bibr" rid="ref13 ref28">13, 28</xref>
        ]. This allows for
multi-pattern optimizations and avoids the data replication issue
of the previous technique. On the other hand, the
parallelization level is again limited, this time by the number of operators
present in the pattern (which is closely related to the number
of automaton states in automata-based CEP systems). Finally,
in data-parallelization schemes, events are split among multiple
instances of the same pattern [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. For example, a pattern trying
to detect violations of speed limits must be applied to all the
monitored vehicles and thus the input stream may be partitioned
according to the id of the vehicles. The advantage of this method
is that it can scale well with the input event rate. It is, however,
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>AUTOMATA-BASED EVENT</title>
    </sec>
    <sec id="sec-4">
      <title>RECOGNITION</title>
      <p>
        We begin by first presenting our framework for CEP. It is based
on Wayeb, a Complex Event Processing and Forecasting engine
which employs symbolic automata as its computational model
[
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. The rationale behind our choice of Wayeb is that,
contrary to other automata-based CEP engines, it has clear,
compositional semantics due to the fact that symbolic automata have
nice closure properties [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. At the same time, it is expressive
enough to support most of the common CEP operators [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], while
remaining amenable to the standard parallelization solutions. In
this paper, we extend Wayeb’s language in order to support more
expressive patterns.
      </p>
      <p>
        Symbolic automata constitute a variation of classical automata,
with the main diference being that their transitions, instead of
being labeled with a symbol from an alphabet, are equipped with
formulas from Boolean algebra [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. A symbolic automaton
consumes strings and, after every new element, applies the predicates
of its current state’s outgoing transitions to that element. If a
predicate evaluates to TRUE then the corresponding transition is
triggered and the automaton moves to that transition’s target
state. A Boolean algebra is defined as follows:
      </p>
      <p>
        Definition 3.1 (Efective Boolean algebra [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]). A Boolean
algebra is a tuple (D, Ψ, ⟦_⟧, ⊥, ⊤, ∨, ∧, ¬) where D is a set
of domain elements; Ψ is a set of predicates closed under the
Boolean connectives; ⊥, ⊤ ∈ Ψ; the component ⟦_⟧ : Ψ → 2D
is a denotation function such that ⟦⊥⟧ = ∅, ⟦⊤⟧ = D and
∀ϕ, ψ ∈ Ψ: a) ⟦ϕ ∨ ψ ⟧ = ⟦ϕ⟧ ∪ ⟦ψ ⟧; b) ⟦ϕ ∧ ψ ⟧ = ⟦ϕ⟧ ∩ ⟦ψ ⟧; and
c) ⟦¬ϕ⟧ = D \ ⟦ϕ⟧.
      </p>
      <p>Elements of D are called characters and finite sequences of
characters are called strings. A set of strings L constructed from
elements of D (L ⊆ D∗, where ∗ denotes Kleene-star) is called
a language over D.</p>
      <p>
        Wayeb uses symbolic regular expressions to define patterns
and to represent a class of languages over D. Wayeb’s standard
operators are those of the classical regular expressions, i.e.,
concatenation, disjunction and Kleene-star. We extend Wayeb to
include various extra CEP operators: that of negation and those
of diferent selection policies (see [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] for a discussion of selection
policies). Symbolic regular expressions are defined as follows:
      </p>
      <p>Definition 3.2 (Symbolic regular expression). A Wayeb symbolic
regular expression (SRE) over a Boolean algebra (D, Ψ, ⟦_⟧, ⊥,
⊤, ∨, ∧, ¬) is recursively defined as follows:
• If ψ ∈ Ψ, then R := ψ is a symbolic regular expression,
with L(ψ ) = ⟦ψ ⟧, i.e., the language of ψ is the subset of
D for which ψ evaluates to TRUE;
• Disjunction / Union: If R1 and R2 are symbolic regular
expressions, then R := R1 + R2 is also a symbolic regular
expression, with L(R) = L(R1) ∪ L(R2);
• Concatenation / Sequence: If R1 and R2 are symbolic
regular expressions, then R := R1 · R2 is also a symbolic
regular expression, with L(R) = L(R1) · L(R2), where ·
denotes concatenation. L(R) is then the set of all strings
constructed from concatenating each element of L(R1)
with each element of L(R2);
• Iteration / Kleene-star: If R is a symbolic regular
expression, then R′ := R∗ is a symbolic regular expression, with
L(R∗) = (L(R))∗, where L∗ = Ð Li and Li is the
coni ≥0
catenation of L with itself i times.
• Bounded iteration: If R is a symbolic regular expression,
then R′ := Rx + is a symbolic regular expression, with
x t imes</p>
      <p>Rx + = R · · · · · R · R∗.
• Negation / complement: If R is a symbolic regular
expression, then R′ := !R is a symbolic regular expression, with
L(R′) = (L(R))c .
• skip-till-any-match selection policy: If R1, R2, · · · , Rn are
symbolic regular expressions, then R′ := #(R1, R2, · · · , Rn ) is
a symbolic regular expression, with R′ := R1 · ⊤∗ · R2 ·
⊤∗ · · · ⊤∗ · Rn .
• skip-till-next-match selection policy: If R1, R2, · · · , Rn are
symbolic regular expressions, then R′ := @(R1, R2, · · · , Rn ) is
a symbolic regular expression, with R′ := R1·!(⊤∗ · R2 ·
⊤∗) · R2 · · ·!(⊤∗ · Rn · ⊤∗) · Rn .</p>
      <p>
        A Wayeb expression without a selection policy implicitly
follows the strict-contiguity policy, i.e., the SDEs involved in a match
of a pattern should occur contiguously in the input stream. The
other two selection policies relax the strict requirement of
contiguity (see [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] for details). Note that all these operators, even
those of selection policies, may be arbitrarily used and nested
in an expression, without any limitations. This is in contrast to
other CEP systems where nested operations may be prohibited.
      </p>
      <p>Wayeb patterns are defined as symbolic regular expressions
which are subsequently compiled into symbolic automata. The
definition for a symbolic automaton is the following:</p>
      <p>
        Definition 3.3 (Symbolic finite automaton [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]). A symbolic
ifnite automaton ( SFA) is a tuple M =(A, Q, qs , F , ∆ ), where A is
an efective Boolean algebra; Q is a finite set of states; qs ∈ Q is
the initial state; Q f ⊆ Q is the set of final states; ∆ ⊆ Q × ΨA × Q
is a finite set of transitions.
      </p>
      <p>
        A string w = a1a2 · · · ak is accepted by a SFA M if, for 1 ≤
i ≤ k, there exist transitions qi−1 →ai qi such that q0 = qs and
qk ∈ Q f . The set of strings accepted by M is the language of M,
denoted by L(M). It can be proven that every symbolic regular
expression can be translated to an equivalent (i.e., with the same
language) symbolic automaton [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>We are now in a position to precisely define the meaning of
“complex events”. Input events come in the form of tuples with
both numerical and categorical values. These tuples constitute the
set of domain elements D. A stream S is an infinite sequence S =
t1, t2, · · · , where each ti is a tuple (ti ∈ D). Our goal is to report
the indices i at which a CE is detected. If S1..k = · · · , tk−1, tk is
the prefix of S up to the index k, we say that an instance of a SRE
R is detected at k if there exists a sufix Sm..k of S1..k such that
Sm..k ∈ L(R). If we attempted to detect CEs, as defined above, by
directly compiling an expression R to an automaton, we would
fail. Consider, for example, the (classical) regular expression R :=
a · b and the (classical) stream/string S = a, b, c, a, b, c. If we
compile R to a (classical) automaton and feed S to it, then the
automaton would reach its final state after reading the second
element t2 of the string. However, it would then never reach its
ifnal state again. We would like our automaton to reach its final
state every time it encounters a, b as a sufix, e.g., again after
reading t5 of S. We can achieve this with a simple trick. Instead of
using R, we first convert it to Rs = ⊤∗ · R. Using Rs we can detect
CEs of R while consuming a stream S, since a stream segment
Sm..k is recognized by R if the prefix S1..k is recognized by Rs .</p>
      <p>The prefix ⊤∗ lets us skip any number of events from the stream
and start recognition at any index m, 1 ≤ m ≤ k.</p>
      <p>As an example, consider the domain of vehicle monitoring.
An analyst could use the Wayeb language to define the pattern
R := (speed &gt; 100) · (speed &gt; 100) in order to detect speed
violations on roads where the maximum allowed speed is 100 km/h.
This pattern detects two consecutive events where the speed
exceeds the threshold in order to avoid cases where a vehicle
momentarily exceeds the threshold, possibly due to some
measurement error. This pattern would be compiled to the
(nondeterministic) automaton of Figure 1. Table 1 shows an example
stream processed by this automaton. For the first three input
events, the automaton would remain in its start state, state 0.
After the fourth event, it would move to state 1 and after the fifth
event it would reach its final state, state 2. We would thus say
that a complex event R was detected at timestamp = 5.</p>
    </sec>
    <sec id="sec-5">
      <title>4 SCALABLE EVENT RECOGNITION OVER</title>
    </sec>
    <sec id="sec-6">
      <title>MULTIPLE TRAJECTORIES</title>
      <p>
        We now discuss how various parallelization schemes may be
applied to our CEP engine. For this purpose, we leverage a popular
streaming platform, Apache Flink [
        <xref ref-type="bibr" rid="ref1 ref14">1, 14</xref>
        ]. Flink is a distributed
processing engine for stateful computations over unbounded and
bounded data streams. It is designed to run in cluster
environments and perform computations at in-memory speed. In this
paper, we focus on two parallelization techniques: pattern-based
and partition-based parallelization [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We currently exclude
state-based parallelization, since, as explained above, its
parallelization level is limited by the number of automaton states,
which is typically quite low (it is often a single-digit number).
We do intend, however, to examine in the future how it could
be combined with pattern- or partition-based parallelization to
provide them with an extra performance boost.
      </p>
      <p>In pattern-based parallelization, each available CEP engine
receives a unique subset of the patterns and performs recognition
for these patterns only, with these subsets being (almost) equal
in size (see Figure 2a). On the other hand, the stream is broadcast
to all parallel instances of any downstream operators. This is a
significant (yet unavoidable) drawback of pattern-based
parallelization, since each worker has a subset of the patterns while
each pattern may need to process the whole stream. Note that
each blue rectangle in Figure 2a represents a single thread. This
means that in this architecture we have one thread for the source
plus as many threads as the parallelism of the CEP operator.</p>
      <p>In partition-based parallelization the opposite happens. Every
CEP engine is initialized with all patterns, but the stream is not
broadcast (see Figure 2b). A partitioning function is used to decide
where each new input event should be forwarded. This function
takes as input any attribute of the event (we use the id of a vehicle
or vessel) and, by performing hashing, it outputs which parallel
instance of the next operator the event will go to. As with
patternbased parallelization, we have one thread for the source plus as
many threads as the parallelism of the CEP operator.</p>
      <p>
        Besides Flink, we also use the Apache Kafka messaging
platform to connect our stream sources to Wayeb instances [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Kafka
provides various ways to consume streams. So far, we have
focused on linear streams, i.e., events are assumed to be totally
ordered and arrive at our system sequentially one after another.
With Kafka, however, there is the option of using parallel input
streams. A Kafka input topic can have multiple partitions and
each partition can be consumed in parallel by a diferent
consumer. In this case the input stream is already partitioned on
some attribute of the events.
      </p>
      <p>
        Through this Kafka functionality, a variant of partition-based
parallelization becomes possible, where both the input source
and the recognition engine work in parallel (see Figure 2c). If
the parallelism of the input source (i.e., the number of partitions
of the topic is the same as that of the recognition operator (i.e.,
number of CEP engines), then we can simply attach each source
instance to a CEP engine instance without further re-partitioning
on our end. When, however, the parallelisms are diferent,
further re-partitioning is performed by partitioning each source the
same way we partitioned the single threaded source. Unlike the
previous architectures, each pair of a source and a CEP operator
parallel instances belong in a single thread. This is attributed to
operator chaining [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a Flink mechanism that chains operators
of the same parallelism in a single thread for better performance.
      </p>
      <p>Similarly to partition, we can have multiple sources for pattern
based parallelism as well. Messages in Kafka are still partitioned
by a desired attribute, albeit we always have to perform the
broadcasting step for each source. Hence, we don’t have a variant
of pattern parallelization, rather we use it only as a way to process
streams of greater input rate.</p>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENTAL EVALUATION</title>
      <p>We present an extensive experimental evaluation of our parallel
CEP engine on two datasets containing real-world trajectories
of moving objects. The first dataset comes from the domain of
lfeet management for vehicles moving on roads and emitting
information about their status. The second dataset consists of
vessel trajectories from ships sailing at sea. In both cases, our
goal is to simultaneously monitor thousands of moving objects
and detect interesting (or even critical) behavioral patterns in
real-time, as defined by domain experts. All experiments were
conducted on a server with 24 processors. Each processor is an
Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz. The server has 252
GB of RAM. The source code for Wayeb may be found in the
following repository: https://github.com/ElAlev/Wayeb.
5.1</p>
    </sec>
    <sec id="sec-8">
      <title>Fleet Management</title>
      <p>Eficient fleet management is essential for transportation and
logistics companies. We show how our proposed solution can
efectively help in this task. With the help of experts, we define a
set of patterns to be detected on real-time streams of trajectories
and show that our engine can detect these patterns with an
eficiency that is orders of magnitude better than real-time.</p>
      <p>
        5.1.1 Dataset Description. The dataset is provided by
Vodafone Innovus4, our partner in the Track &amp; Know project, which
ofers fleet management services. It contains approximately 270M
records (243GB). It covers a period of 5 months, from June 30,
2018 11:00:00 PM to November 30, 2018 11:59:59 PM. The initial
source emitting the data is composed of GPS (Global Positioning
System) traces of moving vehicles. The data also includes speed
information provided by an installed accelerometer and
information regarding the level of fuel in a vehicle’s tank measured by a
fuel sensor. It is also enriched with weather and point-of-interest
(POI) information (e.g., if a vehicle is close to a gas station, a
university, a school etc), as described in [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. Duration,
accelaration and distance are some extra attributes that are calculated
on the fly as they enter our system by storing information from
previous events.
      </p>
      <p>5.1.2 Patern Definitions. The first pattern we have defined
concerns vehicle routes. A route is the basic element of vehicle
management and aggregates data between the start and the end
point of a vehicle’s motion cycle. A motion cycle is based on
the engine status. Each vehicle route must start and end with
an “engine-of” message, i.e., a message whose engine status
attribute is “of”. According to Vodafone Innovus, there are 12
patterns that describe the most frequent routes. These 12 route
patterns can be expressed with a single Wayeb pattern as follows:</p>
      <p>Definition 5.1. A route pattern for a vehicle is defined as the
following sequence: emitting “engine-of” messages for at least
30 minutes, emitting at least one “moving” message and again
emitting “engine-of” messages for at least 30 minutes:
Route := (Engine = Of ∧ Duration &gt; 30)·
(Engine = Moving)+·
(Engine = Of ∧ Duration &gt; 30)</p>
      <p>Unfortunately, the expected data flow can be corrupted due to a
variety of reasons. These reasons include bad connection during
the device installation or after the vehicle has been serviced,
movement of the satellites, hardware malfunctions or, simply,
4https://www.vodafoneinnovus.com/
just an issue with the GPS. The result of these reasons is reflected
in the data. For example, coordinates may change even though
the vehicle is not moving or the vehicle may be moving but the
coordinates remain the same. It is also often the case that the
engine status is incorrect (e.g., parked messages are emitted even
though engine is on, vehicle is moving yet engine status is not
moving etc). These issues are important and need to be detected.
We have summarized those issues in a number of patterns, defined
as follows:</p>
      <p>Definition 5.2. ParkedMovingSwing . Engine status swings
between “parked” and “moving” during consecutive events.
ParkedMovingSwing := (Engine = Parked) · (Engine = Moving)·
(Engine = Parked) · (Engine = Moving)</p>
      <p>Definition 5.3. IdleParkedSwing . Engine status swings between
“idle” and “parked” during consecutive events.</p>
      <p>IdleParkedSwing := (Engine = Idle) · (Engine = Parked)·
(Engine = Idle) · (Engine = Parked)</p>
      <p>Definition 5.4. SpeedSwing . Speed swings between 0km/h and
greater than 50km/h during consecutive events.</p>
      <p>SpeedSwing := (Speed &gt; 50)·(Speed = 0)·(Speed &gt; 50)·(Speed = 0)</p>
      <p>Definition 5.5. MovingWithZeroSpeed . Engine status is
“moving”, distance traveled is greater than 30m, yet speed is 0km/h
for more than 3 consecutive messages.</p>
      <p>MWZS := (Engine = Moving ∧ Speed = 0 ∧ Distance &gt; 30)3+</p>
      <p>Definition 5.6. MovingWithBadSignal . Vehicle is accelerating
and distance traveled is greater than 30m yet there are no
satellites tracking the vehicle for more than 3 consecutive messages.
MWBS := (Accelaration ∧ Satellites = 0 ∧ Distance &gt; 30)3+</p>
      <p>In addition to the above issues, possibly related to
malfunctions, experts are also interested in the following patterns:</p>
      <p>Definition 5.7. Possible Theft . Engine status is parked, speed is
0km/h and distance traveled is greater than 30m for more than 3
consecutive messages.</p>
      <p>PossibleTheft
:= (Engine = Parked ∧</p>
      <p>Speed = 0 ∧ Distance &gt; 30)3+</p>
      <p>Definition 5.8. Dangerous Driving . There is ice on the road and
the vehicle is moving above a specific speed limit for at least 2
consecutive messages.</p>
      <p>DangerousDriving := (IceExists = TRUE ∧ Speed &gt; vlimit )2+
Definition 5.9. Refuel Opportunity . Vehicle is close to a gas
station and the fuel in the tank is less than 50% for at least 2
consecutive messages.</p>
      <p>RefuelOpp := (CloseToGasStation = TRUE ∧ FuelLevel &lt; 0.5)2+
(a) 16 patterns
(b) 48 patterns</p>
      <p>5.1.3 Recognition Results. Figure 3a showcases recognition
times for our various parallelization techniques: pattern-based,
partition-based and partition-based with one source per engine.
We also show results for the non-parallel version. We have
duplicated some of the patterns defined previously to simulate a
greater workload of 16 patterns. We have repeated the experiment
for 1, 2, 4 ,8 and 16 cores. Compared to the original single-core
version, all three parallelization techniques exhibit speed-ups.
For partition- and pattern-based parallelization, however, there
seems to be an upper limit on the number of cores it is most
eficient to use. For pattern-based parallelization there is a
significant raise in time after 4 cores, while for partition-based there is
no improvement after 2 cores. The reason is that the single source
acts as a bottleneck. For partition-based parallelization we have
one thread (the partitioner) deciding in which core each event
will be forwarded, while for pattern-based events are broadcast
to all cores, again in a single threaded manner. This explanation
is supported by the smooth decrease in time when a parallel
source is used for partition-based parallelization. While it starts
of worse than pattern- and partition-based, it exhibits the best
results for 16 cores.</p>
      <p>To further support our claim above, we conducted a second
experiment with a larger load, as shown in Figure 3b. 48
patterns were used this time (replicated in similar fashion as before)
without any other changes. Indeed, with every recognition node
having more work to do, execution time becomes less dependent
on partitioning/broadcasting and more dependent on the actual
recognition. Hence, speed-ups are now visible even for higher
number of cores. As suspected, for parallel sources the results are
not afected. Partition-based parallelization with parallel sources
is slower when few threads are used (e.g., for 2 threads). This is</p>
      <p>10
CER parallelism</p>
      <p>15
(a) Partition-based.</p>
      <p>3 4
CER parallelism
5</p>
      <p>6
(b) Pattern-based.
because the other two techniques have an extra thread handling
the whole stream source and the work is in fact split between
this one source thread and two others performing recognition.
We thus have 3 threads performing similar volumes of work.
When parallel sources are used, however, each parallel source is
chained to a Wayeb engine in a single thread and we thus have 2
threads doing more work. Each thread handles half the source
and performs recognition on half the stream.</p>
      <p>
        In order to evaluate recognition speed independently from
source speed we had to turn operator chaining of as we can’t
measure them separately when they belong in the same thread
(i.e., in the case of one source per CEP engine). In addition, we
leveraged parallel sources to achieve input streams of higher
input rate. The goal here is to determine if our system can process
input events faster than the source produces them. Flink ofers a
metric for this purpose, called backpressure [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Backpressure is
judged on the availability of output bufers. Assuming that some
task A sends events to some task B, if there is no output bufer
available for task A, we say that task B is back pressuring task
A. In our case A, is the source operator and B is the operator
with the CEP Engine. 100 samples (each sample checks if there is
any output bufer available) are triggered every 50ms in order to
measure backpressure. The resulting ratio notifies us how many
of these samples were indicating back pressure, e.g. 0.6 indicates
that 60 in 100 were stuck requesting bufers from the network
stack. According to the documentation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] a ratio between 0
and 0.1 is normal. 0.1 to 0.5 is considered to be low and anything
above 0.5 is high. Note that low and high pressure will slow down
the source to match the throughput of the pressuring operator.
      </p>
      <p>Results for Partition-based distribution are presented in
Figure 4a. The backpressure ratio is plotted against the number of
threads used for recognition. Generally, the more threads used
the faster Wayeb can process events and hence less pressure
is noted. Each dashed line represents a source with parallelism
varying from 1 to 6. The event rate of a parallel source is
measured by executing an experiment with 0% backpressure (i.e., it
will not slow down due to pressure) and summing the event rate
of each parallel instance presented by the flink dashboard (e.g.,
for 2 parallel instances of 120K events/second (e/s) for each one
the overall sum is 120 + 120 = 240K (e/s)). Although, the greater
the parallelism of the source the larger the event rate, it is not a
multiplier as for 1, 2, 3, 4, 5 and 6 sources the rate becomes 130K,
240K, 350K, 430K, 440K and 490K e/s respectively. The workload
in this experiment tries to emulate a real scenario and hence the
route and the 5 malfunction patterns are used (i.e., they are not
replicated). The results show that Wayeb can efectively process
streams of at least 490K e/s (black line) for this workload as 0
pressure is exhibited when 16 workers are used. As it was stated
before, the duration of the dataset is 5 months which translates
to roughly 13M seconds. Since the total number of input events is
270M, the average event input rate is 270M/13M ≈ 20 e/s.
Comparing the pattern throughput rate (490K e/s) with the event input
rate clearly exhibits a performance that is 4 orders of magnitude
better than the real-time requirements of this use case.</p>
      <p>We perform a similar experiment for pattern-based
parallelization. This time the number of threads used for recognition varies
between 1, 2, 3 and 6 as there are only 6 patterns - a handicap
of this technique discussed earlier. Figure 4b showcases the
results of our experiment. Unfortunately, even with 6 recognition
threads and a single threaded source (purple dashed line) there is
about 13% backpressure. Due to this, all sources are being slowed
down to 110K e/s regardless of their parallelism. There is a drop
in pressure the more recognition threads are used due to the
better distribution of the patterns. However, it is not significant
as the events are also multiplied as many times as the number of
these threads and add extra pressure. Eventually more space is
requested from the output network bufers of the source.
5.2</p>
    </sec>
    <sec id="sec-9">
      <title>Maritime Monitoring</title>
      <p>
        We now present experimental results on another real-world
dataset. This dataset contains trajectories of vessels sailing at
sea. We have defined a set of patterns that are similar to the
ones presented in [
        <xref ref-type="bibr" rid="ref24 ref26">24, 26</xref>
        ], which have been constructed with the
help of domain experts. We demonstrate the efectiveness of our
system which is capable of eficiently processing a dataset that
contains trajectories from ≈ 5K vessels and covers a period of 6
months in less than one hour.
      </p>
      <p>
        5.2.1 Dataset Description. A public dataset of 18M position
signals from 5K vessels sailing in the Atlantic Ocean around the
port of Brest, France, between October 1st 2015 and 31st March
2016 has been utilized [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. A derivative dataset has been released
in [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], containing a compressed version of the original dataset
(4.5M signals), as decribed in [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Each trajectory in this dataset
contains only the so-called critical points of the original
trajectory, i.e., points that indicate a significant change in a vessel’s
behavior (e.g., a change in speed or heading) and from which the
original trajectory can be faithfully reconstructed. We processed
these compressed trajectories in order to determine the proximity
of vessels to various areas and locations of interest, such as ports,
ifshing areas, protected NATURA areas, the coastline, etc.
5.2.2 Patern Definitions. We now present a detailed
description of the maritime patterns that we implemented, assuming
that the input events contain the information described above.
      </p>
      <p>Definition 5.10. High Speed Near Coast : Vessel is within 300
meters from the coast and is sailing with speed greater than 5
knots for at least one message.</p>
      <p>HSNC := (IsNear(Coast) = TRUE ∧ Speed &gt; 5)+</p>
      <p>Definition 5.11. Anchored : Vessel is inside an anchorage area
or near a port and is sailing with speed less than 0.5 knots for at
least three messages.</p>
      <p>Anchored := ((IsNear(Port) = TRUE ∨</p>
      <p>WithinArea(Anchorage) = TRUE) ∧
speed &lt; 0.5)3+</p>
      <p>Definition 5.12. Drifting : There is a diference between heading
and actual course over ground greater than 30 degrees while the
vessel is sailing with at least 0.5 knots for at least three messages.</p>
      <p>Drifting := (|Heading − Cog| &gt; 30 ∧ Speed &gt; 0.5)3+
Definition 5.13. Trawling : A vessel is inside a fishing area
sailing with speed between 1 and 9 knots for at least three messages.
In addition, it must be a fishing vessel.</p>
      <p>Trawling := (VesselType = Fishing ∧</p>
      <p>WithinArea(Fishing) = TRUE ∧
speed &gt; 1.0 ∧ speed &lt; 9.0)3+</p>
      <p>Definition 5.14. Search and Rescue : A SAR Vessel sails with
a speed of greater than 2.7 knots and constantly changes its
heading for at least three messages.</p>
      <p>SAR := (ChangeInHeading = TRUE ∧</p>
      <p>VesselType = SAR ∧ speed &gt; 2.7)3+</p>
      <p>Definition 5.15. Loitering : Vessel is neither near port nor the
coastline while it sails with speed below 0.5 knots.</p>
      <p>Loitering := (IsNear(Port) = FALSE ∧</p>
      <p>IsNear(Coast) = FALSE ∧
speed &lt; 0.5)3+
5.2.3 Recognition Results. We used the 6 patterns defined
above as the workload for a number of experiments. Following
a similar approach to our fleet management experiments, we
avoided replicating the patterns to emulate a real scenario and
evaluated them for streams of diferent event rates.</p>
      <p>Figure 5a presents results for the partition-based
parallelization scheme. This time the backpressure remains always above
40% for a 6-threaded input source, even with 16 recognition
threads. Due to the fact that the CEP operator cannot keep up
with the initial rate of the source, the source has to adjust its
rate to match the throughput of the CER operator. According
to the Flink dashboard, this rate is at most 180K e/s (with 16
worker threads). This lower throughput of our CEP operator
(compared with the fleet management use case) can be
attributed to the fact that the patterns are now more complex, as on
average they contain more unions, disjunctions and iterations to
evaluate for every event. The duration of the dataset is 6 months
which translates to roughly 15.5M seconds. Since the total
number of input events is about 4.5M, the average event input rate is
0.8
reu0.6
s
s
e
r
ckp0.4
a
B
0.2
0
0.8
e
rsu0.6
s
e
r
p
cak0.4
B
0.2</p>
      <p>0
1
0.8
e
r
ssu0.6
e
r
ckp0.4
a
B
0.2
0
0</p>
      <p>3 4
CER parallelism
5
6
(b) Pattern-based backpressure for a workload of 6 patterns.
Each dashed line represents a diferent parallelism of the
source stream. Event rate is capped at 90K e/s due to
backpressure.</p>
      <p>5</p>
      <p>10
CER parallelism
15
partition 1 hour
pattern 1 hour
partition 0.5 hour
pattern 0.5 hour
0
5</p>
      <p>10
CER parallelism
15
(a) Partition-based backpressure for a workload of 6 patterns.
Each dashed line represents a diferent parallelism of the
source stream. Event rate is capped at 180K e/s due to
backpressure.</p>
      <p>1
(c) Partition- and pattern-based parallelism for the dataset being
replayed in one 1 and 0.5 hours respectively. Workload is 220
patterns of vessels approaching 220 diferent ports.
15.5M/4.5M ≈ 3.5 e/s. Comparing the pattern throughput rate
with the event input rate showcases again a performance that
is 4 orders of magnitude better than the real-time requirements
of this use case. The results for pattern-based parallelism are
presented in Figure 5b and follow a similar trend. The event rate
here is capped at 90K e/s due to backpressure.</p>
      <p>We conducted a second series of experiments with a setting
where the number of patterns is naturally high, in order to
determine whether pattern-based parallelism ofers an advantage
in such settings. Consider the following pattern, describing the
movement of a vessel as it a approaches a port.</p>
      <p>Definition 5.16. Approaching Port : Vessel is initially more than
7 km away from the port, then, for at least on message, its
distance from the port is between 5 and 7 km and finally it enters
the port (i.e., its distance from the port falls below 5 km).</p>
      <p>Port := (DistanceToPort(PortX ) &gt; 7 .0) ∧</p>
      <p>DistanceToPort(PortX ) &lt; 10.0)·
(DistanceToPort(PortX ) &gt; 5.0) ∧
DistanceToPort(PortX ) &lt; 7 .0)+ ·
(DistanceToPort(PortX ) &lt; 5.0)</p>
      <p>The predicate DistanceToPort calculates the distance of a vessel
from the port PortX passed as argument and is evaluated online.
If we want to monitor vessel activity around every port in a given
area, then we need to replicate this pattern N times, if there are N
distinct ports. We would thus naturally have N patterns, which
would be almost identical except for the argument passed to
DistanceToPort (Port1, Port2 , up to PortN ). For the area of Brest,
the total number of ports is 220. We run an experiment with
these 220 patterns with partition- and then with pattern-based
parallelization. Figure 5c shows the results. Contrary to previous
experiments, in this one we used a stream simulator to feed the
dataset to our CEP system. This simulator, instead of reading
input events from a file and instantly sending them to our engine,
has the ability to insert a delay between consecutive events. For
example, we can set the delay to be exactly the time diference
between two events. This would allow us to re-play the stream
as it was actually produced, which would take 6 months for
this dataset. We also have the ability to re-play the stream at
higher speeds. For these experiments, we re-played the stream
at various diferent speeds in order to determine the “breaking
point” of our system. Figure 5c shows the results for two such
speeds, where the whole stream was processed in 0.5 and 1 hour,
corresponding to a speed-up of x 8640 and x 4320 compared to
the original dataset. While the CEP operator lags behind the
source when it is re-played at half an hour, it is evident that it
can process it without any problems when it is replayed at one
hour, as both pattern- and partition-based parallelism exhibit 0%
backpressure. In fact, pattern-based parallelism performs better
in this experiment. This lends credence to our belief that
patternbased parallelism might actually be more suitable than
partitionbased parallelism when there is a high number of patterns to be
processed simultaneously.
6</p>
    </sec>
    <sec id="sec-10">
      <title>SUMMARY AND FUTURE WORK</title>
      <p>
        In this paper, we presented Wayeb, a tool for Complex Event
Processing, as a means of processing big mobility data streams. We
defined a number of detection patterns that are useful in fleet
management and maritime monitoring applications. Moreover, we
presented implementations of two distributed recognition
techniques and compared their eficiency against the single-core
version. Our results demonstrate the superiority of partition-based
over pattern-based parallelization, when the number of patterns
is relatively low. When this number is significantly high, then
pattern-based parallelization becomes a viable option. For the
future, we intend to combine various distribution techniques and
to construct more patterns for the domains presented. Another
interesting research avenue would be to compare our
automatabased method against other approaches, such as logic-based ones,
which have been applied to similar datasets [
        <xref ref-type="bibr" rid="ref24 ref31">24, 31</xref>
        ].
      </p>
    </sec>
    <sec id="sec-11">
      <title>ACKNOWLEDGMENTS</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] [n.d.].
          <source>Apache Flink - Stateful Computations over Data Streams</source>
          . https://flink. apache.org/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>[2] [n.d.]. Apache Kafka. https://kafka.apache.org/.</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>[3] [n.d.]. Esper. http://www.espertech.com/esper.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>[4] [n.d.]. Esperonstorm. https://github.com/tomdz/storm-esper.</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>[5] [n.d.]. FlinkCEP - Complex event processing for Flink. https://ci.apache.org/ projects/flink/flink-docs-stable/dev/libs/cep.html.</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] [n.d.]. Monitoring Back Pressure. https://ci.apache.org/projects/flink/ lfink-docs
          <source>-release-1</source>
          .9/monitoring/back_pressure.html.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7] [n.d.].
          <source>Siddhi CEP</source>
          . https://github.com/wso2/siddhi.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] [n.d.].
          <article-title>Task chaining and resource groups</article-title>
          . https://ci.apache. org/projects/flink/flink-docs
          <source>-release-1</source>
          .9/dev/stream/operators/ #
          <article-title>task-chaining-and-resource-groups.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9] [n.d.].
          <source>WSO2</source>
          .
          <article-title>Creating a Storm Based Distributed Execu-tionPlan</article-title>
          . https://docs.wso2.com/display/CEP410/Creating+a+Storm+Based+ Distributed+Execution+Plan.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E</given-names>
            <surname>Alevizos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Artikis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G</given-names>
            <surname>Paliouras</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Event Forecasting with Pattern Markov Chains</article-title>
          .
          <source>In DEBS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E</given-names>
            <surname>Alevizos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Artikis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G</given-names>
            <surname>Paliouras</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Wayeb: a Tool for Complex Event Forecasting</article-title>
          .
          <source>In LPAR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A</given-names>
            <surname>Artikis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Sergot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G</given-names>
            <surname>Paliouras</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>An Event Calculus for Event Recognition</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C</given-names>
            <surname>Balkesen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N</given-names>
            <surname>Dindar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Wetter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N</given-names>
            <surname>Tatbul</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>RIP: run-based intraquery parallelism for scalable complex event processing</article-title>
          .
          <source>In DEBS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P</given-names>
            <surname>Carbone</surname>
          </string-name>
          , A Katsifodimos,
          <string-name>
            <given-names>S</given-names>
            <surname>Ewen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V</given-names>
            <surname>Markl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S</given-names>
            <surname>Haridi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K</given-names>
            <surname>Tzoumas</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Apache Flink™: Stream and Batch Processing in a Single Engine</article-title>
          .
          <source>IEEE Data Eng. Bull</source>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G</given-names>
            <surname>Cugola</surname>
          </string-name>
          and
          <string-name>
            <given-names>A</given-names>
            <surname>Margara</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Complex event processing with T-REX</article-title>
          .
          <source>J. Syst. Softw</source>
          . (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G</given-names>
            <surname>Cugola</surname>
          </string-name>
          and
          <string-name>
            <given-names>A</given-names>
            <surname>Margara</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Processing flows of information: From data stream to complex event processing</article-title>
          .
          <source>ACM Comput. Surv</source>
          . (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L D</given-names>
            <surname>'Antoni</surname>
          </string-name>
          and
          <string-name>
            <given-names>M</given-names>
            <surname>Veanes</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>The Power of Symbolic Automata and Transducers</article-title>
          .
          <source>In CAV (1).</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A</given-names>
            <surname>Demers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B</given-names>
            <surname>Panda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Riedewald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V</given-names>
            <surname>Sharma</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W</given-names>
            <surname>White</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Cayuga: A General Purpose Event Monitoring System</article-title>
          .
          <source>In CIDR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>N</given-names>
            <surname>Giatrakos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Alevizos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Artikis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Deligiannakis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M</given-names>
            <surname>Garofalakis</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Complex event recognition in the Big Data era: a survey</article-title>
          .
          <source>VLDB J</source>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Hirzel</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Partition and compose: parallel complex event processing</article-title>
          .
          <source>In DEBS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N</given-names>
            <surname>Koutroumanis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G</given-names>
            <surname>Santipantakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Glenis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C</given-names>
            <surname>Doulkeridis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G</given-names>
            <surname>Vouros</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Integration of Mobility Data with Weather Information</article-title>
          . In EDBT/ICDT Workshops.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Rundensteiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K</given-names>
            <surname>Greenfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I</given-names>
            <surname>Ari</surname>
          </string-name>
          , and
          <string-name>
            <given-names>A</given-names>
            <surname>Mehta</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing</article-title>
          .
          <source>In SIGMOD.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Y</given-names>
            <surname>Mei</surname>
          </string-name>
          and
          <string-name>
            <given-names>S</given-names>
            <surname>Madden</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>ZStream: a cost-based query processor for adaptively detecting composite events</article-title>
          .
          <source>In SIGMOD.</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>K</given-names>
            <surname>Patroumpas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Alevizos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Artikis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Vodas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N</given-names>
            <surname>Pelekis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Online event recognition from moving vessel trajectories</article-title>
          .
          <source>GeoInformatica</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K</given-names>
            <surname>Patroumpas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>Spirelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Chondrodima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H</given-names>
            <surname>Georgiou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Petrou</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tampakis</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sideridis</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelekis</surname>
            <given-names>N</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Theodoridis</surname>
            <given-names>Y.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Final dataset of Trajectory Synopses over AIS kinematic messages in Brest area (ver. 0.8) [Data set]</article-title>
          ,
          <volume>10</volume>
          .5281/zenodo.2563256. https://doi.org/10.5281/zenodo.2563256
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M</given-names>
            <surname>Pitsikalis</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Artikis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R</given-names>
            <surname>Dreo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Camossi</surname>
          </string-name>
          , and
          <string-name>
            <surname>A-L Jousselme</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Composite Event Recognition for Maritime Monitoring</article-title>
          .
          <source>In DEBS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>C</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R</given-names>
            <surname>Dreo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Camossi</surname>
          </string-name>
          , and AL Jousselme.
          <year>2018</year>
          .
          <article-title>Heterogeneous Integrated Dataset for Maritime Intelligence</article-title>
          , Surveillance, and Reconnaissance,
          <volume>10</volume>
          .5281/zenodo.1167595. https://doi.org/10.5281/zenodo.1167595
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>N</given-names>
            <surname>Schultz-Møller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Migliavacca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and P</given-names>
            <surname>Pietzuch</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Distributed complex event processing with query rewriting</article-title>
          .
          <source>In DEBS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L</given-names>
            <surname>Snidaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I</given-names>
            <surname>Visentini</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K Bryan.
          <year>2015</year>
          .
          <article-title>Fusing uncertain knowledge and evidence for maritime situational awareness via Markov Logic Networks</article-title>
          .
          <source>Inf. Fusion</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>F</given-names>
            <surname>Terroso-Saenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Valdés-Vela</surname>
          </string-name>
          , and
          <string-name>
            <given-names>A</given-names>
            <surname>Skarmeta-Gómez</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>A complex event processing approach to detect abnormal behaviours in the marine environment</article-title>
          .
          <source>Inf. Syst. Frontiers</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>E</given-names>
            <surname>Tsilionis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N</given-names>
            <surname>Koutroumanis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P</given-names>
            <surname>Nikitopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C</given-names>
            <surname>Doulkeridis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>A</given-names>
            <surname>Artikis</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Online Event Recognition from Moving Vehicles: Application Paper</article-title>
          .
          <source>TPLP</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>H</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y</given-names>
            <surname>Diao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N</given-names>
            <surname>Immerman</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>On complexity and optimization of expensive queries in complex event processing</article-title>
          .
          <source>In SIGMOD.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>