=Paper= {{Paper |id=Vol-1882/paper12 |storemode=property |title=Processing Moving Object Data Streams with Data Stream Management Systems |pdfUrl=https://ceur-ws.org/Vol-1882/paper12.pdf |volume=Vol-1882 |authors=Tobias L. Brandt |dblpUrl=https://dblp.org/rec/conf/vldb/Brandt17 }} ==Processing Moving Object Data Streams with Data Stream Management Systems== https://ceur-ws.org/Vol-1882/paper12.pdf
                 Processing Moving Object Data Streams with
                     Data Stream Management Systems

                                                               Tobias Brandt
                                                    Supervised by Marco Grawunder
                                                    University of Oldenburg, Germany
                                                      tobias.leo.brandt@uol.de

ABSTRACT                                                                  programming language and simplify the connection to typ-
With the wide spread of cheap and mobile GPS sensors                      ical data sources. Maintenance of queries on data streams
as well as mobile data connections, live streams from mov-                is made simple due to the ease to change and update the
ing objects are becoming a huge data source. The services                 query text. Hence, queries can be adapted to new require-
based on these data streams, for example, for connected                   ments quickly. These features make them a useful tool to
cars, vessels or smartphone users, need real-time results for             easily create and change queries for many different use cases
queries based on the current or even near-future positions                and are therefore a good choice for rapid prototyping sys-
of the moving objects. Spatio-temporal data from moving                   tems in the field of data stream processing and analysis.
objects cannot just be treated as a crowd of points with                     To cope with the requirements of data streams, DSMSs
timestamps, but must be seen as points in a trajectory with               support continuous queries and use a data-driven approach.
non-measured points in between. In this paper I present                   New results are incrementally calculated when new data ar-
my work on the management of such real-time trajectories                  rives at the system. This increases the demand for quick
within a Data Stream Management System (DSMS) to en-                      calculations, wherefore data is typically kept in-memory.
able simple, flexible and efficient in-memory moving object               Unfortunately, a potentially infinite data stream cannot be
query processing.                                                         hold in-memory. A typical solution DSMSs provide are win-
                                                                          dows. These reduce the amount of data hold in-memory to
                                                                          a smaller part of the data streams, e. g., all elements from
1.    INTRODUCTION                                                        the last hour or the last 100 elements.
  Moving objects in the real world are everywhere and have                   Even though DSMSs already tackle lots of the challenges
been there for a while: pedestrians and cars on the streets,              that occur with data stream processing, they lack features
vessels on the oceans and airplanes in the sky. Comparably                necessary to work with moving objects data. Two of these
new is the huge amount of data these objects are produc-                  features are (1) continuous location interpolation and near-
ing. Many of these send their location regularly to a central             future prediction and (2) fast moving objects index struc-
server or other facility, may it be the smartphone user with a            tures for windows with high fluctuation. In this work, I
Location-based Service (LBS) or a vessel with an Automatic                concentrate on point data, hence, moving regions are not in
Identification System (AIS) sender.                                       the scope. That is because most moving object data of inter-
  This data can be used to answer questions and solve real-               est today, such as vessels and pedestrians, can be simplified
world problems. For example, vessels can be warned about                  to point objects without loosing too much precision in the
congested or currently dangerous areas based on their own                 queries. Moving or evolving regions in contrast introduce a
position as well as the positions of other vessels. As more               whole new palette of challenges.
data can be shared via live-streams and as the results of such               One important feature of moving objects data streams is
queries are required with minimal delay, traditional systems              that the objects move continuously but are only measured
that first store and then query the data streams are not                  once in a while. Therefore, the objects have unknown lo-
ideal. They typically run short-term queries on static data               cations in between the measurements, which can and some-
sets that are stored on the hard drive.                                   times need to be used for querying. Imagine a query where
  Data Stream Management Systems (DSMSs) especially                       a vessel needs to know all vessels around it. Another vessel,
target data streams. They offer solutions for many data                   which last known location is (temporal and spatial) far away
stream related challenges, provide query languages to define              but will probably be within that range on querytime, should
queries without the need to write code in a general purpose               be included in the answer, hence, its location needs to be
                                                                          automatically predicted to the future by the query. This sce-
                                                                          nario is particularly important for satellite AIS where it is
                                                                          normal to have hours between location updates [3]. Stream-
                                                                          ing data differs from static time-series data: in static data,
                                                                          all measured locations of a moving object are known. In con-
                                                                          trast, in a streaming environment, it is not possible to know
                                                                          if and when the next location update of a moving object will
Proceedings of the VLDB 2017 PhD Workshop, August 28, 2017. Munich,
Germany.
                                                                          arrive.
Copyright (c) 2017 for this paper by its authors. Copying permitted for      Challenges with this approach are that interpolation and
private and academic purposes.
prediction depends on the use case and always comes with an
uncertainty. When new data about an interpolated object is                                        12:00
                                                                                                                       12:00
available, old query results may need to be updated. When                         11:55
                                                                                                      2
and how to update results by more precise ones is a non-                            1                     11:58
trivial question within a DSMS. To my best knowledge,                                             12:00
there is no work that tackles these challenges in the field of                                    X
data streams for moving objects.
                                                                                          11:58
   The second feature mentioned above are moving objects                                           3
                                                                                                    12:05
windows and suitable index structures. As not all data can
be stored, it needs to be decided which data is still needed
for processing and which is not and how and when old data
can be wiped. Typical window concepts such as time-based
windows can be extended by windows especially for moving
objects. A possible window type could be a distance-based                                                 CC BY-SA 2.0, Editors of http://map.openseamap.org/

window. It would store all data within a certain distance of
a single moving object, e. g., the last kilometer of every ob-         Figure 1: Range query for the orange vessel.
ject. The requirements for the underlying index structures
differ both from pure temporal as well as pure spatial index
structures.
                                                                 the middle sent its last location at 12 o’clock and wants to
2.    OBJECTIVES AND CHALLENGES                                  know which vessels are in its range at that point in time.
   The overall goal of this work is to create and implement      The trajectory of the other vessels one to three are visual-
concepts to allow DSMSs to process spatio-temporal data          ized with the arrows and circles. The circles are the measure-
from moving objects. To reach this goal, I am tackling the       ments where the correct location of the vessel is known. The
challenges of including location interpolation and prediction    arrows in between visualize a simple interpolation. It is as-
as well as window concepts for moving object data streams.       sumed that the path between the measurements is straight.
                                                                 That is not necessarily the case but it is a reasonable and
2.1     Objectives                                               simple approximation. The dashed lines with the striped
                                                                 circles are predictions of the future trajectory.
     • Location Interpolation Moving objects naturally              The need to interpolate and predict locations arises from
       move continuously but are only measured once in a         the fact that the moving objects do not measure and send
       while. Within a data stream, they seem to be hopping      their location synchronized with each other, but at different
       from point to point, resulting in delayed and possi-      time intervals. In the figure, the last known location of
       bly wrong results. I aim to introduce location inter-     the orange vessel in the middle was captured at 12 o’clock.
       polation into the data stream processing to that the      For this point in time, the locations of all other vessels are
       objects move continuously in time and space. The in-      needed to answer the query correctly. Unfortunately, the
       terpolation should also be used for short-term location   known, i. e., measured, locations of the other vessels are not
       prediction.                                               at 12 o’clock, but slightly before or after this point in time.
                                                                 If only the last known location would be used to answer the
     • Result Uncertainty As prediction always comes with        query, the result would be wrong: Vessel 1 would not be
       a degree of uncertainty, the results are uncertain as     within the result but should be, Vessel 2 would be within
       well and can change when new data from a moving           the result but should not and the location of Vessel 3 would
       objects arrives. The accuracy or uncertainty of a re-     be wrong.
       sult should be transparent to the user and updates of        Hence, interpolation and prediction is necessary to an-
       results should be possible.                               swer the query approximately correct. When doing so, a
                                                                 few challenges arise. The interpolation has to work in an in-
     • Moving Object Windows Window definitions that             cremental manner with limited knowledge about the data,
       are especially useful for moving objects data should be   as not all past data can be stored in a streaming environ-
       introduced.                                               ment. The accuracy of a query result needs to be known to
     • Spatio-temporal Indexes The data within the win-          the user or further processing steps. When the prediction
       dows need to be hold in spatio-temporal index struc-      was wrong, old results for a query may need to be updated
       tures optimized for high data fluctuation.                (e. g., if Vessel 1 takes a different path than predicted). In a
                                                                 streaming scenario, the approximate result may already be
  The concepts created to solve the goals should be eval-        used for further processing or a following result, for exam-
uated by implementing them into an open source DSMS.             ple, for 12:03, has already been processed. The questions
Scenarios with AIS data from vessels should show that the        on how to integrate uncertain results and updates to them
concepts work with real-world data and queries.                  within a DSMS need to be tackled.

2.2     Challenges for Location Interpolation                    2.3     Challenges for Moving Object Windows
  Typical spatio-temporal queries include neighborhood and         Windows are necessary to reduce the infinite data stream
range queries. An example query could be “Continuously re-       to a finite set of data. The data within a window can be
port all vessels within a range of 10 km around vessel X”.       kept in-memory and never needs to be permanently written
Such a query is depicted in Figure 1. The orange vessel in       to a hard drive. Next to the performance improvements,
the concept of windows has more benefits. They are based           implement the spatio-temporal operations. While doing so,
on the assumption that in many cases, queries are only or          they have to behave like normal operators to the outside so
mostly interested in the current data and not in the data          that other operators can seamlessly use the output. Two
from the distant past. To define a window according to the         example operators that are needed are a range and a k -
use case, the domain needs to be known. A typical window           nearest neighbors (k NN) operator. Both search for other
could, for example, hold all data from the last hour.              moving objects close to a certain object. The external be-
   In the domain of moving objects, the requirements for           havior of these operators is similar to other operators. They
windows can differ depending on the scenario. The defi-            receive stream elements, process them and send their results
nition of a window only by the time and not by the space           as stream elements to the next operator.
dimension is often not enough. Imagine, for example, a win-           Internally, these operators need to use location interpola-
dow where the speed of all objects within a data stream is         tion and prediction to compute correct results and annotate
diverse and variable (e. g., slow vessels and fast planes within   these results with the level of (un)certainty in the meta data
one stream). It could be necessary to have from each ob-           of a streaming object. The interpolation should be done by
ject at least the last kilometer within the window. With a         a framework within the DSMS that allows to interchange
time-based window this would be difficult to achieve. Addi-        algorithms, as the interpolation algorithm can change from
tionally, compression could be introduced to moving objects        case to case.
windows. Patroumpas et al. [7] show that trajectory data
can be compressed without loosing much accuracy. Hence,            3.2   Current and Future Work
new window concepts for moving objects are useful.                    Currently, prototype versions for moving object range and
   Windows reduce the amount of the required (in-memory)           k NN queries are available. A prototype implementation
storage space. Nevertheless, it opens up new questions about       within a DSMS was developed. For spatial querying with
how to clean up the memory, for example, when to delete            moving object windows, an index structure based on Geo-
old data. This question gets more complicated as window            Hashes [6] was developed. It showed better performance
indexes have to be shared between multiple queries. Imag-          than an implementation based on QuadTrees. This could
ine a data stream of all vessels on the North Sea. As spatial      be due to better insertion and deletion performance with
queries can be heavily accelerated when using an index, the        the GeoHash implementation. However, these results are
data in the windows are indexed. Thereby, due to memory            very preliminary, as the test setup needs to be better de-
limitations, creating multiple nearly identical indexes must       scribed and results need to be analyzed further to find out
be avoided. Subsequently, one index is shared between mul-         the reasons for the differences.
tiple queries. Nevertheless, such a sharing makes the de-             The correct integration of those queries as well as moving
cision when to delete old data more complicated, as the            object windows is ongoing work. The results are currently
window requirements from the queries can be different.             only partly usable for other query operators. Interpolating
   Additionally, not every index structure is suitable for a       and predicting locations are in the concept phase. Develop-
spatio-temporal data stream index. In contrast to more tra-        ment and implementation of these will be a major part of
ditional Geographic Information System (GIS) applications,         the future work.
the fluctuation in the data is very high. New data needs to
be inserted and at the same time old data needs to be re-          3.3   Planned Evaluation
moved on a high frequency. It is possible that the whole              The concepts that are created in this PhD project will
set of data within a window can be swapped within minutes          be implemented into the open source DSMS Odysseus 1 [1].
or even seconds, which, for example, distinguishes the re-         Odysseus offers a rich set of operators and a query language.
quirements from static time-series data. Traditional index-        For the purpose of this work it already supports protocols
structures that require heavy reorganization when data gets        used in the maritime domain such as AIS and can talk to
changed are probably not suitable for this environment. In         common data sources such as RabbitMQ out of the box. In
this work it needs to be evaluated if index structures for         contrast to streaming frameworks such as Apache Flink2 or
this purpose are useful as it is only an improvement if the        Heron3 , it is not necessary to program in a general program-
indexing needs less time than it saves while querying the          ming language such as Java to create new queries.
data.                                                                 With that implementation, the feasibility of the concepts
   In this PhD project I aim to create suitable window con-        will be evaluated. Using an iterative approach, the concepts
cepts, implement them and choosing an efficient index struc-       can be adjusted if uncovered challenges occur while evaluat-
ture for this very dynamic environment.                            ing. The implementation will be used with scenarios in the
                                                                   maritime context, especially with AIS data. An example
3.    RESEARCH PLAN                                                query could be to continuously query if a vessel is heading
  In this section, the main approaches to overcome the chal-       to an area that will be congested during its transit.
lenges from above are described.                                      For the given scenarios with moving objects, timely query
                                                                   results are necessary, wherefore the performance of the solu-
3.1    Approach                                                    tions will be measured. The latency and throughput of the
   Query processing on data streams is typically done with         queries will be used to compare different implementations,
operator graphs. The elements of the data stream are send          e. g., different approaches for spatio-temporal indexes.
from one operator to the next, each operator doing a specific      1
                                                                     http://odysseus.offis.uni-oldenburg.de/, last ac-
task. Joins, projections and selections are typical examples       cessed on 03/21/2017
for such operators.                                                2
                                                                     https://flink.apache.org/, last accessed on 05/24/2017
   When adding support for moving objects data, this mod-          3
                                                                     https://twitter.github.io/heron/, last accessed on
ular architecture should be exploited. New operators can           05/24/2017
4.   RELATED WORK                                                  5.   CONCLUSION
   General purpose and open source streaming systems such             This paper describes the motivation, challenges and ap-
as Apache Flink and Apache Storm4 as well as commer-               proaches of processing data from moving objects in DSMSs.
cial systems such as IBM InfoSphere Streams5 (short: IBM           A major challenge are asynchronous updates of locations of
Streams) offer high performance and distributed stream pro-        multiple moving objects. To serve timely query results, e. g.,
cessing, but have only limited support for moving objects.         for a range query, locations of objects need to be interpo-
While spatio-temporal data is supported in some systems            lated and predicted. The integration of interpolated values
(e. g., with IBM Streams [2]), location interpolation and          into a DSMS provides some challenges that are tackled with
moving object windows are, as of my knowledge, not.                this PhD project. First approaches to solve these are ex-
   Zhang et al. [9] use Apache Storm to process fast data          plained. The planned evaluation uses AIS data from vessels
streams from moving objects. They focus on a distributed           for continuous queries.
spatial index which speeds up range and k NN queries as
well as spatial joins. One main difference to this work is         6.   ACKNOWLEDGMENTS
that they do not interpolate and predict locations to have           We thank the Ministry of Science and Culture of Lower
temporal correct results.                                          Saxony, Germany for supporting us with the graduate school
   The open source project GeoMesa6 works with spatio-             Safe Automation of Maritime Systems (SAMS).
temporal data, e. g., from moving objects [6]. The project
develops indexes based on space-filling curves. These allow
quick access to spatial or spatio-temporal data within sorted
                                                                   7.   REFERENCES
                                                                   [1] H.-J. Appelrath, D. Geesen, M. Grawunder,
key-value stores such as Apache Accumolo7 . GeoMesa does               T. Michelsen, and D. Nicklas. Odysseus: A highly
not specifically address streaming data, data stream man-              customizable framework for creating efficient event
agement capabilities or location interpolation.                        stream management systems. In Proceedings of the 6th
   The RxSpatial library [8] is an extension for the Microsoft         ACM International Conference on Distributed
SQL Server Spatial Library and adds support for moving                 Event-Based Systems, DEBS ’12, pages 367–368, New
objects. It adds the RUM-tree, which is an extension of the            York, NY, USA, 2012. ACM.
R-tree for frequent updates. Additionally, RxSpatial allows        [2] A. Biem, E. Bouillet, H. Feng, A. Ranganathan,
continuous spatial queries, e. g., to observe if a moving object       A. Riabov, O. Verscheure, H. Koutsopoulos, and
is close to another. As of my best knowledge, the library              C. Moran. Ibm infosphere streams for scalable,
does not take into account the time of the updates but uses            real-time, intelligent transportation services. In
the newest updates of every moving object. Interpolation               Proceedings of the 2010 ACM SIGMOD International
and prediction are not used.                                           Conference on Management of Data, SIGMOD ’10,
   Interpolation for moving objects is a difficult challenge for       pages 1093–1104, New York, NY, USA, 2010. ACM.
moving regions (e. g., described by Heinz et al. [5]). This is
                                                                   [3] M. A. Cervera, A. Ginesi, and K. Eckstein.
especially complex as these regions can change their shape
                                                                       Satellite-based vessel automatic identification system:
over time. In this work I want to concentrate on moving
                                                                       A feasibility and performance analysis. International
points, which is a way simpler version of that problem. Nev-
                                                                       Journal of Satellite Communications and Networking,
ertheless, the perfect interpolation method is not the goal of
                                                                       29(2):117–142, 2011.
this work but the integration of interpolation and prediction
                                                                   [4] R. H. Güting, T. Behr, and C. Düntgen. Secondo: A
into the stream processing of moving objects data.
                                                                       platform for moving objects database research and for
   Secondo [4] is a database system especially for moving
                                                                       publishing and integrating research implementations.
objects. It has a spatial and temporal algebra with which
                                                                       Fernuniv., Fak. für Mathematik u. Informatik, 2010.
queries for moving objects can be formulated. As it is a
database, it is not optimized for data streams, e. g., it does     [5] F. Heinz and R. H. Güting. Robust high-quality
not support windows, does not run mainly in-memory and                 interpolation of regions to moving regions.
hence does not have to solve the problem of cleaning up old            Geoinformatica, 20(3):385–413, July 2016.
data. Nevertheless, it gives useful insights on the handling       [6] H. V. Le. Distributed moving objects database based
of moving objects data.                                                on key-value stores. In Proceedings of the VLDB 2016
   Patroumpas et al. [7] use AIS data in a streaming envi-             PhD Workshop co-located with the 42nd International
ronment to detect complex events such as unexpected stops              Conference on Very Large Databases (VLDB 2016),
of vessels. They compress the data to important points in              New Delhi, India, September 9, 2016., 2016.
the trajectory of the vessels without loosing much accuracy.       [7] K. Patroumpas, E. Alevizos, A. Artikis, M. Vodas,
They also address errors in the AIS data by removing wrong             N. Pelekis, and Y. Theodoridis. Online event
measurements. In contrast to this work, they do not use lo-            recognition from moving vessel trajectories.
cation prediction for vessels that did not send an update for          GeoInformatica, 21(2):389–427, 2017.
a while, which is for example the case for satellite AIS.          [8] Y. Shi, A. M. Hendawi, H. Fattah, and M. Ali.
                                                                       Rxspatial: Reactive spatial library for real-time
                                                                       location tracking and processing. In Proceedings of the
4                                                                      2016 International Conference on Management of
  https://storm.apache.org/, last accessed on 03/21/2017               Data, pages 2165–2168. ACM, 2016.
5
  https://www.ibm.com/analytics/us/en/technology/
stream-computing/, last accessed on 03/21/2017                     [9] F. Zhang, Y. Zheng, D. Xu, Z. Du, Y. Wang, R. Liu,
6
  http://www.geomesa.org, last accessed on 03/17/2017                  and X. Ye. Real-time spatial queries for moving objects
7
  https://accumulo.apache.org/,      last accessed on                  using storm topology. ISPRS International Journal of
03/21/2017                                                             Geo-Information, 5(10), 2016.