<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>STREAMLINE - Streamlined Analysis of Data at Rest and Data in Motion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Philipp M. Grulich</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tilmann Rabl</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volker Markl</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Csaba Sidló</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andras Benczur</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>3. REFERENCES [1] Carbone et al. Apache ink: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4), 2015. [2] Carbone et al. Cutty: Aggregate sharing for user-de ned windows. In CIKM, pages 1201{1210, 2016. [3] Traub et al. I2: Interactive real-time visualization for streaming data. In EDBT, EDBT, 2017.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>STREAMLINE aims for improving the overall work ow of
big data analytics systems. For this goal, it combines
research in di erent areas to reduce the complexity of the work
with data at rest and data in motion in a uni ed fashion. As
a foundation STREAMLINE o ers a uniform programming
model on top of Apache Flink, for which it drives innovations
in a wide range of areas, such as interactive data in motion
visualization and advanced window aggregation techniques.</p>
    </sec>
    <sec id="sec-2">
      <title>PROJECT SUMMARY</title>
      <p>The STREAMLINE project aims to improve the work ow
and usability of current big data analysis systems. Therefore
it provides a uniform system, which is able to handle the
analysis of big data at rest as well as fast data in motion.
With this platform, STREAMLINE enables a reduction of
complexity, costs, and latency.</p>
      <p>Traditionally batch- and stream-processing were
considered as two very di erent types of applications, but in the
last years, it has been shown that the most real-world
usecases required systems for both workloads. This forces
companies to integrate di erent specialized systems, which leads
not only to complex system architecturesand introduces
maintenance overhead, it also introduces a high latency to the
general data analysis work ow. This is also known as the
problem of system and human latency in big data analysis.
Even technologies that are able to combine data in motion
and data at rest are currently very complex and di cult to
deploy, maintain and use. Beside this many companies have
a demand for much more advanced analyses, which are still
hard to implement in current systems.</p>
      <p>To reduce this complexity STREAMLINE combines
research and innovations in the areas of distributed systems,
data management, and machine learning. Whereby
STREAMLINE's key goal is to arrive at sustainable innovation by
technology transfer to an established and growing open source
project. STREAMLINE focuses on innovations in the area
of the following four reactive and proactive applications:
customer retention, personalized recommendations, target
advertisement and multilingual Web processing. To
integrate the innovations into the industry, STREAMLINE
partners with multiple companies.</p>
      <p>As its system foundation STREAMLINE relies on the
open source data processing system Apache Flink, which
is able to handle batch and stream processing on a single
pipelined execution engine [1]. On top of this
STREAMLINE o ers a single uniform programming model that can
automatically be optimized, parallelized, and adopted to the
system load, data distribution, and architecture.</p>
      <p>Research Highlights: Cutty [2] introduces a general
aggregation sharing framework for streaming windows, which
outperforms previous solutions in order of magnitudes. This
technique utilizes the fact that window aggregations are
one of the most redundancy-prone operations in current
stream processing. Cutty is also suitable for multi query
aggregation sharing and non-periodic windows, such as
session window, which can be used for more complex
business logic. Based on this technique STREAMLINE enables
higher throughput and improves the e ciency of its data
processing platform.</p>
      <p>I2 [3] in contrast, focuses on the visualization and
interactive aggregation of data in motion, which is a key enabler
for fast and e cient real-time data analysis. It contributes
an interactive development environment, which coordinates
the cluster application and includes interactive stream
visualization techniques. With this I2 is able to handle advanced
and adaptive aggregations directly on the cluster. As one
example we provide an aggregation algorithm for timer-series
data, which reduces the amount of data in a data-rate
independent manner and is proven to be correct and minimal
in terms of transferred data. Therefore I2 is an important
part of STREAMLINE, because it enhances the usability
and accessibility of its platform.
2.</p>
    </sec>
    <sec id="sec-3">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This work was supported by the EU Horizon 2020 project
Streamline (688191).</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>