<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A. Alexandrov, R. Bergmann,
et al. The stratosphere platform for big data analytics. The
VLDB Journal</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1066-8888</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>PROTEUS: Scalable Online Machine Learning for Predictive Analytics and Real-Time Interactive Visualization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bonaventura Del Monte</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeyhun Karimov</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alireza Rezaei Mahdiraji</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tilmann Rabl</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volker Markl</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>23</volume>
      <issue>6</issue>
      <fpage>1</fpage>
      <lpage>1</lpage>
      <abstract>
        <p>Big data analytics is a critical and unavoidable process in any business and industrial environment. Nowadays, companies that do exploit big data's inner value get more economic revenue than the ones which do not. Once companies have determined their big data strategy, they face another serious problem: in-house designing and building of a scalable system that runs their business intelligence is difficult. The PROTEUS project aims to design, develop, and provide an open ready-to-use big data software architecture which is able to handle extremely large historical data and data streams and supports online machine learning predictive analytics and real-time interactive visualization. The overall evaluation of PROTEUS is carried out using a real industrial scenario.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>PROJECT DESCRIPTION
1. Handling extremely large historical data and data streams
2. Analytics on massive, high-rate, and complex data streams
3. Real-time interactive visual analytics of massive datasets,
continuous unbounded streams, and learned models</p>
      <p>PROTEUS’s solutions for the challenges above are: 1) a real-time
hybrid processing system built on top of Apache Flink3 (formerly
Stratosphere4 [1]) with optimized relational algebra and linear
algebra operations support through LARA declarative language [2,
3], 2) a new library for scalable online machine learning and data
mining called SOLMA, and 3) investigation and development of
incremental visual methods that allow end-users to efficiently explore
1https://www.proteus-bigdata.com/
2https://ec.europa.eu/programmes/horizon2020/
3https://flink.apache.org/
4http://stratosphere.eu/
both batch and streaming data for making well-informed decisions
in real time. These three subsystems will be integrated in a single
platform running in a containerized environment. Once the platform
is deployed in a cluster, its life-cycle is as follows: 1) the end-user
writes data analytics tasks in LARA mixing extract-transform-load
and SOLMA algorithms pipelines and executes them on top of
PROTEUS hybrid processing system, 2) the system continuously trains
deployed machine learning models in an online fashion, 3) the visual
stack queries those models and displays requested real-time
predictions and statistics to end-user.</p>
      <p>PROTEUS faces an additional challenge which deals with
correct integration of machine learning solutions in big data processing
systems by taking into account the principal anti-patterns and risks
factors that affect this kind of interactions [4].</p>
      <p>In addition, PROTEUS ensures the achievement of its goals through
rigorous experimental testing and industrial-validated processes. The
project is indeed guided by the specific requirements of the hot strip
mill steel-making process, provided by an industrial partner of
PROTEUS’ consortium. Hot strip mill produces coils, whose quality is
affected by several parameters (e.g. temperature, vibration
intensity, tension in the rollers). Since coils are used in further production
stages, they must present no defect. Predicting anomalies through
the analysis of massive real-time data generated during the hot strip
mill is the main target in this validation scenario.</p>
      <p>Regardless the above validation scenario, PROTEUS platform is
also applicable for general data streams analysis in other domains.</p>
      <p>Acknowledgements. This work was supported by the EU
Horizon 2020 project PROTEUS (687691).
2.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>