=Paper=
{{Paper
|id=Vol-1810/EuroPro_paper_03
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1810/EuroPro_paper_03.pdf
|volume=Vol-1810
|dblpUrl=https://dblp.org/rec/conf/edbt/MonteKMRM17
}}
==None==
PROTEUS: Scalable Online Machine Learning for Predictive Analytics and Real-Time Interactive Visualization Bonaventura Del Monte1 , Jeyhun Karimov1 , Alireza Rezaei Mahdiraji1 , Tilmann Rabl1,2 , Volker Markl1,2 1 German Research Center for Artificial Intelligence (DFKI), 2 TU Berlin 1 firstname.lastname@dfki.de, 2 firstname.lastname@tu-berlin.de ABSTRACT both batch and streaming data for making well-informed decisions Big data analytics is a critical and unavoidable process in any busi- in real time. These three subsystems will be integrated in a single ness and industrial environment. Nowadays, companies that do ex- platform running in a containerized environment. Once the platform ploit big data’s inner value get more economic revenue than the is deployed in a cluster, its life-cycle is as follows: 1) the end-user ones which do not. Once companies have determined their big data writes data analytics tasks in LARA mixing extract-transform-load strategy, they face another serious problem: in-house designing and and SOLMA algorithms pipelines and executes them on top of PRO- building of a scalable system that runs their business intelligence is TEUS hybrid processing system, 2) the system continuously trains difficult. The PROTEUS project aims to design, develop, and pro- deployed machine learning models in an online fashion, 3) the visual vide an open ready-to-use big data software architecture which is stack queries those models and displays requested real-time predic- able to handle extremely large historical data and data streams and tions and statistics to end-user. supports online machine learning predictive analytics and real-time PROTEUS faces an additional challenge which deals with cor- interactive visualization. The overall evaluation of PROTEUS is car- rect integration of machine learning solutions in big data processing ried out using a real industrial scenario. systems by taking into account the principal anti-patterns and risks factors that affect this kind of interactions [4]. In addition, PROTEUS ensures the achievement of its goals through 1. PROJECT DESCRIPTION rigorous experimental testing and industrial-validated processes. The PROTEUS1 is an EU Horizon20202 funded research project, which project is indeed guided by the specific requirements of the hot strip has the goal to investigate and develop ready-to-use, scalable online mill steel-making process, provided by an industrial partner of PRO- machine learning algorithms and real-time interactive visual analyt- TEUS’ consortium. Hot strip mill produces coils, whose quality is ics, taking care of scalability, usability, and effectiveness. In partic- affected by several parameters (e.g. temperature, vibration inten- ular, PROTEUS aims to solve the following big data challenges by sity, tension in the rollers). Since coils are used in further production surpassing the current state-of-art technologies with original contri- stages, they must present no defect. Predicting anomalies through butions: the analysis of massive real-time data generated during the hot strip 1. Handling extremely large historical data and data streams mill is the main target in this validation scenario. Regardless the above validation scenario, PROTEUS platform is 2. Analytics on massive, high-rate, and complex data streams also applicable for general data streams analysis in other domains. 3. Real-time interactive visual analytics of massive datasets, con- Acknowledgements. This work was supported by the EU Hori- tinuous unbounded streams, and learned models zon 2020 project PROTEUS (687691). PROTEUS’s solutions for the challenges above are: 1) a real-time hybrid processing system built on top of Apache Flink3 (formerly 2. REFERENCES Stratosphere4 [1]) with optimized relational algebra and linear al- [1] A. Alexandrov, R. Bergmann, gebra operations support through LARA declarative language [2, et al. The stratosphere platform for big data analytics. The 3], 2) a new library for scalable online machine learning and data VLDB Journal, 23(6):939–964, Dec. 2014. ISSN 1066-8888. mining called SOLMA, and 3) investigation and development of in- [2] A. Alexandrov, A. Kunft, cremental visual methods that allow end-users to efficiently explore et al. Implicit parallelism through deep language embedding. 1 In Proceedings of the 2015 ACM SIGMOD International https://www.proteus-bigdata.com/ 2 Conference on Management of Data, SIGMOD ’15, pp. 47–61. https://ec.europa.eu/programmes/horizon2020/ 3 https://flink.apache.org/ ACM, New York, NY, USA, 2015. ISBN 978-1-4503-2758-9. 4 http://stratosphere.eu/ [3] A. Kunft, A. Alexandrov, et al. Bridging the gap: Towards opti- mization across linear and relational algebra. In Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR ’16, pp. 1:1–1:4. ACM, New York, NY, USA, 2016. ISBN 978-1-4503-4311-4. [4] D. Sculley, G. Holt, et al. Machine learning: The high interest c 2017, Copyright is with the authors. Published in Proc. 20th International credit card of technical debt. In SE4ML: Software Engineering Conference on Extending Database Technology (EDBT), March 21-24, 2017 - Venice, Italy: ISBN 978-3-89318-073-8, on OpenProceedings.org. for Machine Learning (NIPS 2014 Workshop). 2014. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0