=Paper=
{{Paper
|id=Vol-1458/E06_CRC40_Leibrandt
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1458/E06_CRC40_Leibrandt.pdf
|volume=Vol-1458
}}
==None==
Big Data Science Architecture for Continuous Technology Transfer from Research to Industry Operations Richard A. E. Leibrandt WidasConcepts Unternehmensberatung GmbH, Maybachstrae 2, 71299 Wimsheim, Deutschland richard.leibrandt@widas.de http://www.widas.de Abstract. Big Data without analysis is hardly anything but dead weight. But how to analyse it? Finding algorithms to do so is one of the Data Scientist’s jobs. However, we would like to not only explore our data, but also automatise the process by building systems that analyse our data for us. A solution should enable research, meet industry demands and enable continuous delivery of technology transfer. For this we need a Big Data Science Architecture. Why? Because in Big Data Science (BDS) projects, Big Data (BD) and Data Science (DS) – influencing each other – can’t be handled separately. Thus, their com- plexities (and gain) multiply: BDS 6= BD+DS, BDS = BD·DS. This complexity boost increases further by the clash of the two different worlds of scientific research programming (DS) and enterprise software engineering (BD). The former thrives on explorative experiments which are often messy, ad hoc and uncertain in their findings. The later requires code quality and fail-safe operation, achieved by well defined processes with access control and automated testing and deployment. We present a blue print for a Big Data Science Architecture. It includes data cleaning, feature derivation and machine learning, using Batch and Real-time engines. It spans the entire lifecycle with three environments: Experiments, close-to-life-tests, life-operations, enabling creativity while ensuring fail-safe operation. It takes the needs of data scientist, software engineers and operation administrators into account. Data can be creatively explored in the experimental environment. Thanks to strict read governance no critical systems are endangered. After algo- rithms are developed, a technology transfer to the test environment takes place, which is build the same as the life-operations environment. There the algorithm is adapted to run in automated operations and tested thor- oughly. On acceptance the algorithms are deployed to life-operations. Keywords: Big Data, Data Science, Architecture, Industrial Challenges, Technology Transfer, Continuous Delivery, Batch- and Real-Time-Processing Copyright c 2015 by the papers authors. Copying permitted only for private and academic purposes. In: R. Bergmann, S. Görg, G. Müller (Eds.): Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. Trier, Germany, 7.-9. October 2015, published at http://ceur-ws.org 76