Scouting Big Data Campaigns using TOREADOR Labs Claudio A. Ardagna, Paolo Ceravolo Marcello Leida Ernesto Damiani Università degli Studi di Milano Taiger Consorzio Interuniversitario Computer Science Department Madrid 28036, Spain Nazionale per l’Informatica Crema, CR 26013, Italy marcello.leida@taiger.com Rome 00198, Italy {claudio.ardagna,paolo.ceravolo}@unimi.it ernesto.damiani@unimi.it ABSTRACT 2. BIG DATA-AS-A-SERVICE 1 Big Data Analytics-as-a-Service (BDAaaS) consists of a set of TOREADOR Labs offer a Big Data Analytics-as-a-Service automatic tools and methodologies that allows customers to environment for testing simplified but real-life Big Data analytics design BDA and deploy a full Big Data pipeline addressing their vertical scenarios. Users are challenged with requirements, goals [1]. BDAaaS can be seen as a function that takes as input described from a business perspective, and are requested to users’ Big Data goals and preferences, and returns as output a compare alternative options, investigating the consequences of ready-to-be executed Big Data pipeline. their choices. This “trial and error” approach brings up the interconnections and interferences of the different design stages While declarative goals underlying the use of Big Data services typically addressed in preparing a Big Data campaign. are usually industry-dependent, we argue that identifying a core set of standard indicators is an important step towards increasing Keywords transparency of the commitments taken by Big Data service Parallel computing methodologies; Modeling and simulation. providers, as well as the awareness of users adopting a Big Data solution. Indicators present a way for measuring or assessing a 1. INTRODUCTION business goal, such as analytics tasks or regulatory constraints on Today, the level of complexity of architectures supporting Big personal data protection, and are accompanied by Big Data Data Analytics (BDA) and the lack of standardisation for them objectives representing the target to be achieved for fulfilling the represents a huge barrier towards the adoption of Big Data goal. technologies, especially for those organisations and SMEs not having the sufficient amount of competences and skills. Another 3. TOREADOR LABS major hindering factor is the so-called “regulatory barrier”, that is, The model driven approach adopted by TOREADOR supports the concerns about violating data access, sharing and custody creation of a virtual environment particularly suited for training regulations when using BDA, and the high cost of obtaining legal Big Data professionals using a “trial and error” approach. This clearance for specific scenarios, which is discouraging companies, environment supports users in understanding the interrelations and particularly SMEs, from taking over BDA. interferences of the different design options available when Project TOREADOR aims to overcome some of these hurdles, by preparing a BDA. providing a platform that supports customers lacking Big Data In this context, the TOREADOR Labs provide a free-limited expertise in the management of BDA and deployment of a full access to TOREADOR using a Platform-as-a-Service solution. It Big Data pipeline [2]. Users with different skills and expertise can proposes a simplified version of real-life vertical scenarios and benefit by using TOREADOR. Users lacking proper data science success stories organised in a set of challenges, where the trainees expertise (e.g., modeling, analysis, problem solving) can use are requested to identify alternative options, and investigate the TOREADOR for preparing the real analytics, reason on data to consequences of their choices. Note that this kind of experience is find out hidden patterns and information, and solve business usually not available in the professional Big Data platforms today problems. Users lacking expertise proper of data engineers (e.g., in the market, where the architectural and data complexity make it builds a robust and fault-tolerant data pipeline, install a Big Data difficult to compare different runs of a composite BDA. system) can use TOREADOR to automatically identify and deploy the proper set of technologies that accomplish their REFERENCES requirements. Users lacking both type of expertise can use [1] E. D. Claudio Ardagna, Paolo Ceravolo. Big data analytics TOREADOR for a proper initiation in the Big Data realm. as-a-service: Issues and challenges. In Proceedings of the 3rd International Workshop on Privacy and Security of Big 1 This project has received funding from the European Union’s Horizon 2020 Data (PSBD). IEEE, 2016. research and innovation programme under the TOREADOR project, grant agreement No 688797; Project Coordinator: Prof. Ernesto Damiani, CINI, [2] M. Leida, C. Ruiz, and P. Ceravolo. Facing big data variety Italy; Project web site: http://www.toreador-project.eu/. in a model driven approach. In Research and Technologies for Society and Industry Leveraging a better tomorrow 2017, Copyright is with the authors. Published in the Workshop Proc. of the EDBT/ICDT 2017 Joint Conference (March 21, 2017, Venice, Italy) on CEUR- (RTSI), 2016 IEEE 2nd International Forum on, pages 1–6. WS.org (ISSN 1613-0073). Distribution of this paper is permitted under the IEEE, 2016. terms of the Creative Commons license CC-by-nc-nd 4.0