Supporting OLAP-Based Big Data Analytics over Data-Intensive Business Processes: Issues, Models, Proposals, and a Real-Life Framework Alfredo Cuzzocrea University of Trieste and ICAR-CNR Trieste, 34127 alfredo.cuzzocrea@dia.units.it term “OLAP-based big data analytics” (e.g., [Cuz17, CMF+ 16]). Abstract Inspired by this research context, in this paper we focus the attention on the problem of support- This paper focuses the attention on the prob- ing OLAP-based big data analytics over data-intensive lem of supporting big data analytics over so- business processes, and we describe a real-life frame- called data-intensive business processes, i.e. work inspired developed in the context of a real-life business processes connected to big data project, called REMS.PA, which has produced the sources. This applicative setting is now more corresponding framework, mainly designed on top of and more of great interest in the commu- open-source technologies, and that, particularly, fo- nity, also due to emerging computational cuses on business processes of the Public Administra- paradigms like Cloud Computing. The paper tion. explores issues, models and proposals in the The remaining part of this paper is organized as fol- field, and finally provides the architecture of a lows. In Section 2, we report on main research issues of real-life framework that supports big data an- supporting OLAP-based big data analytics over data- alytics over data-intensive business processes intensive business processes. In Section 3, we describe via fortunate OLAP metaphors. the proposed framework. Finally, in Section 4, we pro- vide conclusions and future work for our research. 1 Introduction Nowadays, the problem of supporting big data an- 2 OLAP-Based Big Data Analytics alytics (e.g., [CSD11, Cuz13, CS14, Rus11, RR14]) over Data-Intensive Business Pro- over so-called data-intensive business processes (e.g., cesses: Emerging Research Issues [ALRM17, SMM17, GK18]) plays a relevant role. This OLAP-based big data analytics over data-intensive because, on one hand, business processes still keep the business processes opens the door to several emerg- most of the data, information and knowledge of very- ing research issues, among which some noticeable ones large enterprises and organizations, and, on the other are the following: hand, perfectly marry with the emerging characteris- tics of big data (e.g., [CSU13, CBS13, LJYC15, ZE11, • computing multidimensional OLAP aggregations MCB+ 11]). over data-intensive business processes; An important solution for supporting big data an- alytics concerns with applying fortunate multidimen- • supporting OLAP querying, operators and oper- sional metaphors and abstractions, mainly falling in ations over so-computed OLAP cubes; the well-known OLAP context, thus originating an evolving trend that can be safely recognized within the • effective and efficient in-memory representation of business process cubes; Copyright © CIKM 2018 for the individual papers by the papers' authors. Copyright © CIKM 2018 for the volume as a collection • supporting flexible big data prediction method- ologies over so-computed OLAP cubes. by its editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). How to aggregate a collection of data-intensive paradigm; business processes? This is a relevant question that has attracted the attention of several studies. Ba- • visual analytics tools for business processes based sically, classical OLAP aggregation algorithms can- on multidimensional abstractions; not be applied as they are, but suitable adaptations must be devised. A possibility consists in consider- • tools to support the prediction of executions of ing the graph-like nature of business processes in this business processes based on a data-driven ap- respect. Doing this, the scalability property, which proach. is relevant for big data management and processing (e.g., [WXGM18, SYGZ18, YLHC14, CMX13]), must The framework has been realized by using and inte- be taken into account. grating open-source software technologies for the sup- After computing aggregations, the support for port of business process management with the aim of OLAP querying, operators and operations must be en- speeding up and simplifying the management of the sured. Among queries, range queries are very signif- operational workflows of the Public Administration, icant in this context. In addition, supporting roll-up via defining and building the management processes and drill-down operators is, for instance, a first-class in a rigorous and reliable way, and finally monitor the problem in this respect. At the same, slice and dice real status of their execution. More generally, the pro- operations are significant in order to provide a com- posed framework aims at optimizing and automating prehensive support to ad-hoc big data analytics pro- the management of Public Administration processes cedures. through their analysis and prediction of their execu- Effectively and efficiently supporting in-memory tions. Business process analysis and prediction are representation of business process cubes conveys on therefore the two central themes of the business pro- several challenges to be faced-off. Indeed, so-computed cess management framework, which aims, by recogniz- OLAP cubes can achieve very large sizes when stored ing in these two phases, critical elements for the im- in suitable Cloud storage systems. Therefore, special- provement of the management of these Public Admin- ized approaches must be devised in order to tame such istration processes as well as the provision of services enormous sizes. Partition-based approaches seem a to the citizen. Therefore, the resulting optimizations promise trend to this end. tend towards the general objective of achieving effi- Finally, another critical problem is represented by ciency and flexibility of the Public Administration pro- the issue of supporting flexible big data prediction cesses. To this end, the proposed framework includes methodologies over target OLAP cubes, as the final two innovative components to support the analysis and goal is that of discovering useful knowledge from data- prediction phases: (i ) visual analytics on business pro- intensive business processes (e.g., [BCC+ 14, WQL+ 18, cesses, which focuses on the analysis of business pro- She18]). Again, multidimensional paradigms, such cesses (and their execution traces) using multidimen- as multidimensional clustering (e.g., [Mur85]), can be sional abstractions for the support of OLAP analysis successfully applied to this end. on business process schemes; (ii ) execution prediction on business processes, which focuses on the prediction 3 An Innovative Framework for Sup- of business process executions, to support their opti- porting OLAP-Based Big Data An- mization, through an innovative data-driven approach. In short, this approach aims to predict execution of alytics over Data-Intensive Business Public Administration business processes by resorting Processes to the analysis of the variations that business-processes The proposed framework aims at supporting OLAP- previous performances have produced on the data (fo- based big data analytics over data-intensive business cusing the attention, therefore, on the nature of the processes. It combines two main assets: analysis and data distributions that characterize these variations). prediction of business processes, with focus on the case A software tool has been implemented, as to allow the of business processes in the Public Administration, Public Administration to optimize the management and intends to reach the definition of the framework for of internal processes, evaluate their effectiveness, and the automated management and optimization of busi- adopt the necessary corrections in order to make the ness processes in the Public Administration. From a service offered to the community efficient and trans- strictly technological point of view, the fundamental parent. components of the framework are the following: Indeed, the level of citizen satisfaction is a yardstick for the Public Administration with respect to public • tools to support multidimensional analysis of management. In this sense, the framework aims to business process schemes using the OLAP ensure significant changes, including: • improvement of administrative transparency (e.g., complexity) in a very powerful and flexible manner, telematics desk for the citizen, and so forth); thus imposing a sound methodology (based on mul- tidimensional abstractions) as opposed to other ap- • certainty of compliance with procedures and reg- proaches known in the state-of-the-art literature that ulations and the traceability of activities; solve the difficult problem of monitoring and opti- • control and optimization of processes; mizing business processes through solution-driven ap- proaches (which introduce little flexibility and exten- • reduction in the time required for administrative sibility not only for application scenarios other than procedures; those for which they have been developed, but also for application scenarios characterized by execution set- • increase in “company productivity”; tings that are not very different from the latter). • global reduction of associated costs; Summarizing, the main scientific and technical re- search issues addressed by the framework are the fol- • automation of the planned activities; lowing: • accountability and monitoring of the people in- • definition of methodologies, models and tools for volved. supporting multidimensional analysis of business process schemes; The innovative features introduced by the proposed • effective and efficient representation of aggregated framework are the following. business process schemes in secondary storage; Feature 1 – Innovative techniques and tools for • definition of paradigms for the support of OLAP OLAP analysis on business process schemes: functionalities and extensions on aggregated busi- Although OLAP is a methodology applied to many ness process schemes; data models (such as graphs, sequences, text, etc.), in • definition of methodologies, models and tools for literature, as well as in industry, there are no proposals supporting the multi-resolution OLAP analysis of that offer an “explicit” OLAP support on business pro- business process schemes; cesses (for example: multidimensional browsing and exploration of aggregated business process schemes, • optimization techniques for OLAP roll-up and coverage of the most common OLAP operators and drill-down operators on aggregated business pro- operations - such as roll-up, drill-down, pivoting, etc., cess schemes; and so forth), in spite of the embryonic tools for mul- • definition of appropriate multidimensional tidimensional analysis made available by some tools metaphors for the support of visual analytics for (e.g., ProM [vDdMV+ 05]). business process using OLAP methodologies and paradigms; Feature 2 – Visual analytics tools and tech- niques on BP that exploit multidimensional ab- • efficient and scalable solutions for the support of stractions: Even in this case, the visual analytics so- visual analytics for business processes; lution proposed by the framework directly exploit the • definition of the predictive analysis method of power of multidimensional abstractions, for example data-driven process mining; thanks to multi-resolution analysis, which it is both powerful and very intuitive. It should be noted that, • cumulative similarity techniques between discrete both in literature and in the field of industrial solu- data distributions; tions, there are no approaches that propose this vision • techniques for optimizing procedures for process- of visual analytics on business processes. ing and analyzing discrete distributions on big business process data. Feature 3 – Data-driven process mining: From a purely scientific and industrial point of view, the most 4 Logical Architecture of the Proposed valuable result that the framework introduces is rep- resented by the innovative data-driven process mining Framework methodology. This methodology is not only innovative Figure 1 shows the logical architecture of the proposed in research (academic and industrial), but, despite its framework for supporting OLAP-based big data ana- complexity, it effectively captures real-world applica- lytics over data-intensive business processes. tion scenarios of business process management systems As shown in Figure 1, the proposed framework in- (which, in turn, are characterized by a certain intrinsic troduces the following layers: Acknowledgments This research has been developed in the context of the MISE Horizon 2020 – PON 2014/2020 project: “REMS.PA (Resource in Engineering Management for Software process automation in Public Administra- tion)”. References [ALRM17] Saima Gulzar Ahmad, Chee Sun Liew, M. Mustafa Rafique, and Ehsan Ullah Munir. Optimization of data-intensive workflows in stream-based data process- ing models. The Journal of Supercom- puting, 73(9):3901–3923, 2017. [BCC+ 14] Peter Braun, Juan J. Cameron, Alfredo Cuzzocrea, Fan Jiang, and Carson Kai- Sang Leung. Effectively and efficiently mining frequent patterns from dense graph streams on disk. In 18th Interna- tional Conference in Knowledge Based Figure 1: Logical architecture and Intelligent Information and Engi- neering Systems, KES 2014, Gdynia, • BPM Layer : is it the layer where the input busi- Poland, 15-17 September 2014, pages ness processes are located and exploited to popu- 338–347, 2014. late the big data layer of the framework; [CB11] Alfredo Cuzzocrea and Elisa Bertino. • OLAP Aggregation Layer : it is the layer where Privacy preserving OLAP over dis- business processes are aggregated into cubes in tributed XML data: A theoretically- order to supporting OLAP-based big data ana- sound secure-multiparty-computation lytics; approach. J. Comput. Syst. Sci., 77(6):965–987, 2011. • OLAP Analysis Layer : it is the layer where the [CBS13] Alfredo Cuzzocrea, Ladjel Bellatreche, OLAP querying, operators and operations over and Il-Yeol Song. Data warehousing business processes are implemented; and OLAP over big data: current chal- • Application Layer : it is the layer where the con- lenges and future research directions. sumer applications are located, being visual ana- In Proceedings of the sixteenth interna- lytics and prediction analytics the main function- tional workshop on Data warehousing alities supported. and OLAP, DOLAP 2013, San Fran- cisco, CA, USA, October 28, 2013, pages 67–70, 2013. 5 Conclusions and Future Work [CMF+ 16] Alfredo Cuzzocrea, Carmen De Maio, This paper has focused the attention on the problem Giuseppe Fenza, Vincenzo Loia, and of supporting big data analytics over so-called data- Mimmo Parente. OLAP analysis of intensive business processes, i.e. business processes multidimensional tweet streams for sup- connected to big data sources. We explored issues, porting advanced analytics. In Pro- models and proposals in the field, and finally the archi- ceedings of the 31st Annual ACM Sym- tecture of a real-life framework developed in the con- posium on Applied Computing, Pisa, text of a real-life project has been provided. Italy, April 4-8, 2016, pages 992–999, Future work is mainly oriented to enrich the pro- 2016. posed framework via innovative big data properties, such as: privacy preservation (e.g., [CB11, CR09]), [CMX13] Alfredo Cuzzocrea, Rim Moussa, and open big data predicates (e.g., [Kar17]), and consis- Guandong Xu. Olap*: Effectively and tency checking (e.g., [KWR+ 15]). efficiently supporting parallel OLAP over big data. In Model and Data [GK18] Janis Grabis and Janis Kampars. Appli- Engineering - Third International Con- cation of microservices for digital trans- ference, MEDI 2013, Amantea, Italy, formation of data-intensive business September 25-27, 2013. Proceedings, processes. In Proceedings of the 20th pages 38–49, 2013. International Conference on Enterprise Information Systems, ICEIS 2018, Fun- [CR09] Alfredo Cuzzocrea and Vincenzo Russo. chal, Madeira, Portugal, March 21-24, Privacy preserving OLAP and OLAP 2018, Volume 2., pages 736–742, 2018. security. In Encyclopedia of Data Ware- housing and Mining, Second Edition, [Kar17] Holden Karau. Unifying the open big pages 1575–1581. 2009. data world: The possibilities∗ of apache BEAM. In 2017 IEEE International [CS14] Alfredo Cuzzocrea and Il-Yeol Song. Conference on Big Data, BigData 2017, Big graph analytics: The state of Boston, MA, USA, December 11-14, the art and future research agenda. 2017, page 3981, 2017. In Proceedings of the 17th Interna- [KWR+ 15] Thanh Tran Thi Kim, Erhard Weiss, tional Workshop on Data Warehousing Christoph Ruhsam, Christoph Czepa, and OLAP, DOLAP 2014, Shanghai, Huy Tran, and Uwe Zdun. Embrac- China, November 3-7, 2014, pages 99– ing process compliance and flexibility 101, 2014. through behavioral consistency check- ing in ACM - A repair service manage- [CSD11] Alfredo Cuzzocrea, Il-Yeol Song, and ment case. In Business Process Manage- Karen C. Davis. Analytics over large- ment Workshops - BPM 2015, 13th In- scale multidimensional data: the big ternational Workshops, Innsbruck, Aus- data revolution! In DOLAP 2011, tria, August 31 - September 3, 2015, Re- ACM 14th International Workshop on vised Papers, pages 43–54, 2015. Data Warehousing and OLAP, Glas- gow, United Kingdom, October 28, [LJYC15] Kuan-Ching Li, Hai Jiang, Laurence T. 2011, Proceedings, pages 101–104, 2011. Yang, and Alfredo Cuzzocrea, editors. Big Data - Algorithms, Analytics, and [CSU13] Alfredo Cuzzocrea, Domenico Saccà, Applications. Chapman and Hall/CRC, and Jeffrey D. Ullman. Big data: a 2015. research agenda. In 17th International [MCB+ 11] James Manyika, Michael Chui, Brad Database Engineering & Applications Brown, Jacques Bughin, Richard Symposium, IDEAS ’13, Barcelona, Dobbs, Charles Roxburgh, and An- Spain - October 09 - 11, 2013, pages gela Hung Byers. Big data: The next 198–203, 2013. frontier for innovation, competition, and productivity. Technical report, [Cuz13] Alfredo Cuzzocrea. Analytics over McKinsey Global Institute, 2011. big data: Exploring the convergence of datawarehousing, OLAP and data- [Mur85] Fionn Murtagh. Multidimensional clus- intensive cloud infrastructures. In 37th tering algorithms. Physica-Verlag, 1985. Annual IEEE Computer Software and Applications Conference, COMPSAC [RR14] Wullianallur Raghupathi and Viju 2013, Kyoto, Japan, July 22-26, 2013, Raghupathi. Big data analytics in pages 481–483, 2013. healthcare: promise and potential. Health Inf. Sci. Syst., 2(1):3, 2014. [Cuz17] Alfredo Cuzzocrea. Scalable olap-based [Rus11] Philip Russom. Big data analytics. big data analytics over cloud infras- Technical report, TDWI Research, Ren- tructures: Models, issues, algorithms. ton, WA, USA, 2011. In Proceedings of the 2017 Interna- tional Conference on Cloud and Big [She18] Bin Shen. Universal knowledge discov- Data Computing, ICCBDC 2017, Lon- ery from big data using combined dual- don, United Kingdom, September 17 - cycle. Int. J. Machine Learning & Cy- 19, 2017, pages 17–21, 2017. bernetics, 9(1):133–144, 2018. [SMM17] Vladislav A. Shchapov, Aleksei G. Ma- sich, and Grigorii F. Masich. The technology of processing intensive struc- tured dataflow on a supercomputer. Journal of Systems and Software, 127:258–265, 2017. [SYGZ18] Dawei Sun, Hongbin Yan, Shang Gao, and Zhangbing Zhou. Performance eval- uation and analysis of multiple scenarios of big data stream computing on storm platform. TIIS, 12(7):2977–2997, 2018. [vDdMV+ 05] Boudewijn F. van Dongen, Ana Karla A. de Medeiros, H. M. W. Ver- beek, A. J. M. M. Weijters, and Wil M. P. van der Aalst. The prom frame- work: A new era in process mining tool support. In Applications and Theory of Petri Nets 2005, 26th International Conference, ICATPN 2005, Miami, USA, June 20-25, 2005, Proceedings, pages 444–454, 2005. [WQL+ 18] Xinyang Wang, Deyu Qi, Weiwei Lin, Mincong Yu, Zhishuo Zheng, Naqin Zhou, and Pengguang Chen. A gen- eral framework for big data knowledge discovery and integration. Concurrency and Computation: Practice and Experi- ence, 30(13), 2018. [WXGM18] Yulei Wu, Yang Xiang, Jingguo Ge, and Peter Mueller. High-performance com- puting for big data processing. Fu- ture Generation Comp. Syst., 88:693– 695, 2018. [YLHC14] Chao-Tung Yang, Jung-Chun Liu, Ching-Hsien Hsu, and Wei-Li Chou. On improvement of cloud virtual machine availability with virtualization fault tol- erance mechanism. The Journal of Su- percomputing, 69(3):1103–1122, 2014. [ZE11] Paul Zikopoulos and Chris Eaton. Un- derstanding Big Data: Analytics for En- terprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media, 1st edition, 2011.