-

Supporting OLAP-Based Big Data Analytics over Data-Intensive Business Processes: Issues, Models, Proposals, and a Real-Life Framework

0 Alfredo Cuzzocrea University of Trieste and ICAR-CNR Trieste , 34127

2018

2 21 24

This paper focuses the attention on the problem of supporting big data analytics over socalled data-intensive business processes, i.e. business processes connected to big data sources. This applicative setting is now more and more of great interest in the community, also due to emerging computational paradigms like Cloud Computing. The paper explores issues, models and proposals in the eld, and nally provides the architecture of a real-life framework that supports big data analytics over data-intensive business processes via fortunate OLAP metaphors.

Nowadays, the problem of supporting big data analytics (e.g., [CSD11, Cuz13, CS14, Rus11, RR14]) over so-called data-intensive business processes (e.g., [ALRM17, SMM17, GK18]) plays a relevant role. This because, on one hand, business processes still keep the most of the data, information and knowledge of verylarge enterprises and organizations, and, on the other hand, perfectly marry with the emerging characteristics of big data (e.g., [CSU13, CBS13, LJYC15, ZE11, MCB+11]).

An important solution for supporting big data analytics concerns with applying fortunate multidimensional metaphors and abstractions, mainly falling in the well-known OLAP context, thus originating an evolving trend that can be safely recognized within the Copyright © CIKM 2018 for the individual papers by the papers' authors. Copyright © CIKM 2018 for the volume as a collection by its editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). term \OLAP-based big data analytics " (e.g., [Cuz17, CMF+16]).

Inspired by this research context, in this paper we focus the attention on the problem of supporting OLAP-based big data analytics over data-intensive business processes, and we describe a real-life framework inspired developed in the context of a real-life project, called REMS.PA, which has produced the corresponding framework, mainly designed on top of open-source technologies, and that, particularly, focuses on business processes of the Public Administration.

The remaining part of this paper is organized as follows. In Section 2, we report on main research issues of supporting OLAP-based big data analytics over dataintensive business processes. In Section 3, we describe the proposed framework. Finally, in Section 4, we provide conclusions and future work for our research. 2 OLAP-based big data analytics over data-intensive business processes opens the door to several emerging research issues, among which some noticeable ones are the following: computing multidimensional OLAP aggregations over data-intensive business processes; supporting OLAP querying, operators and operations over so-computed OLAP cubes; e ective and e cient in-memory representation of business process cubes; supporting exible big data prediction methodologies over so-computed OLAP cubes.

How to aggregate a collection of data-intensive business processes? This is a relevant question that has attracted the attention of several studies. Basically, classical OLAP aggregation algorithms cannot be applied as they are, but suitable adaptations must be devised. A possibility consists in considering the graph-like nature of business processes in this respect. Doing this, the scalability property, which is relevant for big data management and processing (e.g., [WXGM18, SYGZ18, YLHC14, CMX13]), must be taken into account.

After computing aggregations, the support for OLAP querying, operators and operations must be ensured. Among queries, range queries are very significant in this context. In addition, supporting roll-up and drill-down operators is, for instance, a rst-class problem in this respect. At the same, slice and dice operations are signi cant in order to provide a comprehensive support to ad-hoc big data analytics procedures.

E ectively and e ciently supporting in-memory representation of business process cubes conveys on several challenges to be faced-o . Indeed, so-computed OLAP cubes can achieve very large sizes when stored in suitable Cloud storage systems. Therefore, specialized approaches must be devised in order to tame such enormous sizes. Partition-based approaches seem a promise trend to this end.

Finally, another critical problem is represented by the issue of supporting exible big data prediction methodologies over target OLAP cubes, as the nal goal is that of discovering useful knowledge from dataintensive business processes (e.g., [BCC+14, WQL+18, She18]). Again, multidimensional paradigms, such as multidimensional clustering (e.g., [Mur85]), can be successfully applied to this end. 3

An Innovative Framework for Supporting OLAP-Based Big Data Analytics over Data-Intensive Business Processes The proposed framework aims at supporting OLAPbased big data analytics over data-intensive business processes. It combines two main assets: analysis and prediction of business processes, with focus on the case of business processes in the Public Administration, and intends to reach the de nition of the framework for the automated management and optimization of business processes in the Public Administration. From a strictly technological point of view, the fundamental components of the framework are the following: tools to support multidimensional analysis of business process schemes using the OLAP paradigm; visual analytics tools for business processes based on multidimensional abstractions; tools to support the prediction of executions of business processes based on a data-driven approach.

The framework has been realized by using and integrating open-source software technologies for the support of business process management with the aim of speeding up and simplifying the management of the operational work ows of the Public Administration, via de ning and building the management processes in a rigorous and reliable way, and nally monitor the real status of their execution. More generally, the proposed framework aims at optimizing and automating the management of Public Administration processes through their analysis and prediction of their executions. Business process analysis and prediction are therefore the two central themes of the business process management framework, which aims, by recognizing in these two phases, critical elements for the improvement of the management of these Public Administration processes as well as the provision of services to the citizen. Therefore, the resulting optimizations tend towards the general objective of achieving e ciency and exibility of the Public Administration processes. To this end, the proposed framework includes two innovative components to support the analysis and prediction phases: (i ) visual analytics on business processes, which focuses on the analysis of business processes (and their execution traces) using multidimensional abstractions for the support of OLAP analysis on business process schemes; (ii ) execution prediction on business processes, which focuses on the prediction of business process executions, to support their optimization, through an innovative data-driven approach. In short, this approach aims to predict execution of Public Administration business processes by resorting to the analysis of the variations that business-processes previous performances have produced on the data (focusing the attention, therefore, on the nature of the data distributions that characterize these variations). A software tool has been implemented, as to allow the Public Administration to optimize the management of internal processes, evaluate their e ectiveness, and adopt the necessary corrections in order to make the service o ered to the community e cient and transparent.

Indeed, the level of citizen satisfaction is a yardstick for the Public Administration with respect to public management. In this sense, the framework aims to ensure signi cant changes, including: improvement of administrative transparency (e.g., telematics desk for the citizen, and so forth); certainty of compliance with procedures and regulations and the traceability of activities; control and optimization of processes; reduction in the time required for administrative procedures; increase in \company productivity"; global reduction of associated costs; automation of the planned activities; accountability and monitoring of the people involved.

The innovative features introduced by the proposed framework are the following.

Feature 1 { Innovative techniques and tools for OLAP analysis on business process schemes: Although OLAP is a methodology applied to many data models (such as graphs, sequences, text, etc.), in literature, as well as in industry, there are no proposals that o er an \explicit" OLAP support on business processes (for example: multidimensional browsing and exploration of aggregated business process schemes, coverage of the most common OLAP operators and operations - such as roll-up, drill-down, pivoting, etc., and so forth), in spite of the embryonic tools for multidimensional analysis made available by some tools (e.g., ProM [vDdMV+05]).

Feature 2 { Visual analytics tools and techniques on BP that exploit multidimensional abstractions: Even in this case, the visual analytics solution proposed by the framework directly exploit the power of multidimensional abstractions, for example thanks to multi-resolution analysis, which it is both powerful and very intuitive. It should be noted that, both in literature and in the eld of industrial solutions, there are no approaches that propose this vision of visual analytics on business processes.

Feature 3 { Data-driven process mining: From a purely scienti c and industrial point of view, the most valuable result that the framework introduces is represented by the innovative data-driven process mining methodology. This methodology is not only innovative in research (academic and industrial), but, despite its complexity, it e ectively captures real-world application scenarios of business process management systems (which, in turn, are characterized by a certain intrinsic complexity) in a very powerful and exible manner, thus imposing a sound methodology (based on multidimensional abstractions) as opposed to other approaches known in the state-of-the-art literature that solve the di cult problem of monitoring and optimizing business processes through solution-driven approaches (which introduce little exibility and extensibility not only for application scenarios other than those for which they have been developed, but also for application scenarios characterized by execution settings that are not very di erent from the latter).

Summarizing, the main scienti c and technical research issues addressed by the framework are the following: de nition of methodologies, models and tools for supporting multidimensional analysis of business process schemes; e ective and e cient representation of aggregated business process schemes in secondary storage; de nition of paradigms for the support of OLAP functionalities and extensions on aggregated business process schemes; de nition of methodologies, models and tools for supporting the multi-resolution OLAP analysis of business process schemes; optimization techniques for OLAP roll-up and drill-down operators on aggregated business process schemes; de nition of appropriate multidimensional metaphors for the support of visual analytics for business process using OLAP methodologies and paradigms; e cient and scalable solutions for the support of visual analytics for business processes; de nition of the predictive analysis method of data-driven process mining; cumulative similarity techniques between discrete data distributions; techniques for optimizing procedures for processing and analyzing discrete distributions on big business process data. 4

Logical Architecture of the Proposed Framework Figure 1 shows the logical architecture of the proposed framework for supporting OLAP-based big data analytics over data-intensive business processes.

As shown in Figure 1, the proposed framework introduces the following layers: BPM Layer : is it the layer where the input business processes are located and exploited to populate the big data layer of the framework; OLAP Aggregation Layer : it is the layer where business processes are aggregated into cubes in order to supporting OLAP-based big data analytics; OLAP Analysis Layer : it is the layer where the OLAP querying, operators and operations over business processes are implemented; Application Layer : it is the layer where the consumer applications are located, being visual analytics and prediction analytics the main functionalities supported. 5

Conclusions and Future Work

This paper has focused the attention on the problem of supporting big data analytics over so-called dataintensive business processes, i.e. business processes connected to big data sources. We explored issues, models and proposals in the eld, and nally the architecture of a real-life framework developed in the context of a real-life project has been provided.

Future work is mainly oriented to enrich the proposed framework via innovative big data properties, such as: privacy preservation (e.g., [CB11, CR09]), open big data predicates (e.g., [Kar17]), and consistency checking (e.g., [KWR+15]). [CB11] [CBS13] [CMF+16] [CMX13]

Acknowledgments

This research has been developed in the context of the MISE Horizon 2020 { PON 2014/2020 project: \REMS.PA (Resource in Engineering Management for Software process automation in Public Administration)". [ALRM17] [BCC+14]

Saima Gulzar Ahmad, Chee Sun Liew,

M. Mustafa Ra que, and Ehsan Ullah Munir. Optimization of data-intensive work ows in stream-based data processing models. The Journal of Supercomputing, 73(9):3901{3923, 2017. [CR09]

Alfredo Cuzzocrea. Analytics over big data: Exploring the convergence of datawarehousing, OLAP and dataintensive cloud infrastructures. In 37th

Annual IEEE Computer Software and Applications Conference, COMPSAC 2013, Kyoto, Japan, July 22-26, 2013, pages 481{483, 2013.

Alfredo Cuzzocrea. Scalable olap-based big data analytics over cloud infrastructures: Models, issues, algorithms.

In Proceedings of the 2017 International Conference on Cloud and Big Data Computing, ICCBDC 2017, London, United Kingdom, September 17 19, 2017, pages 17{21, 2017. [GK18] [Kar17] [KWR+15] [LJYC15] [MCB+11] [Mur85] [RR14] [Rus11] [She18]

Holden Karau. Unifying the open big data world: The possibilities of apache

BEAM. In 2017 IEEE International Conference on Big Data, BigData 2017, Boston, MA, USA, December 11-14, 2017, page 3981, 2017.

Thanh Tran Thi Kim, Erhard Weiss,

Christoph Ruhsam, Christoph Czepa, Huy Tran, and Uwe Zdun. Embracing process compliance and exibility through behavioral consistency checking in ACM - A repair service management case. In Business Process Management Workshops - BPM 2015, 13th International Workshops, Innsbruck, Austria, August 31 - September 3, 2015, Revised Papers, pages 43{54, 2015.

Kuan-Ching Li, Hai Jiang, Laurence T. Yang, and Alfredo Cuzzocrea, editors.

Big Data - Algorithms, Analytics, and Applications. Chapman and Hall/CRC, 2015.

James Manyika, Michael Chui, Brad

Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big data: The next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute, 2011.

Fionn Murtagh. Multidimensional clustering algorithms. Physica-Verlag, 1985. Wullianallur Raghupathi and Viju Raghupathi. Big data analytics in healthcare: promise and potential.

Health Inf. Sci. Syst., 2(1):3, 2014.

Philip Russom. Big data analytics. Technical report, TDWI Research, Renton, WA, USA, 2011. Bin Shen. Universal knowledge discov

ery from big data using combined dualcycle. Int. J. Machine Learning & Cybernetics, 9(1):133{144, 2018. [SYGZ18] [YLHC14] [ZE11]

Xinyang Wang, Deyu Qi, Weiwei Lin,

Mincong Yu, Zhishuo Zheng, Naqin Zhou, and Pengguang Chen. A general framework for big data knowledge discovery and integration. Concurrency and Computation: Practice and Experience, 30(13), 2018.

Yulei Wu, Yang Xiang, Jingguo Ge, and

Peter Mueller. High-performance computing for big data processing. Future Generation Comp. Syst., 88:693{ 695, 2018.

Chao-Tung Yang, Jung-Chun Liu,

Ching-Hsien Hsu, and Wei-Li Chou. On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism. The Journal of Supercomputing, 69(3):1103{1122, 2014.

Paul Zikopoulos and Chris Eaton. Un

derstanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media, 1st edition, 2011.

Peter

Braun ,

Juan J.

Cameron , Alfredo Cuzzocrea, Fan Jiang, and Carson KaiSang Leung. E ectively and e ciently mining frequent patterns from dense graph streams on disk . In 18th International Conference in Knowledge Based and Intelligent Information and Engineering Systems , KES 2014, Gdynia, Poland, 15 -17 September 2014 , pages 338 { 347 , 2014 .

Privacy preserving OLAP over distributed XML data: A theoreticallysound secure-multiparty-computation approach . J. Comput. Syst. Sci. , 77 ( 6 ): 965 { 987 , 2011 .

In Proceedings of the sixteenth international workshop on Data warehousing and OLAP , DOLAP 2013 , San Francisco, CA, USA, October 28 , 2013 , pages 67 { 70 , 2013 .

Alfredo

Cuzzocrea , Carmen De Maio, Giuseppe Fenza, Vincenzo Loia, and

Mimmo

Parente . OLAP analysis of multidimensional tweet streams for supporting advanced analytics . In Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, April 4-8 , 2016 , pages 992 { 999 , 2016 .

Journal of Systems and Software , 127 : 258 { 265 , 2017 .

Dawei

Sun , Hongbin Yan,

Shang

Gao , and

Zhangbing

Zhou . Performance evaluation and analysis of multiple scenarios of big data stream computing on storm platform . TIIS , 12 ( 7 ): 2977 { 2997 , 2018 .

[vDdMV+05] Boudewijn

F. van Dongen

, Ana Karla A. de Medeiros , H. M. W. Verbeek , A. J. M. M. Weijters , and Wil

M. P. van der Aalst.

The prom framework: A new era in process mining tool support . In Applications and Theory of Petri Nets 2005 , 26th International Conference, ICATPN 2005, Miami , USA, June 20-25, 2005 , Proceedings, pages 444 { 454 , 2005 .

[WQL+18] [WXGM18]