=Paper= {{Paper |id=Vol-2482/paper19 |storemode=property |title=Supporting OLAP-Based Big Data Analytics over Data-Intensive Business Processes: Issues, Models, Proposals, and a Real-Life Framework |pdfUrl=https://ceur-ws.org/Vol-2482/paper19.pdf |volume=Vol-2482 |authors=Alfredo Cuzzocrea |dblpUrl=https://dblp.org/rec/conf/cikm/Cuzzocrea18 }} ==Supporting OLAP-Based Big Data Analytics over Data-Intensive Business Processes: Issues, Models, Proposals, and a Real-Life Framework== https://ceur-ws.org/Vol-2482/paper19.pdf
       Supporting OLAP-Based Big Data Analytics over
       Data-Intensive Business Processes: Issues, Models,
             Proposals, and a Real-Life Framework

                                                   Alfredo Cuzzocrea
                                          University of Trieste and ICAR-CNR
                                                      Trieste, 34127
                                             alfredo.cuzzocrea@dia.units.it



                                                                 term “OLAP-based big data analytics” (e.g., [Cuz17,
                                                                 CMF+ 16]).
                          Abstract                                  Inspired by this research context, in this paper
                                                                 we focus the attention on the problem of support-
     This paper focuses the attention on the prob-               ing OLAP-based big data analytics over data-intensive
     lem of supporting big data analytics over so-               business processes, and we describe a real-life frame-
     called data-intensive business processes, i.e.              work inspired developed in the context of a real-life
     business processes connected to big data                    project, called REMS.PA, which has produced the
     sources. This applicative setting is now more               corresponding framework, mainly designed on top of
     and more of great interest in the commu-                    open-source technologies, and that, particularly, fo-
     nity, also due to emerging computational                    cuses on business processes of the Public Administra-
     paradigms like Cloud Computing. The paper                   tion.
     explores issues, models and proposals in the                   The remaining part of this paper is organized as fol-
     field, and finally provides the architecture of a           lows. In Section 2, we report on main research issues of
     real-life framework that supports big data an-              supporting OLAP-based big data analytics over data-
     alytics over data-intensive business processes              intensive business processes. In Section 3, we describe
     via fortunate OLAP metaphors.                               the proposed framework. Finally, in Section 4, we pro-
                                                                 vide conclusions and future work for our research.
1    Introduction
Nowadays, the problem of supporting big data an-                 2     OLAP-Based Big Data Analytics
alytics (e.g., [CSD11, Cuz13, CS14, Rus11, RR14])                      over Data-Intensive Business Pro-
over so-called data-intensive business processes (e.g.,                cesses: Emerging Research Issues
[ALRM17, SMM17, GK18]) plays a relevant role. This
                                                                 OLAP-based big data analytics over data-intensive
because, on one hand, business processes still keep the
                                                                 business processes opens the door to several emerg-
most of the data, information and knowledge of very-
                                                                 ing research issues, among which some noticeable ones
large enterprises and organizations, and, on the other
                                                                 are the following:
hand, perfectly marry with the emerging characteris-
tics of big data (e.g., [CSU13, CBS13, LJYC15, ZE11,                 • computing multidimensional OLAP aggregations
MCB+ 11]).                                                             over data-intensive business processes;
   An important solution for supporting big data an-
alytics concerns with applying fortunate multidimen-                 • supporting OLAP querying, operators and oper-
sional metaphors and abstractions, mainly falling in                   ations over so-computed OLAP cubes;
the well-known OLAP context, thus originating an
evolving trend that can be safely recognized within the              • effective and efficient in-memory representation of
                                                                       business process cubes;
Copyright © CIKM 2018 for the individual papers by the papers'
authors. Copyright © CIKM 2018 for the volume as a collection        • supporting flexible big data prediction method-
                                                                       ologies over so-computed OLAP cubes.
by its editors. This volume and its papers are published under
the Creative Commons License Attribution 4.0 International (CC
BY 4.0).
   How to aggregate a collection of data-intensive              paradigm;
business processes? This is a relevant question that
has attracted the attention of several studies. Ba-          • visual analytics tools for business processes based
sically, classical OLAP aggregation algorithms can-            on multidimensional abstractions;
not be applied as they are, but suitable adaptations
must be devised. A possibility consists in consider-         • tools to support the prediction of executions of
ing the graph-like nature of business processes in this        business processes based on a data-driven ap-
respect. Doing this, the scalability property, which           proach.
is relevant for big data management and processing
(e.g., [WXGM18, SYGZ18, YLHC14, CMX13]), must                 The framework has been realized by using and inte-
be taken into account.                                     grating open-source software technologies for the sup-
   After computing aggregations, the support for           port of business process management with the aim of
OLAP querying, operators and operations must be en-        speeding up and simplifying the management of the
sured. Among queries, range queries are very signif-       operational workflows of the Public Administration,
icant in this context. In addition, supporting roll-up     via defining and building the management processes
and drill-down operators is, for instance, a first-class   in a rigorous and reliable way, and finally monitor the
problem in this respect. At the same, slice and dice       real status of their execution. More generally, the pro-
operations are significant in order to provide a com-      posed framework aims at optimizing and automating
prehensive support to ad-hoc big data analytics pro-       the management of Public Administration processes
cedures.                                                   through their analysis and prediction of their execu-
   Effectively and efficiently supporting in-memory        tions. Business process analysis and prediction are
representation of business process cubes conveys on        therefore the two central themes of the business pro-
several challenges to be faced-off. Indeed, so-computed    cess management framework, which aims, by recogniz-
OLAP cubes can achieve very large sizes when stored        ing in these two phases, critical elements for the im-
in suitable Cloud storage systems. Therefore, special-     provement of the management of these Public Admin-
ized approaches must be devised in order to tame such      istration processes as well as the provision of services
enormous sizes. Partition-based approaches seem a          to the citizen. Therefore, the resulting optimizations
promise trend to this end.                                 tend towards the general objective of achieving effi-
   Finally, another critical problem is represented by     ciency and flexibility of the Public Administration pro-
the issue of supporting flexible big data prediction       cesses. To this end, the proposed framework includes
methodologies over target OLAP cubes, as the final         two innovative components to support the analysis and
goal is that of discovering useful knowledge from data-    prediction phases: (i ) visual analytics on business pro-
intensive business processes (e.g., [BCC+ 14, WQL+ 18,     cesses, which focuses on the analysis of business pro-
She18]). Again, multidimensional paradigms, such           cesses (and their execution traces) using multidimen-
as multidimensional clustering (e.g., [Mur85]), can be     sional abstractions for the support of OLAP analysis
successfully applied to this end.                          on business process schemes; (ii ) execution prediction
                                                           on business processes, which focuses on the prediction
3     An Innovative Framework for Sup-                     of business process executions, to support their opti-
      porting OLAP-Based Big Data An-                      mization, through an innovative data-driven approach.
                                                           In short, this approach aims to predict execution of
      alytics over Data-Intensive Business                 Public Administration business processes by resorting
      Processes                                            to the analysis of the variations that business-processes
The proposed framework aims at supporting OLAP-            previous performances have produced on the data (fo-
based big data analytics over data-intensive business      cusing the attention, therefore, on the nature of the
processes. It combines two main assets: analysis and       data distributions that characterize these variations).
prediction of business processes, with focus on the case   A software tool has been implemented, as to allow the
of business processes in the Public Administration,        Public Administration to optimize the management
and intends to reach the definition of the framework for   of internal processes, evaluate their effectiveness, and
the automated management and optimization of busi-         adopt the necessary corrections in order to make the
ness processes in the Public Administration. From a        service offered to the community efficient and trans-
strictly technological point of view, the fundamental      parent.
components of the framework are the following:                Indeed, the level of citizen satisfaction is a yardstick
                                                           for the Public Administration with respect to public
    • tools to support multidimensional analysis of        management. In this sense, the framework aims to
      business process schemes using the OLAP              ensure significant changes, including:
  • improvement of administrative transparency (e.g.,        complexity) in a very powerful and flexible manner,
    telematics desk for the citizen, and so forth);          thus imposing a sound methodology (based on mul-
                                                             tidimensional abstractions) as opposed to other ap-
  • certainty of compliance with procedures and reg-         proaches known in the state-of-the-art literature that
    ulations and the traceability of activities;             solve the difficult problem of monitoring and opti-
  • control and optimization of processes;                   mizing business processes through solution-driven ap-
                                                             proaches (which introduce little flexibility and exten-
  • reduction in the time required for administrative        sibility not only for application scenarios other than
    procedures;                                              those for which they have been developed, but also for
                                                             application scenarios characterized by execution set-
  • increase in “company productivity”;                      tings that are not very different from the latter).
  • global reduction of associated costs;                       Summarizing, the main scientific and technical re-
                                                             search issues addressed by the framework are the fol-
  • automation of the planned activities;                    lowing:

  • accountability and monitoring of the people in-              • definition of methodologies, models and tools for
    volved.                                                        supporting multidimensional analysis of business
                                                                   process schemes;
   The innovative features introduced by the proposed
                                                                 • effective and efficient representation of aggregated
framework are the following.
                                                                   business process schemes in secondary storage;
Feature 1 – Innovative techniques and tools for                  • definition of paradigms for the support of OLAP
OLAP analysis on business process schemes:                         functionalities and extensions on aggregated busi-
Although OLAP is a methodology applied to many                     ness process schemes;
data models (such as graphs, sequences, text, etc.), in
                                                                 • definition of methodologies, models and tools for
literature, as well as in industry, there are no proposals
                                                                   supporting the multi-resolution OLAP analysis of
that offer an “explicit” OLAP support on business pro-
                                                                   business process schemes;
cesses (for example: multidimensional browsing and
exploration of aggregated business process schemes,              • optimization techniques for OLAP roll-up and
coverage of the most common OLAP operators and                     drill-down operators on aggregated business pro-
operations - such as roll-up, drill-down, pivoting, etc.,          cess schemes;
and so forth), in spite of the embryonic tools for mul-
                                                                 • definition of appropriate multidimensional
tidimensional analysis made available by some tools
                                                                   metaphors for the support of visual analytics for
(e.g., ProM [vDdMV+ 05]).
                                                                   business process using OLAP methodologies and
                                                                   paradigms;
Feature 2 – Visual analytics tools and tech-
niques on BP that exploit multidimensional ab-                   • efficient and scalable solutions for the support of
stractions: Even in this case, the visual analytics so-            visual analytics for business processes;
lution proposed by the framework directly exploit the
                                                                 • definition of the predictive analysis method of
power of multidimensional abstractions, for example
                                                                   data-driven process mining;
thanks to multi-resolution analysis, which it is both
powerful and very intuitive. It should be noted that,            • cumulative similarity techniques between discrete
both in literature and in the field of industrial solu-            data distributions;
tions, there are no approaches that propose this vision
                                                                 • techniques for optimizing procedures for process-
of visual analytics on business processes.
                                                                   ing and analyzing discrete distributions on big
                                                                   business process data.
Feature 3 – Data-driven process mining: From a
purely scientific and industrial point of view, the most     4     Logical Architecture of the Proposed
valuable result that the framework introduces is rep-
resented by the innovative data-driven process mining
                                                                   Framework
methodology. This methodology is not only innovative         Figure 1 shows the logical architecture of the proposed
in research (academic and industrial), but, despite its      framework for supporting OLAP-based big data ana-
complexity, it effectively captures real-world applica-      lytics over data-intensive business processes.
tion scenarios of business process management systems           As shown in Figure 1, the proposed framework in-
(which, in turn, are characterized by a certain intrinsic    troduces the following layers:
                                                            Acknowledgments
                                                            This research has been developed in the context of
                                                            the MISE Horizon 2020 – PON 2014/2020 project:
                                                            “REMS.PA (Resource in Engineering Management for
                                                            Software process automation in Public Administra-
                                                            tion)”.

                                                            References
                                                            [ALRM17]     Saima Gulzar Ahmad, Chee Sun Liew,
                                                                         M. Mustafa Rafique, and Ehsan Ullah
                                                                         Munir. Optimization of data-intensive
                                                                         workflows in stream-based data process-
                                                                         ing models. The Journal of Supercom-
                                                                         puting, 73(9):3901–3923, 2017.
                                                            [BCC+ 14]    Peter Braun, Juan J. Cameron, Alfredo
                                                                         Cuzzocrea, Fan Jiang, and Carson Kai-
                                                                         Sang Leung. Effectively and efficiently
                                                                         mining frequent patterns from dense
                                                                         graph streams on disk. In 18th Interna-
                                                                         tional Conference in Knowledge Based
             Figure 1: Logical architecture                              and Intelligent Information and Engi-
                                                                         neering Systems, KES 2014, Gdynia,
    • BPM Layer : is it the layer where the input busi-                  Poland, 15-17 September 2014, pages
      ness processes are located and exploited to popu-                  338–347, 2014.
      late the big data layer of the framework;
                                                            [CB11]       Alfredo Cuzzocrea and Elisa Bertino.
    • OLAP Aggregation Layer : it is the layer where                     Privacy preserving OLAP over dis-
      business processes are aggregated into cubes in                    tributed XML data: A theoretically-
      order to supporting OLAP-based big data ana-                       sound secure-multiparty-computation
      lytics;                                                            approach.     J. Comput. Syst. Sci.,
                                                                         77(6):965–987, 2011.
    • OLAP Analysis Layer : it is the layer where the
                                                            [CBS13]      Alfredo Cuzzocrea, Ladjel Bellatreche,
      OLAP querying, operators and operations over
                                                                         and Il-Yeol Song. Data warehousing
      business processes are implemented;
                                                                         and OLAP over big data: current chal-
    • Application Layer : it is the layer where the con-                 lenges and future research directions.
      sumer applications are located, being visual ana-                  In Proceedings of the sixteenth interna-
      lytics and prediction analytics the main function-                 tional workshop on Data warehousing
      alities supported.                                                 and OLAP, DOLAP 2013, San Fran-
                                                                         cisco, CA, USA, October 28, 2013,
                                                                         pages 67–70, 2013.
5     Conclusions and Future Work
                                                            [CMF+ 16]    Alfredo Cuzzocrea, Carmen De Maio,
This paper has focused the attention on the problem
                                                                         Giuseppe Fenza, Vincenzo Loia, and
of supporting big data analytics over so-called data-
                                                                         Mimmo Parente. OLAP analysis of
intensive business processes, i.e. business processes
                                                                         multidimensional tweet streams for sup-
connected to big data sources. We explored issues,
                                                                         porting advanced analytics. In Pro-
models and proposals in the field, and finally the archi-
                                                                         ceedings of the 31st Annual ACM Sym-
tecture of a real-life framework developed in the con-
                                                                         posium on Applied Computing, Pisa,
text of a real-life project has been provided.
                                                                         Italy, April 4-8, 2016, pages 992–999,
   Future work is mainly oriented to enrich the pro-                     2016.
posed framework via innovative big data properties,
such as: privacy preservation (e.g., [CB11, CR09]),         [CMX13]      Alfredo Cuzzocrea, Rim Moussa, and
open big data predicates (e.g., [Kar17]), and consis-                    Guandong Xu. Olap*: Effectively and
tency checking (e.g., [KWR+ 15]).                                        efficiently supporting parallel OLAP
          over big data. In Model and Data           [GK18]      Janis Grabis and Janis Kampars. Appli-
          Engineering - Third International Con-                 cation of microservices for digital trans-
          ference, MEDI 2013, Amantea, Italy,                    formation of data-intensive business
          September 25-27, 2013. Proceedings,                    processes. In Proceedings of the 20th
          pages 38–49, 2013.                                     International Conference on Enterprise
                                                                 Information Systems, ICEIS 2018, Fun-
[CR09]    Alfredo Cuzzocrea and Vincenzo Russo.                  chal, Madeira, Portugal, March 21-24,
          Privacy preserving OLAP and OLAP                       2018, Volume 2., pages 736–742, 2018.
          security. In Encyclopedia of Data Ware-
          housing and Mining, Second Edition,        [Kar17]     Holden Karau. Unifying the open big
          pages 1575–1581. 2009.                                 data world: The possibilities∗ of apache
                                                                 BEAM. In 2017 IEEE International
[CS14]    Alfredo Cuzzocrea and Il-Yeol Song.                    Conference on Big Data, BigData 2017,
          Big graph analytics: The state of                      Boston, MA, USA, December 11-14,
          the art and future research agenda.                    2017, page 3981, 2017.
          In Proceedings of the 17th Interna-        [KWR+ 15]   Thanh Tran Thi Kim, Erhard Weiss,
          tional Workshop on Data Warehousing                    Christoph Ruhsam, Christoph Czepa,
          and OLAP, DOLAP 2014, Shanghai,                        Huy Tran, and Uwe Zdun. Embrac-
          China, November 3-7, 2014, pages 99–                   ing process compliance and flexibility
          101, 2014.                                             through behavioral consistency check-
                                                                 ing in ACM - A repair service manage-
[CSD11]   Alfredo Cuzzocrea, Il-Yeol Song, and                   ment case. In Business Process Manage-
          Karen C. Davis. Analytics over large-                  ment Workshops - BPM 2015, 13th In-
          scale multidimensional data: the big                   ternational Workshops, Innsbruck, Aus-
          data revolution!     In DOLAP 2011,                    tria, August 31 - September 3, 2015, Re-
          ACM 14th International Workshop on                     vised Papers, pages 43–54, 2015.
          Data Warehousing and OLAP, Glas-
          gow, United Kingdom, October 28,           [LJYC15]    Kuan-Ching Li, Hai Jiang, Laurence T.
          2011, Proceedings, pages 101–104, 2011.                Yang, and Alfredo Cuzzocrea, editors.
                                                                 Big Data - Algorithms, Analytics, and
[CSU13]   Alfredo Cuzzocrea, Domenico Saccà,                    Applications. Chapman and Hall/CRC,
          and Jeffrey D. Ullman. Big data: a                     2015.
          research agenda. In 17th International
                                                     [MCB+ 11]   James Manyika, Michael Chui, Brad
          Database Engineering & Applications
                                                                 Brown, Jacques Bughin, Richard
          Symposium, IDEAS ’13, Barcelona,
                                                                 Dobbs, Charles Roxburgh, and An-
          Spain - October 09 - 11, 2013, pages
                                                                 gela Hung Byers. Big data: The next
          198–203, 2013.
                                                                 frontier for innovation, competition,
                                                                 and productivity.    Technical report,
[Cuz13]   Alfredo Cuzzocrea.       Analytics over
                                                                 McKinsey Global Institute, 2011.
          big data: Exploring the convergence
          of datawarehousing, OLAP and data-         [Mur85]     Fionn Murtagh. Multidimensional clus-
          intensive cloud infrastructures. In 37th               tering algorithms. Physica-Verlag, 1985.
          Annual IEEE Computer Software and
          Applications Conference, COMPSAC           [RR14]      Wullianallur Raghupathi and Viju
          2013, Kyoto, Japan, July 22-26, 2013,                  Raghupathi.      Big data analytics in
          pages 481–483, 2013.                                   healthcare:    promise and potential.
                                                                 Health Inf. Sci. Syst., 2(1):3, 2014.
[Cuz17]   Alfredo Cuzzocrea. Scalable olap-based     [Rus11]     Philip Russom. Big data analytics.
          big data analytics over cloud infras-                  Technical report, TDWI Research, Ren-
          tructures: Models, issues, algorithms.                 ton, WA, USA, 2011.
          In Proceedings of the 2017 Interna-
          tional Conference on Cloud and Big         [She18]     Bin Shen. Universal knowledge discov-
          Data Computing, ICCBDC 2017, Lon-                      ery from big data using combined dual-
          don, United Kingdom, September 17 -                    cycle. Int. J. Machine Learning & Cy-
          19, 2017, pages 17–21, 2017.                           bernetics, 9(1):133–144, 2018.
[SMM17]      Vladislav A. Shchapov, Aleksei G. Ma-
             sich, and Grigorii F. Masich. The
             technology of processing intensive struc-
             tured dataflow on a supercomputer.
             Journal of Systems and Software,
             127:258–265, 2017.
[SYGZ18]     Dawei Sun, Hongbin Yan, Shang Gao,
             and Zhangbing Zhou. Performance eval-
             uation and analysis of multiple scenarios
             of big data stream computing on storm
             platform. TIIS, 12(7):2977–2997, 2018.
[vDdMV+ 05] Boudewijn F. van Dongen, Ana
            Karla A. de Medeiros, H. M. W. Ver-
            beek, A. J. M. M. Weijters, and Wil
            M. P. van der Aalst. The prom frame-
            work: A new era in process mining tool
            support. In Applications and Theory
            of Petri Nets 2005, 26th International
            Conference, ICATPN 2005, Miami,
            USA, June 20-25, 2005, Proceedings,
            pages 444–454, 2005.
[WQL+ 18]    Xinyang Wang, Deyu Qi, Weiwei Lin,
             Mincong Yu, Zhishuo Zheng, Naqin
             Zhou, and Pengguang Chen. A gen-
             eral framework for big data knowledge
             discovery and integration. Concurrency
             and Computation: Practice and Experi-
             ence, 30(13), 2018.
[WXGM18]     Yulei Wu, Yang Xiang, Jingguo Ge, and
             Peter Mueller. High-performance com-
             puting for big data processing. Fu-
             ture Generation Comp. Syst., 88:693–
             695, 2018.
[YLHC14]     Chao-Tung Yang, Jung-Chun Liu,
             Ching-Hsien Hsu, and Wei-Li Chou. On
             improvement of cloud virtual machine
             availability with virtualization fault tol-
             erance mechanism. The Journal of Su-
             percomputing, 69(3):1103–1122, 2014.

[ZE11]       Paul Zikopoulos and Chris Eaton. Un-
             derstanding Big Data: Analytics for En-
             terprise Class Hadoop and Streaming
             Data. McGraw-Hill Osborne Media, 1st
             edition, 2011.