       Supporting OLAP-Based Big Data Analytics over
       Data-Intensive Business Processes: Issues, Models,
             Proposals, and a Real-Life Framework

                                                   Alfredo Cuzzocrea
                                          University of Trieste and ICAR-CNR
                                                      Trieste, 34127

                                                                 term “OLAP-based big data analytics” (e.g., [Cuz17,
                                                                 CMF+ 16]).
                          Abstract                                  Inspired by this research context, in this paper
                                                                 we focus the attention on the problem of support-
     This paper focuses the attention on the prob-               ing OLAP-based big data analytics over data-intensive
     lem of supporting big data analytics over so-               business processes, and we describe a real-life frame-
     called data-intensive business processes, i.e.              work inspired developed in the context of a real-life
     business processes connected to big data                    project, called REMS.PA, which has produced the
     sources. This applicative setting is now more               corresponding framework, mainly designed on top of
     and more of great interest in the commu-                    open-source technologies, and that, particularly, fo-
     nity, also due to emerging computational                    cuses on business processes of the Public Administra-
     paradigms like Cloud Computing. The paper                   tion.
     explores issues, models and proposals in the                   The remaining part of this paper is organized as fol-
     field, and finally provides the architecture of a           lows. In Section 2, we report on main research issues of
     real-life framework that supports big data an-              supporting OLAP-based big data analytics over data-
     alytics over data-intensive business processes              intensive business processes. In Section 3, we describe
     via fortunate OLAP metaphors.                               the proposed framework. Finally, in Section 4, we pro-
                                                                 vide conclusions and future work for our research.
1    Introduction
Nowadays, the problem of supporting big data an-                 2     OLAP-Based Big Data Analytics
alytics (e.g., [CSD11, Cuz13, CS14, Rus11, RR14])                      over Data-Intensive Business Pro-
over so-called data-intensive business processes (e.g.,                cesses: Emerging Research Issues
[ALRM17, SMM17, GK18]) plays a relevant role. This
                                                                 OLAP-based big data analytics over data-intensive
because, on one hand, business processes still keep the
                                                                 business processes opens the door to several emerg-
most of the data, information and knowledge of very-
                                                                 ing research issues, among which some noticeable ones
large enterprises and organizations, and, on the other
                                                                 are the following:
hand, perfectly marry with the emerging characteris-
tics of big data (e.g., [CSU13, CBS13, LJYC15, ZE11,                 • computing multidimensional OLAP aggregations
MCB+ 11]).                                                             over data-intensive business processes;
   An important solution for supporting big data an-
alytics concerns with applying fortunate multidimen-                 • supporting OLAP querying, operators and oper-
sional metaphors and abstractions, mainly falling in                   ations over so-computed OLAP cubes;
the well-known OLAP context, thus originating an
evolving trend that can be safely recognized within the              • effective and efficient in-memory representation of
                                                                       business process cubes;
   How to aggregate a collection of data-intensive              paradigm;
business processes? This is a relevant question that
has attracted the attention of several studies. Ba-          • visual analytics tools for business processes based
sically, classical OLAP aggregation algorithms can-            on multidimensional abstractions;
not be applied as they are, but suitable adaptations
must be devised. A possibility consists in consider-         • tools to support the prediction of executions of
ing the graph-like nature of business processes in this        business processes based on a data-driven ap-
respect. Doing this, the scalability property, which           proach.
is relevant for big data management and processing
(e.g., [WXGM18, SYGZ18, YLHC14, CMX13]), must                 The framework has been realized by using and inte-
be taken into account.                                     grating open-source software technologies for the sup-
   After computing aggregations, the support for           port of business process management with the aim of
OLAP querying, operators and operations must be en-        speeding up and simplifying the management of the
sured. Among queries, range queries are very signif-       operational workflows of the Public Administration,
icant in this context. In addition, supporting roll-up     via defining and building the management processes
and drill-down operators is, for instance, a first-class   in a rigorous and reliable way, and finally monitor the
problem in this respect. At the same, slice and dice       real status of their execution. More generally, the pro-
operations are significant in order to provide a com-      posed framework aims at optimizing and automating
prehensive support to ad-hoc big data analytics pro-       the management of Public Administration processes
cedures.                                                   through their analysis and prediction of their execu-
   Effectively and efficiently supporting in-memory        tions. Business process analysis and prediction are
representation of business process cubes conveys on        therefore the two central themes of the business pro-
several challenges to be faced-off. Indeed, so-computed    cess management framework, which aims, by recogniz-
OLAP cubes can achieve very large sizes when stored        ing in these two phases, critical elements for the im-
in suitable Cloud storage systems. Therefore, special-     provement of the management of these Public Admin-
ized approaches must be devised in order to tame such      istration processes as well as the provision of services
enormous sizes. Partition-based approaches seem a          to the citizen. Therefore, the resulting optimizations
promise trend to this end.                                 tend towards the general objective of achieving effi-
   Finally, another critical problem is represented by     ciency and flexibility of the Public Administration pro-
the issue of supporting flexible big data prediction       cesses. To this end, the proposed framework includes
methodologies over target OLAP cubes, as the final         two innovative components to support the analysis and
goal is that of discovering useful knowledge from data-    prediction phases: (i ) visual analytics on business pro-
intensive business processes (e.g., [BCC+ 14, WQL+ 18,     cesses, which focuses on the analysis of business pro-
She18]). Again, multidimensional paradigms, such           cesses (and their execution traces) using multidimen-
as multidimensional clustering (e.g., [Mur85]), can be     sional abstractions for the support of OLAP analysis
successfully applied to this end.                          on business process schemes; (ii ) execution prediction
                                                           on business processes, which focuses on the prediction
3     An Innovative Framework for Sup-                     of business process executions, to support their opti-
      porting OLAP-Based Big Data An-                      mization, through an innovative data-driven approach.
                                                           In short, this approach aims to predict execution of
      alytics over Data-Intensive Business                 Public Administration business processes by resorting
      Processes                                            to the analysis of the variations that business-processes
The proposed framework aims at supporting OLAP-            previous performances have produced on the data (fo-
based big data analytics over data-intensive business      cusing the attention, therefore, on the nature of the
processes. It combines two main assets: analysis and       data distributions that characterize these variations).
prediction of business processes, with focus on the case   A software tool has been implemented, as to allow the
of business processes in the Public Administration,        Public Administration to optimize the management
and intends to reach the definition of the framework for   of internal processes, evaluate their effectiveness, and
the automated management and optimization of busi-         adopt the necessary corrections in order to make the
ness processes in the Public Administration. From a        service offered to the community efficient and trans-
strictly technological point of view, the fundamental      parent.
components of the framework are the following:                Indeed, the level of citizen satisfaction is a yardstick
                                                           for the Public Administration with respect to public
    • tools to support multidimensional analysis of        management. In this sense, the framework aims to
      business process schemes using the OLAP              ensure significant changes, including:
  • improvement of administrative transparency (e.g.,        complexity) in a very powerful and flexible manner,
    telematics desk for the citizen, and so forth);          thus imposing a sound methodology (based on mul-
                                                             tidimensional abstractions) as opposed to other ap-
  • certainty of compliance with procedures and reg-         proaches known in the state-of-the-art literature that
    ulations and the traceability of activities;             solve the difficult problem of monitoring and opti-
  • control and optimization of processes;                   mizing business processes through solution-driven ap-
                                                             proaches (which introduce little flexibility and exten-
  • reduction in the time required for administrative        sibility not only for application scenarios other than
    procedures;                                              those for which they have been developed, but also for
                                                             application scenarios characterized by execution set-
  • increase in “company productivity”;                      tings that are not very different from the latter).
  • global reduction of associated costs;                       Summarizing, the main scientific and technical re-
                                                             search issues addressed by the framework are the fol-
  • automation of the planned activities;                    lowing:

  • accountability and monitoring of the people in-              • definition of methodologies, models and tools for
    volved.                                                        supporting multidimensional analysis of business
                                                                   process schemes;
   The innovative features introduced by the proposed
                                                                 • effective and efficient representation of aggregated
framework are the following.
                                                                   business process schemes in secondary storage;
Feature 1 – Innovative techniques and tools for                  • definition of paradigms for the support of OLAP
OLAP analysis on business process schemes:                         functionalities and extensions on aggregated busi-
Although OLAP is a methodology applied to many                     ness process schemes;
data models (such as graphs, sequences, text, etc.), in
                                                                 • definition of methodologies, models and tools for
literature, as well as in industry, there are no proposals
                                                                   supporting the multi-resolution OLAP analysis of
that offer an “explicit” OLAP support on business pro-
                                                                   business process schemes;
cesses (for example: multidimensional browsing and
exploration of aggregated business process schemes,              • optimization techniques for OLAP roll-up and
coverage of the most common OLAP operators and                     drill-down operators on aggregated business pro-
operations - such as roll-up, drill-down, pivoting, etc.,          cess schemes;
and so forth), in spite of the embryonic tools for mul-
                                                                 • definition of appropriate multidimensional
tidimensional analysis made available by some tools
                                                                   metaphors for the support of visual analytics for
(e.g., ProM [vDdMV+ 05]).
                                                                   business process using OLAP methodologies and
Feature 2 – Visual analytics tools and tech-
niques on BP that exploit multidimensional ab-                   • efficient and scalable solutions for the support of
stractions: Even in this case, the visual analytics so-            visual analytics for business processes;
lution proposed by the framework directly exploit the
                                                                 • definition of the predictive analysis method of
power of multidimensional abstractions, for example
                                                                   data-driven process mining;
thanks to multi-resolution analysis, which it is both
powerful and very intuitive. It should be noted that,            • cumulative similarity techniques between discrete
both in literature and in the field of industrial solu-            data distributions;
tions, there are no approaches that propose this vision
                                                                 • techniques for optimizing procedures for process-
of visual analytics on business processes.
                                                                   ing and analyzing discrete distributions on big
                                                                   business process data.
Feature 3 – Data-driven process mining: From a
purely scientific and industrial point of view, the most     4     Logical Architecture of the Proposed
valuable result that the framework introduces is rep-
resented by the innovative data-driven process mining
methodology. This methodology is not only innovative         Figure 1 shows the logical architecture of the proposed
in research (academic and industrial), but, despite its      framework for supporting OLAP-based big data ana-
complexity, it effectively captures real-world applica-      lytics over data-intensive business processes.
tion scenarios of business process management systems           As shown in Figure 1, the proposed framework in-
(which, in turn, are characterized by a certain intrinsic    troduces the following layers:
