<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Supporting OLAP-Based Big Data Analytics over Data-Intensive Business Processes: Issues, Models, Proposals, and a Real-Life Framework</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Alfredo Cuzzocrea University of Trieste and ICAR-CNR Trieste</institution>
          ,
          <addr-line>34127</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>2</volume>
      <fpage>21</fpage>
      <lpage>24</lpage>
      <abstract>
        <p>This paper focuses the attention on the problem of supporting big data analytics over socalled data-intensive business processes, i.e. business processes connected to big data sources. This applicative setting is now more and more of great interest in the community, also due to emerging computational paradigms like Cloud Computing. The paper explores issues, models and proposals in the eld, and nally provides the architecture of a real-life framework that supports big data analytics over data-intensive business processes via fortunate OLAP metaphors.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Nowadays, the problem of supporting big data
analytics (e.g., [CSD11, Cuz13, CS14, Rus11, RR14])
over so-called data-intensive business processes (e.g.,
[ALRM17, SMM17, GK18]) plays a relevant role. This
because, on one hand, business processes still keep the
most of the data, information and knowledge of
verylarge enterprises and organizations, and, on the other
hand, perfectly marry with the emerging
characteristics of big data (e.g., [CSU13, CBS13, LJYC15, ZE11,
MCB+11]).</p>
      <p>An important solution for supporting big data
analytics concerns with applying fortunate
multidimensional metaphors and abstractions, mainly falling in
the well-known OLAP context, thus originating an
evolving trend that can be safely recognized within the
Copyright © CIKM 2018 for the individual papers by the papers'
authors. Copyright © CIKM 2018 for the volume as a collection
by its editors. This volume and its papers are published under
the Creative Commons License Attribution 4.0 International (CC
BY 4.0).
term \OLAP-based big data analytics " (e.g., [Cuz17,
CMF+16]).</p>
      <p>Inspired by this research context, in this paper
we focus the attention on the problem of
supporting OLAP-based big data analytics over data-intensive
business processes, and we describe a real-life
framework inspired developed in the context of a real-life
project, called REMS.PA, which has produced the
corresponding framework, mainly designed on top of
open-source technologies, and that, particularly,
focuses on business processes of the Public
Administration.</p>
      <p>The remaining part of this paper is organized as
follows. In Section 2, we report on main research issues of
supporting OLAP-based big data analytics over
dataintensive business processes. In Section 3, we describe
the proposed framework. Finally, in Section 4, we
provide conclusions and future work for our research.
2
OLAP-based big data analytics over data-intensive
business processes opens the door to several
emerging research issues, among which some noticeable ones
are the following:
computing multidimensional OLAP aggregations
over data-intensive business processes;
supporting OLAP querying, operators and
operations over so-computed OLAP cubes;
e ective and e cient in-memory representation of
business process cubes;
supporting exible big data prediction
methodologies over so-computed OLAP cubes.</p>
      <p>How to aggregate a collection of data-intensive
business processes? This is a relevant question that
has attracted the attention of several studies.
Basically, classical OLAP aggregation algorithms
cannot be applied as they are, but suitable adaptations
must be devised. A possibility consists in
considering the graph-like nature of business processes in this
respect. Doing this, the scalability property, which
is relevant for big data management and processing
(e.g., [WXGM18, SYGZ18, YLHC14, CMX13]), must
be taken into account.</p>
      <p>After computing aggregations, the support for
OLAP querying, operators and operations must be
ensured. Among queries, range queries are very
significant in this context. In addition, supporting roll-up
and drill-down operators is, for instance, a rst-class
problem in this respect. At the same, slice and dice
operations are signi cant in order to provide a
comprehensive support to ad-hoc big data analytics
procedures.</p>
      <p>E ectively and e ciently supporting in-memory
representation of business process cubes conveys on
several challenges to be faced-o . Indeed, so-computed
OLAP cubes can achieve very large sizes when stored
in suitable Cloud storage systems. Therefore,
specialized approaches must be devised in order to tame such
enormous sizes. Partition-based approaches seem a
promise trend to this end.</p>
      <p>Finally, another critical problem is represented by
the issue of supporting exible big data prediction
methodologies over target OLAP cubes, as the nal
goal is that of discovering useful knowledge from
dataintensive business processes (e.g., [BCC+14, WQL+18,
She18]). Again, multidimensional paradigms, such
as multidimensional clustering (e.g., [Mur85]), can be
successfully applied to this end.
3</p>
      <p>An Innovative Framework for
Supporting OLAP-Based Big Data
Analytics over Data-Intensive Business
Processes
The proposed framework aims at supporting
OLAPbased big data analytics over data-intensive business
processes. It combines two main assets: analysis and
prediction of business processes, with focus on the case
of business processes in the Public Administration,
and intends to reach the de nition of the framework for
the automated management and optimization of
business processes in the Public Administration. From a
strictly technological point of view, the fundamental
components of the framework are the following:
tools to support multidimensional analysis of
business process schemes using the OLAP
paradigm;
visual analytics tools for business processes based
on multidimensional abstractions;
tools to support the prediction of executions of
business processes based on a data-driven
approach.</p>
      <p>The framework has been realized by using and
integrating open-source software technologies for the
support of business process management with the aim of
speeding up and simplifying the management of the
operational work ows of the Public Administration,
via de ning and building the management processes
in a rigorous and reliable way, and nally monitor the
real status of their execution. More generally, the
proposed framework aims at optimizing and automating
the management of Public Administration processes
through their analysis and prediction of their
executions. Business process analysis and prediction are
therefore the two central themes of the business
process management framework, which aims, by
recognizing in these two phases, critical elements for the
improvement of the management of these Public
Administration processes as well as the provision of services
to the citizen. Therefore, the resulting optimizations
tend towards the general objective of achieving e
ciency and exibility of the Public Administration
processes. To this end, the proposed framework includes
two innovative components to support the analysis and
prediction phases: (i ) visual analytics on business
processes, which focuses on the analysis of business
processes (and their execution traces) using
multidimensional abstractions for the support of OLAP analysis
on business process schemes; (ii ) execution prediction
on business processes, which focuses on the prediction
of business process executions, to support their
optimization, through an innovative data-driven approach.
In short, this approach aims to predict execution of
Public Administration business processes by resorting
to the analysis of the variations that business-processes
previous performances have produced on the data
(focusing the attention, therefore, on the nature of the
data distributions that characterize these variations).
A software tool has been implemented, as to allow the
Public Administration to optimize the management
of internal processes, evaluate their e ectiveness, and
adopt the necessary corrections in order to make the
service o ered to the community e cient and
transparent.</p>
      <p>Indeed, the level of citizen satisfaction is a yardstick
for the Public Administration with respect to public
management. In this sense, the framework aims to
ensure signi cant changes, including:
improvement of administrative transparency (e.g.,
telematics desk for the citizen, and so forth);
certainty of compliance with procedures and
regulations and the traceability of activities;
control and optimization of processes;
reduction in the time required for administrative
procedures;
increase in \company productivity";
global reduction of associated costs;
automation of the planned activities;
accountability and monitoring of the people
involved.</p>
      <p>The innovative features introduced by the proposed
framework are the following.</p>
      <p>Feature 1 { Innovative techniques and tools for
OLAP analysis on business process schemes:
Although OLAP is a methodology applied to many
data models (such as graphs, sequences, text, etc.), in
literature, as well as in industry, there are no proposals
that o er an \explicit" OLAP support on business
processes (for example: multidimensional browsing and
exploration of aggregated business process schemes,
coverage of the most common OLAP operators and
operations - such as roll-up, drill-down, pivoting, etc.,
and so forth), in spite of the embryonic tools for
multidimensional analysis made available by some tools
(e.g., ProM [vDdMV+05]).</p>
      <p>Feature 2 { Visual analytics tools and
techniques on BP that exploit multidimensional
abstractions: Even in this case, the visual analytics
solution proposed by the framework directly exploit the
power of multidimensional abstractions, for example
thanks to multi-resolution analysis, which it is both
powerful and very intuitive. It should be noted that,
both in literature and in the eld of industrial
solutions, there are no approaches that propose this vision
of visual analytics on business processes.</p>
      <p>Feature 3 { Data-driven process mining: From a
purely scienti c and industrial point of view, the most
valuable result that the framework introduces is
represented by the innovative data-driven process mining
methodology. This methodology is not only innovative
in research (academic and industrial), but, despite its
complexity, it e ectively captures real-world
application scenarios of business process management systems
(which, in turn, are characterized by a certain intrinsic
complexity) in a very powerful and exible manner,
thus imposing a sound methodology (based on
multidimensional abstractions) as opposed to other
approaches known in the state-of-the-art literature that
solve the di cult problem of monitoring and
optimizing business processes through solution-driven
approaches (which introduce little exibility and
extensibility not only for application scenarios other than
those for which they have been developed, but also for
application scenarios characterized by execution
settings that are not very di erent from the latter).</p>
      <p>Summarizing, the main scienti c and technical
research issues addressed by the framework are the
following:
de nition of methodologies, models and tools for
supporting multidimensional analysis of business
process schemes;
e ective and e cient representation of aggregated
business process schemes in secondary storage;
de nition of paradigms for the support of OLAP
functionalities and extensions on aggregated
business process schemes;
de nition of methodologies, models and tools for
supporting the multi-resolution OLAP analysis of
business process schemes;
optimization techniques for OLAP roll-up and
drill-down operators on aggregated business
process schemes;
de nition of appropriate multidimensional
metaphors for the support of visual analytics for
business process using OLAP methodologies and
paradigms;
e cient and scalable solutions for the support of
visual analytics for business processes;
de nition of the predictive analysis method of
data-driven process mining;
cumulative similarity techniques between discrete
data distributions;
techniques for optimizing procedures for
processing and analyzing discrete distributions on big
business process data.
4</p>
      <p>Logical Architecture of the Proposed
Framework
Figure 1 shows the logical architecture of the proposed
framework for supporting OLAP-based big data
analytics over data-intensive business processes.</p>
      <p>As shown in Figure 1, the proposed framework
introduces the following layers:
BPM Layer : is it the layer where the input
business processes are located and exploited to
populate the big data layer of the framework;
OLAP Aggregation Layer : it is the layer where
business processes are aggregated into cubes in
order to supporting OLAP-based big data
analytics;
OLAP Analysis Layer : it is the layer where the
OLAP querying, operators and operations over
business processes are implemented;
Application Layer : it is the layer where the
consumer applications are located, being visual
analytics and prediction analytics the main
functionalities supported.
5</p>
    </sec>
    <sec id="sec-2">
      <title>Conclusions and Future Work</title>
      <p>This paper has focused the attention on the problem
of supporting big data analytics over so-called
dataintensive business processes, i.e. business processes
connected to big data sources. We explored issues,
models and proposals in the eld, and nally the
architecture of a real-life framework developed in the
context of a real-life project has been provided.</p>
      <p>Future work is mainly oriented to enrich the
proposed framework via innovative big data properties,
such as: privacy preservation (e.g., [CB11, CR09]),
open big data predicates (e.g., [Kar17]), and
consistency checking (e.g., [KWR+15]).
[CB11]
[CBS13]
[CMF+16]
[CMX13]</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>This research has been developed in the context of
the MISE Horizon 2020 { PON 2014/2020 project:
\REMS.PA (Resource in Engineering Management for
Software process automation in Public
Administration)".
[ALRM17]
[BCC+14]</p>
      <sec id="sec-3-1">
        <title>Saima Gulzar Ahmad, Chee Sun Liew,</title>
        <p>M. Mustafa Ra que, and Ehsan Ullah
Munir. Optimization of data-intensive
work ows in stream-based data
processing models. The Journal of
Supercomputing, 73(9):3901{3923, 2017.
[CR09]</p>
      </sec>
      <sec id="sec-3-2">
        <title>Alfredo Cuzzocrea. Analytics over big data: Exploring the convergence of datawarehousing, OLAP and dataintensive cloud infrastructures. In 37th</title>
        <p>Annual IEEE Computer Software and
Applications Conference, COMPSAC
2013, Kyoto, Japan, July 22-26, 2013,
pages 481{483, 2013.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Alfredo Cuzzocrea. Scalable olap-based big data analytics over cloud infrastructures: Models, issues, algorithms.</title>
        <p>In Proceedings of the 2017
International Conference on Cloud and Big
Data Computing, ICCBDC 2017,
London, United Kingdom, September 17
19, 2017, pages 17{21, 2017.
[GK18]
[Kar17]
[KWR+15]
[LJYC15]
[MCB+11]
[Mur85]
[RR14]
[Rus11]
[She18]</p>
      </sec>
      <sec id="sec-3-4">
        <title>Holden Karau. Unifying the open big data world: The possibilities of apache</title>
        <p>BEAM. In 2017 IEEE International
Conference on Big Data, BigData 2017,
Boston, MA, USA, December 11-14,
2017, page 3981, 2017.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Thanh Tran Thi Kim, Erhard Weiss,</title>
        <p>Christoph Ruhsam, Christoph Czepa,
Huy Tran, and Uwe Zdun.
Embracing process compliance and exibility
through behavioral consistency
checking in ACM - A repair service
management case. In Business Process
Management Workshops - BPM 2015, 13th
International Workshops, Innsbruck,
Austria, August 31 - September 3, 2015,
Revised Papers, pages 43{54, 2015.</p>
      </sec>
      <sec id="sec-3-6">
        <title>Kuan-Ching Li, Hai Jiang, Laurence T. Yang, and Alfredo Cuzzocrea, editors.</title>
        <p>Big Data - Algorithms, Analytics, and
Applications. Chapman and Hall/CRC,
2015.</p>
      </sec>
      <sec id="sec-3-7">
        <title>James Manyika, Michael Chui, Brad</title>
        <p>Brown, Jacques Bughin, Richard
Dobbs, Charles Roxburgh, and
Angela Hung Byers. Big data: The next
frontier for innovation, competition,
and productivity. Technical report,
McKinsey Global Institute, 2011.</p>
      </sec>
      <sec id="sec-3-8">
        <title>Fionn Murtagh. Multidimensional clustering algorithms. Physica-Verlag, 1985.</title>
      </sec>
      <sec id="sec-3-9">
        <title>Wullianallur Raghupathi and Viju Raghupathi. Big data analytics in healthcare: promise and potential.</title>
        <p>Health Inf. Sci. Syst., 2(1):3, 2014.</p>
      </sec>
      <sec id="sec-3-10">
        <title>Philip Russom. Big data analytics. Technical report, TDWI Research, Renton, WA, USA, 2011.</title>
      </sec>
      <sec id="sec-3-11">
        <title>Bin Shen. Universal knowledge discov</title>
        <p>ery from big data using combined
dualcycle. Int. J. Machine Learning &amp;
Cybernetics, 9(1):133{144, 2018.
[SYGZ18]
[YLHC14]
[ZE11]</p>
      </sec>
      <sec id="sec-3-12">
        <title>Xinyang Wang, Deyu Qi, Weiwei Lin,</title>
        <p>Mincong Yu, Zhishuo Zheng, Naqin
Zhou, and Pengguang Chen. A
general framework for big data knowledge
discovery and integration. Concurrency
and Computation: Practice and
Experience, 30(13), 2018.</p>
      </sec>
      <sec id="sec-3-13">
        <title>Yulei Wu, Yang Xiang, Jingguo Ge, and</title>
        <p>Peter Mueller. High-performance
computing for big data processing.
Future Generation Comp. Syst., 88:693{
695, 2018.</p>
      </sec>
      <sec id="sec-3-14">
        <title>Chao-Tung Yang, Jung-Chun Liu,</title>
        <p>Ching-Hsien Hsu, and Wei-Li Chou. On
improvement of cloud virtual machine
availability with virtualization fault
tolerance mechanism. The Journal of
Supercomputing, 69(3):1103{1122, 2014.</p>
      </sec>
      <sec id="sec-3-15">
        <title>Paul Zikopoulos and Chris Eaton. Un</title>
        <p>derstanding Big Data: Analytics for
Enterprise Class Hadoop and Streaming
Data. McGraw-Hill Osborne Media, 1st
edition, 2011.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Peter</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Juan J.</given-names>
            <surname>Cameron</surname>
          </string-name>
          , Alfredo Cuzzocrea, Fan Jiang, and Carson KaiSang Leung.
          <article-title>E ectively and e ciently mining frequent patterns from dense graph streams on disk</article-title>
          .
          <source>In 18th International Conference in Knowledge Based and Intelligent Information and Engineering Systems</source>
          , KES 2014, Gdynia, Poland,
          <fpage>15</fpage>
          -17
          <source>September</source>
          <year>2014</year>
          , pages
          <fpage>338</fpage>
          {
          <fpage>347</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Privacy preserving OLAP over distributed XML data: A theoreticallysound secure-multiparty-computation approach</article-title>
          .
          <source>J. Comput. Syst. Sci.</source>
          ,
          <volume>77</volume>
          (
          <issue>6</issue>
          ):
          <volume>965</volume>
          {
          <fpage>987</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>In Proceedings of the sixteenth international workshop on Data warehousing and OLAP</source>
          , DOLAP
          <year>2013</year>
          , San Francisco, CA, USA, October
          <volume>28</volume>
          ,
          <year>2013</year>
          , pages
          <fpage>67</fpage>
          {
          <fpage>70</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Alfredo</given-names>
            <surname>Cuzzocrea</surname>
          </string-name>
          , Carmen De Maio, Giuseppe Fenza, Vincenzo Loia, and
          <string-name>
            <given-names>Mimmo</given-names>
            <surname>Parente</surname>
          </string-name>
          .
          <article-title>OLAP analysis of multidimensional tweet streams for supporting advanced analytics</article-title>
          .
          <source>In Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, April 4-8</source>
          ,
          <year>2016</year>
          , pages
          <fpage>992</fpage>
          {
          <fpage>999</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>Journal of Systems and Software</source>
          ,
          <volume>127</volume>
          :
          <fpage>258</fpage>
          {
          <fpage>265</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Dawei</given-names>
            <surname>Sun</surname>
          </string-name>
          , Hongbin Yan,
          <string-name>
            <given-names>Shang</given-names>
            <surname>Gao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Zhangbing</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Performance evaluation and analysis of multiple scenarios of big data stream computing on storm platform</article-title>
          .
          <source>TIIS</source>
          ,
          <volume>12</volume>
          (
          <issue>7</issue>
          ):
          <volume>2977</volume>
          {
          <fpage>2997</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [vDdMV+05]
          <string-name>
            <surname>Boudewijn</surname>
            <given-names>F. van Dongen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ana Karla A. de Medeiros</surname>
            ,
            <given-names>H. M. W.</given-names>
          </string-name>
          <string-name>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J. M. M. Weijters</surname>
          </string-name>
          , and
          <string-name>
            <surname>Wil</surname>
            <given-names>M. P. van der Aalst.</given-names>
          </string-name>
          <article-title>The prom framework: A new era in process mining tool support</article-title>
          .
          <source>In Applications and Theory of Petri Nets</source>
          <year>2005</year>
          , 26th International Conference, ICATPN 2005,
          <article-title>Miami</article-title>
          , USA, June 20-25,
          <year>2005</year>
          , Proceedings, pages
          <volume>444</volume>
          {
          <fpage>454</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>[WQL+18] [WXGM18]</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>