<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ProMetheuS: A Suite for Process Mining Applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lucantonio Ghionna</string-name>
          <email>l.ghionna@mat.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luigi Granata</string-name>
          <email>luigi.granata@exeura.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianluigi Greco</string-name>
          <email>ggreco@mat.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massimo Guarascio</string-name>
          <email>guarascio@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Matematica, Università della Calabria</institution>
          ,
          <addr-line>I-87036 Rende</addr-line>
          ,
          <country country="IT">ITALY</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Exeura SRL</institution>
          ,
          <addr-line>Via Pedro Alvares Cabral - C.da Lecco, I-87036, Rende</addr-line>
          ,
          <country country="IT">ITALY</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>ICAR-CNR</institution>
          ,
          <addr-line>Via P.Bucci 41C, I-87036, Rende</addr-line>
          ,
          <country country="IT">ITALY</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Process mining is an established approach for analyzing and modeling complex business processes. In this paper we showcase ProMetheuS, a flexible and scalable suite for process mining natively designed for industrial applications. Moving from the experience of the ProM framework, the state-of-art process mining tool, ProMetheuS introduces three innovative designing elements. Firstly, ProMetheuS defines the concept of flow of mining, which is aimed at supporting the design of complex mining applications, where various mining tasks can be combined and automatically orchestrated at run-time. Secondly, ProMetheuS exports a rich set of facilities to help developers in building interactive applications providing on-the-fly feedback during analysis. Finally, behind the scenes, a powerful stream-based log-handling subsystem ensures scalability in data-intensive applications.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the context of enterprise automation, process mining is an established approach
to support the analysis and the design of complex business processes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In a typical
process mining scenario, the goal is to derive a model for a transactional process
capable of explaining all activities registered in some log given at hand. Eventually, the
“mined” model can be used to design a detailed process schema possibly supporting
forthcoming enactments, or to describe its actual behavior.
      </p>
      <p>
        The ProM framework [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is an open and extendable tool for process mining, which
enables users to write and import their own mining algorithms as plug-ins. ProM
currently supports a wide range of process mining applications (e.g., control-flow
mining, decision tree induction, or clustering, to cite a few) and analysis tasks (e.g.,
validation of process models, performance analysis, or statistical evaluations). Thanks
to this valuable packaging, ProM represents the state-of-art tool for process mining,
and many real-world scenarios exploiting its mining capabilities have been discussed
in literature (see e.g., [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). Despite its success, however, certain issues of flexibility
and scalability might arise with the use of the framework, which limit its effectiveness
in handling complex industrial applications [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>In this paper, we describe ProMetheuS1, a novel suite for process mining
introducing innovative designing elements, which are aimed at facing some limitations of
ProM and at providing new insights on the development of process analysis software.</p>
      <p>
        First, ProMetheuS has been specifically conceived to support the definition of
complex mining applications, where various mining tasks can be combined and
automatically orchestrated at run-time: Process mining applications may involve dozens
of different tasks, ranging from data acquisition, to data manipulation, information
extraction based on different mining algorithms, recombination of mining results, and
visualization. These different kinds of task can be managed in ProM, but at the price
of requiring human intervention in their coordination. Indeed, constructing complex
mining applications requires manually invoking the various tasks by collecting and
storing each intermediate result and by reusing them as the input for some further
tasks. ProM 6.0 has simplified the chaining of intermediate results by letting tasks be
aware of the kinds of inputs/outputs they are supporting [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In order to automatize
and easily deploy mining applications involving different tasks, ProMetheuS
introduces instead the concept of “flow of mining”, a very natural and manageable way of
designing complex mining processes. Indeed, ProMetheuS supports the deployment
of mining applications in their entity, by allowing to design mining processes as
complex flows of elementary bricks. Each brick produces an output that may be used as
input for other bricks in the flow. Consequently, users may incrementally build the
desired flow, by connecting existing blocks or adding new ones to manipulate
produced outputs. In fact, ProMetheuS comes equipped with a run-time engine that
supports and monitors the execution of the mining flow and that orchestrates the
compositions of the various elementary bricks. Notably, ProMetheuS allows the definition
and the organization of bricks in workspaces, grouping resources according their
application domain (e.g., text mining, rules learning, etc.). Resources belonging to
different workspaces can be transparently connected together to build mixed flows.
      </p>
      <p>Second, ProMetheuS allows users to build interactive applications providing
onthe-fly feedback during analysis: A plug-in based architecture is a crucial factor to
provide flexibility for real-world applications. However, each plug-in is current
viewed in the ProM framework as a monolithic box, where interaction is limited to
the startup phase in which users configure the execution environment of each
algorithm by setting all parameters. ProMetheuS extends the flexibility of each plug-in by
introducing an “interactive execution” mode (in addition to the standard “batch” one),
i.e., it supports an approach to process mining where users may continually interact
with the mining algorithms and provide feedbacks trough the graphical user interface.</p>
      <p>
        Finally, ProMetheuS ensures scalability over large volumes of data: In real
industrial environments, enormous volumes of data are available for mining analysis. Yet,
few efforts (see e.g., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), have been spent to provide an adequate support for
dataintensive applications. ProM imports the whole log into the main memory or, if this is
not feasible, loads only a batch of data per time and stores remaining batches in
disk1The system has been released by Exeura S.r.l.---under the name of “OKT Process Mining
Suite”---as an open source software for the OpenKnowTech project founded by the Italian
Ministry of University and Research (MIUR).
resident swap files, at the price of slower access time2. To face scalability issues,
ProMetheuS adopts instead a data management subsystem based on a stream handling
model for data acquisition. Thus, rather than building a complete in-memory
representation of data, this model stores statistical sketches only, while supporting
ondemand streaming access to the original log (no additional paging files are required).
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>ProMetheuS Architecture</title>
      <p>
        As shown in Fig. 1, ProMetheuS is implemented over four distinct logic layers.
The data layer manages physical low-level operations for acquiring and storing
elementary data types. The layer defines primitives for the input/output and for the
modification of Log data, representing log files, of Model data, representing the abstraction
of a process model, and of Custom data, representing user defined data-types.
Regarding log representation, ProMetheuS supports the MXML data model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], while a
subset of the standard XPDL 2.0 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is used for model representation.
      </p>
      <p>The API layer is responsible of two basic functionalities. Firstly, it supports the
efficient internal storage of log files. In particular, it handles a main-memory repository
storing dependencies graphs, i.e., graphs whose nodes represent the activities in the
process log and whose edges represent the relationship of precedence among them. In
fact, these structures are internally built by scanning once the input log, while further
I/O operations may be executed when additional information is needed3. Secondly, it
allows a transparent access to the data layer through dedicated managers. In details, a
2ProM 6.0 uses the OpenXES library---see http://www.xes-standard.org/openxes/start.
3SAX libraries are used for reading XML streams---see http://www.saxproject.org.
LogManager, a DependenciesManager, and a ModelManager provide primitives to
manipulate logs, dependencies graphs, and models respectively (see Fig. 1).</p>
      <p>
        Above the API layer, it is placed the computational layer. In ProMetheuS, a
computational resource is a plug-in component, which performs a specific task in a flow
of mining. ProMetheuS provides three main templates for computational resource. A
source is a template conceived to access the input data on which the mining analysis
has to be performed. In particular, a Log Source, a Model Source, and a Custom
Source are provided for handling logs, models, and custom data sources respectively.
Mining modules are responsible of performing mining algorithms and statistical
evaluations on the input provided by source modules. In particular, a Log Miner template
manages a Log as input, and produces as output one or more instances of Log. A
Model Miner template works on a Log input, and produces a Model. ProMetheuS
comes equipped with various mining modules, with the default one being the α-miner
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A Custom Module template is conceived to work on Custom types. Finally, sinks
templates are intended to manage the outputs of the mining process. These templates
are useful for visualization, statistical analysis and storage of the computed results.
      </p>
      <p>At the computational layer a Mediator manages communications between plug-ins
and the Front Office layer. In particular, during the batch execution mode, the
mediator automatically checks for the dependencies among the involved plug-ins, by tracing
the state of the various executions and the execute availability of their input. Indeed, a
plug-in may be executed only when all its inputs are available. Importantly, during the
interactive mode, the mediator manages the interaction between the various graphical
component associated with the different plug-ins. Basically, when the state of a
component is modified, an event is generated and sent to the mediator, being then in
charge of dispatching it to the other components, which can react accordingly.</p>
      <p>The Front Office Layer exports GUI functionalities for the creation of a process
mining flow, for the configuration of environment parameters, and for the
visualization of results. A workflow engine is provided for the batch execution of the analysis.
3</p>
    </sec>
    <sec id="sec-3">
      <title>ProMetheuS in action</title>
      <p>We now overview the functionalities of ProMetheuS, by showcasing the complete
deployment of a sample flow of mining. We also discuss some scalability results
obtained by executing the flow in different configuration scenarios.
3.1</p>
      <p>Designing and executing a flow of mining</p>
      <p>To design a flow of mining, ProMetheuS provides the user with an intuitive GUI
consisting of several graphical elements and facilities. The Workspace Explorer
shows all available workspaces as navigable entries, in which plug-ins are organized
according their type (i.e., source, mining modules, or sinks). The Workarea is the
design panel on which users can freely customize mining flow properties. Users can
quickly add/remove concrete instances of plug-in definitions (by dragging them from
the workspace explorer), edit connections between plug-ins, combine input/outputs,
and control execution flow. Once a plug-in instance is placed, users can configure its
execution environment in two steps: Parameters Configuration, where if the selected
plug-in requires some input parameters, then users can proceed to their configuration,
and Edge Configuration, where users can insert a new connection between modules,
can rearrange a defined connection, or can remove it from the mining flow.</p>
      <p>A sample flow of mining ready for the execution is depicted in Fig. 2. Given a flow
of mining, users can run all executable plug-ins at once, or execute only selected ones.
Interestingly, users may define different flows in the same work area and run
unrelated plug-ins in parallel, reducing the overall execution time of the analysis. After the
computation, users can visualize the actual value of input/output data by using
ProMetheuS’s default inspectors graphical components. An inspector is a very
generic data explorer, which is able to produce a suitable representation of a specific data
type of the flow (see Fig. 3). Notably, users can program their own inspectors for
custom data types or can create multiple views on the same data set, each one
depicting some portion of the data information of interest.</p>
      <p>Plug-ins can be graphically composed in high-level blocks of components
performing user-defined operations. In many occasions, it might be necessary to perform the
same operation many times in the same mining flow or in different flows as well. In
order to suite this need, ProMetheuS supports the grouping of connected plug-ins into
macros that can be used as bricks with their own input and outputs (see Fig. 4).
Fig. 3. Inspecting the flow: The input/output of executed plug-ins can be inspected by clicking
on flow arrows or on connection ports. The log inspector shows relevant statistics (e.g., number
of activities) on the log, the model inspector draws the process workflow reporting summary
information (e.g., frequency of a transition), the output inspector provides the resulting XPDL.</p>
      <p>ProMetheuS allows users to modify at the run-time the parameters of mining
algorithms and to manipulate their execution logic on the basis of feedbacks they provide
during the current execution. To support interaction, plug-ins can be equipped with
customizable graphical components. In particular, a plug-in can be associated with a
menu, with a toolbar, with a main pane (i.e., a graphical area for controlling
execution), with a bottom pane (i.e., a panel for setting parameters), and with the quick view
(i.e., a panel providing an overview of the plug-in status). Fig. 5 depicts a possible
interactive refinement of a model computed by the α-miner of the flow of Fig. 2.</p>
      <p>The execution time of the flow shown in Fig. 2 has been measured in both
ProMetheuS and ProM considering different log sizes4 (see Fig. 6). Notably, the time
required by ProMetheuS to complete the whole flow is much lower than the time
ProM needs to just import the log. Moreover, as shown in Table 1, the performances
of the ProM buffered importer deteriorate quickly at the growing of the log size,
because of the overhead of writing swap files.
4 We used a Xeon 4 quad-core, with 8 Gb of Ram and running Ubuntu 11.04 Server.
1 Gb
3 Gb
5 Gb
7 Gb
10 Gb
55
180
225
420
580
1750
1800
1800
1800
1800</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>
        In this paper we presented ProMetheuS, a suite for process mining applications [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
providing novel design elements. ProMetheuS is based on the concept of flow of
mining, which enables user to design complex mining processes in which different
mining tasks can be combined. Each task can be controlled interactively, and users can
exploit run-time feedbacks to improve the quality of their analysis. In the case of
mining large logs, the stream-based log handling may also help in achieving good
scalability performances by just loading information needed for the analysis.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. W. van der Aalst,
          <string-name>
            <given-names>A.</given-names>
            <surname>Weijters</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Maruster</surname>
          </string-name>
          ,
          <article-title>"Workflow Mining: Discovering Process Models from Event Logs,"</article-title>
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          , vol.
          <volume>16</volume>
          , no.
          <issue>9</issue>
          , pp.
          <fpage>1128</fpage>
          -
          <lpage>1142</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>H.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Buijs</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. van Dongen and W. van der Aalst</surname>
          </string-name>
          ,
          <article-title>"ProM 6: The Process Mining Toolkit,"</article-title>
          <source>in Proceedings of BPM Demonstration Track</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>N. van Beest</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Maruster</surname>
          </string-name>
          ,
          <article-title>"A Process Mining Approach to Redesign Business Processes - A Case Study in Gas Industry,"</article-title>
          <source>in Proceedings of SYNASC '07</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Ingvaldsen</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gulla</surname>
          </string-name>
          ,
          <article-title>"Preprocessing Support for Large Scale Process Mining of SAP transactions,"</article-title>
          <source>in Proceedings of Business Process Management Workshops</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. W. van der Aalst and
          <string-name>
            <surname>B. van Dongen</surname>
          </string-name>
          ,
          <article-title>"A meta model for process mining data,"</article-title>
          <source>in Proceedings of the CAiSE 05 Workshops</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Ping</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mair</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <article-title>"Using UML to design distributed collaborative workflows: from UML to XPDL,"</article-title>
          <source>in Proceedings of IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. W. van der Aalst,
          <string-name>
            <surname>B. van Dongen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Herbs</surname>
          </string-name>
          ,
          <article-title>"Workflow mining: a survey of issues and approaches," Data Knowledge Engineering</article-title>
          , vol.
          <volume>47</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>237</fpage>
          -
          <lpage>267</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>