Supporting Process Mining Workflows with
                   RapidProM

                R.S. Mans, W.M.P. van der Aalst, H.M.W. Verbeek

    Department of Mathematics and Computer Science, Eindhoven University of
      Technology, P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands.
             {r.s.mans,w.m.p.v.d.aalst,h.m.w.verbeek}@tue.nl


       Abstract. Process mining is gaining more and more attention both in
       industry and practice. As such, the number of process mining products is
       steadily increasing. However, none of these products allow for composing
       and executing analysis workflows consisting of multiple process mining
       algorithms. As a result, the analyst needs to perform repetitive process
       mining tasks manually and scientific process experiments are extremely
       labor intensive.
       To this end, we have RapidMiner 5, which allows for the definition and ex-
       ecution of analysis workflows, connected with the process mining frame-
       work ProM 6. As such any discovery, conformance, or extension algo-
       rithm of ProM can be used within a RapidMiner analysis process thus
       supporting process mining workflows.


1    Introduction
During last decade, process mining has become a mature technique for analyzing
all kinds of business processes based on a so-called event log [3]. Not surpris-
ingly, the number of process mining products has increased. However, for all
products, the analysis steps need to be done in an ad-hoc fashion thereby losing
the overview of all the steps that are done and their order. In other words, within
the process mining domain there is currently no support for the construction and
execution of a workflow which describes all analysis steps and their order.
    Within the scientific workflow domain, special kinds of workflow manage-
ment systems exist which are designed to compose and execute a series of com-
putational or data manipulation steps (e.g. RapidMiner, KNIME, and Taverna).
When applying scientific workflow concepts in the process mining field, several
advantages can be realized. For example, comparable process mining analyses
can be repeated by just one-click of a button and scientific experiments can be ex-
ecuted in an automated fashion. Furthermore, within several scientific workflow
management systems many data and machine learning techniques are readily
available. As such, different techniques can be easily combined for an end-to-end
analysis.
    To this end, we have integrated the process mining framework ProM 6 [4]
within the scientific workflow management system RapidMiner 5. That is, pro-
cess mining functionality is added as an extension to RapidMiner. In this paper,
this extension, called RapidProM, is discussed in detail.


Copyright c 2014 for this paper by its authors. Copying permitted for private and academic
purposes.
2

2    Defining and Executing Process Mining Workflows

We first provide a general introduction to RapidMiner. Then we present the
extension which supports process mining analysis workflows.
    RapidMiner is a software product allowing for advanced analytics, i.e. sophis-
ticated quantitative methods (for example, statistics, descriptive and predictive
data mining, simulation and optimization) to produce insights that traditional
approaches to Business Intelligence (BI) are unlikely to discover [2]. Where BI fo-
cusses on querying and reporting combined with simple visualization techniques
showing dashboards and scorecards, advanced analytics aims at automatically
finding hidden patterns too complex for humans to find. Moreover, BI looks back
at the past whereas advanced analytics also aims to provide predictions about
the future. In addition, RapidMiner provides a GUI to design and execute an
analytical pipeline. After execution the results can be inspected.


      ProM
    operators


         Fig. 1: A workflow in which several ProM 6 plug-ins are executed.
                                                                                    3

Table 1: For each subfolder a brief description is provided and some example operators
are mentioned.

Subfolder    Description                                       Example Operators
Import       ProM objects are imported from file (e.g. an Read Log File and Read
             event log or a Petri net).                        PNML file.
Mining       A process mining algorithm is executed. The ILP Miner, Passage Miner,
             algorithms may discover knowledge regarding and Inductive Miner.
             the control-flow (e.g. a Petri net), organiza-
             tional (e.g. a social network), and performance
             perspectives (e.g. a dotted chart).
Analysis     Typically an analysis is performed on a pro- Replay a Log on Petri
             cess mining result. For example, timing infor- Net for Conformance
             mation is projected on a Petri net.               Analysis   and    Repair
                                                               Model.
Export       A process mining result is saved to disk.         Export Log   and     PNML
                                                               Export.
Filtering    A filter is applied to an event log. For example, Add Artificial Start
             an artificial start and end event is added to and End Event Filter
             each trace.                                       and Add Noise Log Filter
                                                               and Convert Process Tree
                                                               into a Petri Net.
Conversion   One type of process mining result is converted Convert Process Tree
             into another type of process mining result.       into a Petri Net.
    Looking at advanced analytics in general, one drawback is that processes are
not made explicit. As a result, it is useful to extend RapidMiner with process
mining capabilities. In the sequel, we will focus on the process mining capabilities
that are available within the extension. In Figure 1 a screenshot of RapidMiner
is provided thereby showing a process mining analysis within RapidMiner.
    First, within the “Operators” panel at the left side all the operators that
are available can be selected. There is a special “ProM6” folder in which all the
process mining operators can be found. A description can be found in Table 1.
Most of the operators correspond to existing ProM plug-ins.
    In the “Process” panel in Figure 1 some of the available operators can be seen
together with a visualization of the obtained results. First an event log is read
(Read Log operator). Afterwards, a Petri net is discovered using the ILP miner
(ILP Miner operator) and a dotted chart is created showing events in a graphical
way such that a “helicopter overview” of the process is obtained (Analyse using
Dotted chart operator). Finally, timing information is projected on the Petri
net so that bottlenecks can be identified within the process (Replay a Log on
Petri Net for Performance / Conformance Analysis operator).
    A wide variety of workflows can be made. Some examples are:


 – By using the Loop Parameters operator it is possible to iterate over a se-
   lection of operators for a set of parameter combinations. For example, the
   ILP Miner is repeated for each different option of its “Variant” parameter.
4

    – For each item in a collection of objects, the same workflow can be executed.
      For example, the Guide Tree Miner operator provides a collection of logs.
      Subsequently, for each log the corresponding Petri net can be discovered.
    – Within RapidMiner many data mining algorithms are available which can
      be used after converting the log into a feature set. For example, using the
      Case Data Extractor operator the log is converted into a feature set and
      subsequently a decision tree is obtained using the Decision Tree operator.
    – Within RapidMiner also many statistical techniques are available. These can
      be used for evaluation of process mining experiments. For example, by using
      the Loop Attributes operator and the Replay a Log on Petri Net for
      Performance / Conformance Analysis operator, for a varying number of
      maximal states, the fitness between a log and a Petri net can be calculated.
      Afterwards, using the Linear Regression operator, the strength of the re-
      lationship between the fitness and the number of states can be determined.

3      Architecture and Implementation
In this section, we elaborate on the architecture of ProM 6 and RapidMiner and
how both are connected. Furthermore, we focus on some implementation details.

    To this end, in Figure 2 an architectural overview of RapidMiner and ProM
6 is given and how they are connected. First, the most important part of ProM 6
is the framework which roughly spoken contains all the necessary functionalities
in order that process mining algorithms can be executed. The algorithms itself
are provided by means of packages. A package may contain one or more plug-ins
and a collection of provided objects that are needed or produced by the plug-
in. Furthermore, a plug-in needs a context to run in. Depending on the type
of context, the plug-in communicates in a different way with a user. Here, it is
important that there is a clear separation between the actual process mining
algorithm and the visualization of its results. Also, there is a clear separation
between the plug-in and the parameter settings it needs. At the moment, two
types of contexts are available: a GUI-aware context, called UITopia, and a
headless context. So, a plug-in that is running in the UITopia context may
communicate with the user through dialogs and/or wizards, whereas for a plug-
in running in a headless context this is not obliged. So, the latter plug-ins can be
ran using a client. In case such a plug-in requires parameter settings, these can
be provided via its own input parameter object. Moreover, for the object that
has been obtained, a visualization can be obtained by running the associated
visualization plug-in.
                                           Context
                  Extensions    ProM 6               Extensions
                               extension   Client                  Files

                                                        ProM
                  RapidMiner
                                                     Framework
                                           User                   Packages


Fig. 2: Architectural overview of ProM and RapidMiner and how they are connected.
                                                                                  5

    The headless context of ProM 6 is used in order that ProM plug-ins can be
executed within RapidMiner. As can be seen in Figure 2, RapidMiner consists of
a core in order that operators can be executed. New operators can be added by
means of an extension together with the objects that are needed by the operators
[1]. In order to do so, for each operator the algorithm needs to be defined and
which objects it uses and produces. Furthermore, for each object it needs to be
defined how it is visualized. In order to run ProM 6 plug-ins a special operator
is created for each plug-in. For a provided object that needs to be visualized,
simply the associated ProM visualizer is called. For example, regarding Figure 1,
for the ILP Miner operator the ILP Miner plug-in of ProM is called and for the
provided Petri net the associated Visualize Petri Net visualizer is called.
    Currently, over 40 operators are available covering a selection of the plug-ins
that are available within ProM 6. In case a plug-in runs in a headless context it
can easily be added to the ProM extension. In order to speed up this process,
we have developed a specific ProM plugin, called RapidMiner Code Generator
Plug-in, that generates code for adding the plug-in.
    The ProM extension has been tested for many scenarios. Furthermore, stu-
dents following the “Advanced Process Mining” course at TU/e are using the
software in order to do the assignments. In the end, we want to achieve that the
extension becomes robust and mature in order that it can be successfully used
by many people both from industry and science. Currently, the extension has
been downloaded over 850 times at the RapidMiner marketplace1 .

4     Links
For the ProM extension of RapidMiner, a dedicated website is available: http:
//www.rapidprom.org. Amongst others, this website contains instructions for
installing the extension, instructions for using it, it describes several use cases,
and several screencasts.

Acknowledgements
This research is supported by the Dutch Technology Foundation STW, applied
science division of NWO and the Technology Program of the Ministry of Eco-
nomic Affairs.

References
1. How to Extend RapidMiner 5. Rapid-I, 2012.
2. Gartner. Magic Quadrant for Advanced Analytics Platforms. 2014.
3. W.M.P. van der Aalst. Process Mining: Discovery, Conformance and Enhancement
   of Business Processes. Springer-Verlag, Berlin, 2011.
4. H. M. W. Verbeek, J. C. A. M. Buijs, B. F. van Dongen, and W. M. P. van der
   Aalst. ProM 6: The Process Mining Toolkit. In Proc. of BPM Demonstration Track
   2010, volume 615, pages 34–39. CEUR-WS.org, 2010.

1
    http://marketplace.rapid-i.com/