=Paper= {{Paper |id=Vol-2703/paperTD2 |storemode=property |title=An Open-Source Integration of Process Mining Features Into the Camunda Workflow Engine: Data Extraction and Challenges |pdfUrl=https://ceur-ws.org/Vol-2703/paperTD2.pdf |volume=Vol-2703 |authors=Alessandro Berti,Wil van der Aalst,David Zang,Magdalena Lang |dblpUrl=https://dblp.org/rec/conf/icpm/BertiAZL20 }} ==An Open-Source Integration of Process Mining Features Into the Camunda Workflow Engine: Data Extraction and Challenges== https://ceur-ws.org/Vol-2703/paperTD2.pdf
     An Open-Source Integration of Process Mining Features Into the Camunda Workflow
                         Engine: Data Extraction and Challenges

                          Alessandro Berti∗ , Wil van der Aalst∗ , David Zang† , Magdalena Lang†
                                    ∗ Process and Data Science Department, RWTH Aachen University

                       Process and Data Science department, Lehrstuhl fur Informatik 9 52074 Aachen, Germany
                                 Emails: a.berti@pads.rwth-aachen.de, wvdaalst@pads.rwth-aachen.de
                                                 † viadee Unternehmensberatung AG

                                                   Konrad-Adenauer-Ufer 7, 50668
                                      Emails: david.zang@viadee.de, magdalena.lang@viadee.de


        Abstract—Process mining provides techniques to improve the           Process mining techniques have been applied to Work-
     performance and compliance of operational processes. Although        flow Management (WFM) and Business Process Management
     sometimes the term “workflow mining” is used, the application        (BPM) systems. There are connectors to the YAWL WFM
     in the context of Workflow Management (WFM) and Business
     Process Management (BPM) systems is limited. The main reason         system [3] and some other WFM/BPM systems, including
     is that WFM/BPM systems control the process, leaving less room       Signavio, Bizagi, and Bonita, that allow to to extract the event
     for flexibility and the corresponding deviations. However, as this   data and operate on the process models contained in such
     paper shows, it is easy to extract event data from systems like      systems. This paper focuses on Camunda and is a result of a
     Camunda, one of the leading open-source WFM/BPM systems.             collaborative project between the RWTH Aachen University
     Moreover, although the respective process engines control the
     process flow, process mining is still able to provide valuable       and viadee Unternehmensberatung AG. Before, there was
     insights, such as the analysis of the performance of the paths       no open-source connector to extract logs useful for process
     and the mining of the decision rules. This demo paper presents       mining purposes from Camunda, although Camunda is open-
     a process mining connector to Camunda that extracts event            source and holds detailed event data. Therefore, we developed
     logs and process models, allowing for the application of existing    and evaluated such a connector, and viadee has integrated
     process mining tools. We also analyzed the added value of
     different process mining techniques in the context of Camunda.       event log extraction techniques in its software stack.
     We discuss a subset of process mining techniques that nicely            Camunda is widely used, e.g., by Deutsche Telekom,
     complements the process intelligence capabilities of Camunda.        Warner Music, Allianz, DB, Zalando and Generali. Camunda
     Through this demo paper, we hope to boost the use of process
     mining among Camunda users.                                          uses the BPMN 2.0 notation for modeling. Among the main
        Index Terms—Process Mining; Workflow Management; Data
                                                                          selling points of Camunda are high throughput and collab-
     Extraction and Preprocessing; Process Engine                         oration and integration possibilities. Camunda can be easily
                                                                          integrated with different information systems, business intelli-
                           I. I NTRODUCTION                               gence and big data systems such as QlikView, Apache Spark
                                                                          and Kafka. This explains our goal to provide process mining
        The vast majority of business processes (including enter-         for the large Camunda user base.
     prise resource planning, customer relationship management,
     document management) are nowadays supported by informa-                 In this demo paper, (A) we present our implementation of a
     tion systems. These systems manage (but not always regulate)         process mining extractor for Camunda, that is able to extract
     the execution of a business process, and record event data           a set of event logs for the processes executed by Camunda,
     with fine detail about each step of the process. In this context,    and (B) we discuss the existing process mining techniques that
     process mining [1] allows to improve operational processes by        complement the business intelligence capabilities of Camunda.
     exploiting the event data recorded by such systems. An event         Figure 1 provides an overview of the approach implemented
     log can be extracted from an information system’s database           in the paper. The extractor is publicly available. For demon-
     in order to apply the process mining algorithms. An event            strative purposes, it is integrated with a graphical interface
     log contains event data of multiple executions of the business       based on PM4Py that offers basic process mining functions.
     process. Process mining techniques include: process discovery,       Moreover, the techniques analyzed in this paper are available
     i.e., the automated discovery of a process model from event          in open-source software.
     data; conformance checking, i.e., the comparison between a              The remainder of this demo paper is organized as follows.
     process model and the event data; model enhancement, i.e.,           Section II describes the basic structure of the Camunda
     the enrichment of the model with additional perspectives (for        database, along with a methodology of extraction of event logs
     example, execution guards [2]), prediction and simulation            and process models from the Camunda database. Section III
     algorithms. Open-source software supporting process mining           discusses the added value of process mining techniques for
     includes ProM, APROMORE and PM4Py.                                   Camunda users. Section IV describes the set-up of the tool.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Fig. 1. Overview of the toolchain supporting process mining in the context of Camunda. In this paper, we provide (A) a connector to Camunda that is
able to extract event logs and the BPMN diagrams modeling the processes. (B) an overview on the most valuable process mining techniques complementing
Camunda. Although the connector is generic, we showcase the integration using PM4Py.



Finally, Section V concludes the paper.                                     These attributes can be useful to investigate the process more
                                                                            thoroughly, also for predictive analyses.
  II. E XTRACTING E VENT L OGS AND P ROCESS M ODELS                            An important point is that also the traversal of gateways
                   F ROM C AMUNDA                                           and internal or boundary events are included in the event log.
  In this section, we will focus on how to extract logs                     So, not only the tasks are recorded, but the exact path of
containing the historical executions of the processes supported             the model. This can simplify the frequency or performance
by Camunda, and how to extract the process models of such                   decoration of the process model: performing token-based
processes.                                                                  replay or alignments to find the path that is followed is not
                                                                            necessary. A postprocessing activity is only necessary when
   1) Extracting Event Logs: The extraction is done directly                the execution of tasks needs to be analysed.
at the database level. The Camunda workflow engine sup-
                                                                               We will present the implementation of an connector in Sec-
ports different relational databases (e.g., PostgreSQL, Oracle,
                                                                            tion IV. A property of the connector is that it is incremental:
MySQL). We will focus exclusively on the completed execu-
                                                                            the first extraction extracts all the events from the beginning
tions, and ignore ongoing executions.
                                                                            of the time, while the following extractions extract only the
   The table containing the historical executions of the pro-               events that are inserted since the previous extraction. This
cesses is the ACTI HI ACTINST table. The rows of this table                 permits to keep the log updated, keeping the workload low.
are the events thrown by Camunda. The table contains all the
                                                                               2) Extracting Process Models: Aside from event logs, we
basic information that is needed to extract event logs:
                                                                            can also extract the BPMN models of the processes sup-
  • The identifier of the process that is executed is stored                ported by the workflow engine. In a Tomcat distribution of
    in the proc def key column. This column contains as                     Camunda, each process supported by Camunda has its own
    many different values as processes are executed via the                 folder in PATH-TO-CAMUNDA-SERVER/webapps/. As exam-
    Camunda process engine.                                                 ple, if a process has name invoice, its corresponding folder is
  • The identifier of the process execution (case ID) is stored             PATH-TO-CAMUNDA-SERVER/webapps/invoice. To extract
    inside the proc inst id column.                                         the BPMN model associated to the invoice process, the content
  • The name of the BPMN element that are executed via the                  of the PATH-TO-CAMUNDA-SERVER/webapps/invoice/WEB-
    Camunda process engine is stored inside the act name                    INF/classes folder should be taken. As another possibility, we
    column.                                                                 could refer to querying the REST API for the process diagram.
  • The type of the BPMN element is stored inside the                          The extracted BPMN models can be imported in different
    act type column.                                                        process mining tools. In order to perform analyses such as
  • The start and end timestamps are stored inside the                      decision mining and conformance checking (see Section III),
    start time and the end time columns, respectively.                      the BPMN model should be converted to a Petri net model.
  • The identifier of the resource that performs the event is               This is difficult for many constructs (for example, OR-joins
    stored inside the assignee column.                                      and OR-splits, swimlanes, subprocesses) and thus can lead to
Basically, an event log is created for each distinct value                  problems for complex real-life processes. An overview of the
of the proc def key column. The resulting table for an                      problematics of conversion from a BPMN model to a Petri net
individual process is enough to analyze the control flow of                 is found in [4].
the process and its bottlenecks. Other attributes at the event
                                                                                     III. P ROCESS M INING ON TOP OF C AMUNDA
level can be obtained by merging the ACT HI ACTINST table
with the ACT HI DETAIL table. The latter contains a row                       In the previous section, we have described an approach
for each distinct attribute that is associated with an event.               to extract event logs for the different processes supported
                                                                        TABLE I
  A NALYSIS OF THE PROS AND CONS OF THE APPLICATION OF SEVERAL PROCESS MINING TECHNIQUES IN THE CONTEXT OF THE C AMUNDA ENGINE .
                                  M ANY OBSERVATIONS HERE HOLD GENERALLY FOR ANY WFM/BPM SYSTEM .
        Technique                  Pros                                                    Cons
        Process Discovery          It is possible to show the frequent paths in processes. The process model underlying the event data is
                                   Moreover, it becomes visible when people bypass the     already contained in Camunda and probably not
                                   system.                                                 surprising.
        Conformance Checking       It is possible to measure the precision of the pro-     It is expected that the event data already follows the
                                   cess model, in order to understand how much extra       model. Hence, some measures such as the calculation
                                   behavior is allowed.                                    of fitness are not useful.
        Decision Mining            It is possible to enrich the BPMN model with guards     Many execution guards are already inserted in the
                                   that describe and regulate the behavior of the process  BPMN diagrams during the design phase. The dis-
                                   at the decision points.                                 covered guards might be trivial or overfit the data.
        Concept Drift Analysis     Process mining can be used to detect process            Many of the possible change points are known or
                                   changes. Possible reasons include changes of the        deliberate.
                                   process model or day-night shifts.
        Prediction of the Remain-  The technique provides an estimation of the remain-     The quality of the predictions performed by state-of-
        ing Time                   ing time for the process instances, in order to detect  the-art approaches on real datasets must be improved.
                                   possible service level agreements violations.
        Social Network Analysis    The collaboration between the resources can be an-      Roles are often set and controlled by the system.
                                   alyzed from different angles (e.g., to see the effect
                                   on performance).
        Model Enhancement          It is possible to identify the bottlenecks of the       Basic performance measurements are already pro-
                                   process, and the most frequent paths.                   vided by the WFM/BPM system.

by Camunda. This enables the application of several process                with execution guards that are defined in the design phase.
mining tools and techniques. In this section, we want to                   Hence, decision mining could end up finding exactly the same
analyze which process mining techniques are most useful in                 guards without adding anything new. Another problem is the
the context of the Camunda processes. The techniques are                   discovery of trivial guards, or guards that overfit the data. A
implemented and released as open-source software, including                careful selection of the guards is necessary after performing
the one based on PM4Py presented in Section IV. Table I                    the analysis.
provides an overview of the approaches, along with their pros                 3) Concept Drift Analysis: A concept drift analysis allows
and cons.                                                                  to identify the points in time where the execution of the
   1) Process Discovery and Conformance Checking: The two                  process changes. Different types of concept drifts exist: sudden
most popular process mining disciplines are process discovery              drifts (where the process becomes immediately significantly
and conformance checking. The scope of application of pro-                 different), gradual drifts and seasonal drifts. An approach for
cess discovery is pretty limited, since the event data contained           the detection of concept drifts is described in [6]. While the
in the database is regulated by the process models inserted in             technique is interesting, many concept drift points are already
Camunda. A BPMN model is also a formal model that enables                  known in the context of WfMSs as Camunda: for example, the
the application of conformance checking techniques. For less               underlying BPMN schema changes, or there are differences in
regulated processes, the goal of conformance checking is to                the execution of a process between day and night.
identify deviations in the process model, and the executions                  4) Prediction: Given an incomplete process execution, it
of the process are evaluated by their fitness according to the             may be useful to estimate the remaining execution time
process model. For WFM/BPM systems, we can expect to have                  based on historical executions. Several approaches have been
perfect fitness for all the process executions. However, another           proposed [7], [8], however in our experiments the quality of
application of conformance checking is the measurement of                  predictions on top of real-life datasets is still not completely
precision. A model is precise when it does not allow for extra             satisfying. Moreover, this paper only covers the extraction of
behavior, i.e., behavior that does not appear in the event data.           complete (historical) process instances.
Models can have low precision when they flexibly allow the
execution sequence of activities. Hence, measuring precision                 5) Other Analyses: In this category, we include social
can provide a measure for the “flexibility” of the process                 network analysis [9], that with different metrics (such as the
model. A popular measure for precision is proposed in [5].                 handover of work, the working together, the similar activities
                                                                           metric) calculates the collaboration between the different orga-
   2) Decision Mining: The application of a decision mining
                                                                           nizational resources. Model enhancement with frequency and
technique allows to enrich the model with execution guards
                                                                           performance metrics is particularly important to identify the
that are extracted automatically from the event data. These are
                                                                           bottlenecks of the process (from a performance point of view)
conditions that are required in order to execute a path in the
                                                                           and the most frequent paths.
model. Hence, decision mining helps to reduce the amount of
behavior allowed in the process model by an adaptation of the                              IV. S ET-U P OF THE C ONNECTOR
model towards the guards that are discovered by the technique.
A mature approach for decision mining is proposed in [2]. On                 The    connector   presented   in    this    section                   is
the other hand, BPMN models are often already decorated                    completely    open-source    and    is      available                    at
https://github.com/Javert899/incremental-camunda-parquet-                  on how to deploy a complete environment that contains
exporter. A completely working demo environment can be                     Camunda supported by the PostgreSQL database, the process
easily obtained by using docker-compose inside the folder of               mining tool, and the extractor. The deployment shows how
the project1 . In the prepared environment, there are:                     Camunda can be extended with process mining capabilities
                                                                           in a straightforward way. While the PM4Py deployment is
  • A PostgreSQL relational database that is supporting the
                                                                           for demonstrative purposes, the extractor can be used on real-
    Camunda BPM engine and is exposed at port 5432.
                                                                           life deployments of Camunda and combined with any process
  • The Camunda BPM engine, that is running at port 8080.
                                                                           mining tool. An example use case for our tool is proposed at
    The Camunda interface can be reached at http://localhost:
                                                                           http://www.alessandroberti.it/only appendix.pdf.
    8080/camunda-welcome/index.html. The installation con-
    tains some demonstrative models and event data that can                   Next to process discovery and conformance checking, our
    be extracted by the connector and used for process mining              integration allows to discover the bottlenecks of the processes
    analysis.                                                              and to identify execution guards that are not contained in
  • The connector, that is written in the Python language                  the process model, but implicitly assumed by the resources
    and is configured to reach the PostgreSQL database (for                performing the process. Thereby, the analysis can help to
    the extraction of the event data) and the Camunda BPM                  improve the quality of the BPMN model. Other analysis, such
    docker container (to extract the BPMN models).                         as the detection of concept drifts, and the prediction of the
  • An open-source process mining solution [10], [11], with                remaining time, can also be useful.
    its own event logs database, that is reachable at port                    As a result of this project, process mining techniques have
    80, providing admin/admin as access credentials for the                been integrated in the viadee Unternehmensberatung AG soft-
    interface. The services and the web interface are offering             ware stack. The project located at https://github.com/viadee/
    the logs for all the processes contained in Camunda. In                camunda-kafka-polling-client proposes an implementation of
    the demo interface, a single process is offered to the user.           a polling client on top of the Apache Kafka event streaming
                                                                           platform to poll data from Camunda, and the project located at
   The processes contained in Camunda are offered, through
                                                                           https://github.com/viadee/bpmn.ai proposes a data preparation
the connector, in the web interface, graphically allowing for
                                                                           pipeline for such process data.
the following operations:
  • Process discovery of a directly-follows graph, and of a                                             R EFERENCES
    process tree or Petri net discovered using the inductive                [1] W. van der Aalst, Process mining: discovery, conformance and enhance-
    miner algorithm. While the model itself is already known,                   ment of business processes. Springer, 2011, vol. 2.
                                                                            [2] M. De Leoni and W. van der Aalst, “Data-aware process mining:
    the frequency and performance information are important                     discovering decisions in processes using alignments,” in Proceedings
    to understand which parts of the process are more critical                  of the 28th annual ACM symposium on applied computing. ACM,
    for key performance indicators and service level agree-                     2013, pp. 1454–1461.
                                                                            [3] A. Rozinat, M. Wynn, W. van der Aalst, A. Ter Hofstede, and C. Fidge,
    ments.                                                                      “Workflow simulation for operational decision support using yawl and
  • Cases Exploration: understanding which cases have                           prom,” BPM Center Report BPM-08-04, BPMcenter. org, vol. 298, pp.
    longer duration, and which are the events of such cases.                    302–306, 2008.
                                                                            [4] R. M. Dijkman, M. Dumas, and C. Ouyang, “Semantics and analysis of
  • Social Network Analysis: shows the interaction between
                                                                                business process models in bpmn,” Information and Software technology,
    the organizational resources using the Camunda BPM                          vol. 50, no. 12, pp. 1281–1294, 2008.
    engine through some classic metrics.                                    [5] J. Muñoz-Gama and J. Carmona, “A fresh look at precision in process
                                                                                conformance,” in International Conference on Business Process Man-
The deployment of the connector through docker-compose is                       agement. Springer, 2010, pp. 211–226.
integrated with the process mining tool, but the event log                  [6] R. J. C. Bose, W. van der Aalst, I. Žliobaitė, and M. Pechenizkiy,
                                                                                “Handling concept drift in process mining,” in International Conference
is available for usage in other process mining platforms. An                    on Advanced Information Systems Engineering. Springer, 2011, pp.
example log that is extracted by the technique is available at                  391–405.
http://www.alessandroberti.it/invoice.xes.                                  [7] M. Polato, A. Sperduti, A. Burattin, and M. de Leoni, “Time and activity
                                                                                sequence prediction of business process instances,” Computing, vol. 100,
                                                                                no. 9, pp. 1005–1031, 2018.
                         V. C ONCLUSION                                     [8] N. Tax, I. Verenich, M. La Rosa, and M. Dumas, “Predictive business
                                                                                process monitoring with lstm neural networks,” in International Confer-
   In this paper, we presented a tool to extract event logs                     ence on Advanced Information Systems Engineering. Springer, 2017,
                                                                                pp. 477–492.
and process models from the Camunda system and analyzed                     [9] W. van der Aalst, H. A. Reijers, and M. Song, “Discovering social
the applicability of process mining tools on such models                        networks from event logs,” Computer Supported Cooperative Work
and logs. Thereby, process mining comes into reach of all                       (CSCW), vol. 14, no. 6, pp. 549–593, 2005.
                                                                           [10] A. Berti, S. J. van Zelst, and W. van der Aalst, “Process Mining for
organizations using Camunda with almost no effort. The                          Python (PM4Py): Bridging the Gap Between Process-and Data Science,”
extractor we implemented was also integrated with the open-                     in ICPM Demo Track (CEUR 2374), 2019, p. 13–16.
source process mining tool PM4Py, along with instructions                  [11] A. Berti, S. van Zelst, and W. van der Aalst, “PM4Py Web Services: Easy
                                                                                Development, Integration and Deployment of Process Mining Features
   1 The command docker-compose up starts the docker containers that are        in any Application Stack,” in BPM Demo Track, 2019.
referred in the docker-compose.yml file