=Paper=
{{Paper
|id=Vol-2703/paperTD2
|storemode=property
|title=An Open-Source Integration of Process Mining Features Into the Camunda Workflow Engine: Data Extraction and Challenges
|pdfUrl=https://ceur-ws.org/Vol-2703/paperTD2.pdf
|volume=Vol-2703
|authors=Alessandro Berti,Wil van der Aalst,David Zang,Magdalena Lang
|dblpUrl=https://dblp.org/rec/conf/icpm/BertiAZL20
}}
==An Open-Source Integration of Process Mining Features Into the Camunda Workflow Engine: Data Extraction and Challenges==
An Open-Source Integration of Process Mining Features Into the Camunda Workflow
Engine: Data Extraction and Challenges
Alessandro Berti∗ , Wil van der Aalst∗ , David Zang† , Magdalena Lang†
∗ Process and Data Science Department, RWTH Aachen University
Process and Data Science department, Lehrstuhl fur Informatik 9 52074 Aachen, Germany
Emails: a.berti@pads.rwth-aachen.de, wvdaalst@pads.rwth-aachen.de
† viadee Unternehmensberatung AG
Konrad-Adenauer-Ufer 7, 50668
Emails: david.zang@viadee.de, magdalena.lang@viadee.de
Abstract—Process mining provides techniques to improve the Process mining techniques have been applied to Work-
performance and compliance of operational processes. Although flow Management (WFM) and Business Process Management
sometimes the term “workflow mining” is used, the application (BPM) systems. There are connectors to the YAWL WFM
in the context of Workflow Management (WFM) and Business
Process Management (BPM) systems is limited. The main reason system [3] and some other WFM/BPM systems, including
is that WFM/BPM systems control the process, leaving less room Signavio, Bizagi, and Bonita, that allow to to extract the event
for flexibility and the corresponding deviations. However, as this data and operate on the process models contained in such
paper shows, it is easy to extract event data from systems like systems. This paper focuses on Camunda and is a result of a
Camunda, one of the leading open-source WFM/BPM systems. collaborative project between the RWTH Aachen University
Moreover, although the respective process engines control the
process flow, process mining is still able to provide valuable and viadee Unternehmensberatung AG. Before, there was
insights, such as the analysis of the performance of the paths no open-source connector to extract logs useful for process
and the mining of the decision rules. This demo paper presents mining purposes from Camunda, although Camunda is open-
a process mining connector to Camunda that extracts event source and holds detailed event data. Therefore, we developed
logs and process models, allowing for the application of existing and evaluated such a connector, and viadee has integrated
process mining tools. We also analyzed the added value of
different process mining techniques in the context of Camunda. event log extraction techniques in its software stack.
We discuss a subset of process mining techniques that nicely Camunda is widely used, e.g., by Deutsche Telekom,
complements the process intelligence capabilities of Camunda. Warner Music, Allianz, DB, Zalando and Generali. Camunda
Through this demo paper, we hope to boost the use of process
mining among Camunda users. uses the BPMN 2.0 notation for modeling. Among the main
Index Terms—Process Mining; Workflow Management; Data
selling points of Camunda are high throughput and collab-
Extraction and Preprocessing; Process Engine oration and integration possibilities. Camunda can be easily
integrated with different information systems, business intelli-
I. I NTRODUCTION gence and big data systems such as QlikView, Apache Spark
and Kafka. This explains our goal to provide process mining
The vast majority of business processes (including enter- for the large Camunda user base.
prise resource planning, customer relationship management,
document management) are nowadays supported by informa- In this demo paper, (A) we present our implementation of a
tion systems. These systems manage (but not always regulate) process mining extractor for Camunda, that is able to extract
the execution of a business process, and record event data a set of event logs for the processes executed by Camunda,
with fine detail about each step of the process. In this context, and (B) we discuss the existing process mining techniques that
process mining [1] allows to improve operational processes by complement the business intelligence capabilities of Camunda.
exploiting the event data recorded by such systems. An event Figure 1 provides an overview of the approach implemented
log can be extracted from an information system’s database in the paper. The extractor is publicly available. For demon-
in order to apply the process mining algorithms. An event strative purposes, it is integrated with a graphical interface
log contains event data of multiple executions of the business based on PM4Py that offers basic process mining functions.
process. Process mining techniques include: process discovery, Moreover, the techniques analyzed in this paper are available
i.e., the automated discovery of a process model from event in open-source software.
data; conformance checking, i.e., the comparison between a The remainder of this demo paper is organized as follows.
process model and the event data; model enhancement, i.e., Section II describes the basic structure of the Camunda
the enrichment of the model with additional perspectives (for database, along with a methodology of extraction of event logs
example, execution guards [2]), prediction and simulation and process models from the Camunda database. Section III
algorithms. Open-source software supporting process mining discusses the added value of process mining techniques for
includes ProM, APROMORE and PM4Py. Camunda users. Section IV describes the set-up of the tool.
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Fig. 1. Overview of the toolchain supporting process mining in the context of Camunda. In this paper, we provide (A) a connector to Camunda that is
able to extract event logs and the BPMN diagrams modeling the processes. (B) an overview on the most valuable process mining techniques complementing
Camunda. Although the connector is generic, we showcase the integration using PM4Py.
Finally, Section V concludes the paper. These attributes can be useful to investigate the process more
thoroughly, also for predictive analyses.
II. E XTRACTING E VENT L OGS AND P ROCESS M ODELS An important point is that also the traversal of gateways
F ROM C AMUNDA and internal or boundary events are included in the event log.
In this section, we will focus on how to extract logs So, not only the tasks are recorded, but the exact path of
containing the historical executions of the processes supported the model. This can simplify the frequency or performance
by Camunda, and how to extract the process models of such decoration of the process model: performing token-based
processes. replay or alignments to find the path that is followed is not
necessary. A postprocessing activity is only necessary when
1) Extracting Event Logs: The extraction is done directly the execution of tasks needs to be analysed.
at the database level. The Camunda workflow engine sup-
We will present the implementation of an connector in Sec-
ports different relational databases (e.g., PostgreSQL, Oracle,
tion IV. A property of the connector is that it is incremental:
MySQL). We will focus exclusively on the completed execu-
the first extraction extracts all the events from the beginning
tions, and ignore ongoing executions.
of the time, while the following extractions extract only the
The table containing the historical executions of the pro- events that are inserted since the previous extraction. This
cesses is the ACTI HI ACTINST table. The rows of this table permits to keep the log updated, keeping the workload low.
are the events thrown by Camunda. The table contains all the
2) Extracting Process Models: Aside from event logs, we
basic information that is needed to extract event logs:
can also extract the BPMN models of the processes sup-
• The identifier of the process that is executed is stored ported by the workflow engine. In a Tomcat distribution of
in the proc def key column. This column contains as Camunda, each process supported by Camunda has its own
many different values as processes are executed via the folder in PATH-TO-CAMUNDA-SERVER/webapps/. As exam-
Camunda process engine. ple, if a process has name invoice, its corresponding folder is
• The identifier of the process execution (case ID) is stored PATH-TO-CAMUNDA-SERVER/webapps/invoice. To extract
inside the proc inst id column. the BPMN model associated to the invoice process, the content
• The name of the BPMN element that are executed via the of the PATH-TO-CAMUNDA-SERVER/webapps/invoice/WEB-
Camunda process engine is stored inside the act name INF/classes folder should be taken. As another possibility, we
column. could refer to querying the REST API for the process diagram.
• The type of the BPMN element is stored inside the The extracted BPMN models can be imported in different
act type column. process mining tools. In order to perform analyses such as
• The start and end timestamps are stored inside the decision mining and conformance checking (see Section III),
start time and the end time columns, respectively. the BPMN model should be converted to a Petri net model.
• The identifier of the resource that performs the event is This is difficult for many constructs (for example, OR-joins
stored inside the assignee column. and OR-splits, swimlanes, subprocesses) and thus can lead to
Basically, an event log is created for each distinct value problems for complex real-life processes. An overview of the
of the proc def key column. The resulting table for an problematics of conversion from a BPMN model to a Petri net
individual process is enough to analyze the control flow of is found in [4].
the process and its bottlenecks. Other attributes at the event
III. P ROCESS M INING ON TOP OF C AMUNDA
level can be obtained by merging the ACT HI ACTINST table
with the ACT HI DETAIL table. The latter contains a row In the previous section, we have described an approach
for each distinct attribute that is associated with an event. to extract event logs for the different processes supported
TABLE I
A NALYSIS OF THE PROS AND CONS OF THE APPLICATION OF SEVERAL PROCESS MINING TECHNIQUES IN THE CONTEXT OF THE C AMUNDA ENGINE .
M ANY OBSERVATIONS HERE HOLD GENERALLY FOR ANY WFM/BPM SYSTEM .
Technique Pros Cons
Process Discovery It is possible to show the frequent paths in processes. The process model underlying the event data is
Moreover, it becomes visible when people bypass the already contained in Camunda and probably not
system. surprising.
Conformance Checking It is possible to measure the precision of the pro- It is expected that the event data already follows the
cess model, in order to understand how much extra model. Hence, some measures such as the calculation
behavior is allowed. of fitness are not useful.
Decision Mining It is possible to enrich the BPMN model with guards Many execution guards are already inserted in the
that describe and regulate the behavior of the process BPMN diagrams during the design phase. The dis-
at the decision points. covered guards might be trivial or overfit the data.
Concept Drift Analysis Process mining can be used to detect process Many of the possible change points are known or
changes. Possible reasons include changes of the deliberate.
process model or day-night shifts.
Prediction of the Remain- The technique provides an estimation of the remain- The quality of the predictions performed by state-of-
ing Time ing time for the process instances, in order to detect the-art approaches on real datasets must be improved.
possible service level agreements violations.
Social Network Analysis The collaboration between the resources can be an- Roles are often set and controlled by the system.
alyzed from different angles (e.g., to see the effect
on performance).
Model Enhancement It is possible to identify the bottlenecks of the Basic performance measurements are already pro-
process, and the most frequent paths. vided by the WFM/BPM system.
by Camunda. This enables the application of several process with execution guards that are defined in the design phase.
mining tools and techniques. In this section, we want to Hence, decision mining could end up finding exactly the same
analyze which process mining techniques are most useful in guards without adding anything new. Another problem is the
the context of the Camunda processes. The techniques are discovery of trivial guards, or guards that overfit the data. A
implemented and released as open-source software, including careful selection of the guards is necessary after performing
the one based on PM4Py presented in Section IV. Table I the analysis.
provides an overview of the approaches, along with their pros 3) Concept Drift Analysis: A concept drift analysis allows
and cons. to identify the points in time where the execution of the
1) Process Discovery and Conformance Checking: The two process changes. Different types of concept drifts exist: sudden
most popular process mining disciplines are process discovery drifts (where the process becomes immediately significantly
and conformance checking. The scope of application of pro- different), gradual drifts and seasonal drifts. An approach for
cess discovery is pretty limited, since the event data contained the detection of concept drifts is described in [6]. While the
in the database is regulated by the process models inserted in technique is interesting, many concept drift points are already
Camunda. A BPMN model is also a formal model that enables known in the context of WfMSs as Camunda: for example, the
the application of conformance checking techniques. For less underlying BPMN schema changes, or there are differences in
regulated processes, the goal of conformance checking is to the execution of a process between day and night.
identify deviations in the process model, and the executions 4) Prediction: Given an incomplete process execution, it
of the process are evaluated by their fitness according to the may be useful to estimate the remaining execution time
process model. For WFM/BPM systems, we can expect to have based on historical executions. Several approaches have been
perfect fitness for all the process executions. However, another proposed [7], [8], however in our experiments the quality of
application of conformance checking is the measurement of predictions on top of real-life datasets is still not completely
precision. A model is precise when it does not allow for extra satisfying. Moreover, this paper only covers the extraction of
behavior, i.e., behavior that does not appear in the event data. complete (historical) process instances.
Models can have low precision when they flexibly allow the
execution sequence of activities. Hence, measuring precision 5) Other Analyses: In this category, we include social
can provide a measure for the “flexibility” of the process network analysis [9], that with different metrics (such as the
model. A popular measure for precision is proposed in [5]. handover of work, the working together, the similar activities
metric) calculates the collaboration between the different orga-
2) Decision Mining: The application of a decision mining
nizational resources. Model enhancement with frequency and
technique allows to enrich the model with execution guards
performance metrics is particularly important to identify the
that are extracted automatically from the event data. These are
bottlenecks of the process (from a performance point of view)
conditions that are required in order to execute a path in the
and the most frequent paths.
model. Hence, decision mining helps to reduce the amount of
behavior allowed in the process model by an adaptation of the IV. S ET-U P OF THE C ONNECTOR
model towards the guards that are discovered by the technique.
A mature approach for decision mining is proposed in [2]. On The connector presented in this section is
the other hand, BPMN models are often already decorated completely open-source and is available at
https://github.com/Javert899/incremental-camunda-parquet- on how to deploy a complete environment that contains
exporter. A completely working demo environment can be Camunda supported by the PostgreSQL database, the process
easily obtained by using docker-compose inside the folder of mining tool, and the extractor. The deployment shows how
the project1 . In the prepared environment, there are: Camunda can be extended with process mining capabilities
in a straightforward way. While the PM4Py deployment is
• A PostgreSQL relational database that is supporting the
for demonstrative purposes, the extractor can be used on real-
Camunda BPM engine and is exposed at port 5432.
life deployments of Camunda and combined with any process
• The Camunda BPM engine, that is running at port 8080.
mining tool. An example use case for our tool is proposed at
The Camunda interface can be reached at http://localhost:
http://www.alessandroberti.it/only appendix.pdf.
8080/camunda-welcome/index.html. The installation con-
tains some demonstrative models and event data that can Next to process discovery and conformance checking, our
be extracted by the connector and used for process mining integration allows to discover the bottlenecks of the processes
analysis. and to identify execution guards that are not contained in
• The connector, that is written in the Python language the process model, but implicitly assumed by the resources
and is configured to reach the PostgreSQL database (for performing the process. Thereby, the analysis can help to
the extraction of the event data) and the Camunda BPM improve the quality of the BPMN model. Other analysis, such
docker container (to extract the BPMN models). as the detection of concept drifts, and the prediction of the
• An open-source process mining solution [10], [11], with remaining time, can also be useful.
its own event logs database, that is reachable at port As a result of this project, process mining techniques have
80, providing admin/admin as access credentials for the been integrated in the viadee Unternehmensberatung AG soft-
interface. The services and the web interface are offering ware stack. The project located at https://github.com/viadee/
the logs for all the processes contained in Camunda. In camunda-kafka-polling-client proposes an implementation of
the demo interface, a single process is offered to the user. a polling client on top of the Apache Kafka event streaming
platform to poll data from Camunda, and the project located at
The processes contained in Camunda are offered, through
https://github.com/viadee/bpmn.ai proposes a data preparation
the connector, in the web interface, graphically allowing for
pipeline for such process data.
the following operations:
• Process discovery of a directly-follows graph, and of a R EFERENCES
process tree or Petri net discovered using the inductive [1] W. van der Aalst, Process mining: discovery, conformance and enhance-
miner algorithm. While the model itself is already known, ment of business processes. Springer, 2011, vol. 2.
[2] M. De Leoni and W. van der Aalst, “Data-aware process mining:
the frequency and performance information are important discovering decisions in processes using alignments,” in Proceedings
to understand which parts of the process are more critical of the 28th annual ACM symposium on applied computing. ACM,
for key performance indicators and service level agree- 2013, pp. 1454–1461.
[3] A. Rozinat, M. Wynn, W. van der Aalst, A. Ter Hofstede, and C. Fidge,
ments. “Workflow simulation for operational decision support using yawl and
• Cases Exploration: understanding which cases have prom,” BPM Center Report BPM-08-04, BPMcenter. org, vol. 298, pp.
longer duration, and which are the events of such cases. 302–306, 2008.
[4] R. M. Dijkman, M. Dumas, and C. Ouyang, “Semantics and analysis of
• Social Network Analysis: shows the interaction between
business process models in bpmn,” Information and Software technology,
the organizational resources using the Camunda BPM vol. 50, no. 12, pp. 1281–1294, 2008.
engine through some classic metrics. [5] J. Muñoz-Gama and J. Carmona, “A fresh look at precision in process
conformance,” in International Conference on Business Process Man-
The deployment of the connector through docker-compose is agement. Springer, 2010, pp. 211–226.
integrated with the process mining tool, but the event log [6] R. J. C. Bose, W. van der Aalst, I. Žliobaitė, and M. Pechenizkiy,
“Handling concept drift in process mining,” in International Conference
is available for usage in other process mining platforms. An on Advanced Information Systems Engineering. Springer, 2011, pp.
example log that is extracted by the technique is available at 391–405.
http://www.alessandroberti.it/invoice.xes. [7] M. Polato, A. Sperduti, A. Burattin, and M. de Leoni, “Time and activity
sequence prediction of business process instances,” Computing, vol. 100,
no. 9, pp. 1005–1031, 2018.
V. C ONCLUSION [8] N. Tax, I. Verenich, M. La Rosa, and M. Dumas, “Predictive business
process monitoring with lstm neural networks,” in International Confer-
In this paper, we presented a tool to extract event logs ence on Advanced Information Systems Engineering. Springer, 2017,
pp. 477–492.
and process models from the Camunda system and analyzed [9] W. van der Aalst, H. A. Reijers, and M. Song, “Discovering social
the applicability of process mining tools on such models networks from event logs,” Computer Supported Cooperative Work
and logs. Thereby, process mining comes into reach of all (CSCW), vol. 14, no. 6, pp. 549–593, 2005.
[10] A. Berti, S. J. van Zelst, and W. van der Aalst, “Process Mining for
organizations using Camunda with almost no effort. The Python (PM4Py): Bridging the Gap Between Process-and Data Science,”
extractor we implemented was also integrated with the open- in ICPM Demo Track (CEUR 2374), 2019, p. 13–16.
source process mining tool PM4Py, along with instructions [11] A. Berti, S. van Zelst, and W. van der Aalst, “PM4Py Web Services: Easy
Development, Integration and Deployment of Process Mining Features
1 The command docker-compose up starts the docker containers that are in any Application Stack,” in BPM Demo Track, 2019.
referred in the docker-compose.yml file