<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Open-Source Integration of Process Mining Features Into the Camunda Workflow Engine: Data Extraction and Challenges</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Berti</string-name>
          <email>a.berti@pads.rwth-aachen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wil van der Aalst</string-name>
          <email>wvdaalst@pads.rwth-aachen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Zangy</string-name>
          <email>david.zang@viadee.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Magdalena Langy</string-name>
          <email>magdalena.lang@viadee.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Process and Data Science Department, RWTH Aachen University Process and Data Science department</institution>
          ,
          <addr-line>Lehrstuhl fur Informatik 9 52074 Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-Process mining provides techniques to improve the performance and compliance of operational processes. Although sometimes the term “workflow mining” is used, the application in the context of Workflow Management (WFM) and Business Process Management (BPM) systems is limited. The main reason is that WFM/BPM systems control the process, leaving less room for flexibility and the corresponding deviations. However, as this paper shows, it is easy to extract event data from systems like Camunda, one of the leading open-source WFM/BPM systems. Moreover, although the respective process engines control the process flow, process mining is still able to provide valuable insights, such as the analysis of the performance of the paths and the mining of the decision rules. This demo paper presents a process mining connector to Camunda that extracts event logs and process models, allowing for the application of existing process mining tools. We also analyzed the added value of different process mining techniques in the context of Camunda. We discuss a subset of process mining techniques that nicely complements the process intelligence capabilities of Camunda. Through this demo paper, we hope to boost the use of process mining among Camunda users. Index Terms-Process Mining; Workflow Management; Data Extraction and Preprocessing; Process Engine</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>I. INTRODUCTION</p>
      <p>
        The vast majority of business processes (including
enterprise resource planning, customer relationship management,
document management) are nowadays supported by
information systems. These systems manage (but not always regulate)
the execution of a business process, and record event data
with fine detail about each step of the process. In this context,
process mining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] allows to improve operational processes by
exploiting the event data recorded by such systems. An event
log can be extracted from an information system’s database
in order to apply the process mining algorithms. An event
log contains event data of multiple executions of the business
process. Process mining techniques include: process discovery,
i.e., the automated discovery of a process model from event
data; conformance checking, i.e., the comparison between a
process model and the event data; model enhancement, i.e.,
the enrichment of the model with additional perspectives (for
example, execution guards [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), prediction and simulation
algorithms. Open-source software supporting process mining
includes ProM, APROMORE and PM4Py.
      </p>
      <p>
        Process mining techniques have been applied to
Workflow Management (WFM) and Business Process Management
(BPM) systems. There are connectors to the YAWL WFM
system [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and some other WFM/BPM systems, including
Signavio, Bizagi, and Bonita, that allow to to extract the event
data and operate on the process models contained in such
systems. This paper focuses on Camunda and is a result of a
collaborative project between the RWTH Aachen University
and viadee Unternehmensberatung AG. Before, there was
no open-source connector to extract logs useful for process
mining purposes from Camunda, although Camunda is
opensource and holds detailed event data. Therefore, we developed
and evaluated such a connector, and viadee has integrated
event log extraction techniques in its software stack.
      </p>
      <p>Camunda is widely used, e.g., by Deutsche Telekom,
Warner Music, Allianz, DB, Zalando and Generali. Camunda
uses the BPMN 2.0 notation for modeling. Among the main
selling points of Camunda are high throughput and
collaboration and integration possibilities. Camunda can be easily
integrated with different information systems, business
intelligence and big data systems such as QlikView, Apache Spark
and Kafka. This explains our goal to provide process mining
for the large Camunda user base.</p>
      <p>In this demo paper, (A) we present our implementation of a
process mining extractor for Camunda, that is able to extract
a set of event logs for the processes executed by Camunda,
and (B) we discuss the existing process mining techniques that
complement the business intelligence capabilities of Camunda.</p>
      <p>Figure 1 provides an overview of the approach implemented
in the paper. The extractor is publicly available. For
demonstrative purposes, it is integrated with a graphical interface
based on PM4Py that offers basic process mining functions.</p>
      <p>Moreover, the techniques analyzed in this paper are available
in open-source software.</p>
      <p>The remainder of this demo paper is organized as follows.</p>
      <p>Section II describes the basic structure of the Camunda
database, along with a methodology of extraction of event logs
and process models from the Camunda database. Section III
discusses the added value of process mining techniques for
Camunda users. Section IV describes the set-up of the tool.</p>
      <p>Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>Finally, Section V concludes the paper.</title>
      <p>II. EXTRACTING EVENT LOGS AND PROCESS MODELS</p>
      <p>FROM CAMUNDA</p>
      <p>In this section, we will focus on how to extract logs
containing the historical executions of the processes supported
by Camunda, and how to extract the process models of such
processes.</p>
      <p>1) Extracting Event Logs: The extraction is done directly
at the database level. The Camunda workflow engine
supports different relational databases (e.g., PostgreSQL, Oracle,
MySQL). We will focus exclusively on the completed
executions, and ignore ongoing executions.</p>
      <p>The table containing the historical executions of the
processes is the ACTI HI ACTINST table. The rows of this table
are the events thrown by Camunda. The table contains all the
basic information that is needed to extract event logs:</p>
      <p>The identifier of the process that is executed is stored
in the proc def key column. This column contains as
many different values as processes are executed via the
Camunda process engine.</p>
      <p>The identifier of the process execution (case ID) is stored
inside the proc inst id column.</p>
      <p>The name of the BPMN element that are executed via the
Camunda process engine is stored inside the act name
column.</p>
      <p>The type of the BPMN element is stored inside the
act type column.</p>
      <p>The start and end timestamps are stored inside the
start time and the end time columns, respectively.</p>
      <p>The identifier of the resource that performs the event is
stored inside the assignee column.</p>
      <p>These attributes can be useful to investigate the process more
thoroughly, also for predictive analyses.</p>
      <p>An important point is that also the traversal of gateways
and internal or boundary events are included in the event log.</p>
      <p>So, not only the tasks are recorded, but the exact path of
the model. This can simplify the frequency or performance
decoration of the process model: performing token-based
replay or alignments to find the path that is followed is not
necessary. A postprocessing activity is only necessary when
the execution of tasks needs to be analysed.</p>
      <p>We will present the implementation of an connector in
Section IV. A property of the connector is that it is incremental:
the first extraction extracts all the events from the beginning
of the time, while the following extractions extract only the
events that are inserted since the previous extraction. This
permits to keep the log updated, keeping the workload low.</p>
      <p>2) Extracting Process Models: Aside from event logs, we
can also extract the BPMN models of the processes
supported by the workflow engine. In a Tomcat distribution of
Camunda, each process supported by Camunda has its own
folder in PATH-TO-CAMUNDA-SERVER/webapps/. As
example, if a process has name invoice, its corresponding folder is
PATH-TO-CAMUNDA-SERVER/webapps/invoice. To extract
the BPMN model associated to the invoice process, the content
of the
PATH-TO-CAMUNDA-SERVER/webapps/invoice/WEBINF/classes folder should be taken. As another possibility, we
could refer to querying the REST API for the process diagram.</p>
      <p>The extracted BPMN models can be imported in different
process mining tools. In order to perform analyses such as
decision mining and conformance checking (see Section III),
the BPMN model should be converted to a Petri net model.</p>
      <p>
        This is difficult for many constructs (for example, OR-joins
and OR-splits, swimlanes, subprocesses) and thus can lead to
problems for complex real-life processes. An overview of the
problematics of conversion from a BPMN model to a Petri net
is found in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Basically, an event log is created for each distinct value
of the proc def key column. The resulting table for an
individual process is enough to analyze the control flow of
the process and its bottlenecks. Other attributes at the event
level can be obtained by merging the ACT HI ACTINST table III. PROCESS MINING ON TOP OF CAMUNDA
with the ACT HI DETAIL table. The latter contains a row In the previous section, we have described an approach
for each distinct attribute that is associated with an event. to extract event logs for the different processes supported
by Camunda. This enables the application of several process
mining tools and techniques. In this section, we want to
analyze which process mining techniques are most useful in
the context of the Camunda processes. The techniques are
implemented and released as open-source software, including
the one based on PM4Py presented in Section IV. Table I
provides an overview of the approaches, along with their pros
and cons.</p>
      <p>1) Process Discovery and Conformance Checking: The two
most popular process mining disciplines are process discovery
and conformance checking. The scope of application of
process discovery is pretty limited, since the event data contained
in the database is regulated by the process models inserted in
Camunda. A BPMN model is also a formal model that enables
the application of conformance checking techniques. For less
regulated processes, the goal of conformance checking is to
identify deviations in the process model, and the executions
of the process are evaluated by their fitness according to the
process model. For WFM/BPM systems, we can expect to have
perfect fitness for all the process executions. However, another
application of conformance checking is the measurement of
precision. A model is precise when it does not allow for extra
behavior, i.e., behavior that does not appear in the event data.</p>
      <p>
        Models can have low precision when they flexibly allow the
execution sequence of activities. Hence, measuring precision
can provide a measure for the “flexibility” of the process
model. A popular measure for precision is proposed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>2) Decision Mining: The application of a decision mining
technique allows to enrich the model with execution guards
that are extracted automatically from the event data. These are
conditions that are required in order to execute a path in the
model. Hence, decision mining helps to reduce the amount of
behavior allowed in the process model by an adaptation of the
model towards the guards that are discovered by the technique.</p>
      <p>
        A mature approach for decision mining is proposed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. On
the other hand, BPMN models are often already decorated
with execution guards that are defined in the design phase.
      </p>
      <p>Hence, decision mining could end up finding exactly the same
guards without adding anything new. Another problem is the
discovery of trivial guards, or guards that overfit the data. A
careful selection of the guards is necessary after performing
the analysis.</p>
      <p>
        3) Concept Drift Analysis: A concept drift analysis allows
to identify the points in time where the execution of the
process changes. Different types of concept drifts exist: sudden
drifts (where the process becomes immediately significantly
different), gradual drifts and seasonal drifts. An approach for
the detection of concept drifts is described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. While the
technique is interesting, many concept drift points are already
known in the context of WfMSs as Camunda: for example, the
underlying BPMN schema changes, or there are differences in
the execution of a process between day and night.
      </p>
      <p>
        4) Prediction: Given an incomplete process execution, it
may be useful to estimate the remaining execution time
based on historical executions. Several approaches have been
proposed [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], however in our experiments the quality of
predictions on top of real-life datasets is still not completely
satisfying. Moreover, this paper only covers the extraction of
complete (historical) process instances.
      </p>
      <p>
        5) Other Analyses: In this category, we include social
network analysis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], that with different metrics (such as the
handover of work, the working together, the similar activities
metric) calculates the collaboration between the different
organizational resources. Model enhancement with frequency and
performance metrics is particularly important to identify the
bottlenecks of the process (from a performance point of view)
and the most frequent paths.
      </p>
    </sec>
    <sec id="sec-3">
      <title>IV. SET-UP OF THE CONNECTOR</title>
      <p>The connector presented
completely open-source
and
in</p>
      <p>this
is</p>
      <p>section
available
is
at
https://github.com/Javert899/incremental-camunda-parquetexporter. A completely working demo environment can be
easily obtained by using docker-compose inside the folder of
the project1. In the prepared environment, there are:
A PostgreSQL relational database that is supporting the
Camunda BPM engine and is exposed at port 5432.
The Camunda BPM engine, that is running at port 8080.
The Camunda interface can be reached at http://localhost:
8080/camunda-welcome/index.html. The installation
contains some demonstrative models and event data that can
be extracted by the connector and used for process mining
analysis.</p>
      <p>The connector, that is written in the Python language
and is configured to reach the PostgreSQL database (for
the extraction of the event data) and the Camunda BPM
docker container (to extract the BPMN models).</p>
      <p>
        An open-source process mining solution [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], with
its own event logs database, that is reachable at port
80, providing admin/admin as access credentials for the
interface. The services and the web interface are offering
the logs for all the processes contained in Camunda. In
the demo interface, a single process is offered to the user.
      </p>
      <p>The processes contained in Camunda are offered, through
the connector, in the web interface, graphically allowing for
the following operations:</p>
      <p>Process discovery of a directly-follows graph, and of a
process tree or Petri net discovered using the inductive
miner algorithm. While the model itself is already known,
the frequency and performance information are important
to understand which parts of the process are more critical
for key performance indicators and service level
agreements.</p>
      <p>Cases Exploration: understanding which cases have
longer duration, and which are the events of such cases.
Social Network Analysis: shows the interaction between
the organizational resources using the Camunda BPM
engine through some classic metrics.</p>
      <p>The deployment of the connector through docker-compose is
integrated with the process mining tool, but the event log
is available for usage in other process mining platforms. An
example log that is extracted by the technique is available at
http://www.alessandroberti.it/invoice.xes.</p>
      <p>V. CONCLUSION</p>
      <p>In this paper, we presented a tool to extract event logs
and process models from the Camunda system and analyzed
the applicability of process mining tools on such models
and logs. Thereby, process mining comes into reach of all
organizations using Camunda with almost no effort. The
extractor we implemented was also integrated with the
opensource process mining tool PM4Py, along with instructions
1The command docker-compose up starts the docker containers that are
referred in the docker-compose.yml file
on how to deploy a complete environment that contains
Camunda supported by the PostgreSQL database, the process
mining tool, and the extractor. The deployment shows how
Camunda can be extended with process mining capabilities
in a straightforward way. While the PM4Py deployment is
for demonstrative purposes, the extractor can be used on
reallife deployments of Camunda and combined with any process
mining tool. An example use case for our tool is proposed at
http://www.alessandroberti.it/only appendix.pdf.</p>
      <p>Next to process discovery and conformance checking, our
integration allows to discover the bottlenecks of the processes
and to identify execution guards that are not contained in
the process model, but implicitly assumed by the resources
performing the process. Thereby, the analysis can help to
improve the quality of the BPMN model. Other analysis, such
as the detection of concept drifts, and the prediction of the
remaining time, can also be useful.</p>
      <p>As a result of this project, process mining techniques have
been integrated in the viadee Unternehmensberatung AG
software stack. The project located at https://github.com/viadee/
camunda-kafka-polling-client proposes an implementation of
a polling client on top of the Apache Kafka event streaming
platform to poll data from Camunda, and the project located at
https://github.com/viadee/bpmn.ai proposes a data preparation
pipeline for such process data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Process mining: discovery, conformance and enhancement of business processes</article-title>
          . Springer,
          <year>2011</year>
          , vol.
          <volume>2</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>M. De Leoni</surname>
            and
            <given-names>W. van der Aalst,</given-names>
          </string-name>
          “
          <article-title>Data-aware process mining: discovering decisions in processes using alignments,”</article-title>
          <source>in Proceedings of the 28th annual ACM symposium on applied computing. ACM</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1454</fpage>
          -
          <lpage>1461</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rozinat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wynn</surname>
          </string-name>
          , W. van der Aalst,
          <string-name>
            <given-names>A. Ter</given-names>
            <surname>Hofstede</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Fidge</surname>
          </string-name>
          , “
          <article-title>Workflow simulation for operational decision support using yawl and prom</article-title>
          ,
          <source>” BPM Center Report BPM-08-04</source>
          , BPMcenter. org, vol.
          <volume>298</volume>
          , pp.
          <fpage>302</fpage>
          -
          <lpage>306</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          , “
          <article-title>Semantics and analysis of business process models in bpmn,” Information and Software technology</article-title>
          , vol.
          <volume>50</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>1281</fpage>
          -
          <lpage>1294</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mun</surname>
          </string-name>
          <article-title>˜oz-Gama and</article-title>
          J. Carmona, “
          <article-title>A fresh look at precision in process conformance</article-title>
          ,” in International Conference on Business Process Management. Springer,
          <year>2010</year>
          , pp.
          <fpage>211</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R. J. C.</given-names>
            <surname>Bose</surname>
          </string-name>
          , W. van der Aalst, I. Zˇ liobaite˙, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechenizkiy</surname>
          </string-name>
          , “
          <article-title>Handling concept drift in process mining</article-title>
          ,
          <source>” in International Conference on Advanced Information Systems Engineering</source>
          . Springer,
          <year>2011</year>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>405</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sperduti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          , and M. de Leoni, “
          <article-title>Time and activity sequence prediction of business process instances</article-title>
          ,
          <source>” Computing</source>
          , vol.
          <volume>100</volume>
          , no.
          <issue>9</issue>
          , pp.
          <fpage>1005</fpage>
          -
          <lpage>1031</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tax</surname>
          </string-name>
          , I. Verenich,
          <string-name>
            <given-names>M. La</given-names>
            <surname>Rosa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          , “
          <article-title>Predictive business process monitoring with lstm neural networks</article-title>
          ,
          <source>” in International Conference on Advanced Information Systems Engineering</source>
          . Springer,
          <year>2017</year>
          , pp.
          <fpage>477</fpage>
          -
          <lpage>492</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            W. van der Aalst,
            <given-names>H. A.</given-names>
            <surname>Reijers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Song</surname>
          </string-name>
          , “
          <article-title>Discovering social networks from event logs,” Computer Supported Cooperative Work (CSCW)</article-title>
          , vol.
          <volume>14</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>549</fpage>
          -
          <lpage>593</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          , and W. van der Aalst, “
          <article-title>Process Mining for Python (PM4Py): Bridging the Gap Between Process-and Data Science,”</article-title>
          <source>in ICPM Demo Track (CEUR 2374)</source>
          ,
          <year>2019</year>
          , p.
          <fpage>13</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. van Zelst</surname>
          </string-name>
          , and W. van der Aalst, “PM4Py Web Services:
          <article-title>Easy Development, Integration and Deployment of Process Mining Features in any Application Stack,”</article-title>
          <source>in BPM Demo Track</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>