<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>C. Drodt); sven.weinzierl@fau.de (S. Weinzierl)
~ http://fg-bks.uni-koblenz.de/ (C. Drodt)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>The Recomminder: A Decision Support Tool for Predictive Business Process Monitoring</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christoph Drodt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sven Weinzierl</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Matzner</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick Delfmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for IS Research, Universität Koblenz-Landau</institution>
          ,
          <addr-line>Koblenz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Information Systems, Friedrich-Alexander-Universität Erlangen-Nürnberg</institution>
          ,
          <addr-line>Nuremberg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Predictive business process monitoring (PBPM) provides a set of techniques to optimize the performance of operational business processes. Most recent PBPM techniques learn predictive models from historical event log data using machine learning algorithms (ML). However, there is no silver bullet approach for diferent event logs, and their performance depends on the characteristics of the underlying event logs. This paper demonstrates the decision support tool Recomminder. The main idea of our tool is to recommend an appropriate pre-processing procedure, an ML algorithm, and the hyper-parameter configuration for a new event log based on its characteristics. While our tool can support researchers to better understand the relation between event log characteristics and ML-driven PBPM techniques, it supports practitioners in developing efective PBPM techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Predictive Business Process Monitoring</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Business Process Management</kwd>
        <kwd>Process Mining</kwd>
        <kwd>Decision Support</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Over the last years, business process management (BPM) researchers have developed a plethora
of predictive business process monitoring (PBPM) techniques. PBPM techniques aim to predict
aspects such as next activities, process outcomes, or next timestamps in running business
processes. Based on these predictions, process stakeholders can proactively intervene in
running business processes. In doing that, process stakeholders can improve the performance of
operational business processes by mitigating risks or avoiding failures before these occur, or
exploiting potentials in time [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Most recent PBPM techniques generate predictions through predictive models learned from
historical event log data using machine learning (ML) algorithms [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Driven by the goal to
achieve accurate predictions, many techniques with diferent data pre-processing procedures,
ML algorithms, and hyper-parameter configurations have been proposed [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>However, PBPM research has shown that there is no silver bullet approach for event logs
g
n
ii
n
a
r
T
g
n
i
d
n
e
m
m
o
c
e
R</p>
      <sec id="sec-1-1">
        <title>Event logs Log</title>
      </sec>
      <sec id="sec-1-2">
        <title>Analyzer</title>
      </sec>
      <sec id="sec-1-3">
        <title>New Event log</title>
      </sec>
      <sec id="sec-1-4">
        <title>Trainer</title>
      </sec>
      <sec id="sec-1-5">
        <title>Evaluator Log</title>
      </sec>
      <sec id="sec-1-6">
        <title>Analyzer</title>
      </sec>
      <sec id="sec-1-7">
        <title>Recommender</title>
      </sec>
      <sec id="sec-1-8">
        <title>Decision</title>
      </sec>
      <sec id="sec-1-9">
        <title>Support</title>
        <p>
          with diferent characteristics, even if predictive models are learned using deep learning (DL)[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Instead, most approaches are typically optimized based on a specific set or type of event logs
but often show a poor generalization ability if applied to other event logs. In general, event logs
difer regarding their characteristics, e.g., the number of activities, number of trace variants, or
number of categorical data attributes. So far, little is known about the relation between event
log characteristics and ML-driven approaches in BPM research. As a logical consequence of
this missing understanding, it is challenging for practitioners to adopt an approach designed
and implemented by research because their available event logs may comprise characteristics
to that such an approach is generally not geared.
        </p>
        <p>To overcome this challenge, we demonstrate the decision support tool Recomminder in this
paper for the task of predicting next activities. The tool is mainly structured into the four
components LogAnalyzer, Trainer, Evaluator, and Recommender (see Fig. 1). For a given set
of event logs, the LogAnalyzer component first determines event log characteristics. Second,
the Trainer component learns predictive models for diferent constellations of pre-processing
procedures, ML algorithms, and hyper-parameter configurations. Third, the Evaluator tests the
predictive models and calculates ML metrics to assess the models’ predictive quality. Forth, the
Recomminder component selects per event log the predictive model with the highest predictive
quality and creates a meta predictive model in form of a decision tree (DT) that learns the
mapping from the event log characteristics to the best performing (non-meta) predictive models.
Finally, the meta predictive model can be applied to a new event log to determine an appropriate
pre-processing procedure, ML algorithm, and hyper-parameter configuration based on its
characteristics.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Backend</title>
      <p>In this section, we explain the structure of the Recomminder and elaborate its technical
background. The presented tool is written in Python 3.8.2. and is designed as a Python package,
split into two main modules: backend (this section) and frontend (Section 3). The backend is
organized into four components, namely Log Analyzer, Trainer, Evaluator and Recommender,
which will be described in more detail in this section. Advanced users can access these
components and their modules by simply importing them and, thus, accessing their methods in
their code. Additionally, we designed two phases that represent the fundamental workflows
of Recomminder: the ofline phase (Training/Feeder) and the online phase (Recommending)
(see Fig. 1 and Section 1). On the coding perspective, those phases can be executed by calling
Recomminder.train(), and Recomminder.recommend(). To make the features of the Recomminder
also available for non-developers, we designed a web-based frontend that can be started via the
shell or command line. For more details about the frontend, please see Section 3.</p>
      <p>Another important aspect is to allow other developers to extend this artifact, especially the
Trainer component, by implementing more classifiers. We developed a superclass to support
developers in implementing further classifiers. More details on executing, extending, and
installing the Recomminder can be found in the repository (see Section 4).</p>
      <sec id="sec-2-1">
        <title>2.1. Log Analyzer</title>
        <p>
          In terms of the feature extraction, we consider three diferent types of event log
characteristics: event-log-based (e.g., concerning events, activities, and traces), process-model-based (e.g.,
concerning loops, noise, variants, and gateways), and process-context based features (e.g.,
concerning categorical attributes and numerical attributes). To extract some of these features, a
Process Discovery algorithm has to produce a process model. In this tool, we used PM4PY1’s
data structure for event logs and its Heuristic Miner [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] implementation to create a Petri net
of a event log file. The Heuristic Miner is a common discovery algorithm in process mining
(PM) that can, e.g., handle loops and noise in event log data [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ]. After feature extraction, the
results are stored in an SQLite database for later use.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Trainer</title>
        <p>
          To evaluate the best classifier for a given event log file, a predictive model must be trained
for each classifier. In the Recomminder tool, we implemented two classifiers using existing
ML frameworks: Random Forest2[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and LSTM3[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The presented tool also provides some
methods for pre-processing the event log. This includes event encoding methods (ordinal
and one-hot encoding for categorical attributes and min-max normalization for continuous
attributes), as well as sequence encoding techniques (window-based and index-based prefix
generation). In addition, we implemented a test-train-split method to generate test and train
data sets. Before executing the training, we evaluated the best fitting hyper-parameters using
the Optuna4 framework with 20 optimization rounds and a subset of the training set (90%
training and 10% test split). Finally, the results are stored in the database. After each classifier
has been trained, the model is transferred to the Evaluator.
        </p>
        <p>1https://pm4py.fit.fraunhofer.de
2https://scikit-learn.org/
3https://www.tensorflow.org
4https://optuna.org</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Evaluator</title>
        <p>This component evaluates diferent common metrics (accuracy, precision, recall and f1 score)
and stores them in the database. Once all Trainer processes have finished, the best classifier
for an event log file is extracted and used as target values. In addition, the Evaluator gathers
corresponding extracted features of the log which are used as input samples. Those data is the
training set for the DT. The training of the DT is performed by the Evaluator and the resulting
model represents the meta predictive model for the Recommender. Finally, we retrieve feature
importance values from the trained meta model, that is the outcome of this component, and
store a visualisation of the evaluation.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Recommender</title>
        <p>As stated in the previous section, the Recommender components rely on a DT which is trained
with the results from the Log Analyzer and Trainer. By calling the Recomminder.recommend()
method, the Recommender uses the Log Analyzer to extract the feature of a given event log file
and feeds them to the trained DT model. Following, the best matching classifier is returned.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Frontend</title>
      <p>In addition to the developer access level, we also provide a rich, web-based frontend to allow
nondevelopers to use the Recomminder. To implement the frontend, this tool uses TurboGears25 for
content delivery, Bootstrap6 to style the web pages, and jQuery7 for asynchronous functions.
The navigation is split into the two workflows described previously and is located at the top of
the page. Stepping into one phase, the tool provides a sub-navigation that leads the user through
the necessary steps. In both phases, users can upload event log files via an asynchronous upload
script, so that multiple event logs can be provided easily. The frontend directly interacts with the
backend and starts both functions (train end recommend) as a new thread. Using the threading
extension of python, the frontend can proceed without waiting for the new process to finish.
Further on, the frontend shows a live view on the Recomminder’s log file, so users can always
follow current process steps and retrieve the process. Finally, the frontend presents the results
and in case of the training phase, it also includes three figures, including a representation of
the DT, a plot of the metrics, and a bar chart stating the feature importance. Every progress is
stored in the session, generated by TurboGears2, which allows users to return to the frontend
after some time and pick the process up where they left it.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This paper demonstrated the decision support tool Recominder, consisting of four components
that can support research and practice. With our tool, researchers can better understand the
relationship between event log characteristics and ML-driven PBPM techniques. On the other
5https://turbogears.org
6https://getbootstrap.com
7https://jquery.com
hand, our tool can support practitioners in the development of efective PBPM techniques. In
future research, we plan to extend the tool by:
• further prediction tasks such as the next timestamp prediction,
• further granularities of prediction tasks such as prediction per prefix size or prediction
per decision point,
• further ML algorithms to learn predictive models,
• further sequence and event encoding techniques,
• further intrinsic explainable ML algorithms such as linear regression to learn the mapping
between event log characteristics and the best-performing predictive models, and
• ensemble learning techniques such as stacking or boosting to improve the prediction
accuracy by combining several predictive models.</p>
      <p>A demonstration video of the Recomminder can be found at https://youtu.be/37ikmt9g818.
The source code is of the decision tool is available under the Lesser GNU Public License (LGPL)
at https://gitlab.uni-koblenz.de/fg-bks/predictive-recommining.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Marquez-Chamorro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Resinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ruiz-Cortes</surname>
          </string-name>
          ,
          <article-title>Predictive Monitoring of Business Processes: A Survey</article-title>
          ,
          <source>IEEE Transactions on Services Computing</source>
          <volume>11</volume>
          (
          <year>2018</year>
          )
          <fpage>962</fpage>
          -
          <lpage>977</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Milani</surname>
          </string-name>
          , Predictive Process Monitoring Methods:
          <article-title>Which One Suits Me Best?</article-title>
          , in: Business Process Management, volume
          <volume>11080</volume>
          , Springer International Publishing,
          <year>2018</year>
          , pp.
          <fpage>462</fpage>
          -
          <lpage>479</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zschech</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Janiesch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bonin</surname>
          </string-name>
          ,
          <article-title>Process Data Properties Matter: Introducing Gated Convolutional Neural Networks (GCNN) and Key-Value-Predict Attention Networks (KVP) for Next Event Prediction with Deep Learning, Decision Support Systems 143 (</article-title>
          <year>2021</year>
          )
          <fpage>113494</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Weinzierl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zilker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brunk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Revoredo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matzner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eskofier</surname>
          </string-name>
          ,
          <article-title>An Empirical Comparison of Deep-Neural-Network Architectures for Next Activity Prediction Using Context-Enriched Process Event Logs</article-title>
          , arXiv (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kratsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Manderscheid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Röglinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seyfried</surname>
          </string-name>
          ,
          <article-title>Machine Learning in Business Process Monitoring: A Comparison of Deep Learning and Classical Approaches Used for Outcome Prediction</article-title>
          ,
          <source>Business &amp; Information Systems Engineering</source>
          <volume>63</volume>
          (
          <year>2021</year>
          )
          <fpage>261</fpage>
          -
          <lpage>276</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Weijters</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , A. K. Alves de Medeiros,
          <article-title>Process Mining with the HeuristicsMiner Algorithm</article-title>
          , Citeseer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Kurniati</surname>
          </string-name>
          , G. Kusuma, G. Wisudiawan,
          <article-title>Implementing heuristic miner for diferent types of event logs</article-title>
          ,
          <source>International Journal of Applied Engineering Research</source>
          <volume>11</volume>
          (
          <year>2016</year>
          )
          <fpage>5523</fpage>
          -
          <lpage>5529</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bordbar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tino</surname>
          </string-name>
          ,
          <article-title>A principled approach to mining from noisy logs using Heuristics Miner, in: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)</article-title>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>119</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Random Forests,
          <source>Machine Learning</source>
          <volume>45</volume>
          (
          <year>2001</year>
          )
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <source>Long Short Term Memory. Neural Computation, Neural Computation</source>
          <volume>9</volume>
          (
          <year>1997</year>
          )
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>