<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Mining and Simulation to Guide Users Towards Process Improvement in mpmX</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nina Kinkelin</string-name>
          <email>nina.kinkelin@mehrwerk.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Clemens Schreiber</string-name>
          <email>clemens.schreiber@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Josua Reimold</string-name>
          <email>josua.reimold@mehrwerk.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Karlsruhe Institute of Technology</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MEHRWERK GmbH</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Process mining and simulation can be a powerful combination to analyze and improve business processes. While process mining is commonly applied to analyze past process executions (as-is process), simulation allows a user to explore process executions, which might occur in the future (to-be process). Yet, the combination of process mining and simulation is not commonly used in practice. We see two main reasons for this, which we attempt to solve: (1) missing tool support for the creation and execution of process simulations based on event logs, (2) missing guidance for the user-based adaptation of simulation scenarios. Hence, we are introducing a simulation extension to the Mehrwerk ProcessMining software mpmX, which on the one hand enables the automatic discovery and execution of process simulation models based on an event log, while at the same time providing suggestions for the creation of alternative simulation scenarios. The simulation scenarios thereby cover a subset of the discovered process variants from the as-is process and enable a user to analyze the change in process performance based on a more standardized process.</p>
      </abstract>
      <kwd-group>
        <kwd>Process mining</kwd>
        <kwd>interactive process simulation</kwd>
        <kwd>business process reengineering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The combination of process mining and simulation allows a user to compare process executions
in the past (as-is process) with process executions, that might occur in the future (to-be process)
[
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. This also allows a user to simulate how process reengineering based on process mining
might impact the process performance in the future [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. As shown in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] such an approach can
improve business processes across multiple domains. Yet, the user support for an integration of
process mining and simulation is lacking. We are, therefore, introducing a simulation extension
to the existing mpmX process mining tool [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which supports the integration of process
mining and simulation on multiple levels, i.e., the analysis of the as-is process, the creation of a
simulation scenario, the execution of the simulation, and the comparison between as-is and
to-be process. A user can thereby actively influence the creation of the simulation scenario
based on the selection of relevant process variants, the number of cases to be simulated, the
maximum trace length for each simulated case, and the case arrival ratio. We show in a use
CEUR
Workshop
Proceedings
case, that our tool is able to generate suggestions for simulation scenarios, which can lead to
process improvement in terms of throughput and waiting time, and allows for a comprehensive
comparison between as-is and to-be process. A video tutorial with a demonstration of the tool
is available at https://vimeo.com/user167009028/mpmxsim?share=copy.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Guided Process Simulation</title>
      <p>Our guided process simulation approach consists of five main steps (Fig. 1): 1. load and analyze
an event log in mpmX, 2. define the simulation scenario, 3. create the simulation model, 4.
execute the simulation, and 5. analyze the simulation result. The user is guided through each
step by the mpmX tool and the extension. In the following, we will shortly describe each step.</p>
      <sec id="sec-3-1">
        <title>Load and</title>
      </sec>
      <sec id="sec-3-2">
        <title>Analyze Event Log</title>
      </sec>
      <sec id="sec-3-3">
        <title>Analyze</title>
      </sec>
      <sec id="sec-3-4">
        <title>Simulation</title>
      </sec>
      <sec id="sec-3-5">
        <title>Results</title>
      </sec>
      <sec id="sec-3-6">
        <title>Define</title>
      </sec>
      <sec id="sec-3-7">
        <title>Simulation</title>
      </sec>
      <sec id="sec-3-8">
        <title>Scenario</title>
      </sec>
      <sec id="sec-3-9">
        <title>Execute</title>
      </sec>
      <sec id="sec-3-10">
        <title>Simulation</title>
      </sec>
      <sec id="sec-3-11">
        <title>Create</title>
      </sec>
      <sec id="sec-3-12">
        <title>Simulation</title>
      </sec>
      <sec id="sec-3-13">
        <title>Model</title>
        <sec id="sec-3-13-1">
          <title>2.1. Load and analyze event log</title>
          <p>After loading an event log into mpmX, the mpmX process variant analyzer detects all existing
process variants, i.e., event sequences in the event log. In addition, the process variants are
analyzed based on their frequency, i.e., the number of occurrences in the event log, and their
average lead time. This information is provided to the user in a process variant overview (see
Fig. 2).</p>
        </sec>
        <sec id="sec-3-13-2">
          <title>2.2. Define simulation scenario</title>
          <p>Based on the process variant overview, the user is able to select relevant process variants for
the simulation. The user can for example exclude process variants with a low frequency and a
high lead time, if she assumes, that these variants could be omitted in the future. In this way
the process simulation model ends up being a more standardized version of the as-is process.
By hovering the cursor over the diferent variants, the user can also see the specific event
sequences. In this way the user is able to keep relevant variants that might be essential to the
process, while at the same time eliminating non-relevant variants with low performance. The
efect of the selection of process variants on the overall process performance thereby depends
on how the eliminated variants are replaced by alternative variants (see Sect. 2.3).</p>
          <p>Further simulation parameters, which can be adjusted by the user, are: the number of cases to
be simulated, the case arrival ratio, and the maximal trace length. These parameters allow the
user to investigate how the process performance changes, based on varying basic simulation
conditions and constraints. It is important to notice that the simulation also considers the
probabilistic distribution of process variants. Hence, the higher the number of cases to be
simulated, the more similar will be the process variant distribution to the distribution observed
in the event log. The default values for the respective parameters are calculated based on the
provided event log.</p>
        </sec>
        <sec id="sec-3-13-3">
          <title>2.3. Create simulation scenario</title>
          <p>The simulation model is created based on the selected process variants in step 2 (see Sect. 2.2).
The elimination of process variants requires a redistribution of relative case frequencies among
the remaining process variants. This redistribution is solved based on a mapping algorithm, i.e.,
the relative case frequency of the eliminated variants is mapped to the most similar variants,
in terms of some distance measure. In this way the relative case frequency of the eliminated
process variant is added to the most similar process variant. This approach assumes that if
a process variant is eliminated, it will be replaced by the most similar process variant in the
future. In the current implementation, the Levenshtein distance is used to identify the similarity
between the process variants in the event log, but also other distance measures are possible. If
there are multiple process variants with an identical minimum distance to an eliminated variant,
then the relative case frequency is randomly distributed to one of them.</p>
        </sec>
        <sec id="sec-3-13-4">
          <title>2.4. Execute simulation</title>
          <p>
            Based on the selected process variants in step 3 (see Sect. 2.3) a Petri net is discovered, which
is then used for discrete event simulation. Our code is based on the implementation provided
by PMSIM [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] and uses the python libraries PM4Py [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] for process discovery and SimPy1 for
discrete event simulation. The main diference between our implementation and PMSIM is that
we do not want to create new process variants during the simulation, which were not selected
by the user during the creation of the simulation scenario. This is achieved by selecting only
those execution paths in the Petri net during the simulation, which also occur in the event log.
In this way it is assured that the simulation actually considers a more standardized (“improved”)
version of the as-is process and does not involve new deviations.
          </p>
        </sec>
        <sec id="sec-3-13-5">
          <title>2.5. Analyze simulation result</title>
          <p>Finally the simulation results are evaluated, i.e., to which extent the standardization, based on
the selected process variants, might lead to a performance improvement. The simulated data
is reloaded into mpmX and the resulting process model, together with diferent performance
indicators is shown in an overview. This overview also allows for a direct comparison between
the as-is process and the simulated process (see Fig. 3).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Maturity</title>
      <p>
        The mpmX simulation extension was tested based on a real-life event data set (BPIC’19). The
data set consists of 1525 cases and 174 process variants. By eliminating the 10 process variants
with the highest lead time and a frequency of one, and without changing any other parameters,
the simulation shows an improvement in average lead time from 95 days to 92 days, an increase
of the automation rate by 0.08 percentage points and a reduction of wait time by 12.74 percentage
points. The experiment was run on an Intel i5 CPU @ 1.60GHz machine with 8GB RAM. The
generation and execution of the simulation model took about 74 seconds. When no process
variants are eliminated, i.e., in case of highest computational complexity, the generation and
simulation took about 80 seconds. This might indicate that our approach is also feasible for the
application in industry, although further testing is needed.
As future work we would particularly like to add two features to the current version: 1.) the
ability to add additional new process variants to the simulation model, which were not in the
asis process based on the event log, 2.) an evaluation of the mapping accuracy, i.e., if the similarity
between the traces is rather high or rather low. This would also provide a better assessment on
the reliability of the simulation. Furthermore, we would also like to integrate additional process
perspectives, such as the resource [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and the data perspective [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, based on the
current version of the mpmX simulation extension we could show that our proposed simulation
approach is viable and can provide additional guidance and insights to existing process mining
analysis.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W. M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Process mining and simulation: a match made in heaven!</article-title>
          .
          <source>In Proceedings of the 50th Computer Simulation Conference</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Pourbafrani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasudevan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zafar</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xingran</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and van der Aalst,
          <string-name>
            <surname>W. M.</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>A Python Extension to Simulate Petri nets in Process Mining</article-title>
          . arXiv preprint, arXiv:
          <fpage>2102</fpage>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Măruşter</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and Van Beest,
          <string-name>
            <surname>N. R.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Redesigning business processes: a methodology based on simulation and process mining techniques</article-title>
          .
          <source>Knowledge and Information Systems</source>
          ,
          <volume>21</volume>
          ,
          <fpage>267</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] Meyer, J.,
          <string-name>
            <surname>Reimold</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wehmschulte</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>An introduction to MPM - MEHRWERK ProcessMining</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2374</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Berti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Zelst</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>PM4Py: A process mining library for Python</article-title>
          .
          <source>Software Impacts</source>
          ,
          <volume>17</volume>
          ,
          <fpage>100556</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>López-Pintado</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halenok</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dumas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Prosimos: Discovering and Simulating Business Processes with Diferentiated Resources</article-title>
          . In International Conference on Enterprise Design, Operations, and
          <string-name>
            <surname>Computing</surname>
          </string-name>
          (pp.
          <fpage>346</fpage>
          -
          <lpage>352</lpage>
          ). Cham: Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Fritsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schüler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Oberweis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Modelling and Execution of Data-Driven Processes with JSON-Nets</article-title>
          .
          <source>In International Conference on Business Process Modeling, Development and Support</source>
          (pp.
          <fpage>29</fpage>
          -
          <lpage>43</lpage>
          ). Cham: Springer Nature Switzerland.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>