<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Process Model Simplification based on Probabilities in Process Tree (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sabya Shaikh</string-name>
          <email>sabya.shaikh@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammadreza Fani Sani</string-name>
          <email>fanisani@pads.rwth-aachen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Process and Data Science (PADS) Chair, RWTH-Aachen University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>-Process discovery aims to describe how a process is actually executed based on recorded data. However, most automated process discovery algorithms result in complex and imprecise process models due to the existence of outlier behavior in the real event logs. The process discovery procedure usually has an exploratory nature and should be done interactively considering users' preferences. This demo paper proposes an interactive ProM plug-in that allows users to simplify the discovered process model. Using this tool, modifying the event log based on the simplified process model is also possible. Index Terms-Process Mining, Process Discovery, Event log preprocessing</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Process discovery is one of the sub-fields of process mining
that aims to describe process models based on recorded events
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The discovered process model is used by the business
analysts to comprehend the information available in event
logs to strengthen their process. Several process discovery
algorithms [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]–[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] discover a process model assuming that
event logs are noise-free. However, in reality, event logs
contain noise due to wrong or incorrect entry that leads to a
complex process model. Therefore, several filtering algorithms
are put forth to eliminate these behaviours at a preprocessing
stage. The filtered event logs must further be provided to the
process discovery algorithms to obtain a simple process model,
hence, making it a wearisome procedure. There are also other
process discovery techniques that have the in-build capability
to eliminate the noise and infrequent behaviour, nevertheless,
all of these algorithms use frequency of variants or activities
for filtering which is not effective when the event log contains
unique variants.
      </p>
      <p>To address this problem, in this demo paper, we aim to
provide an interactive tool, that would enhance the process
model with probabilistic information and allow the business
analysts to adjust the complexity by modifying the process
model interactively to improve the process model’s precision
and simplicity. The pruned process model can be used to filter
the event log. In other words, the user can have the modified
event log based on the modified process model.</p>
      <p>
        We use the process trees discovered by the Inductive miner
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as the base process model that is further pruned based
on the probabilities of nodes computed by the tool and the
threshold provided by the user. A process tree is a process
model that represents a process in the form of a tree (however,
we could not describe all processes with the process tree
notation). This tool lies in both the discovery and enhancement
phases of process mining. It exploits the process exploration
feature to provide an interactive and iterative plug-in. The
developed plug-in is an extension in the ProM Framework [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
that is a framework that allows the development and supports
various process mining algorithms.
      </p>
      <p>This paper is structured as follows. Section II discusses
some of the previous related works and motivation to develop
the tool. The highlights of the tool, the case studies carried
out to evaluate the tool, and the link to demo is mentioned in
section III, while Section IV concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>II. RELATED WORK AND MOTIVATION</title>
      <p>
        Numerous process discovery algorithms are applied on an
event log assuming all behaviours in the event log are
appropriately ignoring the noise. However, this approach generates
a complex process model that is difficult to understand [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
resulting in the analysis being skewed. Some basic process
discovery algorithms represent all the behaviours in a process
model. Hence, this calls for pre-processing of the event log
using filtering techniques such as [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]–[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] in order to discover a
comprehensible process model. Few of the improvised process
discovery algorithms, e.g., [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], have embedded techniques to
filter the infrequent behaviours prior to discovering the process
model. However, these techniques do not guarantee a simple
model always.
      </p>
      <p>Usually, a user provides multiple input parameters and
discovers a process model that could be unsatisfactory. This
process might have to be repeated until a satisfactory model
is discovered which could be exhausting and time-consuming.
Furthermore, we usually need to first understand the general
behaviour of the process and later dig deeper into the process.
Thus, having an interactive and iterative tool would be helpful
in many scenarios. In commercial tools, there exist some basic
interactive filtering methods that simply work based on the
frequency of variants or activities. However, in many
applications, because of the existence of lots of unique variants, this
type of filtering approach is not beneficial.</p>
    </sec>
    <sec id="sec-3">
      <title>III. TOOL</title>
      <p>
        The tool is developed in the ProM framework that eases the
use and integration with other process mining plug-ins. It is a
part of the research available at [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Fig 1 shows an outlook of the developed tool. The input
for the tool is an event log and interactive user-provided
parameters and the output is a (simplified) process model along
with a filtered event log. This tool allows users to set various
parameters interactively in the right panel which include
the threshold for pruning the process tree, event classifier,
frequency types, probability types, and different ways the
log can be filtered. We have three resulting outputs, namely,
a simplified and enriched Petri net and process tree, and a
filtered event log. The Petri net output is displayed on the
left panel. Each node in the process tree holds the following
information: probabilities of its occurrence in the event log,
frequency of the node’s execution, and list of traces that the
node executes. This information is used to get rid of the noisy
and infrequent behavior resulting in a noise-free filtered event
log. We formally described how to compute the probabilities
and simplify a process tee based on them in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
      </p>
      <p>
        A tutorial video demo of our tool is provided here1. A more
comprehensive tutorial of this tool is available in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>A. Tool Capabilities</title>
        <p>The main capabilities of the developed tool are as follows.</p>
        <p>Simplifying the process model while potentially improve
its quality
Enhancing process model with probabilistic information
which is used for simplifying the process model
Filter and modify the given event log based on the
simplified and enhanced process model</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Use Cases</title>
        <p>
          We have used the developed tool to simplify the process
models of four real event logs, i.e., [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]–[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The results show
that our tool improves the quality of the process model when
simplified. We have found that using the developed tool, it is
possible to improve the precision and simplicity of process
models of these event logs. Moreover, the developed tool
performs in a reasonable time that makes it useful for real
applications. For detailed information on the event logs used
and the evaluation procedure carried out, please refer to [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>C. Installation</title>
        <p>Prom 6 nightly build can be used to run and test the plug-in.</p>
        <p>Download and install Prom 6 nightly build available at 2.
Open the Prom Manager and install the ’LogFiltering’
package.</p>
        <p>Open the Prom tool and import an event log
Use plug-in ”Process model simplification and log
filtering” as shown in the recorded demo.</p>
        <p>The source code of this project is available at 3. In this way,
it is also possible to modify the approach and also execute it.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. CONCLUSION</title>
      <p>We present an interactive tool that enhances the process
model with frequency and probability information and enables
the simplification of the process model. Users can further filter
the event logs based on the simplified process model. The
significant contribution of this tool is the simplification of
the process model in a novel way such that high fitness and
precision are maintained while complexity is reduced.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. van der Aalst</surname>
          </string-name>
          ,
          <source>Process Mining - Data Science in Action, Second Edition</source>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W. van der</given-names>
            <surname>Aalst</surname>
          </string-name>
          , T. Weijters, and L. Maruster, “
          <article-title>Workflow mining: Discovering process models from event logs,”</article-title>
          <source>IEEE Trans. Knowl. Data Eng.</source>
          , vol.
          <volume>16</volume>
          , no.
          <issue>9</issue>
          , pp.
          <fpage>1128</fpage>
          -
          <lpage>1142</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. J. J.</given-names>
            <surname>Leemans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          , and W. van der Aalst, “
          <article-title>Discovering blockstructured process models from event logs - A constructive approach,” in Application and Theory of Petri Nets</article-title>
          and Concurrency - 34th
          <source>International Conference</source>
          ,
          <year>2013</year>
          . Proceedings, vol.
          <volume>7927</volume>
          . Springer,
          <year>2013</year>
          , pp.
          <fpage>311</fpage>
          -
          <lpage>329</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>J. M. E. M. van der Werf</surname>
            ,
            <given-names>B. F. van Dongen</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>C. A. J.</given-names>
            <surname>Hurkens</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Serebrenik</surname>
          </string-name>
          , “
          <article-title>Process discovery using integer linear programming,” in Applications and Theory of Petri Nets</article-title>
          , 29th International Conference, vol.
          <volume>5062</volume>
          . Springer,
          <year>2008</year>
          , pp.
          <fpage>368</fpage>
          -
          <lpage>387</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. J. J.</given-names>
            <surname>Leemans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          , and W. van der Aalst, “
          <article-title>Discovering block-structured process models from event logs containing infrequent behaviour,” in Business Process Management Workshops - BPM 2013 International Workshops</article-title>
          , vol.
          <volume>171</volume>
          . Springer,
          <year>2013</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. F. van Dongen</given-names>
            ,
            <surname>A. K. A. de Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J. M. M. Weijters</surname>
          </string-name>
          , and W. van der Aalst, “
          <article-title>The prom framework: A new era in process mining tool support,” in Applications and Theory of Petri Nets 2005, ser</article-title>
          .
          <source>Lecture Notes in Computer Science</source>
          , vol.
          <volume>3536</volume>
          . Springer,
          <year>2005</year>
          , pp.
          <fpage>444</fpage>
          -
          <lpage>454</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fani Sani</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          , and W. van der Aalst, “
          <article-title>Applying sequence mining for outlier detection in process mining,” in On the Move to Meaningful Internet Systems</article-title>
          .
          <source>OTM 2018 Conferences, ser. Lecture Notes in Computer Science</source>
          , vol.
          <volume>11230</volume>
          . Springer,
          <year>2018</year>
          , pp.
          <fpage>98</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Conforti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Rosa</surname>
          </string-name>
          , and
          <string-name>
            <surname>A. H. M. ter Hofstede</surname>
          </string-name>
          , “
          <article-title>Filtering out infrequent behavior from business process event logs</article-title>
          ,
          <source>” IEEE Trans. Knowl. Data Eng.</source>
          , vol.
          <volume>29</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>300</fpage>
          -
          <lpage>314</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fani Sani</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , “
          <article-title>Repairing outlier behaviour in event logs using contextual behaviour,”</article-title>
          <string-name>
            <given-names>Enterp. Model. Inf. Syst. Archit. Int. J.</given-names>
            <surname>Concept</surname>
          </string-name>
          . Model., vol.
          <volume>14</volume>
          , pp.
          <volume>5</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          :
          <fpage>24</fpage>
          ,
          <year>2019</year>
          . [Online]. Available: https://doi:org/10:18417/emisa:14:
          <fpage>5</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          , “
          <article-title>Cleaning structured event logs: A graph repair approach</article-title>
          ,
          <source>” in 31st IEEE International Conference on Data Engineering5. IEEE Computer Society</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaikh</surname>
          </string-name>
          , “
          <article-title>Process model simplification based on probabilities in process tree,” in Master thesis</article-title>
          . RWTH-Aachen University,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          , “
          <string-name>
            <surname>Hospital</surname>
            billing - event log,” https://data:4tu:nl/articles/ dataset/Hospital Billing - Event Log/12705113/1,
            <given-names>Aug</given-names>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Felix</surname>
          </string-name>
          .Mannhardt, “
          <string-name>
            <surname>Sepsis</surname>
            cases - event log,” https://data:4tu:nl/articles/ dataset/Sepsis Cases - Event Log/12707639/1,
            <given-names>Dec</given-names>
          </string-name>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>M. M. de Leoni</surname>
            and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Mannhardt</surname>
          </string-name>
          , “
          <article-title>Road traffic fine management process</article-title>
          ,” https://data:4tu:nl/articles/dataset/Road Traffic Fine Management Process/12683249/1,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>B. van Dongen</surname>
          </string-name>
          , “Bpi challenge 2017 - offer log,” https://data:4tu:nl/ articles/dataset/BPI Challenge 2017 - Offer log/12705737/1,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>