<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Trident: Generating Noisy Synthetic Processes with Ground Truth</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <addr-line>Mathematics and Computer Science, Eindhoven</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ICPM'23: International Conference on Process Mining</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present Trident: a tool allowing to modify a process model in order to impute realistic behavioral noise in the modeled behavior and generate an event log that can be used to evaluate the performance of process mining techniques on realistically noisy data with knowledge of the “ground truth”, i.e., the true behavior of the system. Traditional approaches for generating noisy process data take a process model that represents the true process, generate a simulated event log and impute noise in the log, while in reality, both the recorded and the modeled behavior are imprecise representations of a process, and noise in the log often follows certain patterns. In our approach, noise is introduced through a series of model transformations applying user-defined deviation patterns to a designed base process model, which creates a basis for more advanced evaluation of process mining methods and tools.</p>
      </abstract>
      <kwd-group>
        <kwd>Synthetic process data</kwd>
        <kwd>realistic noise</kwd>
        <kwd>patterns</kwd>
        <kwd>model transformations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Process mining is a powerful tool for understanding and improving business processes by
exploring the behavior of the system recorded in an event log in order to discover the process
model that could generate the observed behavior or to compare the recorded behavior to
normative process models. The challenge in process mining arises from the fact that the
system’s true nature  is often unknown, while the recorded log  is noisy and incomplete, and
the true model  is either unknown (in the context of process discovery) or imperfect (in the
context of conformance checking).</p>
      <p>Deviating behavior, or noise can be categorized into random and behavioral noise. While
simple random noise arises from errors in logging or modeling and includes missing or incorrectly
included events, behavioral noise entails the violation of structural patterns and dependencies
within a process. It is crucial that process mining algorithms are able to work in the presence of
random and behavioral noise.</p>
      <p>One of the good practices in the field of process mining is that most of the developed methods
are evaluated on data from real-life processes, which is inherently noisy. However, no ground
truth is available for such data sets and the conclusions can only be made based on the feedback
from the process owner.
nEvelop-O
(N. Sidorova)
CEUR
Workshop
Proceedings</p>
      <p>
        Another standard practice is evaluating process mining methods in a controlled environment
where the ground truth is known, i.e., having knowledge about  and  . The usual approach to
creating such data is to simulate (play out) a designed process model and introduce random noise
in the resulting event log through log manipulations like insertion, deletion, and swaps [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
process model represents the true process, and the log represents the recorded behavior with
data quality issues [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This method does not cover the full spectrum of evaluation challenges,
since it does not take into account imperfections in the modeled behavior as well as behavioral
noise in  and  .
      </p>
      <p>Typical deviations and behavioral noise can be defined as (anti-)patterns, e.g., the multitasking
pattern. As a running example, consider a delivery process, where a deliverer should first ring a
customer and then deliver a package without switching from one package to another between
ringing and delivering. An example of deviating behavior is multitasking between packages:
i.e., the deliverer rings another door before handing over the first package, which the normative
process model does not allow.</p>
      <p>Our tool, Trident, is designed to modify a given business model  0 by applying model
transformation patterns, with each pattern addressing a specific type of behavioral or random
noise. The resulting process model  ′ represents the real process execution including deviating
behavior, and is used to generate an event log  . During the simulation of  ′, random noise
can additionally be applied to  through log manipulations. Complete knowledge of the
“true behavior”  is retained through the model transformations applied to  0 and the log
manipulations applied to  , and both  0 and  serve as imprecise representations of  .</p>
    </sec>
    <sec id="sec-3">
      <title>2. Generating a “Noisy” Process Model</title>
      <p>To set up an evaluation experiment using Trident, one takes a real-life process model or sketches
a hypothetical process with a set of appropriate deviations. With this tool, these deviations
are applied to a designed base model through a series of model transformations to simulate a
realistic synthetic process including deviating behavior.</p>
      <sec id="sec-3-1">
        <title>2.1. Model transformations</title>
        <p>An overview of the usage of Trident is shown in Fig. 1, with the designed base model  0 on
the top and an (extendable) set of patterns ( 1,  2,  3, … ) on the right. A pattern  is in itself
a process model which is partitioned into match and create components:   and   . Trident
applies the model transformation of  on a process model  through a user-defined mapping
function  , denoted by Ψ( ,  ,  ) =  ′, as depicted in the red trident shape.  defines a mapping
from   to  , which dictates how the elements from   are added to  . Through the model
transformation,  ′ contains the created components of  1, connected to the matched elements
of  0, via  . This process is repeated until all deviation patterns are included in  , after which
the process model can be either exported or immediately simulated by playing it out from initial
to final marking in the tool.</p>
        <p>Fig. 1 illustrates an example model transformation Ψ(  ,  1,  ) =  ′, with  (  ) =   and
 (   ̄ ) =   ̄ . The components’ colors in  1 show the partitioning into matched (blue) and created
(green).  1 models the behavioral deviation of multitasking of a resource, by allowing a resource
Model transformation</p>
        <p>Create
new patterns</p>
        <p>Deviating
event log
Simulate
to switch from the state “busy” to “available” at any point in time. By applying  1 to  0 on the
deliverer resource, the scenario described in Sec. 1 is enabled where a deliverer can ring another
door before handing over the first package.</p>
        <p>
          The tool operates on a generalized  -net formalism, which is a restricted version of colored
Petri nets and encompasses Petri net extensions like resource-constrained (RC)  -nets [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], typed
Petri nets with identifiers (t-PNIDs), typed Jackson nets [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], as well as Object-Centric Petri
nets [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Such a  -net acts as the base process model for the tool and as  for the methods using
the data. The user interface is focused towards resource-constrained RC  -nets, distinguishing
between the place types being regular, resource available, and resource busy. Our running
example is modeled as an RC  -net as shown in Fig. 1. In each iteration, after the selection of a
 and  , the model transformation to be applied is validated to ensure that the resulting process
model retains its soundness.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Creating new deviation patterns</title>
        <p>There is a provided list of possible deviations that can be applied to the base process model.
This includes multitasking from Fig. 1, as well as temporarily increasing capacities, neglecting
resources in activity executions, overtaking in first-in-first-out queues, resources switching
roles, and more. This list is easily extendable, where one can model a new deviation as a Petri
net from an explanation of deviating behavior, e.g., skipping an assumed to be necessary activity
like deliver in  0. Designing new deviation patterns is supported in the tool by providing the
user with a template and instructions on how to create the pattern. The example mentioned
is considered behavioral noise where a modeling pattern is violated in the true behavior of
the system. Random noise can be trivially modeled similarly by deviation patterns. Skipping
activities is an example that is included in the list as well.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Simulation</title>
        <p>After iteratively applying model transformations to the base process model,  includes all
deviations deemed realistic by the user. The simulation module is a play out of  from the
provided initial to final marking, with the option to set a limit of the number of transition
ifrings, in case of infinite behavior. Probabilities are modeled through sampling a waiting time
for transition firings from the moment they are enabled. In case the transition is not enabled at
the scheduled time anymore, it is canceled. The simulation is basic in terms of probabilities
of which transitions to fire, i.e., it does not take into account any other dependencies than
the sampled scheduling time from the moment it is enabled. If one requires a more advanced
simulation, process model  can be exported to .pnml to be used in other tools.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Availability and Maturity</title>
      <p>The tool is based on Python and can be operated through either a Flask GUI, a command
line interface, and/or run from other Python code, to generate ground truth synthetic process
data. The source code, an installation manual, and a screencast are available at
gitlab.com/vignesh_dv/mira/-/tree/paper/mira/pattern.</p>
      <p>We have used the tool throughout our research project CERTIF-AI involving various industry
partners, for which we model hypothetical assembly processes fitting the companies’ data
including the behavior of operators as resources. We add potential violations on inter-case
dependencies via these resources to generate a true representation of reality with realistic and
explainable deviations. With this synthetic process, we can evaluate our methods which aim to
reveal the true nature of  from the imprecise representations  and  .</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion</title>
      <p>We developed an open-source tool for generating a synthetic process with realistic noise
that is simply random as well as behavioral, where the ground truth  is known together
with imprecise representations of  in the form of the process model  0 and the simulated
event log  . This allows for evaluation of process mining methods where both  and  0
are analyzed to reveal information about  , like in conformance checking, log repair, model
repair, and performance analysis. Unlike traditional approaches, where a process model  0
denotes the perfect representation of  and only the generated event log  contains noise,
Trident takes an available business model  0 and constructs a process model  ′ that can serve
as the representation of the real process execution, using behavioral deviation patterns (e.g.,
multitasking or redo), or be used for the generation of a noisy event log, using log noise patterns
(e.g., delayed logging for certain event types). Complete knowledge of  is retained through the
transformed model  ′ and the simulation method.</p>
      <p>The tool supports generalized  -nets, making it applicable for many Petri net extensions,
however, the GUI is currently focused towards only resource-constrained  -nets. We aim to
expand this to generalized  -nets in the future.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is done within the project “Certification of production process quality through
Artificial Intelligence (CERTIF-AI)”, funded by NWO (project number: 17998).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jouck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Depaire</surname>
          </string-name>
          ,
          <article-title>Generating artificial data for empirical analysis of control-flow discovery algorithms: A process tree and log generator</article-title>
          ,
          <source>Business &amp; Information Systems Engineering</source>
          <volume>61</volume>
          (
          <year>2019</year>
          )
          <fpage>695</fpage>
          -
          <lpage>712</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. J. C.</given-names>
            <surname>Bose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Mans</surname>
          </string-name>
          , W. M. v. Aalst,
          <article-title>Wanna improve process mining results?, in: 2013 IEEE symposium on computational intelligence and data mining (CIDM)</article-title>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>127</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sommers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sidorova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F. v.</given-names>
            <surname>Dongen</surname>
          </string-name>
          ,
          <article-title>Aligning event logs to Resource-Constrained Petri nets</article-title>
          ,
          <source>in: International Conference on Applications and Theory of Petri Nets and Concurrency</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>325</fpage>
          -
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>J. M. E. van der Werf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rivkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polyvyanyy</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Montali, Data and process resonance: Identifier soundness for models of information systems</article-title>
          ,
          <source>in: International Conference on Applications and Theory of Petri Nets and Concurrency</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>369</fpage>
          -
          <lpage>392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W. M. v.</given-names>
            <surname>Aalst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <article-title>Discovering object-centric Petri nets</article-title>
          ,
          <source>Fundamenta informaticae 175</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>