=Paper=
{{Paper
|id=Vol-3648/paper_9762
|storemode=property
|title=Trident: Generating Noisy Synthetic Processes with Ground Truth
|pdfUrl=https://ceur-ws.org/Vol-3648/paper_9762.pdf
|volume=Vol-3648
|authors=Dominique Sommers,Durairaja V. Varadarajan,Natalia Sidorova
|dblpUrl=https://dblp.org/rec/conf/icpm/SommersVS23
}}
==Trident: Generating Noisy Synthetic Processes with Ground Truth==
Trident: Generating Noisy Synthetic Processes with
Ground Truth
Dominique Sommers, Durairaja V. Varadarajan and Natalia Sidorova
Eindhoven University of Technology, Mathematics and Computer Science, Eindhoven, the Netherlands
Abstract
We present Trident: a tool allowing to modify a process model in order to impute realistic behavioral
noise in the modeled behavior and generate an event log that can be used to evaluate the performance
of process mining techniques on realistically noisy data with knowledge of the βground truthβ, i.e., the
true behavior of the system. Traditional approaches for generating noisy process data take a process
model that represents the true process, generate a simulated event log and impute noise in the log, while
in reality, both the recorded and the modeled behavior are imprecise representations of a process, and
noise in the log often follows certain patterns. In our approach, noise is introduced through a series
of model transformations applying user-defined deviation patterns to a designed base process model,
which creates a basis for more advanced evaluation of process mining methods and tools.
Keywords
Synthetic process data, realistic noise, patterns, model transformations
1. Introduction
Process mining is a powerful tool for understanding and improving business processes by
exploring the behavior of the system recorded in an event log in order to discover the process
model that could generate the observed behavior or to compare the recorded behavior to
normative process models. The challenge in process mining arises from the fact that the
systemβs true nature π is often unknown, while the recorded log πΏ is noisy and incomplete, and
the true model π is either unknown (in the context of process discovery) or imperfect (in the
context of conformance checking).
Deviating behavior, or noise can be categorized into random and behavioral noise. While sim-
ple random noise arises from errors in logging or modeling and includes missing or incorrectly
included events, behavioral noise entails the violation of structural patterns and dependencies
within a process. It is crucial that process mining algorithms are able to work in the presence of
random and behavioral noise.
One of the good practices in the field of process mining is that most of the developed methods
are evaluated on data from real-life processes, which is inherently noisy. However, no ground
truth is available for such data sets and the conclusions can only be made based on the feedback
from the process owner.
ICPMβ23: International Conference on Process Mining, October 23β27, 2023, Rome, Italy
Envelope-Open d.sommers@tue.nl (D. Sommers); d.v.varadarajan@student.tue.nl (D. V. Varadarajan); n.sidorova@tue.nl
(N. Sidorova)
Β© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Another standard practice is evaluating process mining methods in a controlled environment
where the ground truth is known, i.e., having knowledge about πΏ and π. The usual approach to
creating such data is to simulate (play out) a designed process model and introduce random noise
in the resulting event log through log manipulations like insertion, deletion, and swaps [1]. The
process model represents the true process, and the log represents the recorded behavior with
data quality issues [2]. This method does not cover the full spectrum of evaluation challenges,
since it does not take into account imperfections in the modeled behavior as well as behavioral
noise in π and πΏ.
Typical deviations and behavioral noise can be defined as (anti-)patterns, e.g., the multitasking
pattern. As a running example, consider a delivery process, where a deliverer should first ring a
customer and then deliver a package without switching from one package to another between
ringing and delivering. An example of deviating behavior is multitasking between packages:
i.e., the deliverer rings another door before handing over the first package, which the normative
process model does not allow.
Our tool, Trident, is designed to modify a given business model π0 by applying model
transformation patterns, with each pattern addressing a specific type of behavioral or random
noise. The resulting process model π β² represents the real process execution including deviating
behavior, and is used to generate an event log πΏ. During the simulation of π β² , random noise
can additionally be applied to πΏ through log manipulations. Complete knowledge of the
βtrue behaviorβ π is retained through the model transformations applied to π0 and the log
manipulations applied to πΏ, and both π0 and πΏ serve as imprecise representations of π.
2. Generating a βNoisyβ Process Model
To set up an evaluation experiment using Trident, one takes a real-life process model or sketches
a hypothetical process with a set of appropriate deviations. With this tool, these deviations
are applied to a designed base model through a series of model transformations to simulate a
realistic synthetic process including deviating behavior.
2.1. Model transformations
An overview of the usage of Trident is shown in Fig. 1, with the designed base model π0 on
the top and an (extendable) set of patterns (π1 , π2 , π3 , β¦ ) on the right. A pattern π is in itself
a process model which is partitioned into match and create components: ππ and ππ . Trident
applies the model transformation of π on a process model π through a user-defined mapping
function π, denoted by Ξ¨(π, π, π ) = π β² , as depicted in the red trident shape. π defines a mapping
from ππ to π, which dictates how the elements from ππ are added to π. Through the model
transformation, π β² contains the created components of π1 , connected to the matched elements
of π0 , via π. This process is repeated until all deviation patterns are included in π, after which
the process model can be either exported or immediately simulated by playing it out from initial
to final marking in the tool.
Fig. 1 illustrates an example model transformation Ξ¨(ππ , π1 , π ) = π β² , with π (ππ ) = ππ and
π (ππΜ ) = ππΜ . The componentsβ colors in π1 show the partitioning into matched (blue) and created
(green). π1 models the behavioral deviation of multitasking of a resource, by allowing a resource
Create
new patterns
Model transformation
Deviating
event log
Simulate
Figure 1: Overview of Trident with running example.
to switch from the state βbusyβ to βavailableβ at any point in time. By applying π1 to π0 on the
deliverer resource, the scenario described in Sec. 1 is enabled where a deliverer can ring another
door before handing over the first package.
The tool operates on a generalized π-net formalism, which is a restricted version of colored
Petri nets and encompasses Petri net extensions like resource-constrained (RC) π-nets [3], typed
Petri nets with identifiers (t-PNIDs), typed Jackson nets [4], as well as Object-Centric Petri
nets [5]. Such a π-net acts as the base process model for the tool and as π for the methods using
the data. The user interface is focused towards resource-constrained RC π-nets, distinguishing
between the place types being regular, resource available, and resource busy. Our running
example is modeled as an RC π-net as shown in Fig. 1. In each iteration, after the selection of a
π and π, the model transformation to be applied is validated to ensure that the resulting process
model retains its soundness.
2.2. Creating new deviation patterns
There is a provided list of possible deviations that can be applied to the base process model.
This includes multitasking from Fig. 1, as well as temporarily increasing capacities, neglecting
resources in activity executions, overtaking in first-in-first-out queues, resources switching
roles, and more. This list is easily extendable, where one can model a new deviation as a Petri
net from an explanation of deviating behavior, e.g., skipping an assumed to be necessary activity
like deliver in π 0 . Designing new deviation patterns is supported in the tool by providing the
user with a template and instructions on how to create the pattern. The example mentioned
is considered behavioral noise where a modeling pattern is violated in the true behavior of
the system. Random noise can be trivially modeled similarly by deviation patterns. Skipping
activities is an example that is included in the list as well.
2.3. Simulation
After iteratively applying model transformations to the base process model, π includes all
deviations deemed realistic by the user. The simulation module is a play out of π from the
provided initial to final marking, with the option to set a limit of the number of transition
firings, in case of infinite behavior. Probabilities are modeled through sampling a waiting time
for transition firings from the moment they are enabled. In case the transition is not enabled at
the scheduled time anymore, it is canceled. The simulation is basic in terms of probabilities
of which transitions to fire, i.e., it does not take into account any other dependencies than
the sampled scheduling time from the moment it is enabled. If one requires a more advanced
simulation, process model π can be exported to .pnml to be used in other tools.
3. Availability and Maturity
The tool is based on Python and can be operated through either a Flask GUI, a command
line interface, and/or run from other Python code, to generate ground truth synthetic process
data. The source code, an installation manual, and a screencast are available at gitlab.com/vig-
nesh_dv/mira/-/tree/paper/mira/pattern.
We have used the tool throughout our research project CERTIF-AI involving various industry
partners, for which we model hypothetical assembly processes fitting the companiesβ data
including the behavior of operators as resources. We add potential violations on inter-case
dependencies via these resources to generate a true representation of reality with realistic and
explainable deviations. With this synthetic process, we can evaluate our methods which aim to
reveal the true nature of π from the imprecise representations π and πΏ.
4. Conclusion
We developed an open-source tool for generating a synthetic process with realistic noise
that is simply random as well as behavioral, where the ground truth π is known together
with imprecise representations of π in the form of the process model π0 and the simulated
event log πΏ. This allows for evaluation of process mining methods where both πΏ and π0
are analyzed to reveal information about π, like in conformance checking, log repair, model
repair, and performance analysis. Unlike traditional approaches, where a process model π0
denotes the perfect representation of π and only the generated event log πΏ contains noise,
Trident takes an available business model π0 and constructs a process model π β² that can serve
as the representation of the real process execution, using behavioral deviation patterns (e.g.,
multitasking or redo), or be used for the generation of a noisy event log, using log noise patterns
(e.g., delayed logging for certain event types). Complete knowledge of π is retained through the
transformed model π β² and the simulation method.
The tool supports generalized π-nets, making it applicable for many Petri net extensions,
however, the GUI is currently focused towards only resource-constrained π-nets. We aim to
expand this to generalized π-nets in the future.
Acknowledgments
This work is done within the project βCertification of production process quality through
Artificial Intelligence (CERTIF-AI)β, funded by NWO (project number: 17998).
References
[1] T. Jouck, B. Depaire, Generating artificial data for empirical analysis of control-flow
discovery algorithms: A process tree and log generator, Business & Information Systems
Engineering 61 (2019) 695β712.
[2] R. J. C. Bose, R. S. Mans, W. M. v. Aalst, Wanna improve process mining results?, in: 2013
IEEE symposium on computational intelligence and data mining (CIDM), IEEE, 2013, pp.
127β134.
[3] D. Sommers, N. Sidorova, B. F. v. Dongen, Aligning event logs to Resource-Constrained
Petri nets, in: International Conference on Applications and Theory of Petri Nets and
Concurrency, Springer, 2022, pp. 325β345.
[4] J. M. E. van der Werf, A. Rivkin, A. Polyvyanyy, M. Montali, Data and process resonance:
Identifier soundness for models of information systems, in: International Conference on
Applications and Theory of Petri Nets and Concurrency, Springer, 2022, pp. 369β392.
[5] W. M. v. Aalst, A. Berti, Discovering object-centric Petri nets, Fundamenta informaticae
175 (2020) 1β40.