Trident: Generating Noisy Synthetic Processes with Ground Truth

Trident: Generating Noisy Synthetic Processes with Ground Truth DominiqueSommers Mathematics and Computer Science Eindhoven University of Technology

Eindhoven the Netherlands

DurairajaVVaradarajan Mathematics and Computer Science Eindhoven University of Technology

Eindhoven the Netherlands

NataliaSidorova Mathematics and Computer Science Eindhoven University of Technology

Eindhoven the Netherlands

Trident: Generating Noisy Synthetic Processes with Ground Truth 1613-0073 F1D71A31AB3ADE96B811A4FA929D4D70 GROBID - A machine learning software for extracting information from scholarly documents Synthetic process data realistic noise patterns model transformations

We present Trident: a tool allowing to modify a process model in order to impute realistic behavioral noise in the modeled behavior and generate an event log that can be used to evaluate the performance of process mining techniques on realistically noisy data with knowledge of the "ground truth", i.e., the true behavior of the system. Traditional approaches for generating noisy process data take a process model that represents the true process, generate a simulated event log and impute noise in the log, while in reality, both the recorded and the modeled behavior are imprecise representations of a process, and noise in the log often follows certain patterns. In our approach, noise is introduced through a series of model transformations applying user-defined deviation patterns to a designed base process model, which creates a basis for more advanced evaluation of process mining methods and tools.

Introduction

Process mining is a powerful tool for understanding and improving business processes by exploring the behavior of the system recorded in an event log in order to discover the process model that could generate the observed behavior or to compare the recorded behavior to normative process models. The challenge in process mining arises from the fact that the system's true nature 𝑆 is often unknown, while the recorded log 𝐿 is noisy and incomplete, and the true model 𝑀 is either unknown (in the context of process discovery) or imperfect (in the context of conformance checking).

Deviating behavior, or noise can be categorized into random and behavioral noise. While simple random noise arises from errors in logging or modeling and includes missing or incorrectly included events, behavioral noise entails the violation of structural patterns and dependencies within a process. It is crucial that process mining algorithms are able to work in the presence of random and behavioral noise.

One of the good practices in the field of process mining is that most of the developed methods are evaluated on data from real-life processes, which is inherently noisy. However, no ground truth is available for such data sets and the conclusions can only be made based on the feedback from the process owner. Another standard practice is evaluating process mining methods in a controlled environment where the ground truth is known, i.e., having knowledge about 𝐿 and 𝑀. The usual approach to creating such data is to simulate (play out) a designed process model and introduce random noise in the resulting event log through log manipulations like insertion, deletion, and swaps [1]. The process model represents the true process, and the log represents the recorded behavior with data quality issues [2]. This method does not cover the full spectrum of evaluation challenges, since it does not take into account imperfections in the modeled behavior as well as behavioral noise in 𝑀 and 𝐿.

Typical deviations and behavioral noise can be defined as (anti-)patterns, e.g., the multitasking pattern. As a running example, consider a delivery process, where a deliverer should first ring a customer and then deliver a package without switching from one package to another between ringing and delivering. An example of deviating behavior is multitasking between packages: i.e., the deliverer rings another door before handing over the first package, which the normative process model does not allow.

Our tool, Trident, is designed to modify a given business model 𝑀 0 by applying model transformation patterns, with each pattern addressing a specific type of behavioral or random noise. The resulting process model 𝑀 ′ represents the real process execution including deviating behavior, and is used to generate an event log 𝐿. During the simulation of 𝑀 ′ , random noise can additionally be applied to 𝐿 through log manipulations. Complete knowledge of the "true behavior" 𝑆 is retained through the model transformations applied to 𝑀 0 and the log manipulations applied to 𝐿, and both 𝑀 0 and 𝐿 serve as imprecise representations of 𝑆.

Generating a "Noisy" Process Model

To set up an evaluation experiment using Trident, one takes a real-life process model or sketches a hypothetical process with a set of appropriate deviations. With this tool, these deviations are applied to a designed base model through a series of model transformations to simulate a realistic synthetic process including deviating behavior.

Model transformations

An overview of the usage of Trident is shown in Fig. 1, with the designed base model 𝑀 0 on the top and an (extendable) set of patterns (𝜋 1 , 𝜋 2 , 𝜋 3 , … ) on the right. A pattern 𝜋 is in itself a process model which is partitioned into match and create components: 𝜋 𝑚 and 𝜋 𝑐 . Trident applies the model transformation of 𝜋 on a process model 𝑀 through a user-defined mapping function 𝑓, denoted by Ψ(𝑀, 𝜋, 𝑓 ) = 𝑀 ′ , as depicted in the red trident shape. 𝑓 defines a mapping from 𝜋 𝑚 to 𝑀, which dictates how the elements from 𝜋 𝑐 are added to 𝑀. Through the model transformation, 𝑀 ′ contains the created components of 𝜋 1 , connected to the matched elements of 𝑀 0 , via 𝑓. This process is repeated until all deviation patterns are included in 𝑀, after which the process model can be either exported or immediately simulated by playing it out from initial to final marking in the tool. to switch from the state "busy" to "available" at any point in time. By applying 𝜋 1 to 𝑀 0 on the deliverer resource, the scenario described in Sec. 1 is enabled where a deliverer can ring another door before handing over the first package.

The tool operates on a generalized 𝜈-net formalism, which is a restricted version of colored Petri nets and encompasses Petri net extensions like resource-constrained (RC) 𝜈-nets [3], typed Petri nets with identifiers (t-PNIDs), typed Jackson nets [4], as well as Object-Centric Petri nets [5]. Such a 𝜈-net acts as the base process model for the tool and as 𝑀 for the methods using the data. The user interface is focused towards resource-constrained RC 𝜈-nets, distinguishing between the place types being regular, resource available, and resource busy. Our running example is modeled as an RC 𝜈-net as shown in Fig. 1. In each iteration, after the selection of a 𝜋 and 𝑓, the model transformation to be applied is validated to ensure that the resulting process model retains its soundness.

Creating new deviation patterns

There is a provided list of possible deviations that can be applied to the base process model. This includes multitasking from Fig. 1, as well as temporarily increasing capacities, neglecting resources in activity executions, overtaking in first-in-first-out queues, resources switching roles, and more. This list is easily extendable, where one can model a new deviation as a Petri net from an explanation of deviating behavior, e.g., skipping an assumed to be necessary activity like deliver in 𝑀 0 . Designing new deviation patterns is supported in the tool by providing the user with a template and instructions on how to create pattern. The example mentioned is considered behavioral noise where a modeling pattern is violated in the true behavior of the system. Random noise can be trivially modeled similarly by deviation patterns. Skipping activities is an example that is included in the list as well.

Simulation

After iteratively applying model transformations to the base process model, 𝑀 includes all deviations deemed realistic by the user. The simulation module is a play out of 𝑀 from the provided initial to final marking, with the option to set a limit of the number of transition firings, in case of infinite behavior. Probabilities are modeled through sampling a waiting time for transition firings from the moment they are enabled. In case the transition is not enabled at the scheduled time anymore, it is canceled. The simulation is basic in terms of probabilities of which transitions to fire, i.e., it does not take into account any other dependencies than the sampled scheduling time from the moment it is enabled. If one requires a more advanced simulation, process model 𝑀 can be exported to .pnml to be used in other tools.

Availability and Maturity

The tool is based on Python and can be operated through either a Flask GUI, a command line interface, and/or run from other Python code, to generate ground truth synthetic process data. The source code, an installation manual, and a screencast are available at gitlab.com/vig-nesh_dv/mira/-/tree/paper/mira/pattern.

We have used the tool throughout our research project CERTIF-AI involving various industry partners, for which we model hypothetical assembly processes fitting the companies' data including the behavior of operators as resources. We add potential violations on inter-case dependencies via these resources to generate a true representation of reality with realistic and explainable deviations. With this synthetic process, we can evaluate our methods which aim to reveal the true nature of 𝑆 from the imprecise representations 𝑀 and 𝐿.

Conclusion

We developed an open-source tool for generating a synthetic process with realistic noise that is simply random as well as behavioral, where the ground truth 𝑆 is known together with imprecise representations of 𝑆 in the form of the process model 𝑀 0 and the simulated event log 𝐿. This allows for evaluation of process mining methods where both 𝐿 and 𝑀 0 are analyzed to reveal information about 𝑆, like in conformance checking, log repair, model repair, and performance analysis. Unlike traditional approaches, where a process model 𝑀 0 denotes the perfect representation of 𝑆 and only the generated event log 𝐿 contains noise, Trident takes an available business model 𝑀 0 and constructs a process model 𝑀 ′ that can serve as the representation of the real process execution, using behavioral deviation patterns (e.g., multitasking or redo), or be used for the generation of a noisy event log, using log noise patterns (e.g., delayed logging for certain event types). Complete knowledge of 𝑆 is retained through the transformed model 𝑀 ′ and the simulation method.

The tool supports generalized 𝜈-nets, making it applicable for many Petri net extensions, however, the GUI is currently focused towards only resource-constrained 𝜈-nets. We aim to expand this to generalized 𝜈-nets in the future.

ICPM' 23 :23International Conference on Process Mining, October 23-27, 2023, Rome, Italy Envelope d.sommers@tue.nl (D. Sommers); d.v.varadarajan@student.tue.nl (D. V. Varadarajan); n.sidorova@tue.nl (N. Sidorova)

Fig. 1 Figure 1 :11Figure 1: Overview of Trident with running example.

Acknowledgments

This work is done within the project "Certification of production process quality through Artificial Intelligence (CERTIF-AI)", funded by NWO (project number: 17998).

Generating artificial data for empirical analysis of control-flow discovery algorithms: A process tree and log generator TJouck BDepaire Business & Information Systems Engineering 61 2019 Wanna improve process mining results? RJ CBose RSMans WM VAalst IEEE symposium on computational intelligence and data mining (CIDM), IEEE 2013. 2013 Aligning event logs to Resource-Constrained Petri nets DSommers NSidorova BF VDongen International Conference on Applications and Theory of Petri Nets and Concurrency Springer 2022 Data and process resonance: Identifier soundness for models of information systems JM EVan Der Werf ARivkin APolyvyanyy MMontali International Conference on Applications and Theory of Petri Nets and Concurrency Springer 2022 Discovering object-centric Petri nets WM VAalst ABerti Fundamenta informaticae 175 2020