Robust Discovery of Complex Process-Structures (Extended Abstract) Lisa Luise Mannel Chair of Process and Data Science (PADS) Department of Computer Science, RWTH Aachen University, Ahornstr. 55, 52074, Aachen, Germany mannel@pads.rwth-aachen.de Abstract—More and more companies gather and analyze the TABLE I data generated by their processes. An often crucial step within OVERVIEW OF THE DESIRED PROPERTIES OF THE DEVELOPED ALGORITHM this analysis is the discovery of the process based on the collected data. Unfortunately, there are still real-live scenarios where algo- Representational Bias Frequencies & Noise Guarantees rithms developed for this purpose fail to meet expectations. The research project presented in this paper develops the discovery Precise Petri nets: Robustness: Reliability: 1) long-term dependencies 1) filter out noise 1) fitness algorithm eST-Miner, which introduces new concepts to address 2) self-loops (errors, deviations) 2) precision well-known challenges and aims to flexibly balance the strengths 3) silent transitions (skips, 2) retain infrequent 3) rediscover- and weaknesses known from existing approaches. loops, etc) behaviour ability Index Terms—process discovery, Petri nets, implicit places 4) duplicate transitions (rare but (counting, process well-defined) I. I NTRODUCTION variations) In process discovery, a process model is constructed aiming to reflect and summarize behavior defined by the example executions in a given event log. This task is challenging for in-depth explanations deferred to the corresponding papers. a variety of reasons. Ideally, a discovery algorithm returns Future research is described in more detail in Section IV. a model which is able to produce the important behavior Finally, Section V positions the research project with respect represented within the event log (fitness) and at the same to the state of the art and concludes the paper. time does not allow for much unobserved behavior (precision), The research methodology is guided by the principles of while remaining simple enough to be understood by a human Design Science Research [1]. For each subgoal, the developed interpreter. In real life logs, noise (errors, deviations) often solutions are formalized and implemented. Theoretical proofs further complicates this task. It is rarely possible to fulfill of correctness and applicability are complemented by practical all requirements simultaneously. Based on the capabilities and experiments based on adequate artificial examples as well as focus of the used algorithm, the discovered models can vary real-life data. Finally, the results are compared to existing greatly, and different trade-offs are possible. algorithms to evaluate the contribution. II. R ESEARCH G OAL III. C OMPLETED R ESEARCH The presented research project aims to develop a discovery algorithm that is able to reasonably balance the different To allow for accurate process models, the chosen repre- quality aspects described above and additionally provides the sentational bias (Table I) includes long-term dependencies, as flexibility to the users to skew the algorithm’s behavior to fit illustrated by the places p9 and p10 in Figure 1. The basic their needs. In particular, it focuses on the discovery of models idea of the eST-Miner presented in [2] is to evaluate all with high precision and guaranteed minimal fitness, which are possible places in an efficient way using a noise parameter τ , not overfitting to include noise but still represent well-defined and to return the places fitting a fraction of traces based infrequent behavior. Additionally, the algorithm should main- on τ . Therefore, it can reliably find frequent dependencies, tain reasonable model simplicity and computational efficiency. including long-term dependencies, without being hampered by Figure 1 shows an example of a model, which the developed noise. Heuristic approaches based on log characteristics can discovery algorithm should return based on the given log, even be utilized to immensely increase efficiency further, with very if noise were added. minor drawbacks to fitness and precision [3]. A variant of the To achieve this overall research goal, several subgoals and algorithm discovers the novel class of uniwired Petri nets [3], desired properties have been defined as shown in Table I. In further boosting efficiency and simplifying the returned models the following these goals will be explained and motivated. while still being able to discover long-term dependencies. Fi- Section III summarizes the results achieved so far, with nally, to retain only meaningful places, the eST-Miner removes implicit places, i.e., places that do not further restrict the I would like to thank my doctoral advisor Prof. Wil van der Aalst for his constant support. We thank the Alexander von Humboldt (AvH) Stiftung for models behavior. The novel approach introduced in [4] uses supporting our research. the even log as an additional source of information. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). model is build and adapted to comply with traces that become available over time. Another possible extension would be the evaluation of places with respect to a partial orders rather than single traces [6], which could be applied, for example, in the context of process discovery on uncertain event data [7]. Fig. 1. The model above can be considered a desirable result for the event log L = [hI, a, c, c, c, d, i104 , hI b, c, c, c, e, i90 , hI a, c, c, c, d, f, i17 , V. R ELATED W ORK AND S UMMARY hI b, c, c, c, e, f, i21 ], but is hard to discover for existing algorithms. The combination of properties and capabilities listed in Table I and detailed in the previous sections set the eST-Miner IV. P LANNED R ESEARCH apart from existing discovery approaches. Algorithms based Considering the research goals in Table I, several directions on simple log abstractions, like the well-known Inductive remain open for investigation. The present lack of silent and Miner [8] and Alpha Miner [9], lack the ability to discover duplicate transition labels in the returned models severely complex control-flow structures and are thus limited in their hampers the attainability of high precision values. Certain precision. Region-based approaches [10], [11] guarantee high common log structures, some of which are illustrated in precision only for models with unique transition labels, and Figure 1, cannot be expressed by Petri nets restricted to have issues with overfitting as well as time and space re- uniquely labeled transitions without resorting to flowerlike quirements. Heuristic and genetic approaches do not provide submodels. Examples are activities that can either happen once guarantees and returned models are often hard to interpret. or be skipped (f ), or activities that happen at least once but can The eST-Miner combines the ability to reliably discover be repeated, activities that happen a certain number of times long-term dependencies with unique noise handling capabili- (c), or logs that include data on very dissimilar subprocesses ties and can provide guarantees with respect to fitness, preci- that would result in completely different submodels. The sion and rediscoverability. Future projects will investigate the identification of such patterns and their inclusion in the model targeted addition of silent and duplicate transitions to improve by the adequate use of non-unique transitions labels is bound fitness and precision, optimize the heuristic approaches and to significantly increase the models precision. explore the adaption of the concept to other application areas. Noise, and its differentiation from infrequent behavior, is Results will be evaluated theoretically and experimentally with often an issue in real life logs. The eST-Miner aims to achieve respect to the formulated goals as well as previous solutions. robustness to noise by requiring a place to replay only a certain The existing extensions, variants and parameters allow to fraction of traces to be considered fitting. This allows the flexibly tailor the basic concept of the eST-Miner towards algorithm to disregard patternless infrequent behavior while the unique needs of each applicant. The developed ideas and rare, yet well-defined, behavior is still discovered. An issue concepts can be valuable in other contexts as well, and thus caused by this approach is that the traces replayable by the contribute to the field of process mining as a whole. model are limited to the intersection of the traces replayable by R EFERENCES all places. To increase the number of replayable traces without [1] A. Hevner, S. March, J. Park, and S. Ram, “Design science in infor- jeopardizing simplicity or precision, the targeted insertion of mation systems research,” Management Information Systems Quarterly, vol. 28, pp. 75–, 03 2004. duplicate or silent transitions will be investigated. [2] L. L. Mannel and W. M. P. van der Aalst, “Finding complex process- The novel implicit place removal strategy introduced in [4] structures by exploiting the token-game,” in Application and Theory of works well for places perfectly fitting the event log. Worth Petri Nets and Concurrency. Springer Nature Switzerland AG, 2019. [3] L. L. Mannel, Y. Epstein, and W. M. P. van der Aalst, Improving the further investigation is its combination with the eST-Miner’s State-Space Traversal of the eST-Miner by Exploiting Underlying Log noise handling approach or, in a more general setting, with Structures, 01 2020, pp. 334–347. Petri nets that do not perfectly fit the log. In this context its [4] L. L. Mannel, R. Bergenthum, and W. M. P. van der Aalst, “Removing implicit places using regions for process discovery,” in Proceedings of use for repairing a given model or adapting a model within an the International Workshop on Algorithms & Theories for the Analysis online discovery setting would be of interest as well. of Event Data (ATAED) 2020, vol. 2625. CEUR-WS.org, pp. 20–32. The algorithm allows for a large variety of optional ex- [5] L. L. Mannel and W. M. P. van der Aalst, “Finding uniwired Petri nets using eST-miner,” in Business Process Management Workshops, 2019. tensions and applications, some of which use heuristics to [6] R. Bergenthum, “Firing partial orders in a petri net,” in Application and improve the quality of the returned model or to increase Theory of Petri Nets and Concurrency. Springer Int. Publishing, 2021. efficiency. Such approaches usually employ some kind of [7] M. Pegoraro, M. S. Uysal, and W. M. P. van der Aalst, “Discovering Process Models from Uncertain Event Data,” Sep 2019. [Online]. ranking on the activities or candidate places. Prior research Available: https://publications.rwth-aachen.de/record/782564 has focused on simple rankings based on the directly and [8] S. Leemans, D. Fahland, and W. van der Aalst, “Discovering block- eventually follows relations for candidate places and the structured process models from event logs - a constructive approach,” Application and Theory of Petri Nets and Concurrency, 2013. occurrence patterns of transitions [3], [5]. Besides refining [9] E. Badouel, “On the α-reconstructibility of workflow nets,” in Applica- these basic approaches, the use of alternative concepts can be tion and Theory of Petri Nets. Springer Berlin Heidelberg, 2012. investigated, which would be reusable for several extensions [10] E. Badouel, L. Bernardinello, and P. Darondeau, Petri Net Synthesis, ser. Text in Theoretical Computer Science, EATCS Series. Springer, 2015. of the eST-Miner. [11] J. M. van der Werf, B. van Dongen, C. Hurkens, and A. Serebrenik, The approach can be adapted towards new application “Process discovery using integer linear programming,” in Applications areas. One such option would be the online setting, where the and Theory of Petri Nets. Berlin, Heidelberg: Springer, 2008. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).