ASP-Based Log Generation with Purposes in Declare4Py Ivan Donadello1,* , Fabrizio Maria Maggi1 , Francesco Riva2 and Manpreet Singh3 1 Free University of Bozen-Bolzano, Bolzano, Italy 2 Datalane SRL, Verona, Italy 3 Wuerth Italia Srl, Egna, Italy Abstract Process mining techniques are meant to extract non-trivial information from complex data. Controlled experiments of the algorithms underlying process mining techniques often require logs of process executions that fit the specific purposes of each specific test. Therefore, many tools for the log generation from both procedural models (e.g., Petri nets or BPMN models) and declarative models (e.g., based on LTL𝑓 or Declare) have been developed. However, the log generation from declarative models still lacks tools for log generation that address specific purposes such as the specification of trace length distributions, the setting of the number of variants that should appear in the log, or the specification of the number of activations of a constraint that should be contained in a trace. We address this research gap by proposing an extension of the Declare4Py Python library that generates synthetic event logs using an Answer Set Programming-based solution whose flexibility supports the encoding of specific purposes. Keywords Process Mining, Declarative Models, Log Generation, Answer Set Programming 1. Introduction Process Mining (PM) is a research area that analyzes the execution data of a business process (an event log) to extract useful information for process improvement. This is not a trivial task as event logs are sets of process instances (a.k.a. traces) that are a complex type of data. A trace is composed of a sequence of events arranged in chronological order and each event contains a set of attributes that can be both symbolic (e.g., the name of the activity executed or the involved resource) and numeric (e.g., a timestamp). In addition, process instances are samples of a process model, that is, background knowledge that constraints the execution order of the events. This background knowledge can be expressed with procedural models (e.g., Petri nets or BPMN models), which specify the exact control flow of the activities in a trace, or with declarative models (that is, constraints expressed in Linear Temporal Logic on finite traces or Declare [1]) that specify the constraints over process activities that should be satisfied during the process ICPM Doctoral Consortium and Demo Track 2023 * Corresponding author. " ivan.donadello@unibz.it (I. Donadello); maggi@inf.unibz.it (F. M. Maggi); f.riva@datalane.nl (F. Riva); mani.sw.dev@gmail.com (M. Singh)  0000-0002-0701-5729 (I. Donadello); 0000-0002-9089-6896 (F. M. Maggi) Β© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings execution. The former models are more fine-grained, the latter more coarse-grained but more flexible. The controlled evaluation of PM algorithms requires event logs that fit the purposes for which each specific experiment has been designed. For example, one purpose can be to test how the performance of an algorithm is affected when the distribution of trace lengths in the event log varies. However, although several tools have been developed for generating event logs by simulating (declarative and procedural) process models [2, 3, 4, 5] only PURPLE (a tool presented in [4]) produces event logs that fulfill a given property/purpose. Since PURPLE uses procedural process models to generate logs, a purpose-guided log generator for declarative models is still missing. To fill this gap, we extend the Declare4Py [6] Python library, which implements classical PM tasks starting from the MP-Declare [7] language, with an Answer Set Programming (ASP) functionality for generating purpose-guided event logs starting from declarative models [8]. This ASP-based method performs the simulation of an input declarative model by converting its associated deterministic finite automaton (DFA) into a logical program whose solution is given by an ASP solver. This solution is extremely flexible as it supports the encoding of the desired purposes as a logical program on top of the DFA encoding. 2. Log Generation with Declare4Py The ASP-based solution adopted in Declare4Py takes as inputs the number 𝑁 of traces to generate, the minimum length π‘š and the maximum length 𝑛 of a trace (i.e., the number of events in the trace), a process model containing MP-Declare constraints, and encodes these inputs in an ASP program to be solved by the Clingo solver (https://potassco.org/). The solution is a set of 𝑁 traces that satisfy the input model and the minimum/maximum length specification. We leverage this solution to generate event logs with the following four purposes as additional inputs. Users can specify a trace length distribution, that is, an input probability distribution on the lengths of the traces in the log. Declare4Py supports three probability distributions: i) the custom distribution, where the user has to specify the probability of a generated trace to have length π‘˜ for each π‘˜ ∈ [π‘š, π‘š + 1, . . . , 𝑛]; ii) the uniform distribution where all the trace lengths have the same probability to appear, i.e., 1/(𝑛 βˆ’ π‘š + 1); iii) the Gaussian distribution where the user has to specify a mean and a variance for sampling the trace lengths from a Gaussian distribution. Once π‘˜ has been sampled, the Clingo solver is called 𝑛 βˆ’ π‘š + 1 times for each trace length π‘˜. Users can specify the number of variants 𝑉 , so the generated event log can be segmented into 𝑉 groups of traces having the same control-flow (i.e., the same sequence of activity names) but that can differ for other attributes, such as the timestamps or the resources. Here, Clingo is called different times: in the first call, 𝑉 traces are generated representing the variants; then, for each variant, Clingo is called to generate traces with the control-flow of the variant. In this second call, the input ASP program also encodes the control-flow of the variant. Users can specify a subset of constraints in the input model to generate negative traces, that is, traces that do not satisfy (at least one of/all) the constraints in the specified subset. This allows users to obtain event logs with labelings defined by MP-Declare constraints. Such labeled traces can be used to train and test Machine Learning-based process mining algorithms. The positive traces satisfy all constraints in the input model, whereas the negative traces do not satisfy the specified subset of constraints. Also in this case, the subset of constraints to be violated is encoded in an ASP program. Users can specify the number of activations for a subset of constraints in the input model. For example, the user can specify for the constraint Response[(CRP, J), ReleaseB] an activation number of 3 that means that the event with activity name CRP with payload J should occur three times. The user provides as input a range of possible values for the number of activations for a specific constraint and then Declare4Py randomly selects a value in that range for each generated trace. The subset of the constraints for which the number of activations is specified and the user-defined value range for the number of activations are encoded in ASP. 3. Tool Maturity The better computational times of the adopted ASP solution with respect to the state-of-the- art (declarative) log generators have already been shown in [8]. Therefore, we performed a comparison in terms of diversity of the generated traces between our tool and the Alloy- based tool presented in [5]. We chose this tool as it is the only one available that generates synthetic logs starting from MP-Declare models. The tool generates the traces by using a SAT solver (instead of an ASP solver as done in Declare4Py) starting from a representation of the input MP-Declare model in Alloy (https://alloytools.org/). The diversity is measured with the average of the syntactic distances among the traces in the generated event log. A higher average distance indicates a higher diversification of the generated traces making the event log a more interesting benchmark for testing purposes. As syntactic distance, we considered the normalized Damerau-Levenshtein Distance (DLD). We considered the activity names and resources available in the 𝑆𝑒𝑝𝑠𝑖𝑠 log1 (13 activity names, 26 resources), in the 𝐡𝑃 𝐼𝐢15_4 log2 (87 activity names, 10 resources) and in the Road Traffic Fine Management log3 (𝑅𝑇 𝐹 𝑀 , 10 activity names, 143 resources), in order to define three MP-Declare models. In each model, 5 constraints were defined, and the model was used to generate a log with 100 traces of lengths ranging from 10 to 20 events. For each synthetic log, we measured the DLD between all pairs of synthetic traces and computed the average. The DLD was measured on both the control-flow and the resources (see Table 1). We can notice that Declare4Py has better performance than the Alloy-based tool, that is, Declare4Py guarantees a higher variability in both control-flow and resources. Finally, the log generator in Declare4Py, being purpose-guided, represents an improvement of the existing tools for log generation also in terms of provided functionalities. 1 10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460 2 10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1 3 10.4121/uuid:270fd440-1057-4fb9-89a9-b699b47990f5 Table 1 Average of the normalized DLD for different logs generated using Declare4Py and Alloy. Declare4Py Alloy-based Dataset Res. Flow Ctrl. Flow Res. Flow Ctrl. Flow 𝑆𝑒𝑝𝑠𝑖𝑠 0.565 0.409 0.027 0.028 𝐡𝑃 𝐼𝐢15_4 0.631 0.798 0.126 0.050 𝑅𝑇 𝐹 𝑀 0.525 0.363 0.366 0.308 4. Screencast and Website The GitHub repository of Declare4Py4 contains the source code of the tool and all the tutorials. A specific tutorial for the ASP-based log generator5 shows how to run the log generator using all the available options. The video presentation of this paper can be accessed at https://www.dropbox.com/scl/fi/cbihgbw34smkisb7u1sry/ Screen-Recording-2023-09-12-at-19.17.17.mov?rlkey=pvzt6cuj5yk98azi611zd79xy&dl=0. References [1] M. Pesic, H. Schonenberg, W. M. P. van der Aalst, DECLARE: Full support for loosely- structured processes, in: EDOC, IEEE Computer Society, 2007, pp. 287–300. [2] A. Burattin, PLG2: Multiperspective process randomization with online and offline simula- tions, in: BPM (Demos), volume 1789 of CEUR Workshop Proceedings, CEUR-WS.org, 2016, pp. 1–6. [3] C. Di Ciccio, M. L. Bernardi, M. Cimitile, F. M. Maggi, Generating event logs through the simulation of Declare models, in: EOMAS@CAiSE, volume 231 of Lecture Notes in Business Information Processing, Springer, 2015, pp. 20–36. [4] A. Burattin, B. Re, L. Rossi, F. Tiezzi, PURPLE: A PURPose-guided Log GEnerator (Ex- tended Abstract), in: ICPM Doctoral Consortium / Demo, volume 3299 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 90–94. [5] V. Skydanienko, C. Di Francescomarino, C. Ghidini, F. M. Maggi, A tool for generating event logs from multi-perspective Declare models, in: BPM (Dissertation/Demos/Industry), CEUR Workshop Proceedings, CEUR-WS.org, 2018, pp. 111–115. [6] I. Donadello, F. Riva, F. M. Maggi, A. Shikhizada, Declare4Py: A Python library for declarative process mining, in: BPM (Demos), CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 117–121. [7] A. Burattin, F. M. Maggi, A. Sperduti, Conformance checking based on multi-perspective declarative process models, Expert Syst. Appl. 65 (2016) 194–211. [8] F. Chiariello, F. M. Maggi, F. Patrizi, ASP-based declarative process mining, in: AAAI, AAAI Press, 2022, pp. 5539–5547. 4 https://github.com/ivanDonadello/Declare4Py 5 https://github.com/ivanDonadello/Declare4Py/blob/main/docs/source/tutorials/9.Log_Generation.ipynb