Exploring L* for Process Mining Karnika Shivhare1 , Rushikesh K. Joshi1 1 Department of Computer Science Engineering, Indian Institute of Technology Bombay, Mumbai, India. Abstract When processes execute through their business logic, their activities generate event logs, which contribute to trace sets. Since its introduction, the field of process mining has evolved, however, accuracy issues persist. In this paper, we explore the L* algorithm in the context of process mining, especially towards development of interactive process mining techniques. We discuss the results of our experiments, the limitations of the approach, and possible future directions towards adapting the technique for covering richer set of features. Keywords Process, Traces, Trace Sets, Process Mining, Interactiveness, L* Algorithm, Automatons, Grammar Inferencing 1. Introduction a finite automaton for the given automaton language. It generates minimalistic DFAs [1]. Processes are progressions of activities that execute and Other related works in this direction of automata learn- generate event logs in the form of trace sets. These event ing and grammar extraction include learning of context logs present executed traces of the processes, and they free languages such as by Omphalos [2], extracting con- are routinely used by Process Mining algorithms to mine text free languages in extended Backus–Naur form [3] processes from them. Process Mining has been explored and learning of probabilistic finite state machines such for more than a decade, but discovering a process accu- as by PAutomaC [4]. Several other works [5] [6] [7] rately from its given trace set is still an attractive research provide surveys in this field of grammar inferencing [8] proposition that needs attention. A Trace from a process with a focus on constructing grammar for the underlying represents a control flow of the process during an execu- language by learning through examples. tion run, and it can be represented as a string of activities. We observe that the field of Process Mining can benefit Such a string can be obtained from a process execution by placing automaton learning in proximity with Process engine through its logs. Similarly, automatons can be con- Discovery. Process models can be learnt like formal gram- sidered as execution engines where strings of alphabets mars by using the idea of positive examples in learning are produced by virtue of transitions between states. automatons and applying them to event log trace sets. The set of strings generated by an automaton repre- sents the language accepted by the automaton, while the process execution engine generates strings of activities 2. Our Approach representing the trace set of the process. The former is a complete set, whereas, the latter is a set generated We present the observations and results (Table 1) of our by a finite number of executions. The L* algorithm [1] exploration of the L* algorithm to mine processes. In our is an algorithm that generates automatons from given approach, processes are mined as DFAs with the help of regular language represented in terms of set of strings. an oracle that validates the generated DFA by providing The algorithm learns a deterministic finite automaton counterexamples for incorrect DFAs. The algorithm is (DFA) from a set of positive and negative examples. It unaware about the process behavior to be mined, the begins with an initial hypothesis DFA that accepts all of correctness of its mined DFA, it’s iteration stage, or the the positive examples and none of the negative examples. point of incorrectness of its result. It is an interactive It then iteratively refines the hypothesis through inter- learning (or refining) algorithm that continues to take action with an oracle to tailor the alterations to obtain counterexamples and learn. This brings flexibility to the algorithm as to repeat its learn-and-refine cycle until a International Workshop on Petri Nets and Software Engineering 2023, correct DFA is constructed, in contrast to other process PNSE’23 mining algorithms that generate either a correct or an $ karnika@cse.iitb.ac.in (Karnika); rkj@cse.iitb.ac.in incorrect process. L* is an iterative approach to extract (Rushikesh K.) regular grammar, and we utilize its iterations for extract- € https://www.cse.iitb.ac.in/~karnika/ (Karnika); ing processes. We view process traces as strings in L* https://www.cse.iitb.ac.in/~rkj/ (Rushikesh K.)  0000-0001-6490-0380 (Karnika); 0000-0002-2712-1406 algorithm, and begin with a hypothesis of an empty tran- (Rushikesh K.) sition as the only trace in the trace set of the process to © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). be mined. The hypothesis is then iteratively refined until CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Table 1 Process Number DFA Developments Count of Correct Mining Microconfigurations of DFA Successes by L* for Trace Set Correct DFA Counterexample Type of Constructions Obtained Provided Counterexample from [9] Process Mining ✗ acd Positive Bowtie [9] {acd, bce} 04 ✗ bce Positive 00 ✓ ✗ cce Negative ✗ abcd Positive Critical Section [9] {abcd, cdab} 04 ✗ cdab Positive 00 ✓ ✗ bdab Negative ✗ pqrs Positive {abcd, pqrs, ✗ pcd Positive Crossover [9] 05 02 ✓ pcd, ars} ✗ bcd Negative ✗ abcd Positive ✗ abd Positive Early Completion [9] {acb, abd} 04 ✗ bce Positive 00 ✓ ✗ adb Negative ✗ abcde Positive Initial Bypass [9] {abcde, de} 03 02 ✓ ✗ cbcde Positive Intertwined Active {abcde, abge, 03 ✗ abcde Positive 04 ✓ Bypass [9] afde} ✗ acde Negative Intertwined Vanilla {abcde, abde, 03 ✗ abcde Positive 01 ✓ Bypass [9] acde} ✗ abcdeabcde Negative Intertwined {abcde, 03 ✗ abcde Positive 02 ✓ Long Bypass [9] abe, ade} ✗ abcdeabcde Negative Seniority [9] {a, b, ab} 02 ✗ bb Negative 01 ✓ ✗ ac Positive Ordered {ab, bc, ac} 04 ✗ bc Positive 00 ✓ Subsequences [9] ✗ cc Negative {ac, abc, acb ✗ ac Positive Asynchronous abbc, abcb, acbb, 03 ✗ bac Negative 00 ✓ Service Loop [9] abbbc, abbcb, abcbb, acbbb } ✗ abc Positive Concurrent ✗ acb Positive {abc, acb, cab } 05 02 ✓ Branching [9] ✗ aab Negative ✗ cab Positive {ad, abd, acd, ✗ ad Positive abcd, acbd, abbd, ✗ bad Negative Ticketed accd, abbcd, 03 00 ✓ Service [9] abcbd, acbbd, accbd, acbcd, abccd } ✗ aabb Positive {𝜖, ab, aabb, ✗ aaabbb Positive 𝑎𝑛 𝑏𝑛 (Later negative aaabbb, ✗ not defined ✗ aaaabbbb Positive - counterexample) aaaabbbb, aaaaabbbbb} ✗ aaaaabbbbb Positive ✗ bbbbbb Negative ✗ aab Negative {𝜖, ab, aabb, ✗ bbb Negative 𝑎𝑛 𝑏𝑛 (Early negative aaabbb, not defined ✗ aaabbb Positive - ✗ counterexample) aaaabbbb, aaaaabbbbb} ✗ aaaabbbb Positive ✗ aaaaabbbbb Positive the intended behavior of the process is accurately built. It ing with Trace language configurations [9] for testing visualizes all next possible traces and their subsequences purposes. by spanning the transition space from the traces and their subsequences in trace set built by the algorithm at that progression stage. The component of Oracle interactivity 3. Experiments and Results is included as a record of all the traces of the given trace As part of experimentation, we have implemented the L* set. We experiment this adaptation of L* for process min- algorithm [1] for Process Mining. This adaptation of L* algorithm is then used to mine the microconfigurations 4. Conclusion presented by Karnika and Joshi in [9]. They presented thirteen inherent patterns among processes, which they We explored the L* algorithm in the context of process call as micro-configurations [9]. Moreover, they identi- mining, drawing an analogy between trace sets and lan- fied them as failure points for some process mining algo- guages. We observed that several earlier trace patterns rithms, and presented the test results of five algorithms which were found to be fracture points in most of the that are obtained on testing these micro-configurations mining algorithms are correctly identified by L*. How- using pm4py [10] framework [9]. ever, the algorithm requires an oracle in order to iterate Table 1 presents experimentation results of apply- with its help to generate correct process model. In future, ing the L* algorithm for Process Mining over 14 micro- the oracle can be replaced by an automated machine that configurations using their corresponding trace sets in- validates the generated results by comparing two trace cluded in the table. The table also includes relevant re- sets. Since the algorithm works for trace sets express- sults and details corresponding to various iterations and ible with regular expressions, as future work, it can be DFA constructions witnessed during mining by L*, such explored to solve those sub-parts of trace sets that are as counterexamples provided by Oracle, their types (pos- not solvable by it. Further, since the algorithm generates itive or negative), and the count of DFAs that are con- DFAs, the models generated by the algorithm need to structed by L* algorithm before producing the correct be accurately converted into Petri Nets, and removal of result. duplicity of transition labels is a challenge in conversion. The table also sums up the results of [9] to present Acknowledgments We thank Paritosh K. Pandya for a count of process mining algorithms that were found suggesting the L* algorithm for our explorations. in [9] to successfully discover that micro-configuration. However, the table marks a - (hyphen) to indicate a result of micro-configuration that was not presented by them. References We identify that all the microconfigurations, presented [1] D. Angluin, Learning regular sets from queries and as fracture points for most of the mining algorithms, were counterexamples, Information and Computation successfully mined by L* algorithm for Process Mining. 75 (1987) 87–106. The algorithm correctly identifies processes that gen- [2] A. Clark, Learning deterministic context free gram- erate regular trace sets, and it continues its iterations for mars: The omphalos competition, Machine Learn- processes that generate non-regular trace sets. For exam- ing 66 (2007) 93–110. ple, as shown in the last two entries in Table 1, the two [3] N. Wirth, What can we do about the unnecessary runs of the algorithm on a non-regular trace set {𝑎𝑛 𝑏𝑛 } diversity of notation for syntactic definitions?, Com- produce correct result only upto the current iteration mun. ACM 20 (1977). generating net to accept {𝑎𝑘−1 𝑏𝑘−1 } at 𝑘𝑡ℎ iteration. [4] S. Verwer, R. Eyraud, C. de la Higuera, Pautomac: a The algorithm generates DFAs, which may have mul- probabilistic automata and hidden markov models tiple occurrences of transition labels, whereas processes learning competition, Machine Learning (2014). tend to have unique transition labels, since they represent [5] C. Higuera, A bibliographical study of grammatical activities in processes. inference, Pattern Recognition 38 (2005). As the algorithm progresses, behavioral dependencies [6] W. Daelemans, Colin de la higuera: Grammatical across transitions are learned by the algorithm. However, inference: learning automata and grammars, 2010, sometimes, false positives mislead as it appears at an Machine Translation 24 (2010) 291–293. outset that the algorithm has learned some relationship, [7] A survey of grammatical inference in software en- but subsequent iterations of DFA miss out on required gineering, Science of Computer Programming 96 traces. (2014) 444–459. The last two entries in the Table 1 represent algorith- [8] M. Young-Lai, Grammar Inference, Springer US, mic results with change in order of type of counterexam- Boston, MA, 2009, pp. 1256–1260. ples. In the first attempt, all negative counterexamples are [9] K. Shivhare, R. Joshi, Trace language: Mining micro- tried before giving any positive counterexample. With an configurations from process transition traces, 2022. expectation to reduce the number of required negative [10] A. Berti, S. van Zelst, W. Aalst, Process mining for counterexamples, strategies to have positive counterex- python (pm4py): Bridging the gap between process- amples in the beginning are attempted. However, it can and data science, 2019. be observed in this case, that the negative counterexam- ples were still required to eliminate unwanted traces.