-

Exploring L* for Process Mining

Karnika Shivhare

Rushikesh K. Joshi

0 0 Department of Computer Science Engineering, Indian Institute of Technology Bombay , Mumbai , India

When processes execute through their business logic, their activities generate event logs, which contribute to trace sets. Since its introduction, the field of process mining has evolved, however, accuracy issues persist. In this paper, we explore the L* algorithm in the context of process mining, especially towards development of interactive process mining techniques. We discuss the results of our experiments, the limitations of the approach, and possible future directions towards adapting the technique for covering richer set of features.

eol>Process Traces Trace Sets Process Mining Interactiveness L* Algorithm Automatons Grammar Inferencing

1. Introduction a finite automaton for the given automaton language. It generates minimalistic DFAs [1].

Processes are progressions of activities that execute and Other related works in this direction of automata learngenerate event logs in the form of trace sets. These event ing and grammar extraction include learning of context logs present executed traces of the processes, and they free languages such as by Omphalos [2], extracting conare routinely used by Process Mining algorithms to mine text free languages in extended Backus–Naur form [3] processes from them. Process Mining has been explored and learning of probabilistic finite state machines such for more than a decade, but discovering a process accu- as by PAutomaC [4]. Several other works [5] [6] [7] rately from its given trace set is still an attractive research provide surveys in this field of grammar inferencing [ 8] proposition that needs attention. A Trace from a process with a focus on constructing grammar for the underlying represents a control flow of the process during an execu- language by learning through examples. tion run, and it can be represented as a string of activities. We observe that the field of Process Mining can benefit Such a string can be obtained from a process execution by placing automaton learning in proximity with Process engine through its logs. Similarly, automatons can be con- Discovery. Process models can be learnt like formal gramsidered as execution engines where strings of alphabets mars by using the idea of positive examples in learning are produced by virtue of transitions between states. automatons and applying them to event log trace sets.

The set of strings generated by an automaton represents the language accepted by the automaton, while the process execution engine generates strings of activities 2. Our Approach representing the trace set of the process. The former is a complete set, whereas, the latter is a set generated We present the observations and results (Table 1) of our by a finite number of executions. The L* algorithm [ 1] exploration of the L* algorithm to mine processes. In our is an algorithm that generates automatons from given approach, processes are mined as DFAs with the help of regular language represented in terms of set of strings. an oracle that validates the generated DFA by providing The algorithm learns a deterministic finite automaton counterexamples for incorrect DFAs. The algorithm is (DFA) from a set of positive and negative examples. It unaware about the process behavior to be mined, the begins with an initial hypothesis DFA that accepts all of correctness of its mined DFA, it’s iteration stage, or the the positive examples and none of the negative examples. point of incorrectness of its result. It is an interactive It then iteratively refines the hypothesis through inter- learning (or refining) algorithm that continues to take action with an oracle to tailor the alterations to obtain counterexamples and learn. This brings flexibility to the algorithm as to repeat its learn-and-refine cycle until a International Workshop on Petri Nets and Software Engineering 2023, correct DFA is constructed, in contrast to other process PNSE’23 mining algorithms that generate either a correct or an $ karnika@cse.iitb.ac.in (Karnika); rkj@cse.iitb.ac.in incorrect process. L* is an iterative approach to extract (Rushikesh K.) regular grammar, and we utilize its iterations for extracthttph0s0tt:0/p/0sw-:0/w/0w0w1w.-c6ws4e.9c.is0ite-b0.i.3iat8cb0..ian(cK/.~ianrrk/n~ji/kk(aaRr)nu;is0kh0ai0/k0e(-Ks0h0a0rKn2.i-)k2a7)1;2-1406 ianlggoprirtohcme,ssaensd. bWegeinviwe withparohcyepsosthtreasciessofasansterminpgtsyitnraLn*(Rushikesh K.) sition as the only trace in the trace set of the process to © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License be mined. The hypothesis is then iteratively refined until CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) by spanning the transition space from the traces and their subsequences in trace set built by the algorithm at that progression stage. The component of Oracle interactivity is included as a record of all the traces of the given trace set. We experiment this adaptation of L* for process mining with Trace language configurations [ 9] for testing purposes. 3. Experiments and Results As part of experimentation, we have implemented the L* algorithm [1] for Process Mining. This adaptation of L* Number of DFA Constructions

DFA Developments Correct DFA Obtained

Counterexample Provided

Type of Counterexample Count of Successes from [9] Correct Mining by L* for Process Mining

✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ acd bce cce abcd cdab bdab pqrs pcd bcd abcd abd bce adb abcde cbcde abcde acde abcde bb ac bc cc ac bac abc acb aab cab ad bad abcdeabcde abcde abcdeabcde aabb aaabbb aaaabbbb aaaaabbbbb bbbbbb aab bbb aaabbb aaaabbbb aaaaabbbbb

Positive Positive Negative Positive Positive Negative Positive Positive Negative Positive Positive Positive Negative Positive Positive Positive Negative Positive Negative Positive Negative Negative Positive Positive Negative Positive Negative Positive Positive Negative Positive Positive Negative Positive Positive Positive Positive Negative Negative Negative Positive Positive Positive 00 00

02 00 02 04 01 02 01 00 00 02 00 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ 4. Conclusion algorithm is then used to mine the microconfigurations presented by Karnika and Joshi in [9]. They presented thirteen inherent patterns among processes, which they We explored the L* algorithm in the context of process call as micro-configurations [ 9]. Moreover, they identi- mining, drawing an analogy between trace sets and lanifed them as failure points for some process mining algo- guages. We observed that several earlier trace patterns rithms, and presented the test results of five algorithms which were found to be fracture points in most of the that are obtained on testing these micro-configurations mining algorithms are correctly identified by L*. Howusing pm4py [10] framework [9]. ever, the algorithm requires an oracle in order to iterate

Table 1 presents experimentation results of apply- with its help to generate correct process model. In future, ing the L* algorithm for Process Mining over 14 micro- the oracle can be replaced by an automated machine that configurations using their corresponding trace sets in- validates the generated results by comparing two trace cluded in the table. The table also includes relevant re- sets. Since the algorithm works for trace sets expresssults and details corresponding to various iterations and ible with regular expressions, as future work, it can be DFA constructions witnessed during mining by L*, such explored to solve those sub-parts of trace sets that are as counterexamples provided by Oracle, their types (pos- not solvable by it. Further, since the algorithm generates itive or negative), and the count of DFAs that are con- DFAs, the models generated by the algorithm need to structed by L* algorithm before producing the correct be accurately converted into Petri Nets, and removal of result. duplicity of transition labels is a challenge in conversion.

The table also sums up the results of [9] to present Acknowledgments We thank Paritosh K. Pandya for a count of process mining algorithms that were found suggesting the L* algorithm for our explorations. in [9] to successfully discover that micro-configuration.

However, the table marks a - (hyphen) to indicate a result References of micro-configuration that was not presented by them.

We identify that all the microconfigurations, presented [1] D. Angluin, Learning regular sets from queries and as fracture points for most of the mining algorithms, were counterexamples, Information and Computation successfully mined by L* algorithm for Process Mining. 75 (1987) 87–106.

The algorithm correctly identifies processes that gen- [2] A. Clark, Learning deterministic context free gramerate regular trace sets, and it continues its iterations for mars: The omphalos competition, Machine Learnprocesses that generate non-regular trace sets. For exam- ing 66 (2007) 93–110. ple, as shown in the last two entries in Table 1, the two [3] N. Wirth, What can we do about the unnecessary runs of the algorithm on a non-regular trace set {} diversity of notation for syntactic definitions?, Comproduce correct result only upto the current iteration mun. ACM 20 (1977). generating net to accept {− 1− 1} at ℎ iteration. [4] S. Verwer, R. Eyraud, C. de la Higuera, Pautomac: a

The algorithm generates DFAs, which may have mul- probabilistic automata and hidden markov models tiple occurrences of transition labels, whereas processes learning competition, Machine Learning (2014). tend to have unique transition labels, since they represent [5] C. Higuera, A bibliographical study of grammatical activities in processes. inference, Pattern Recognition 38 (2005).

As the algorithm progresses, behavioral dependencies [6] W. Daelemans, Colin de la higuera: Grammatical across transitions are learned by the algorithm. However, inference: learning automata and grammars, 2010, sometimes, false positives mislead as it appears at an Machine Translation 24 (2010) 291–293. outset that the algorithm has learned some relationship, [7] A survey of grammatical inference in software enbut subsequent iterations of DFA miss out on required gineering, Science of Computer Programming 96 traces. (2014) 444–459.

The last two entries in the Table 1 represent algorith- [8] M. Young-Lai, Grammar Inference, Springer US, mic results with change in order of type of counterexam- Boston, MA, 2009, pp. 1256–1260. ples. In the first attempt, all negative counterexamples are [9] K. Shivhare, R. Joshi, Trace language: Mining microtried before giving any positive counterexample. With an configurations from process transition traces, 2022. expectation to reduce the number of required negative [10] A. Berti, S. van Zelst, W. Aalst, Process mining for counterexamples, strategies to have positive counterex- python (pm4py): Bridging the gap between processamples in the beginning are attempted. However, it can and data science, 2019. be observed in this case, that the negative counterexamples were still required to eliminate unwanted traces.