Exploring L* for Process Mining
Karnika Shivhare1 , Rushikesh K. Joshi1
1
    Department of Computer Science Engineering, Indian Institute of Technology Bombay, Mumbai, India.


                                          Abstract
                                          When processes execute through their business logic, their activities generate event logs, which contribute to trace sets. Since
                                          its introduction, the field of process mining has evolved, however, accuracy issues persist. In this paper, we explore the L*
                                          algorithm in the context of process mining, especially towards development of interactive process mining techniques. We
                                          discuss the results of our experiments, the limitations of the approach, and possible future directions towards adapting the
                                          technique for covering richer set of features.

                                          Keywords
                                          Process, Traces, Trace Sets, Process Mining, Interactiveness, L* Algorithm, Automatons, Grammar Inferencing


1. Introduction                                                                                     a finite automaton for the given automaton language. It
                                                                                                    generates minimalistic DFAs [1].
Processes are progressions of activities that execute and                                              Other related works in this direction of automata learn-
generate event logs in the form of trace sets. These event                                          ing and grammar extraction include learning of context
logs present executed traces of the processes, and they                                             free languages such as by Omphalos [2], extracting con-
are routinely used by Process Mining algorithms to mine                                             text free languages in extended Backus–Naur form [3]
processes from them. Process Mining has been explored                                               and learning of probabilistic finite state machines such
for more than a decade, but discovering a process accu-                                             as by PAutomaC [4]. Several other works [5] [6] [7]
rately from its given trace set is still an attractive research                                     provide surveys in this field of grammar inferencing [8]
proposition that needs attention. A Trace from a process                                            with a focus on constructing grammar for the underlying
represents a control flow of the process during an execu-                                           language by learning through examples.
tion run, and it can be represented as a string of activities.                                         We observe that the field of Process Mining can benefit
Such a string can be obtained from a process execution                                              by placing automaton learning in proximity with Process
engine through its logs. Similarly, automatons can be con-                                          Discovery. Process models can be learnt like formal gram-
sidered as execution engines where strings of alphabets                                             mars by using the idea of positive examples in learning
are produced by virtue of transitions between states.                                               automatons and applying them to event log trace sets.
   The set of strings generated by an automaton repre-
sents the language accepted by the automaton, while the
process execution engine generates strings of activities                                            2. Our Approach
representing the trace set of the process. The former
is a complete set, whereas, the latter is a set generated                                              We present the observations and results (Table 1) of our
by a finite number of executions. The L* algorithm [1]                                                 exploration of the L* algorithm to mine processes. In our
is an algorithm that generates automatons from given                                                   approach, processes are mined as DFAs with the help of
regular language represented in terms of set of strings.                                               an oracle that validates the generated DFA by providing
The algorithm learns a deterministic finite automaton                                                  counterexamples for incorrect DFAs. The algorithm is
(DFA) from a set of positive and negative examples. It                                                 unaware about the process behavior to be mined, the
begins with an initial hypothesis DFA that accepts all of                                              correctness of its mined DFA, it’s iteration stage, or the
the positive examples and none of the negative examples.                                               point of incorrectness of its result. It is an interactive
It then iteratively refines the hypothesis through inter-                                              learning (or refining) algorithm that continues to take
action with an oracle to tailor the alterations to obtain                                              counterexamples and learn. This brings flexibility to the
                                                                                                       algorithm as to repeat its learn-and-refine cycle until a
International Workshop on Petri Nets and Software Engineering 2023, correct DFA is constructed, in contrast to other process
PNSE’23                                                                                                mining algorithms that generate either a correct or an
$ karnika@cse.iitb.ac.in (Karnika); rkj@cse.iitb.ac.in                                                 incorrect process. L* is an iterative approach to extract
(Rushikesh K.)                                                                                         regular grammar, and we utilize its iterations for extract-
 https://www.cse.iitb.ac.in/~karnika/ (Karnika);
                                                                                                       ing processes. We view process traces as strings in L*
https://www.cse.iitb.ac.in/~rkj/ (Rushikesh K.)
 0000-0001-6490-0380 (Karnika); 0000-0002-2712-1406                                                   algorithm, and begin with a hypothesis of an empty tran-
(Rushikesh K.)                                                                                         sition as the only trace in the trace set of the process to
          © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License
          Attribution 4.0 International (CC BY 4.0).                                                   be mined. The hypothesis is then iteratively refined until
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Table 1
                              Process            Number                         DFA Developments                 Count of    Correct Mining
 Microconfigurations                             of DFA                                                          Successes   by L* for
                              Trace Set                          Correct DFA   Counterexample   Type of
                                                 Constructions   Obtained      Provided         Counterexample   from [9]    Process Mining
                                                                     ✗              acd            Positive
 Bowtie [9]                   {acd, bce}              04             ✗              bce            Positive         00             ✓
                                                                     ✗              cce            Negative
                                                                     ✗              abcd           Positive
 Critical Section [9]        {abcd, cdab}             04             ✗              cdab           Positive         00             ✓
                                                                     ✗              bdab           Negative
                                                                     ✗              pqrs           Positive
                             {abcd, pqrs,                            ✗              pcd            Positive
 Crossover [9]                                        05                                                            02             ✓
                             pcd, ars}                               ✗              bcd            Negative
                                                                     ✗              abcd           Positive
                                                                     ✗              abd            Positive
 Early Completion [9]         {acb, abd}              04             ✗              bce            Positive         00             ✓
                                                                     ✗              adb            Negative
                                                                     ✗             abcde           Positive
 Initial Bypass [9]           {abcde, de}             03                                                            02             ✓
                                                                     ✗             cbcde           Positive
 Intertwined Active          {abcde, abge,            03             ✗             abcde           Positive         04             ✓
 Bypass [9]                  afde}                                   ✗              acde           Negative
 Intertwined Vanilla         {abcde, abde,            03             ✗             abcde           Positive         01             ✓
 Bypass [9]                  acde}                                   ✗           abcdeabcde        Negative
 Intertwined                   {abcde,                03             ✗             abcde           Positive         02             ✓
 Long Bypass [9]               abe, ade}                             ✗           abcdeabcde        Negative
 Seniority [9]                 {a, b, ab}             02             ✗               bb            Negative         01             ✓
                                                                     ✗               ac            Positive
 Ordered
                              {ab, bc, ac}            04             ✗               bc            Positive         00             ✓
 Subsequences [9]
                                                                     ✗               cc            Negative
                           {ac, abc, acb                             ✗               ac            Positive
 Asynchronous              abbc, abcb, acbb,
                                                      03             ✗              bac            Negative         00             ✓
 Service Loop [9]          abbbc, abbcb,
                           abcbb, acbbb }
                                                                     ✗              abc            Positive
 Concurrent                                                          ✗              acb            Positive
                            {abc, acb, cab }          05                                                            02             ✓
 Branching [9]                                                       ✗              aab            Negative
                                                                     ✗              cab            Positive
                         {ad, abd, acd,                              ✗               ad            Positive
                         abcd, acbd, abbd,                           ✗              bad            Negative
 Ticketed
                         accd, abbcd,                 03                                                            00             ✓
 Service [9]
                         abcbd, acbbd,
                         accbd, acbcd, abccd }
                                                                     ✗              aabb           Positive
                             {𝜖, ab, aabb,                           ✗             aaabbb          Positive
 𝑎𝑛 𝑏𝑛 (Later negative       aaabbb,                                                                                               ✗
                                                  not defined        ✗            aaaabbbb         Positive          -
 counterexample)             aaaabbbb,
                             aaaaabbbbb}                             ✗           aaaaabbbbb        Positive
                                                                     ✗             bbbbbb          Negative
                                                                     ✗              aab            Negative
                             {𝜖, ab, aabb,                           ✗              bbb            Negative
 𝑎𝑛 𝑏𝑛 (Early negative       aaabbb,
                                                  not defined        ✗             aaabbb          Positive          -             ✗
 counterexample)             aaaabbbb,
                             aaaaabbbbb}                             ✗            aaaabbbb         Positive
                                                                     ✗           aaaaabbbbb        Positive


the intended behavior of the process is accurately built. It ing with Trace language configurations [9] for testing
visualizes all next possible traces and their subsequences purposes.
by spanning the transition space from the traces and their
subsequences in trace set built by the algorithm at that
progression stage. The component of Oracle interactivity 3. Experiments and Results
is included as a record of all the traces of the given trace
                                                             As part of experimentation, we have implemented the L*
set. We experiment this adaptation of L* for process min-
                                                             algorithm [1] for Process Mining. This adaptation of L*
algorithm is then used to mine the microconfigurations          4. Conclusion
presented by Karnika and Joshi in [9]. They presented
thirteen inherent patterns among processes, which they          We explored the L* algorithm in the context of process
call as micro-configurations [9]. Moreover, they identi-        mining, drawing an analogy between trace sets and lan-
fied them as failure points for some process mining algo-       guages. We observed that several earlier trace patterns
rithms, and presented the test results of five algorithms       which were found to be fracture points in most of the
that are obtained on testing these micro-configurations         mining algorithms are correctly identified by L*. How-
using pm4py [10] framework [9].                                 ever, the algorithm requires an oracle in order to iterate
   Table 1 presents experimentation results of apply-           with its help to generate correct process model. In future,
ing the L* algorithm for Process Mining over 14 micro-          the oracle can be replaced by an automated machine that
configurations using their corresponding trace sets in-         validates the generated results by comparing two trace
cluded in the table. The table also includes relevant re-       sets. Since the algorithm works for trace sets express-
sults and details corresponding to various iterations and       ible with regular expressions, as future work, it can be
DFA constructions witnessed during mining by L*, such           explored to solve those sub-parts of trace sets that are
as counterexamples provided by Oracle, their types (pos-        not solvable by it. Further, since the algorithm generates
itive or negative), and the count of DFAs that are con-         DFAs, the models generated by the algorithm need to
structed by L* algorithm before producing the correct           be accurately converted into Petri Nets, and removal of
result.                                                         duplicity of transition labels is a challenge in conversion.
   The table also sums up the results of [9] to present            Acknowledgments We thank Paritosh K. Pandya for
a count of process mining algorithms that were found            suggesting the L* algorithm for our explorations.
in [9] to successfully discover that micro-configuration.
However, the table marks a - (hyphen) to indicate a result
of micro-configuration that was not presented by them.
                                                                References
   We identify that all the microconfigurations, presented       [1] D. Angluin, Learning regular sets from queries and
as fracture points for most of the mining algorithms, were           counterexamples, Information and Computation
successfully mined by L* algorithm for Process Mining.               75 (1987) 87–106.
   The algorithm correctly identifies processes that gen-        [2] A. Clark, Learning deterministic context free gram-
erate regular trace sets, and it continues its iterations for        mars: The omphalos competition, Machine Learn-
processes that generate non-regular trace sets. For exam-            ing 66 (2007) 93–110.
ple, as shown in the last two entries in Table 1, the two        [3] N. Wirth, What can we do about the unnecessary
runs of the algorithm on a non-regular trace set {𝑎𝑛 𝑏𝑛 }            diversity of notation for syntactic definitions?, Com-
produce correct result only upto the current iteration               mun. ACM 20 (1977).
generating net to accept {𝑎𝑘−1 𝑏𝑘−1 } at 𝑘𝑡ℎ iteration.          [4] S. Verwer, R. Eyraud, C. de la Higuera, Pautomac: a
   The algorithm generates DFAs, which may have mul-                 probabilistic automata and hidden markov models
tiple occurrences of transition labels, whereas processes            learning competition, Machine Learning (2014).
tend to have unique transition labels, since they represent      [5] C. Higuera, A bibliographical study of grammatical
activities in processes.                                             inference, Pattern Recognition 38 (2005).
   As the algorithm progresses, behavioral dependencies          [6] W. Daelemans, Colin de la higuera: Grammatical
across transitions are learned by the algorithm. However,            inference: learning automata and grammars, 2010,
sometimes, false positives mislead as it appears at an               Machine Translation 24 (2010) 291–293.
outset that the algorithm has learned some relationship,         [7] A survey of grammatical inference in software en-
but subsequent iterations of DFA miss out on required                gineering, Science of Computer Programming 96
traces.                                                              (2014) 444–459.
   The last two entries in the Table 1 represent algorith-       [8] M. Young-Lai, Grammar Inference, Springer US,
mic results with change in order of type of counterexam-             Boston, MA, 2009, pp. 1256–1260.
ples. In the first attempt, all negative counterexamples are     [9] K. Shivhare, R. Joshi, Trace language: Mining micro-
tried before giving any positive counterexample. With an             configurations from process transition traces, 2022.
expectation to reduce the number of required negative           [10] A. Berti, S. van Zelst, W. Aalst, Process mining for
counterexamples, strategies to have positive counterex-              python (pm4py): Bridging the gap between process-
amples in the beginning are attempted. However, it can               and data science, 2019.
be observed in this case, that the negative counterexam-
ples were still required to eliminate unwanted traces.