<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring L* for Process Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karnika Shivhare</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rushikesh K. Joshi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science Engineering, Indian Institute of Technology Bombay</institution>
          ,
          <addr-line>Mumbai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>When processes execute through their business logic, their activities generate event logs, which contribute to trace sets. Since its introduction, the field of process mining has evolved, however, accuracy issues persist. In this paper, we explore the L* algorithm in the context of process mining, especially towards development of interactive process mining techniques. We discuss the results of our experiments, the limitations of the approach, and possible future directions towards adapting the technique for covering richer set of features.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Process</kwd>
        <kwd>Traces</kwd>
        <kwd>Trace Sets</kwd>
        <kwd>Process Mining</kwd>
        <kwd>Interactiveness</kwd>
        <kwd>L* Algorithm</kwd>
        <kwd>Automatons</kwd>
        <kwd>Grammar Inferencing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
a finite automaton for the given automaton language. It
generates minimalistic DFAs [1].</p>
      <p>Processes are progressions of activities that execute and Other related works in this direction of automata
learngenerate event logs in the form of trace sets. These event ing and grammar extraction include learning of context
logs present executed traces of the processes, and they free languages such as by Omphalos [2], extracting
conare routinely used by Process Mining algorithms to mine text free languages in extended Backus–Naur form [3]
processes from them. Process Mining has been explored and learning of probabilistic finite state machines such
for more than a decade, but discovering a process accu- as by PAutomaC [4]. Several other works [5] [6] [7]
rately from its given trace set is still an attractive research provide surveys in this field of grammar inferencing [ 8]
proposition that needs attention. A Trace from a process with a focus on constructing grammar for the underlying
represents a control flow of the process during an execu- language by learning through examples.
tion run, and it can be represented as a string of activities. We observe that the field of Process Mining can benefit
Such a string can be obtained from a process execution by placing automaton learning in proximity with Process
engine through its logs. Similarly, automatons can be con- Discovery. Process models can be learnt like formal
gramsidered as execution engines where strings of alphabets mars by using the idea of positive examples in learning
are produced by virtue of transitions between states. automatons and applying them to event log trace sets.</p>
      <p>The set of strings generated by an automaton
represents the language accepted by the automaton, while the
process execution engine generates strings of activities 2. Our Approach
representing the trace set of the process. The former
is a complete set, whereas, the latter is a set generated We present the observations and results (Table 1) of our
by a finite number of executions. The L* algorithm [ 1] exploration of the L* algorithm to mine processes. In our
is an algorithm that generates automatons from given approach, processes are mined as DFAs with the help of
regular language represented in terms of set of strings. an oracle that validates the generated DFA by providing
The algorithm learns a deterministic finite automaton counterexamples for incorrect DFAs. The algorithm is
(DFA) from a set of positive and negative examples. It unaware about the process behavior to be mined, the
begins with an initial hypothesis DFA that accepts all of correctness of its mined DFA, it’s iteration stage, or the
the positive examples and none of the negative examples. point of incorrectness of its result. It is an interactive
It then iteratively refines the hypothesis through inter- learning (or refining) algorithm that continues to take
action with an oracle to tailor the alterations to obtain counterexamples and learn. This brings flexibility to the
algorithm as to repeat its learn-and-refine cycle until a
International Workshop on Petri Nets and Software Engineering 2023, correct DFA is constructed, in contrast to other process
PNSE’23 mining algorithms that generate either a correct or an
$ karnika@cse.iitb.ac.in (Karnika); rkj@cse.iitb.ac.in incorrect process. L* is an iterative approach to extract
(Rushikesh K.) regular grammar, and we utilize its iterations for
extracthttph0s0tt:0/p/0sw-:0/w/0w0w1w.-c6ws4e.9c.is0ite-b0.i.3iat8cb0..ian(cK/.~ianrrk/n~ji/kk(aaRr)nu;is0kh0ai0/k0e(-Ks0h0a0rKn2.i-)k2a7)1;2-1406 ianlggoprirtohcme,ssaensd. bWegeinviwe
withparohcyepsosthtreasciessofasansterminpgtsyitnraLn*(Rushikesh K.) sition as the only trace in the trace set of the process to
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License be mined. The hypothesis is then iteratively refined until
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
by spanning the transition space from the traces and their
subsequences in trace set built by the algorithm at that
progression stage. The component of Oracle interactivity
is included as a record of all the traces of the given trace
set. We experiment this adaptation of L* for process
mining with Trace language configurations [ 9] for testing
purposes.
3. Experiments and Results
As part of experimentation, we have implemented the L*
algorithm [1] for Process Mining. This adaptation of L*
Number
of DFA
Constructions</p>
      <p>DFA Developments
Correct DFA
Obtained</p>
      <p>Counterexample
Provided</p>
    </sec>
    <sec id="sec-2">
      <title>Type of Counterexample</title>
    </sec>
    <sec id="sec-3">
      <title>Count of Successes from [9]</title>
    </sec>
    <sec id="sec-4">
      <title>Correct Mining by L* for Process Mining</title>
      <p>✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
✗
acd
bce
cce
abcd
cdab
bdab
pqrs
pcd
bcd
abcd
abd
bce
adb
abcde
cbcde
abcde
acde
abcde
bb
ac
bc
cc
ac
bac
abc
acb
aab
cab
ad
bad
abcdeabcde
abcde
abcdeabcde
aabb
aaabbb
aaaabbbb
aaaaabbbbb
bbbbbb
aab
bbb
aaabbb
aaaabbbb
aaaaabbbbb</p>
      <sec id="sec-4-1">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-2">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-3">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-4">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-5">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-6">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-7">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-8">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-9">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-10">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-11">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-12">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-13">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-14">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-15">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-16">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-17">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-18">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-19">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-20">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-21">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-22">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-23">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-24">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-25">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-26">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-27">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-28">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-29">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-30">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-31">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-32">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-33">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-34">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-35">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-36">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-37">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-38">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-39">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-40">
        <title>Negative</title>
      </sec>
      <sec id="sec-4-41">
        <title>Positive</title>
      </sec>
      <sec id="sec-4-42">
        <title>Positive Positive 00 00</title>
        <p>02
00
02
04
01
02
01
00
00
02
00
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✗
✗
4. Conclusion
algorithm is then used to mine the microconfigurations
presented by Karnika and Joshi in [9]. They presented
thirteen inherent patterns among processes, which they We explored the L* algorithm in the context of process
call as micro-configurations [ 9]. Moreover, they identi- mining, drawing an analogy between trace sets and
lanifed them as failure points for some process mining algo- guages. We observed that several earlier trace patterns
rithms, and presented the test results of five algorithms which were found to be fracture points in most of the
that are obtained on testing these micro-configurations mining algorithms are correctly identified by L*.
Howusing pm4py [10] framework [9]. ever, the algorithm requires an oracle in order to iterate</p>
        <p>Table 1 presents experimentation results of apply- with its help to generate correct process model. In future,
ing the L* algorithm for Process Mining over 14 micro- the oracle can be replaced by an automated machine that
configurations using their corresponding trace sets in- validates the generated results by comparing two trace
cluded in the table. The table also includes relevant re- sets. Since the algorithm works for trace sets
expresssults and details corresponding to various iterations and ible with regular expressions, as future work, it can be
DFA constructions witnessed during mining by L*, such explored to solve those sub-parts of trace sets that are
as counterexamples provided by Oracle, their types (pos- not solvable by it. Further, since the algorithm generates
itive or negative), and the count of DFAs that are con- DFAs, the models generated by the algorithm need to
structed by L* algorithm before producing the correct be accurately converted into Petri Nets, and removal of
result. duplicity of transition labels is a challenge in conversion.</p>
        <p>The table also sums up the results of [9] to present Acknowledgments We thank Paritosh K. Pandya for
a count of process mining algorithms that were found suggesting the L* algorithm for our explorations.
in [9] to successfully discover that micro-configuration.</p>
        <p>However, the table marks a - (hyphen) to indicate a result References
of micro-configuration that was not presented by them.</p>
        <p>We identify that all the microconfigurations, presented [1] D. Angluin, Learning regular sets from queries and
as fracture points for most of the mining algorithms, were counterexamples, Information and Computation
successfully mined by L* algorithm for Process Mining. 75 (1987) 87–106.</p>
        <p>The algorithm correctly identifies processes that gen- [2] A. Clark, Learning deterministic context free
gramerate regular trace sets, and it continues its iterations for mars: The omphalos competition, Machine
Learnprocesses that generate non-regular trace sets. For exam- ing 66 (2007) 93–110.
ple, as shown in the last two entries in Table 1, the two [3] N. Wirth, What can we do about the unnecessary
runs of the algorithm on a non-regular trace set {} diversity of notation for syntactic definitions?,
Comproduce correct result only upto the current iteration mun. ACM 20 (1977).
generating net to accept {− 1− 1} at ℎ iteration. [4] S. Verwer, R. Eyraud, C. de la Higuera, Pautomac: a</p>
        <p>The algorithm generates DFAs, which may have mul- probabilistic automata and hidden markov models
tiple occurrences of transition labels, whereas processes learning competition, Machine Learning (2014).
tend to have unique transition labels, since they represent [5] C. Higuera, A bibliographical study of grammatical
activities in processes. inference, Pattern Recognition 38 (2005).</p>
        <p>As the algorithm progresses, behavioral dependencies [6] W. Daelemans, Colin de la higuera: Grammatical
across transitions are learned by the algorithm. However, inference: learning automata and grammars, 2010,
sometimes, false positives mislead as it appears at an Machine Translation 24 (2010) 291–293.
outset that the algorithm has learned some relationship, [7] A survey of grammatical inference in software
enbut subsequent iterations of DFA miss out on required gineering, Science of Computer Programming 96
traces. (2014) 444–459.</p>
        <p>The last two entries in the Table 1 represent algorith- [8] M. Young-Lai, Grammar Inference, Springer US,
mic results with change in order of type of counterexam- Boston, MA, 2009, pp. 1256–1260.
ples. In the first attempt, all negative counterexamples are [9] K. Shivhare, R. Joshi, Trace language: Mining
microtried before giving any positive counterexample. With an configurations from process transition traces, 2022.
expectation to reduce the number of required negative [10] A. Berti, S. van Zelst, W. Aalst, Process mining for
counterexamples, strategies to have positive counterex- python (pm4py): Bridging the gap between
processamples in the beginning are attempted. However, it can and data science, 2019.
be observed in this case, that the negative
counterexamples were still required to eliminate unwanted traces.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>