=Paper= {{Paper |id=Vol-3783/paper_343 |storemode=property |title=Ebi - a Stochastic Process Mining Framework |pdfUrl=https://ceur-ws.org/Vol-3783/paper_343.pdf |volume=Vol-3783 |authors=Sander J.J. Leemans,Tian Li,Jan Niklas van Detten |dblpUrl=https://dblp.org/rec/conf/icpm/LeemansLD24 }} ==Ebi - a Stochastic Process Mining Framework== https://ceur-ws.org/Vol-3783/paper_343.pdf
                                Ebi - a Stochastic Process Mining Framework
                                Sander J.J. Leemans1,∗ , Tian Li1,2 and Jan Niklas van Detten1
                                1
                                    RWTH Aachen, Germany
                                2
                                    University of Melbourne, Australia


                                               Abstract
                                               Ebi is an open-source framework for stochastic process mining. Ebi currently contains over 30 techniques
                                               related to stochastic process mining, from completeness estimation to analysis, conformance checking,
                                               discovery, visualisation and statistical tests, and can be called from the command line.
                                                   For technique developers, Ebi allows abstractions on inputs through interfaces (“traits”), such that
                                               implemented techniques automatically support a variety of input and output types. Furthermore, Ebi
                                               almost completely uses exact arithmetic to avoid precision issues with low values.

                                               Keywords
                                               Stochastic process mining, stochastic process models, event logs




                                                Metadata description                                    Value
                                                Tool name                                               Ebi
                                                Current version                                         1.0
                                                Legal code license                                      Apache 2.0
                                                Languages, tools and services used                      Rust, GraphViz, Rust4pm
                                                Supported operating environment                         Windows, Linux, (Mac OS X with self-compilation)
                                                Download/Demo URL                                       https://www.ebitools.org/
                                                Documentation URL                                       https://www.ebitools.org/, click manual
                                                Source code repository                                  https://github.com/BPM-Research-Group/Ebi
                                                Screencast video                                        https://youtu.be/IEeTH3DCZ_0




                                1. Introduction
                                Using process mining techniques, business analysts can find inefficiencies in business processes,
                                with the ultimate aim of improving them. For efficient use of process improvement resources,
                                it is important to recognise whether certain to-be-improved behaviour is frequent or rare.
                                Such frequency information is referred to as the stochastic perspective in process mining, and
                                techniques that take this stochastic information into account explicitly are called stochastic
                                process mining techniques.
                                   Stochastic process mining techniques are scattered across several process mining frameworks,
                                such as Pm4py [1], BupaR [2] and ProM [3], each with its strengths and weaknesses, ranging
                                from computational efficiency to ease of applicability in high-performance computing settings.

                                ICPM 2024 Tool Demonstration Track, October 14-18, 2024, Kongens Lyngby, Denmark
                                ∗
                                    Corresponding author.
                                Envelope-Open s.leemans@bpm.rwth-aachen.de (S. J.J. Leemans)
                                Orcid 0000-0002-5201-7125 (S. J.J. Leemans)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
None of these existing frameworks (i) allows a straightforward application of a particular
technique to an input file, (ii) none is easily accessible from the command line, (iii) all require
installation and external runtime environments and, most importantly, (iv) none support exact
computations throughout.
   Ebi is a command line utility focused on stochastic process mining techniques. For end users,
it allows direct application of stochastic process mining techniques to files, and writes the
output to a file. No need to import, click through options and export, or even to write scripts. In
particular, Ebi supports exact computations, which are a necessity in stochastic process mining
(see Section 2).
   For developers, Ebi is a framework that allows abstractions on inputs through interfaces
(“traits”), such that implemented techniques automatically support a variety of input and
output types. Furthermore, Ebi provides the means to perform exact computations easily,
through custom implementations of corresponding data structures and methods such as matrix
computations, as well as solvers and optimisation routines. Ebi as a framework ensures a
consistent parameter handling, and is thus well positioned to act as a background library for
Pm4py, BupaR and ProM in the future. Ebi is open source and we welcome contributions from
the community.
   The remainder of this paper is organised as follows. In Section 2, we discuss the significance of
Ebi for stochastic process mining. Section 3 details its innovations. Section 4 lists the currently
implemented techniques, while Section 5 discusses its maturity and Section 6 concludes the
paper.


2. Significance
In stochastic process mining, techniques may need to deal with astonishingly small numbers.
For instance, the BPIC11 [4] log has traces with more than 1 800 events. In any reasonable
stochastic process model, the probability of such a trace would be well below 10−300 , which is the
minimum value that a double precision float can represent. Even if we could represent such low
values reliably, doing any kind of computation would yield a low precision. For instance, if we
want to compute the unit Earth-movers’ stochastic conformance measure [5], we need to sum
over one such value for each trace in the log, which would, in a typical log with thousands of
traces, not leave any precision. In practice, computations on such logs without exact arithmetic
would be highly unreliable. More numerically complex techniques that for instance require
matrix computations like solving, inversion or multiplication with such low values would fare
even worse. Hence, exact computations are a necessity in stochastic process mining, and Ebi is
the first process mining framework that provides exact arithmetic for stochastic process mining.
   Furthermore, having a command line interface eases the application of stochastic process min-
ing techniques in demanding scientific experimental settings. Ebi is fully command-line based;
though integrations with existing process mining frameworks such as Pm4py [1], BupaR [2]
and ProM [3] are planned, in a similar fashion to Rust4pm [6].
   Finally, Ebi encourages techniques to accept a large variety of inputs and produce a large
variety of outputs, as it moves the burden of matching inputs and outputs of techniques to file
types in the framework itself: by means of abstract interfaces (such as “a stochastic language
that we can iterate over, and that iteration will end”), file types implement interfaces while
techniques use them.


3. Innovations
Ebi aims to provide stochastic process mining techniques to analysts in a consistent and coherent
fashion, accessible through the command line. The Ebi framework provides the following
innovations:

    • Background library & command line
      Ebi is a command line tool, which allows it to be applied easily in scientific experiment
      workflows. However, due to its structured set-up, it is well suited to be used as well as
      a fast background library for graphical tools or scripting languages such as Pm4py [1],
      BupaR [2] and ProM [3].
    • Exact computation
      As described in Section 2, exact computations are a necessity in stochastic process mining.
      Completely transparent to end users and techniques, Ebi uses fractions of unlimited
      size, explicit logarithms and explicit roots to perform almost all computations exactly.
      However, exact computations can be disabled by the user for performance reasons. For
      more details, please refer to the manual.
      Unfortunately, most existing Rust libraries do not accept unlimited fractions as input,
      let alone explicit representations of logarithms and roots. Therefore, many secondary
      techniques, such as linear solvers, matrix computations and approximation techniques,
      were adapted to support exact computations.
    • Input and output detached from techniques
      Completely transparent for end users, an implemented technique that needs to e.g. walk
      over the traces of an event log will automatically support all input formats for which
      that can be done. That is, techniques preferably define their inputs in terms of interfaces
      (“traits”). Again completely transparent to end users and techniques, the output of a
      technique is converted automatically by Ebi based on the file extension of the desired
      output file. For more details, please refer to the manual.
    • Local and memory-safe
      Ebi runs locally and neither needs nor uses internet access, which makes it suitable
      for privacy-sensitive settings. Use of the Rust programming language ensures most
      types of memory-related errors are absent. Furthermore, Rust is, of course dependent on
      the programmer, more energy efficient than Java and much more energy efficient than
      Python [7].


4. Techniques, Files & Architecture
In Ebi, a command is the implementation of an algorithm. As summarised in Figure 1, a command
defines the inputs it needs, in terms of traits and object types, as well its output type, in terms of
an object type it will produce. For instance, the command Ebi analyse most-likely-traces
                          export as                         output as


               file        imports    object   implements                 input to
                                                                trait                command
             handler                   type

                                                               input to


Figure 1: Architecture of Ebi.


Table 1
Files supported by Ebi.
                                               extension(s)        remarks
 event log                                     .xes, .xes.gz
 finite stochastic language                    .slang
 finite language                               .lang
 stochastic deterministic finite automaton     .sdfa
 stochastic labelled Petri net                 .slpn
 labelled Petri net                            .lpn
 accepting Petri net                           .pnml               every deadlock is considered a final
                                                                   marking
 directly follows model                        .dfm


will extract the most likely traces from its input. These inputs are (i) something that implements
the FiniteStochasticLanguage or StochasticDeterministicSemantics trait, and (ii) an
integer. When the user issues a command, Ebi will establish which traits and object types
can serve as an input for the given command. For each of these traits and object types, Ebi
will search which file handlers can provide it. For our example, the file handlers that can
provide a usable input are a finite stochastic language (.slang), a compressed event log (.xes.gz),
a stochastic labelled Petri net (.slpn), a stochastic deterministic finite automaton (.sdfa) and an
event log (.xes). Ebi will attempt to load the given file with all of these file handlers, until it
finds one that parses. Then, the command is executed. The command will result in an output,
that is, an object of a particular object type. Then, Ebi will consider the file extension of the file
the user wishes to write the result to, and see whether there is a file handler with that extension
that can export the output object type.
   The main techniques for which Ebi currently provides an implementations are shown in
Table 2. In total, Ebi currently has 30 commands. For a full overview of commands and their
inputs and outputs, please refer to the manual, accessible through http://ebitools.org.


5. Maturity
Ebi has been used in the experiments described in several recent papers published by the author
team [12, 15, 16]. In these experiments, Ebi has shown to be me much faster than corresponding,
earlier, implementations in other frameworks. This is presumably partly due to implementing
these techniques a second time, though perhaps also due to the lower level access to computing
Table 2
Techniques currently implemented in Ebi.
 Technique                                                         command                 paper
 Return all traces with their probabilities                           Ebi ana all
 Estimate the completeness of an event log                            Ebi ana comp         [8]
 Obtain the trace(s) that are on average the closest to all traces in Ebi ana med
 an event log
 Get all traces that have a likelihood higher than a threshold        Ebi ana minprob
 Get the most likely traces                                           Ebi ana mostlikely
 Align a finite (non-stochastic) language to a model                  Ebi anans ali        [9]
      ↣ computes alignments for any model, not just Petri nets
 Cluster traces into most-dissimilar groups (non-stochastic)          Ebi anans clus
 Obtain the trace(s) that are on average the closest to all traces in Ebi anans med
 an event log (non-stochastic)
 Association between process behaviour and trace attributes           Ebi asso atts        [10]
 Compute entropic relevance (uniform)                                 Ebi conf er          [11]
      ↣ computes log-model comparisons for any model, not just SDFAs
 Compute Jensen-Shannon stochastic conformance                        Ebi conf jssc        [12]
 Compute unit earth-movers’ stochastic conformance                    Ebi conf uemsc       [5]
 Convert stochastic languages and stochastic deterministic finite Ebi conv …
 automata into other types
 Discover a stochastic model using alignments                         Ebi disc ali         [13]
 Discover a stochastic model using occurrences                        Ebi disc occ         [13]
 Discover a stochastic model by giving each transition a weight of 1 Ebi disc uni
 Print basic information on any file                                  Ebi info
 Compute the probability that a stochastic model will produce a Ebi prob mod               [14]
 trace of a specified language
 Compute the probability of a trace in a model                        Ebi prob trac        [14]
 Compute the most likely path a trace followed in a model             Ebi prob exptra
 Sample a stochastic language                                         Ebi sam
 Test whether the sub-logs defined by a categorical trace attribute Ebi tst lcat           [10]
 are derived from identical processes
 Test-parse a file                                                    Ebi vali …
 Visualise as graph                                                   Ebi vis svg



resources that Rust provides; something also observed in the Rust4pm project [6]. Furthermore,
Ebi has proven to be much easier to integrate in a scientific experimental setting, due to its
command-line interface: a simple command-line script suffices to perform complex chains of
tasks, whereas in ProM or Pm4py, all file IO and intermediate-result storage, and all exceptions,
need to be programmed. Finally, Ebi has been applied in several case studies with industry
partners, including amongst other things in thesis projects.
6. Conclusion
Ebi is a framework for stochastic process mining. It offers exact arithmetic computation and,
currently, over 30 algorithms related to stochastic process mining, ranging from completeness
estimation to analysis, conformance checking, discovery, visualisation and statistical tests. For
technique implementers, Ebi handles the input and output file handling: techniques only need
to define their input and output format, and Ebi will take care of transformations automatically.
Currently, Ebi is a command-line based tool, however, in future work, it is intended that Ebi
can serve as a high speed background library for existing process mining frameworks such as
BupaR, ProM and Pm4py.
References
 [1] A. Berti, S. J. van Zelst, W. M. P. van der Aalst, Process mining for python (pm4py):
     Bridging the gap between process- and data science, CoRR abs/1905.06169 (2019).
 [2] G. Janssenswillen, B. Depaire, M. Swennen, M. Jans, K. Vanhoof, bupar: Enabling repro-
     ducible business process analysis, Knowl. Based Syst. 163 (2019) 927–930.
 [3] B. F. van Dongen, et al., The prom framework: A new era in process mining tool support,
     in: ICATPN, volume 3536 of LNCS, Springer, 2005, pp. 444–454.
 [4] B. van Dongen, Real-life event logs - hospital log, 2011.
 [5] S. J. J. Leemans, A. F. Syring, W. M. P. van der Aalst, Earth movers’ stochastic conformance
     checking, in: BPM Forum, volume 360 of LNBIP, Springer, 2019, pp. 127–143.
 [6] A. Küsters, W. M. P. van der Aalst, Rust4pm: A versatile process mining library for when
     performance matters, in: BPM Demos, volume to appear, CEUR-WS.org, 2024, p. to appear.
 [7] R. Pereira, M. Couto, F. Ribeiro, R. Rua, J. Cunha, J. P. Fernandes, J. Saraiva, Ranking
     programming languages by energy efficiency, Sci. Comput. Program. 205 (2021) 102609.
 [8] M. Kabierski, M. Richter, M. Weidlich, Addressing the log representativeness problem
     using species discovery, in: ICPM, IEEE, 2023, pp. 65–72.
 [9] A. Adriansyah, B. F. van Dongen, W. M. P. van der Aalst, Conformance checking using
     cost-based fitness analysis, in: EDOC, IEEE Computer Society, 2011, pp. 55–64.
[10] S. J. J. Leemans, J. M. McGree, A. Polyvyanyy, A. H. M. ter Hofstede, Statistical tests and
     association measures for business processes, IEEE TKDE 35 (2023) 7497–7511.
[11] H. Alkhammash, et al., Entropic relevance: A mechanism for measuring stochastic process
     models discovered from event data, Inf. Syst. 107 (2022) 101922.
[12] T. Li, S. J. J. Leemans, A. Polyvyanyy, The jensen-shannon distance metric for stochastic
     conformance checking, in: ICPM workshops, volume to appear of LNBIP, 2024.
[13] A. Burke, S. J. J. Leemans, M. T. Wynn, Stochastic process discovery by weight estimation,
     in: ICPM Workshops, volume 406 of LNBIP, Springer, 2020, pp. 260–272.
[14] S. J. J. Leemans, F. M. Maggi, M. Montali, Enjoy the silence: Analysis of stochastic petri
     nets with silent transitions, Inf. Syst. 124 (2024) 102383.
[15] S. J. J. Leemans, T. Li, M. Montali, A. Polyvyanyy, Stochastic process discovery: Can it be
     done optimally?, in: CAiSE, volume 14663 of LNCS, Springer, 2024, pp. 36–52.
[16] W. M. van der Aalst, S. J. J. Leemans, Learning generalized stochastic petri nets from event
     data, in: Festschrift Joost-Pieter Katoen, volume to appear of LNCS, 2024.