LogPPL: A Tool for Probabilistic Process Mining⋆
                                Martin Kuhn1 , Joscha Grüger2 , Christoph Matheja3 and Andrey Rivkin3
                                1
                                  German Research Center for Artificial Intelligence (DFKI), SDS Branch Trier, Trier, Germany
                                2
                                  University of Trier, Germany
                                3
                                  Technical University of Denmark, Kgs. Lyngby, Denmark


                                             Abstract
                                             This paper introduces LogPPL, a novel tool designed to bridge the gap between Data Petri Nets (DPNs)
                                             and probabilistic programming, enabling the generation of event logs with statistical guarantees via
                                             probabilistic program executions. LogPPL implements the transformation of DPNs into probabilistic
                                             programs written in the WebPPL language, allowing to harness the power of simulation and inference
                                             engines supplied for the WebPPL environment. Our tool simplifies the configuration of the DPN
                                             simulation setup and allows for exporting both event logs in XES format as well as WebPPL files. LogPPL
                                             capabilities are demonstrated through various scenarios, showcasing its potential to enhance process
                                             mining tasks by offering rigorous statistical modeling and advanced simulation features. The tool’s
                                             design, features, and performance are evaluated, highlighting its utility in both academic and industrial
                                             settings.

                                             Keywords
                                             Process Mining, Data Petri Nets, Probabilistic Programming, Event Log Generation, WebPPL, Statistical
                                             Simulation, DPN Simulation


                                    Metadata description                               Value
                                    Tool name                                          LogPPL
                                    Current version                                    0.1
                                    Legal code license                                 GPL-3.0
                                    Languages, tools and services used                 Python, WebPPL, Docker
                                    Supported operating environment                    Microsoft Windows, GNU/Linux, Mac OS
                                    Source code repository                             https://github.com/martinkuhn94/LogPPL
                                    Screencast video                                   https://github.com/martinkuhn94/LogPPL/tree/main/examples/screencast


                                1. Introduction
                                The proliferation of data-driven approaches in Business Process Management (BPM) and Process
                                Mining (PM) has led to a significant reliance on data Petri nets (DPNs) – a Petri net-based
                                formalism which combines classical P/T-nets with guarded reasoning about bounded memory

                                ICPM 2024 Tool Demonstration Track, October 14-18, 2024, Kongens Lyngby, Denmark
                                ∗
                                    Corresponding author.
                                †
                                    These authors contributed equally.
                                Envelope-Open martin.kuhn@dfki.de (M. Kuhn); grueger@uni-trier.de (J. Grüger); chmat@dtu.dk (C. Matheja); ariv@dtu.dk
                                (A. Rivkin)
                                Orcid 0000-0002-3242-1251 (M. Kuhn); 0000-0001-7538-1248 (J. Grüger); 0000-0001-9151-0441 (C. Matheja);
                                0000-0001-8425-2309 (A. Rivkin)
                                           © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
(which is also referred to as process or case variables) [1, 2]. DPNs serve as a powerful formalism
for modeling and analyzing complex processes, supporting various tasks such as process
discovery[1, 3], conformance checking [1, 3, 4, 5], formal verification [6, 7], and strategy
synthesis [8]. Despite the strengths of DPNs, traditional simulation techniques used within this
context have often lacked rigorous statistical foundations, especially when addressing stochastic
processes [5, 9]. This gap underscores the need for more robust approaches that can incorporate
statistical guarantees in the simulation and analysis of DPNs.
   Probabilistic Programming [10, 11] (PP for short) offers a promising paradigm that could
address these limitations by providing a systematic way to model statistical guarantees and
perform inference in complex systems. The recent work “Data Petri Nets Meet Probabilistic
Programming” [12] explores this intersection by proposing a novel approach which translates
DPNs into probabilistic programs and uses the latter as net simulation engines. This approach
proposed in [12] leverages the inherent capabilities of PP languages, such as WebPPL [13], Stan
[14] and PyMC3 [15] to provide a statistically grounded simulation framework that integrates
seamlessly with existing DPN models, and that can be used in such tasks as trace generation.
   In this context, we introduce LogPPL – a comprehensive tool designed to bridge the gap
between DPNs and probabilistic programming that enables event log generation with statistical
guarantees. LogPPL implements the approach proposed in [12], offering an intuitive interface
for converting DPN models into WebPPL programs, configuring various simulation parameters
and analyzing the results. This tool attempts at simplifying the application of Probabilistic Pro-
gramming (PP) engines in the process mining context, making them accessible by a potentially
broader audience. Specifically, through LogPPL, users can harness the power of PP in such tasks
as (synthetic) event log generation and “what-if” analysis.
   This paper presents the design and functionality of LogPPL, demonstrating its utility in
supporting advanced simulation and analysis tasks in BPM and PM. The tool’s implementation
and its application in various scenarios is showcased, which highlights its potential to enhance
the rigor and robustness of DPN-based simulations.


2. Innovations and Features
In [12], an approach was introduced that enables the use of probabilistic programming for
simulation and reasoning about data-aware processes. The approach proposed a systematic
translation of a data Petri net into a program written in a probabilistic programming language
(which, in turn, is supported by most probabilistic programming systems). It was also demon-
strated how resulting programs can be used to generate multi-perspective event logs with
statistical guarantees.
   DPNs extend traditional P/T-nets with transition guards that can manipulate scalar case
variables. Each guard is split into a pre- and post-condition expressions, where the latter can
define variable updates using primed copies of case variables. To make such nets simulatable
and suitable for translation into PP programs, [12] proposes to equip the nets with schedulers for
resolving non-deterministic choices. In particular, schedulers assign probability distributions
to transitions and primed variables in guards, which in turn provide the desired statistical
guarantees for simulating DPNs.
   Setting up schedulers and translating them together with corresponding DPNs into “ready-
to-simulate” PP programs is a complicated, error-prone, and time-consuming task. Thus, we
propose LogPPL which fully automates the process of scheduler configuration for a given DPN,
provides various predefined property-driven simulation setups, and automatically translates
such configuration together with the related model into a PP program in WebPPL. The tool thus
enables the generation of event logs with aforementioned statistical guarantees by performing
internal calls to the MCMC (Markov Chain Monte Carlo) statistical inference engine of WebPPL.
   The core features of the tool are:
       • Loading and visualizing DPN: LogPPL initially loads the data Petri net (provided in
         PNML1 format) and visualizes it together with all its guards. This visualization particularly
         aids in configuring the scheduler in subsequent steps.
       • Configuration of simulation parameters: First, the tool offers predefined simulation
         setups, which essentially define properties that target the simulation process. One of
         such properties is the final marking reachability, which is often desirable in the context of
         event log generation. One may also set up custom properties that are defined on events
         produced by the simulated net. For example, one may define a property ensuring that
         𝑥 is at least 17.5 and 𝑡3 was fired at most twice (𝑂𝑏𝑠𝑒𝑟𝑣𝑒(𝑥 > 17.5 ∧ #𝑡3 ≤ 2)). Another
         tow simulation parameters are the sample size which determines the number of traces
         to be generated, and the simulation length size which controls the maximum number of
         transition firings per simulation run.
       • Configuration of the scheduler: For each prime variable and each transition that con-
         tains it, probability distributions needed to resolve non-determinism during simulation
         runs need to be defined. The tool offers 23 pre-defined distribution functions (Bernoulli,
         Beta, Binomial, Categorical, Cauchy, Delta, DiagCovGaussian, Dirichlet, Discrete, Expo-
         nential, Gamma, Gaussian, KDE, Laplace, LogisticNormal, Mixture, Multinomial, Multi-
         variateBernoulli, MultivariateGaussian, Poisson, RandomInteger, TensorGaussian, Ten-
         sorLaplace, Uniform) together with the distribution that can be manually configured. To
         assist with the process of configuring distributions and their parameters, the tool provides
         comprehensive assistance. An example configuration can be seen in Figure 1.
       • Translation and running simulation: After the configuration step, the input DPN is
         translated into a probabilistic program that, apart from the net, takes into account the
         configuration of simulation parameters. The translation follows the procedure described
         in [12]. The following simulation process upon initiation invokes internally the WebPPL
         MCMC inference engine.
       • Event log & WebPPL export: After the simulation, the event log, which conforms to
         the defined probability distributions, can be exported in the XES format. Furthermore,
         the WebPPL representation of the given Data Petri Net (DPN) can be downloaded.
   These features make LogPPL a powerful tool for various process mining tasks. Synthetic
event logs of size 𝑛 are generated by executing the underlying WebPPL program simulating a
given DPN for at least 𝑛 times. Additionally, the same WebPPL program can be used to study
(using tools offered by the WebPPL implementation) the distribution of process runs and explore
1
    https://www.pnml.org/
conditional probabilities, thus enhancing understanding of the dynamics within a given data
Petri net. Such capabilities facilitate advanced what-if analyses and the investigation of rare
events, demonstrating PP’s flexibility and depth in modeling complex stochastic systems.


Figure 1: Scheduler configuration options in the LogPPL tool include settings for guards with prime
variables. In this context, init defines the initialization of a simple auction process, where the time
variable t’ is set based on a uniform distribution. The bid events generate the price o’ following the
Gaussian distribution.


3. Tool Maturity
LogPPL is a fully functional, stand-alone tool that is ready to be used by researchers and
practitioners to convert DPNs into probabilistic programs and use the latter to generate event
logs with statistical guarantees. The tool is designed with ease of use and accessibility in mind,
ensuring that it can be adopted across a wide range of use cases and user expertise levels.
   LogPPL supports Windows and Linux-based operating systems, and is packaged for installa-
tion via Docker, which simplifies the setup process and reduces potential compatibility issues.
From a technical perspective, LogPPL is implemented as a web application utilizing Python and
the Flask framework as its backend. The source code is released under the GPL-3.0 license.
   The tool’s maturity is further demonstrated by its evaluation through a proof-of-concept,
which was conducted to assess both its correctness and efficiency using tool performance
indicators. To demonstrate its practical applicability, the approach was tested on two distinct
DPNs: the Road Fine DPN [1], which includes 9 places, 19 transitions, 11 guards, and 8 variables,
and the more complex Melanoma DPN [16], which consists of 50 places, 76 transitions, 52
guards, and 26 variables. In these evaluations, WebPPL’s MCMC inference engine was employed
with various parameter settings, providing insights into the tool’s performance in different
scenarios. The runtime across multiple simulation runs was measured, under different parameter
combinations. Specifically, parameters such as run lengths (determined by the simulation loop
parameter) ranging from 10 to 50 and the number of samples which ranges from 100 to 819,200
were tested, with a 180-second timeout for each run. Figure 2 displays the runtime over five
computation cycles, where the number of generated runs doubles at each step, leading to an
exponential increase in runtime. The resulting runtimes are within a reasonable timeframe for
applications in real life scenarios, but still indicating potential for optimizations.
Figure 2: Evaluation results for Melanoma and Road Fine, highlighting the runtime (in seconds) in
relation to the number of generated runs (x-axis) [12]


4. Conclusion and Future Work
In this paper, we introduce LogPPL – a novel tool that bridges the gap between DPNs and
probabilistic programming, enabling the generation of event logs with statistical guarantees. By
leveraging the power of the WebPPL PP language, LogPPL facilitates the translation of DPNs into
probabilistic programs, allowing researchers and practitioners to simulate and analyze complex
processes with respect to statistical guarantees on simulated net runs. The tool’s capabilities
are particularly valuable for fields such as Business Process Management and Process Mining,
where accurate modeling and simulation of process behaviors are essential for decision-making
and process optimization. The tool’s design prioritizes usability and accessibility, providing a
user-friendly interface to facilitate adoption by a broad audience.
   While LogPPL represents a significant advancement in the simulation and analysis of DPNs,
there are several possibilities for future work. One promising direction is the extension of
LogPPL to support a wider range of process modeling languages like BPMN. This could serve
an even broader audience and provide more versatility in the types of process models that can
be simulated and analyzed.
   Another potential enhancement involves improving the tool’s user interface to include
advanced statistical visualizations of the simulation results. This could involve integrating
dashboards or interactive charts that allow users to explore the distribution of process attributes
across generated event logs. Such features can improve the interpretability of the simulation
outcomes and also provide users with more insights into the dynamics of the simulated processes.
   Additionally, future work could focus on automating the configuration of probabilistic models
based on existing event log data. Developing methods that can automatically infer appropriate
probabilistic models and parameters from historical data could significantly reduce the manual
effort required to set up simulation parameters and increase the accuracy of the results. This
automation would make LogPPL even more accessible to users who may not have a deep
understanding of probabilistic programming, but still wish to leverage its capabilities.


References
 [1] F. Mannhardt, M. de Leoni, H. A. Reijers, W. M. P. van der Aalst, Balanced multi-perspective
     checking of process conformance, Computing 98 (2016).
 [2] M. de Leoni, P. Felli, M. Montali, A holistic approach for soundness verification of decision-
     aware process models, in: ER, volume 11157 of LNCS, Springer, 2018, pp. 219–235.
 [3] F. Mannhardt, Multi-perspective Process Mining, Ph.D. thesis, TU/e, 2018.
 [4] P. Felli, A. Gianola, M. Montali, A. Rivkin, S. Winkler, Cocomot: Conformance checking
     of multi-perspective processes via SMT, in: A. Polyvyanyy, M. T. Wynn, A. V. Looy,
     M. Reichert (Eds.), Proc. of BPM 2021, volume 12875 of LNCS, Springer, 2021, pp. 217–234.
 [5] P. Felli, A. Gianola, M. Montali, A. Rivkin, S. Winkler, Conformance checking with
     uncertainty via SMT, in: Proc. of BPM 2022, LNCS, Springer, 2022.
 [6] P. Felli, M. Montali, S. Winkler, Soundness of data-aware processes with arithmetic
     conditions, in: X. Franch, G. Poels, F. Gailly, M. Snoeck (Eds.), Proc. of CAiSE 2022, volume
     13295 of LNCS, Springer, 2022, pp. 389–406.
 [7] P. Felli, M. Montali, S. Winkler, Ctl* model checking for data-aware dynamic systems with
     arithmetic, in: J. Blanchette, L. Kovács, D. Pattinson (Eds.), Proc. of IJCAR 2022, volume
     13385 of LNCS, Springer, 2022, pp. 36–56.
 [8] M. de Leoni, P. Felli, M. Montali, Strategy synthesis for data-aware dynamic systems with
     multiple actors, in: KR, 2020, pp. 315–325.
 [9] F. Mannhardt, S. J. J. Leemans, C. T. Schwanen, M. de Leoni, Modelling data-aware
     stochastic processes - discovery and conformance checking, in: L. Gomes, R. Lorenz (Eds.),
     Proc. of PETRI NETS 2023, LNCS, Springer, 2023.
[10] J.-W. van de Meent, B. Paige, H. Yang, F. Wood, An introduction to probabilistic program-
     ming, arXiv preprint arXiv:1809.10756 (2018).
[11] S. Russell, P. Norvig, Artificial Intelligence, Global Edition A Modern Approach, Pearson
     Deutschland, 2021.
[12] M. Kuhn, J. Grüger, C. Matheja, A. Rivkin, Data petri nets meet probabilistic programming,
     in: A. Marrella, M. Resinas, M. Jans, M. Rosemann (Eds.), Proc. of BPM 2024, volume 14940
     of LNCS, Springer, 2024.
[13] N. D. Goodman, A. Stuhlmüller, The Design and Implementation of Probabilistic Program-
     ming Languages, http://dippl.org, 2014. Accessed: 2024-3-8.
[14] B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. A.
     Brubaker, J. Guo, P. Li, A. Riddell, Stan: A probabilistic programming language, Journal of
     statistical software 76 (2017).
[15] J. Salvatier, T. V. Wiecki, C. Fonnesbeck, Probabilistic programming in python using
     pymc3, PeerJ Computer Science 2 (2016) e55.
[16] J. Grüger, T. Geyer, M. Kuhn, S. Braun, R. Bergmann, Verifying guideline compliance
     in clinical treatment using multi-perspective conformance checking: A case study, in:
     Process Mining Workshops, Springer, 2022.