=Paper= {{Paper |id=Vol-3098/dc_208 |storemode=property |title=Stochastic Process Mining (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-3098/dc_208.pdf |volume=Vol-3098 |authors=Adam Burke |dblpUrl=https://dblp.org/rec/conf/icpm/Burke21 }} ==Stochastic Process Mining (Extended Abstract)== https://ceur-ws.org/Vol-3098/dc_208.pdf
      Stochastic Process Mining (Extended Abstract)
                                                         Adam Burke
                                             Queensland University of Technology
                                                     at.burke@qut.edu.au


   There are many information systems in the world, used by           A stochastic process discovery framework was developed
many organizations to build things and help people. In process     in the investigation of this question [3]. In it, a control flow
mining [1], models of organizations at work are automatically      discovery algorithm is first used to discover a Petri net [1,
constructed and analyzed. By describing large amounts of data      p60], and the result is combined with a weight estimation
quickly, process mining accelerates understanding of what an       step to produce a GSPN [2]. Six estimators fitting the frame-
organizations does, and how it may improve. For example,           work were implemented, and evaluated experimentally against
a medical doctor may note which common hospital intake             stochastic conformance measures, using established discovery
tasks are bottlenecks, or an auditor may see evidence that         algorithms and real-life public event logs1 . The framework is a
certain regulatory checks are carried out as expected. These       generalization of existing stochastic discovery techniques that
processes may be explicitly defined by the organization, as        compose control-flow discovery with a pipelined stochastic
in an insurance claim process, or implicit, as in a hospital       estimation step [5], [6]. It is also a specialization, in that
emergency room.                                                    it produces GSPNs with immediate transitions, rather than
   The successes of process mining to date have largely been       GDT SPNs [5].
with control-flow models. In a control-flow model, causality is       Stochastic quality measures of Entropy Recall and Preci-
represented, but probability is not. Stochastic process models     sion [7] and Earth-Movers’ Distance [8] were used for the
are potentially more powerful tools for some types of organiza-    evaluation, together with real-life logs from BPI challenges
tional analysis and optimization where frequency and variation     and a variety of discovery algorithms. Estimation techniques
are key, because one way of understanding organizations is to      found were of comparable quality, were applicable to a broader
look at the patterns of what they repeatedly do, and what they     range of event logs, and were generally faster than GDT SPN
treat as exceptional. Stochastic models may also be used in        discovery [5].
prediction and performance. This project then investigates:           RQ1.2 How may stochastic models be discovered directly
   How can processes in organizations be understood using          from event logs?
stochastic models mined from organizational data?                     Techniques based on a computing pipeline of control-flow
   The relatively small body of existing work on this topic        model discovery followed by some inference of stochastic
is reviewed with research sub-questions and methods in Sec-        data have some inherent design limitations. The control-flow-
tions I-III. This project extends and applies this existing work   only nature of the initially discovered model may introduce a
while contributing novel techniques and analysis for the con-      representational bias toward the structures of that output for-
struction and use of these models on real-world event data. The    malism. The multiple passes through the log is also awkward,
process models notations used are Generalized Stochastic Petri     and perhaps inefficient. Accordingly, this project introduced
Nets (GSPNs) [2] and closely related extensions. Section I         the novel Toothpaste miner framework, a set of direct dis-
summarizes already conducted research into stochastic process      covery algorithms where control-flow and stochastic aspects
discovery, while Sections II and III relate to ongoing and         are discovered in concert, through a process of reduction and
planned research into quality dimensions and concept drift.        abstraction, in polynomial time [4]. To do this, it introduces
                                                                   an intermediate abstraction targeted at stochastic process dis-
      I. D ISCOVERING S TOCHASTIC M ODELS (RQ1)                    covery, the Probabilistic Process Tree (PPT). The algorithms
                                                                   start with a trace model and reduce it to a target model using
   RQ1 How may stochastic process models be discovered             formally defined rules, “squeezing” trace information into a
automatically?                                                     usable form. A prototype was implemented2 and evaluated
   This project work investigated composition of existing          empirically against existing techniques with promising results.
control-flow discovery techniques with new weight estima-             A PPT is an extension of Process Trees that includes
tion [3] and direct stochastic process discovery [4] techniques.   relative probabilities in the form of node weights, and is
Direct Process Discovery algorithms output stochastic mod-         related formally to GSPNs. Toothpaste miner is inspired by
els without an intermediate control-flow discovery step. The       region-based miners for control-flow models [9], [10] and the
techniques for generating stochastic models are not themselves     A LERGIA [11] algorithm.
necessarily stochastic: they may also include analytic methods.
   RQ1.1 How may control-flow discovery techniques be lever-         1 Java code and data at https://github.com/adamburkegh/spd we

aged for stochastic process model discovery?                         2 Haskell code and data at https://github.com/adamburkegh/toothpaste



 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
 International (CC BY 4.0).
             II. Q UALITY D IMENSIONS (RQ2)                         Building on this foundation, the evolution of organizational
   RQ2 What quality dimensions are empirically observable in        processes can itself be analysed computationally, yielding
stochastic process models and logs?                                 models of process change. This long-run process drift can be
   The quality of process models is often quantified with           described with second-order process models [14].
conformance measures, and those measures related to four               This prospective research concerns novel algorithms, tech-
standard control-flow quality dimensions [1, p188]. Recent          niques and newly developed software for describing and un-
research into stochastic process measures suggests both con-        derstanding long-run process drift. The underlying formalisms
nections and challenges to these quality dimensions. For            are expected to be based on GSPNs and Reconfigurable Petri
example, entropy measures have been related to both precision       nets [16]. Models of long-run process drift will be constructed
and recall (fitness) [7], but Earth-Movers Distance does not        using a combination of stochastic process discovery and
obviously align with an existing dimension [12]. Challenges         adapted concept drift detection techniques. The techniques
range from needing to translate the technical definition of e.g.,   will be evaluated experimentally against real-world event logs
fitness, to a stochastic context, to differences in how adjacent    and stochastic quality measures. Initial exploration for suitable
fields theorize the role of simplicity and generalization.          event log data is underway. Automatic techniques can make it
   At least one empirical and quantitative study compares pro-      easier to understand the history of change in a process, what
cess measures and dimensions for control-flow models [13].          is routine, and what is exceptional, and thereby make better
Factor analysis showed Fitness and Precision as observable,         organizations over time.
orthogonal dimensions. Simplicity was excluded, and there                                         R EFERENCES
was insufficient evidence to support a Generalization dimen-
                                                                     [1] W. van der Aalst, Process Mining: Data Science in Action, 2nd ed.
sion. For stochastic process models, this suggests experimental          Berlin Heidelberg: Springer-Verlag, 2016.
investigation of how models may be distinguished can inform          [2] F. Bause and P. Kritzinger, Stochastic Petri Nets: An Introduction to the
and foster new formalized measures, and understand relations             Theory. Vieweg+Teubner Verlag, 2002.
                                                                     [3] A. Burke, S. J. J. Leemans, and M. T. Wynn, “Stochastic Process
between them. Process mining data is ultimately derived from             Discovery by Weight Estimation,” in Process Mining Workshops. Cham:
real-world social activity, and that will narrow the vast space          Springer, 2021, pp. 260–272.
of candidate formalisms.                                             [4] ——, “Discovering Stochastic Process Models By Reduction and Ab-
                                                                         straction,” in Application and Theory of Petri Nets and Concurrency,
   This research, which is ongoing, investigates empirically             ser. Lecture Notes in Computer Science. Springer, 2021, pp. 312–336.
identifiable orthogonal quality dimensions for stochastic pro-       [5] A. Rogge-Solti, W. M. P. van der Aalst, and M. Weske, “Discovering
cess models. The experiment design uses a dataset of stochas-            Stochastic Petri Nets with Arbitrary Delay Distributions from Event
                                                                         Logs,” in BPM Workshops, ser. LNBP. Springer, 2014, pp. 15–27.
tic process models for real-life processes, collected and evalu-     [6] M. Camargo, M. Dumas, and O. Gonzlez-Rojas, “Automated discovery
ated against a set of computationally cheap measures, termed             of business process simulation models from event logs,” Decision
exploration measures. Exploration measures are based on ex-              Support Systems, vol. 134, p. 113284, Jul. 2020.
                                                                     [7] S. J. J. Leemans and A. Polyvyanyy, “Stochastic-Aware Conformance
isting control-flow and stochastic model measures and around             Checking: An Entropy-Based Approach,” in Advanced Information
fifteen measures are being considered. Existing stochastic               Systems Engineering, ser. LNCS. Springer, 2020, pp. 217–233.
conformance measures [7], [8] are considered evaluation mea-         [8] S. J. J. Leemans, W. M. P. van der Aalst, T. Brockhoff, and
                                                                         A. Polyvyanyy, “Stochastic process mining: Earth movers stochastic
sures. The dataset for the experiment comprises thousands                conformance,” Information Systems, p. 101724, Feb. 2021.
of stochastic models, including randomly generated models            [9] J. Carmona, J. Cortadella, and M. Kishinevsky, “New Region-Based
and those from existing discovery algorithms. Evaluation mea-            Algorithms for Deriving Bounded Petri Nets,” IEEE Transactions on
                                                                         Computers, vol. 59, no. 3, pp. 371–384, Mar. 2010, conference Name:
sures will be used for a subset of discovered models where               IEEE Transactions on Computers.
conformance measure tools were expected to terminate. The           [10] V. Liesaputra, S. Yongchareon, and S. Chaisiri, “Efficient Process
subset excludes random models and discovered models which                Model Discovery Using Maximal Pattern Mining,” in BPM, ser. LNCS.
                                                                         Springer, 2015, pp. 441–456.
perform poorly using exploration measures. Undermeasured            [11] R. C. Carrasco and J. Oncina, “Learning stochastic regular grammars
phenomena and possible quality dimensions will be proposed               by means of a state merging method,” in Grammatical Inference and
based on a statistical analysis of the results.                          Applications, ser. LNCS. Berlin: Springer, 1994, pp. 139–152.
                                                                    [12] S. J. Leemans, A. F. Syring, and W. M. van der Aalst, “Earth movers
          III. L ONG -RUN P ROCESS D RIFT (RQ3)                          stochastic conformance checking,” in International Conference on Busi-
                                                                         ness Process Management. Springer, 2019, pp. 127–143.
   RQ3 How can we precisely describe the history of change          [13] G. Janssenswillen, N. Donders, T. Jouck, and B. Depaire, “A com-
in an organizations processes?                                           parative study of existing quality measures for process discovery,”
                                                                         Information Systems, vol. 71, pp. 1–15, Nov. 2017.
   A model describes a system. When the system changes, and         [14] R. P. J. C. Bose, W. M. P. van der Aalst, I. liobait, and M. Pechenizkiy,
an automatically discovered model needs to detect that, this             “Dealing With Concept Drifts in Process Mining,” IEEE Transactions
problem is called concept drift [1, p320]. In process mining,            on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 154–171,
                                                                         Jan. 2014.
the computational discovery and analysis of organizational          [15] T. Brockhoff, M. S. Uysal, and W. M. P. v. d. Aalst, “Time-aware
process models, this becomes process drift [14]. Recent work             Concept Drift Detection Using the Earth Movers Distance,” in ICPM,
on stochastic modelling for concept drift [15] suggests stochas-         Oct. 2020, pp. 33–40.
                                                                    [16] M. Llorens and J. Oliver, “Structural and dynamic changes in concurrent
tic models can be a productively describe these phenomena.               systems: reconfigurable Petri nets,” IEEE Transactions on Computers,
   The significant existing literature on concept drift in pro-          vol. 53, no. 9, pp. 1147–1158, Sep. 2004.
cesses is focused on moments of drift, or anomaly detection.

 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
 International (CC BY 4.0).