<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Stochastic Process Mining (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Adam Burke Queensland University of Technology</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>There are many information systems in the world, used by many organizations to build things and help people. In process mining [1], models of organizations at work are automatically constructed and analyzed. By describing large amounts of data quickly, process mining accelerates understanding of what an organizations does, and how it may improve. For example, a medical doctor may note which common hospital intake tasks are bottlenecks, or an auditor may see evidence that certain regulatory checks are carried out as expected. These processes may be explicitly defined by the organization, as in an insurance claim process, or implicit, as in a hospital emergency room. The successes of process mining to date have largely been with control-flow models. In a control-flow model, causality is represented, but probability is not. Stochastic process models are potentially more powerful tools for some types of organizational analysis and optimization where frequency and variation are key, because one way of understanding organizations is to look at the patterns of what they repeatedly do, and what they treat as exceptional. Stochastic models may also be used in prediction and performance. This project then investigates: How can processes in organizations be understood using stochastic models mined from organizational data? The relatively small body of existing work on this topic is reviewed with research sub-questions and methods in Sections I-III. This project extends and applies this existing work while contributing novel techniques and analysis for the construction and use of these models on real-world event data. The process models notations used are Generalized Stochastic Petri Nets (GSPNs) [2] and closely related extensions. Section I summarizes already conducted research into stochastic process discovery, while Sections II and III relate to ongoing and planned research into quality dimensions and concept drift.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. DISCOVERING STOCHASTIC MODELS (RQ1)</title>
      <p>RQ1 How may stochastic process models be discovered
automatically?</p>
      <p>
        This project work investigated composition of existing
control-flow discovery techniques with new weight
estimation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and direct stochastic process discovery [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] techniques.
Direct Process Discovery algorithms output stochastic
models without an intermediate control-flow discovery step. The
techniques for generating stochastic models are not themselves
necessarily stochastic: they may also include analytic methods.
      </p>
      <p>RQ1.1 How may control-flow discovery techniques be
leveraged for stochastic process model discovery?</p>
      <p>
        A stochastic process discovery framework was developed
in the investigation of this question [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In it, a control flow
discovery algorithm is first used to discover a Petri net [1,
p60], and the result is combined with a weight estimation
step to produce a GSPN [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Six estimators fitting the
framework were implemented, and evaluated experimentally against
stochastic conformance measures, using established discovery
algorithms and real-life public event logs1. The framework is a
generalization of existing stochastic discovery techniques that
compose control-flow discovery with a pipelined stochastic
estimation step [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. It is also a specialization, in that
it produces GSPNs with immediate transitions, rather than
GDT SPNs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Stochastic quality measures of Entropy Recall and
Precision [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and Earth-Movers’ Distance [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] were used for the
evaluation, together with real-life logs from BPI challenges
and a variety of discovery algorithms. Estimation techniques
found were of comparable quality, were applicable to a broader
range of event logs, and were generally faster than GDT SPN
discovery [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>RQ1.2 How may stochastic models be discovered directly
from event logs?</p>
      <p>
        Techniques based on a computing pipeline of control-flow
model discovery followed by some inference of stochastic
data have some inherent design limitations. The
control-flowonly nature of the initially discovered model may introduce a
representational bias toward the structures of that output
formalism. The multiple passes through the log is also awkward,
and perhaps inefficient. Accordingly, this project introduced
the novel Toothpaste miner framework, a set of direct
discovery algorithms where control-flow and stochastic aspects
are discovered in concert, through a process of reduction and
abstraction, in polynomial time [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. To do this, it introduces
an intermediate abstraction targeted at stochastic process
discovery, the Probabilistic Process Tree (PPT). The algorithms
start with a trace model and reduce it to a target model using
formally defined rules, “squeezing” trace information into a
usable form. A prototype was implemented2 and evaluated
empirically against existing techniques with promising results.
      </p>
      <p>
        A PPT is an extension of Process Trees that includes
relative probabilities in the form of node weights, and is
related formally to GSPNs. Toothpaste miner is inspired by
region-based miners for control-flow models [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the
ALERGIA [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] algorithm.
      </p>
      <p>1Java code and data at https://github.com/adamburkegh/spd we
2Haskell code and data at https://github.com/adamburkegh/toothpaste</p>
    </sec>
    <sec id="sec-2">
      <title>II. QUALITY DIMENSIONS (RQ2)</title>
      <p>RQ2 What quality dimensions are empirically observable in
stochastic process models and logs?</p>
      <p>
        The quality of process models is often quantified with
conformance measures, and those measures related to four
standard control-flow quality dimensions [1, p188]. Recent
research into stochastic process measures suggests both
connections and challenges to these quality dimensions. For
example, entropy measures have been related to both precision
and recall (fitness) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], but Earth-Movers Distance does not
obviously align with an existing dimension [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Challenges
range from needing to translate the technical definition of e.g.,
fitness, to a stochastic context, to differences in how adjacent
fields theorize the role of simplicity and generalization.
      </p>
      <p>
        At least one empirical and quantitative study compares
process measures and dimensions for control-flow models [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Factor analysis showed Fitness and Precision as observable,
orthogonal dimensions. Simplicity was excluded, and there
was insufficient evidence to support a Generalization
dimension. For stochastic process models, this suggests experimental
investigation of how models may be distinguished can inform
and foster new formalized measures, and understand relations
between them. Process mining data is ultimately derived from
real-world social activity, and that will narrow the vast space
of candidate formalisms.
      </p>
      <p>
        This research, which is ongoing, investigates empirically
identifiable orthogonal quality dimensions for stochastic
process models. The experiment design uses a dataset of
stochastic process models for real-life processes, collected and
evaluated against a set of computationally cheap measures, termed
exploration measures. Exploration measures are based on
existing control-flow and stochastic model measures and around
fifteen measures are being considered. Existing stochastic
conformance measures [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are considered evaluation
measures. The dataset for the experiment comprises thousands
of stochastic models, including randomly generated models
and those from existing discovery algorithms. Evaluation
measures will be used for a subset of discovered models where
conformance measure tools were expected to terminate. The
subset excludes random models and discovered models which
perform poorly using exploration measures. Undermeasured
phenomena and possible quality dimensions will be proposed
based on a statistical analysis of the results.
      </p>
    </sec>
    <sec id="sec-3">
      <title>III. LONG-RUN PROCESS DRIFT (RQ3)</title>
      <p>RQ3 How can we precisely describe the history of change
in an organizations processes?</p>
      <p>
        A model describes a system. When the system changes, and
an automatically discovered model needs to detect that, this
problem is called concept drift [1, p320]. In process mining,
the computational discovery and analysis of organizational
process models, this becomes process drift [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Recent work
on stochastic modelling for concept drift [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] suggests
stochastic models can be a productively describe these phenomena.
      </p>
      <p>
        The significant existing literature on concept drift in
processes is focused on moments of drift, or anomaly detection.
Building on this foundation, the evolution of organizational
processes can itself be analysed computationally, yielding
models of process change. This long-run process drift can be
described with second-order process models [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        This prospective research concerns novel algorithms,
techniques and newly developed software for describing and
understanding long-run process drift. The underlying formalisms
are expected to be based on GSPNs and Reconfigurable Petri
nets [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Models of long-run process drift will be constructed
using a combination of stochastic process discovery and
adapted concept drift detection techniques. The techniques
will be evaluated experimentally against real-world event logs
and stochastic quality measures. Initial exploration for suitable
event log data is underway. Automatic techniques can make it
easier to understand the history of change in a process, what
is routine, and what is exceptional, and thereby make better
organizations over time.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. van der Aalst</surname>
          </string-name>
          , Process Mining: Data Science in Action, 2nd ed. Berlin Heidelberg: Springer-Verlag,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bause</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kritzinger</surname>
          </string-name>
          , Stochastic Petri Nets:
          <article-title>An Introduction to the Theory</article-title>
          . Vieweg+Teubner Verlag,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J. J.</given-names>
            <surname>Leemans</surname>
          </string-name>
          , and M. T. Wynn, “
          <article-title>Stochastic Process Discovery by Weight Estimation,” in Process Mining Workshops</article-title>
          . Cham: Springer,
          <year>2021</year>
          , pp.
          <fpage>260</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] --, “
          <article-title>Discovering Stochastic Process Models By Reduction and Abstraction,” in Application and Theory of Petri Nets and Concurrency, ser</article-title>
          .
          <source>Lecture Notes in Computer Science</source>
          . Springer,
          <year>2021</year>
          , pp.
          <fpage>312</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rogge-Solti</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Weske</surname>
          </string-name>
          , “
          <article-title>Discovering Stochastic Petri Nets with Arbitrary Delay Distributions from Event Logs,” in BPM Workshops, ser</article-title>
          .
          <source>LNBP</source>
          . Springer,
          <year>2014</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Camargo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Gonzlez-Rojas</surname>
          </string-name>
          , “
          <article-title>Automated discovery of business process simulation models from event logs,” Decision Support Systems</article-title>
          , vol.
          <volume>134</volume>
          , p.
          <fpage>113284</fpage>
          ,
          <string-name>
            <surname>Jul</surname>
          </string-name>
          .
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. J. J.</given-names>
            <surname>Leemans</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Polyvyanyy</surname>
          </string-name>
          , “
          <article-title>Stochastic-Aware Conformance Checking: An Entropy-Based Approach,” in Advanced Information Systems Engineering, ser</article-title>
          .
          <source>LNCS</source>
          . Springer,
          <year>2020</year>
          , pp.
          <fpage>217</fpage>
          -
          <lpage>233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. J. J.</given-names>
            <surname>Leemans</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , T. Brockhoff,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Polyvyanyy</surname>
          </string-name>
          , “
          <article-title>Stochastic process mining: Earth movers stochastic conformance</article-title>
          ,
          <source>” Information Systems</source>
          , p.
          <fpage>101724</fpage>
          ,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cortadella</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kishinevsky</surname>
          </string-name>
          , “
          <article-title>New Region-Based Algorithms for Deriving Bounded Petri Nets,”</article-title>
          <source>IEEE Transactions on Computers</source>
          , vol.
          <volume>59</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>371</fpage>
          -
          <lpage>384</lpage>
          , Mar.
          <year>2010</year>
          , conference Name: IEEE Transactions on Computers.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Liesaputra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yongchareon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaisiri</surname>
          </string-name>
          , “
          <article-title>Efficient Process Model Discovery Using Maximal Pattern Mining,” in BPM, ser</article-title>
          .
          <source>LNCS</source>
          . Springer,
          <year>2015</year>
          , pp.
          <fpage>441</fpage>
          -
          <lpage>456</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Carrasco</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Oncina</surname>
          </string-name>
          , “
          <article-title>Learning stochastic regular grammars by means of a state merging method,” in Grammatical Inference and Applications, ser</article-title>
          .
          <source>LNCS</source>
          . Berlin: Springer,
          <year>1994</year>
          , pp.
          <fpage>139</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Leemans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Syring</surname>
          </string-name>
          , and W. M. van der Aalst, “
          <article-title>Earth movers stochastic conformance checking</article-title>
          ,” in International Conference on Business Process Management. Springer,
          <year>2019</year>
          , pp.
          <fpage>127</fpage>
          -
          <lpage>143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Janssenswillen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Donders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jouck</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Depaire</surname>
          </string-name>
          , “
          <article-title>A comparative study of existing quality measures for process discovery</article-title>
          ,
          <source>” Information Systems</source>
          , vol.
          <volume>71</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          , Nov.
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R. P. J. C.</given-names>
            <surname>Bose</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , I. liobait, and M. Pechenizkiy, “
          <article-title>Dealing With Concept Drifts in Process Mining,”</article-title>
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          , vol.
          <volume>25</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>154</fpage>
          -
          <lpage>171</lpage>
          , Jan.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brockhoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Uysal</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. M. P.</surname>
          </string-name>
          v. d. Aalst, “
          <article-title>Time-aware Concept Drift Detection Using the Earth Movers Distance</article-title>
          ,” in ICPM, Oct.
          <year>2020</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Llorens</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Oliver</surname>
          </string-name>
          , “
          <article-title>Structural and dynamic changes in concurrent systems: reconfigurable Petri nets</article-title>
          ,
          <source>” IEEE Transactions on Computers</source>
          , vol.
          <volume>53</volume>
          , no.
          <issue>9</issue>
          , pp.
          <fpage>1147</fpage>
          -
          <lpage>1158</lpage>
          , Sep.
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>