<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Online Comparison of Streaming Process Discovery Algorithms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kavya Baskar</string-name>
          <email>k.baskar@student.tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marwan Hassani</string-name>
          <email>m.hassani@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science Eindhoven University of Technology</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the active eld of process mining, several techniques have been proposed in various areas like process discovery and conformance checking. The integration of data stream mining techniques in process mining has gained popularity in recent years. The ProM framework that enables process mining with streaming data has been advanced to support event streams in the recent past. In this paper we present a new extension that is built upon existing work related to obtaining process models from data streams within ProM. The extension enables researchers to visually compare the results of two di erent process discovery algorithms for a single incoming stream of events with di erent algorithms to deal with the data streams such as Lossy Counting with Budget, Sliding Window and Exponential Decay.</p>
      </abstract>
      <kwd-group>
        <kwd>Process mining</kwd>
        <kwd>Event streams</kwd>
        <kwd>ProM</kwd>
        <kwd>Visualization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        We assume the reader to be acquainted with the eld of process mining and
refer to [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for a detailed understanding of the eld. The concept of process
mining in the streaming domain has become a subject of interest among
researchers over the recent years. To enable the application of process mining on
event streams, several techniques have been proposed with regard to data
storage, process discovery algorithms, conformance checking and event stream based
process enhancement. One such development focuses on abstract representation
approximations using algorithms which are designed for frequent item mining
on data streams [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. A prototypical implementation that corresponds to this
development is provided in the process mining tool-kit ProM4.
      </p>
      <p>The previous architecture is a generalization and standardization of existing
event stream-based process discovery algorithms and de nes a computational
mechanism applicable to a large class of process discovery algorithms. The
generalization of the architecture allows for the inclusion of event stream-based
process discovery algorithms within the framework, in the future.</p>
    </sec>
    <sec id="sec-2">
      <title>4 http://www.promtools.org/</title>
      <p>
        This demo is based on this existing implementation of process mining with
streaming data, in the ProM tool. Henceforth, the remainder of this demo will
focus towards the ProM framework [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Previous implementations on the ProM
tool mainly allow the user to obtain the resulting process model for only one of
process discovery algorithms [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In our proposal we provide researchers and
practitioners with the ability to compare visually the resulting Petri nets from
two speci c process discovery algorithms. The idea of visually comparing the
output of two di erent streaming algorithms with di erent settings or the same
algorithm with di erent parameter settings at each time stamp is inspired by the
stream clustering tab of MOA (Massive Online Analysis) framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] which
is an open source framework for data stream mining with concept drift. The
aim of this demo is to enhance the evaluation of streaming process discovery
algorithms within the ProM framework by enabling an online analysis of the
resulting models. Hence, the design decision was made to be able to compare
only two algorithms simultaneously.
2
      </p>
      <sec id="sec-2-1">
        <title>Architecture</title>
        <p>ProM is an open source framework for a wide variety of process mining
algorithms and techniques in the form of plug-ins. It is implemented in Java and is
therefore platform independent.</p>
        <p>A plug-in in ProM is an algorithm implementation that is of signi cance
which agrees with the framework 5. For this demo, we have created a new
plugin based on ve existing packages within the ProM code base, i.e. Stream,
EventStream, StreamAbstractRepresentation, StreamAlphaMiner and
StreamInductiveMiner. The new plugin is implemented in the package KavyaBaskar 6that
has dependencies on the aforementioned packages which have been modi ed to
suit the new architecture.</p>
        <p>
          The feature of the proposed architecture in this demo is the visualization of
the resulting process models (Petri nets) from two di erent event stream-based
process discovery algorithms. In [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] a standardized approach that extends ProM
framework enabling the handling of streaming data is presented. In [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] an
architecture is proposed which allows for the adoption of several process discovery
techniques in the event stream context, making it very generic. The core of the
implementation in this demo builds upon the architecture presented in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The
two speci c process discovery algorithms chosen for this implementation are the
Alpha Miner and the Inductive Miner. Alpha Miner was chosen since it is the
very rst miner to have been created for process discovery, and the Inductive
Miner because it guarantees a sound work ow net and previous researches
indicate it to be one of the best process discovery algorithms currently available.
A new plug-in called
CompareStreamInductiveMinerStreamAlphaMinerAPNXSEventReaderImpl has been created in the ProM framework and the respective
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5 http://www.processmining.org 6 https://www.dropbox.com/s/xko9gbntjgclrhy/eclipse-workspace.rar?dl=0</title>
      <p>algorithms have been incorporated within this plug-in. Other combinations are
currently under development.</p>
      <p>The new implementation provides the additional possibility of choosing the
internal data structures for each miner- Backward Exponential Decay, Forward
Exponential Decay, Lossy Counting with Budget and Sliding Window apart from
the existing choice between Frequent, Lossy Counting and Space Saving.</p>
      <p>The implementation of the Visualization object has been extended to split
the display into two panels with process models resulting from the two di erent
algorithms displayed on either panel. As present in earlier implementation, the
Slider component at the bottom of the panel and the start/pause/stop buttons
can be used to view the older model and analyze the evolution of the models.
Two separate Slider components have been implemented in the new plug-in to
enable the viewing of older models in any order and to be able to compare the
models generated by the mining algorithms at di erent points of time. Moreover,
the Update Result button can be used to generate model(s) at any random point
of time as long as the stream of events are incoming.
3</p>
      <sec id="sec-3-1">
        <title>Case study</title>
        <p>As an explanatory case we have used the BPI Challenge 2017 data set7 for the
stream generation. A detailed demonstration is provided as a screen cast 8.</p>
        <p>
          The log is imported and a stream is generated using the Generate Event
Stream plug-in, setting a default emission speed of 10 data-packets/second. An
example of the resulting stream is depicted in Figure 1a. A tutorial for using the
(a) Stream Generator object visualization (b) Applying the process discovery
algoof active stream. rithms on the live event stream.
tool is available 9. While the stream is being generated, the
CompareStreamInductiveMiner7 https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b
8 https://www.youtube.com/watch?v=2eMZe6NWhW0&amp;feature=youtu.be
9 https://www.dropbox.com/s/yh2rzymgy3higqn/Tutorial_Online_Comparison_
of_Streaming_Process_Discovery_Algorithms_%28BPM_2019_Demo%29.pdf?dl=0
StreamAlphaMinerAPNXSEventReaderImpl plug-in is selected as shown in
Figure 1b, with Event Stream (XSEvent) as input which is created by the Generate
Event Stream plug-in. In this plug-in, the Case and Activity Identi er(s) can
be speci ed. From the drop down list of data structures, the Forward
Exponential Decay has been chosen for the Inductive Miner with Renewal Rate=1,
Threshold=0.01 and Decay Rate=0.01. Lossy Counting with Budget for the
Alpha Miner with Budget size=1000 has been chosen. The result is depicted in
Figure.2.
We plan to include conformance checking metrics such as tness and precision
comparison for both the models, visualized using a time series graph, in order to
compare the performance of both algorithms. The implementation is in progress,
as illustrated in Figure 3. Near future goal is to incorporate the prospect of
selecting any two of the existing algorithms from the event stream-based process
discovery domain(e.g. StrProM [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]), with the possibility of including future ones
too (see [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for the potential of sequential pattern mining approaches for
streaming process discovery). Possibility of implementing this architecture in other
streaming frameworks such as Storm, Spark, Flink, Kafka and Samza can be
explored.
5
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Conclusion</title>
        <p>The newly presented simultaneous visual analysis of process models for the same
event stream empowers researchers, developers and business users to experiment
with the concept of process discovery with streaming data within the process
mining domain. The extension also allows the user to analyze the internal data
structure used for handling the data stream within the ProM framework.</p>
        <p>The lessons learned during the development of the presented implementation
can be used to tackle the hurdles in building a more generic and standardized
architecture which will hopefully enable the complete implementation to compare
all process models resulting from algorithms of choice.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Albert</surname>
            <given-names>Bifet</given-names>
          </string-name>
          , Geo Holmes, Richard Kirkby, and
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          . MOA:
          <article-title>Massive online analysis</article-title>
          .
          <source>Journal of ML Research</source>
          ,
          <volume>11</volume>
          (May):
          <volume>1601</volume>
          {
          <fpage>1604</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Burattin</surname>
          </string-name>
          .
          <article-title>Process mining for stream data sources</article-title>
          .
          <source>In Process Mining Techniques in Business Environments</source>
          , pages
          <volume>177</volume>
          {
          <fpage>204</fpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Marwan</given-names>
            <surname>Hassani</surname>
          </string-name>
          , Sergio Siccha, Florian Richter, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Seidl</surname>
          </string-name>
          .
          <article-title>E cient process discovery from event streams using sequential pattern mining</article-title>
          .
          <source>In IEEE Symposium on Computational Intelligence and Data Mining (CIDM)</source>
          , pages
          <fpage>1366</fpage>
          {
          <fpage>1373</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Marwan</given-names>
            <surname>Hassani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sebastiaan J. van Zelst</given-names>
            , and
            <surname>Wil M. P. van der Aalst</surname>
          </string-name>
          .
          <article-title>On the application of sequential pattern mining primitives to process discovery: Overview, outlook and opportunity identi cation</article-title>
          .
          <source>WIREs Data Mining and Knowledge Discovery</source>
          ,
          <year>e1315</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Wil</given-names>
            <surname>Van Der Aalst</surname>
          </string-name>
          .
          <article-title>Process mining: discovery, conformance and enhancement of business processes</article-title>
          , volume
          <volume>2</volume>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wil MP van der Aalst</surname>
          </string-name>
          , Boudewijn F van Dongen, Christian W Gunther, RS Mans, AK Alves De Medeiros, Anne Rozinat, Vladimir Rubin, Minseok Song,
          <source>HMW Verbeek, and AJMM Weijters. ProM 4</source>
          .
          <article-title>0: comprehensive support for real process analysis</article-title>
          .
          <source>In Intl. Conf. on Application and Theory of Petri Nets</source>
          , pages
          <volume>484</volume>
          {
          <fpage>494</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sebastiaan</surname>
            <given-names>J van Zelst</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrea Burattin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Boudewijn F van Dongen</surname>
          </string-name>
          ,
          <article-title>and HMW (Eric) Verbeek. Data streams in ProM 6: A single-node architecture</article-title>
          .
          <source>In BPM (Demos)</source>
          ,
          <source>page 81. Citeseer</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sebastiaan</surname>
            <given-names>J van Zelst</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boudewijn F van Dongen</surname>
          </string-name>
          , and
          <string-name>
            <surname>Wil MP van der Aalst</surname>
          </string-name>
          .
          <article-title>Event stream-based process discovery using abstract representations</article-title>
          .
          <source>Knowledge and Information Systems</source>
          ,
          <volume>54</volume>
          (
          <issue>2</issue>
          ):
          <volume>407</volume>
          {
          <fpage>435</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>