<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kgs. Lyngby</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denmark</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Streaming Process Mining, Apache Flink, Event stream</string-name>
        </contrib>
      </contrib-group>
      <fpage>75</fpage>
      <lpage>79</lpage>
      <abstract>
        <p>Beamline is a Java framework designed to facilitate the prototyping and development of streaming process mining algorithms. The framework is designed on top of Apache Flink which makes it suitable for extremely eficient computation due to its distributed and stateful nature. both algorithms as well as data structures, sources, and sinks to facilitate the development of process mining applications. The framework is licensed with Apache-2.0 and its companion website https: //www.beamline.cloud contains real-life examples on actual live data and all the system's documentation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Process mining [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] is a family of techniques aiming at constructing abstract models (e.g., Petri
nets [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]) and verifying process executions with the final aim of understanding how these
processes are performed, starting from event logs (i.e., recording of what happened).
      </p>
      <p>
        Process mining is typically divided into several sub-tasks including control-flow discovery [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
aiming at discovering a control-flow model starting from executions of the model itself;
conformance checking [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], aiming to verify that the executions of a process are conforming a normative
process description. Real-world application examples of control-flow discovery could aim at
understanding how a firm manufactures or handles goods (with the goal of understanding
the in-vivo processes, to optimize them); applications of conformance checking could target
clinical protocols and ensure that these are aligned with the expected protocols (with the goal
of spotting patients’ mistreatments as soon as possible).
      </p>
      <p>
        Process mining has been applied in many disciplines and, one of the most impactful
applications, right now, is in the healthcare [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] where clinical protocols/guidelines are the processes
and treatments of patients are the executions, or event logs. Particularly in this domain, a
fundamental requirement is the ability to change the course of treatment while the patient is
being medicated, thus requiring a streaming (or online) analysis (as opposed to a historical, or
LGOBE
https://andrea.burattin.net/ (A. Burattin)
CEUR
ofline, analysis).
      </p>
      <p>
        Streaming data analysis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] comes with a set of computational requirements that are directly
transferred into the streaming process mining discipline [8]. In addition to these, in the latter,
the fact that many data points – each of them observed at diferent timestamps – should be
conceptually connected to each other introduces some complexity based on the observation
window (i.e., the period of time during which the analysis is performed) [9].
      </p>
      <p>The software presented in this paper, called Beamline, which is built on top of Apache
Flink1 [10], enables the implementation of streaming data and process mining pipelines, by
providing access to the streaming process mining algorithms as well as common data analysis
techniques.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Overview and Design</title>
      <p>Beamline is defined as an extension of Apache Flink. The latter is a library for distributed
stateful computations over data streams. Specifically, Apache Flink allows the definition of
pipelines called dataflow that define which manipulations each event is expected to go through.
Beamline is a set of operations that extends the capabilities of Apache Flink, including process
mining transformations, such as process-aware event filters or flat-mappers for the discovery
of processes or the computation of the conformance.</p>
      <p>Due to the fact that Beamline is an extension of Apache Flink, all event transformations (both
pre- and post-processing) and all the data connectors implemented are accessible.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Functionalities available</title>
      <p>While Beamline is designed as a tool for researchers and practitioners for developing and
deploying new streaming process mining algorithms, a lot of functionalities are available
of-the-shelf, thus resulting in the ability to immediately benefit from the tool.</p>
      <p>It is possible to ingest events using all Apache Flink connectors. In addition, for testing
purposes, it is also possible to “replay” static logs as well as to simulate events referring to
known processes using the PLG2 simulator [11]. Once events are imported into the platform,
some process-aware filters are available, for example, to filter (retain/exclude) events based on
specific activities, process instances, or other event properties.</p>
      <p>The first option to consume an event stream consists of performing control flow discovery,
i.e., producing a process representation that captures a process expressing all events currently
being observed. It is important to note that this representation can evolve over time. On top of
this representation diferent dimensions could be added as well, for example, the average time
required to execute an activity or the maximum time between two activities, thus enabling to
identify and locate bottlenecks. For example, imagine the production process employed in a
frozen food factory. It is reasonable to think that such a process will be periodically switching
between ice creams (during the months approaching summer) and frozen pizza (during the rest
of the year). In this case, the changes will not involve only the control-flow but the frequencies</p>
      <sec id="sec-3-1">
        <title>1See https://flink.apache.org/.</title>
        <p>as well. Beamline supports the discovery of processes using diferent algorithms, producing
both imperative (e.g., using the Heuristics Miner with Lossy Counting) and declarative (e.g.,
with the Declare Discovery or DCR Miner) models.</p>
        <p>Another way of consuming an event stream is to perform conformance checking. This means
providing a normative (i.e., a prescriptive) model and checking, for each event, whether the
process instance being executed is conforming or not to the requirement. Meaningful use cases
for this activity are, for example, in healthcare, where clinical guidelines should be followed
but, as soon as violations are detected, alerts can be provided, to require a second look at the
case and verify that the patient is treated properly. Beamline supports conformance checking
where normative models are specified using the Petri net notation.</p>
        <p>It is important to highlight that all results produced by Beamline can be sink-ed into any other
system. For example, it is possible to forward the results of the computation into a time-series
database (such as InfluxDB) for visualization with “observability platforms” (such as Grafana) as
shown in Fig. 1. The website of Beamline as well as the GitHub repository provides examples
of all the operations mentioned in this section (including the storage of results in an external
database).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Installation and Usage</title>
      <p>The Beamline framework is hosted on GitHub2, with its interactive documentation hosted
on GitHub Pages3, and installation instructions as well as many tutorials and “hands-on” real
examples available on the project website4. It is possible to use Beamline on any Java project
2See https://github.com/beamline/framework/.
3See https://beamline.github.io/framework/.
4See https://www.beamline.cloud/.
where dependencies are managed using either Gradle, Maven, s b t , or Leiningen. Beamline
comes with all modules and extensions already compiled, therefore it is enough to just include
the proper dependency and all necessary packages are automatically included.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Comparison to Related Software</title>
      <p>While several other open-source software for process mining are available, such as ProM 5 [12]
or PM4Py6 [13], however their capability of handling streaming data is not (or only very
partially) developed. Previous implementations of streaming process mining algorithms have
been carried on using ad hoc software, hence making comparisons across techniques and
algorithms extremely complicated.</p>
      <p>
        When considering streaming data mining and streaming machine learning, several systems
have been developed in the past, such as MOA [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or Apache Flink [10]. While leveraging
these is extremely important, as they already benefit from a huge community, none of them
implement any process mining capability.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Beamline is a Java framework designed to facilitate the prototyping and development of
streaming process mining algorithms. Thanks to its integration into Apache Flink, users can leverage all
capabilities of the latter platform to handle pre- and post-processing needed for their streaming
(process) mining challenges.</p>
      <p>A link to a screencast is available at https://youtu.be/8eagbpJ_hK4.</p>
      <sec id="sec-6-1">
        <title>5https://www.promtools.org/ 6https://pm4py.fit.fraunhofer.de/</title>
        <p>[8] A. Burattin, Streaming Process Discovery and Conformance Checking, in: S. Sakr, A. Y.</p>
        <p>Zomaya (Eds.), Encyclopedia of Big Data Technologies, Springer International Publishing,
2018, pp. 1–8. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 3 1 9 - 6 3 9 6 2 - 8 { \ _ } 1 0 3 - 1 .
[9] A. Burattin, Streaming Process Mining, in: W. M. van der Aalst, J. Carmona (Eds.), Process</p>
        <p>Mining Handbook, Springer, 2022, pp. 349–372. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 0 3 1 - 0 8 8 4 8 - 3 { \ _ } 1 1 .
[10] P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, K. Tzoumas, Apache Flink™:
Stream and Batch Processing in a Single Engine, in: Bulletin of the IEEE Computer Society
Technical Committee on Data Engineering, 2015, pp. 28–38.
[11] A. Burattin, PLG2 : Multiperspective Process Randomization with Online and Ofline</p>
        <p>Simulations, in: Online Proceedings of the BPM Demo Track 2016, CEUR-WS.org, 2016.
[12] E. H. M. W. Verbeek, J. Buijs, B. van Dongen, W. M. van der Aalst, ProM 6: The Process</p>
        <p>Mining Toolkit, in: BPM 2010 Demo, 2010, pp. 34–39.
[13] A. Berti, S. J. van Zelst, W. M. van der Aalst, Process Mining for Python (PM4Py): Bridging
the Gap between Process-and Data Science, in: Proc. of ICPM Demo Track, 2019.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. van der Aalst</surname>
          </string-name>
          , Process Mining, Springer,
          <year>2016</year>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 6 6 2 - 4 9 8 5 1 - 4</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>IEEE</given-names>
            <surname>Task</surname>
          </string-name>
          <article-title>Force on Process Mining, Process Mining Manifesto</article-title>
          , in: F. Daniel,
          <string-name>
            <given-names>K.</given-names>
            <surname>Barkaoui</surname>
          </string-name>
          , S. Dustdar (Eds.),
          <source>Business Process Management Workshops</source>
          , Springer-Verlag,
          <year>2011</year>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>W. M.</surname>
          </string-name>
          <article-title>van der Aalst, Putting high-level Petri nets to work in industry</article-title>
          ,
          <source>Computers in Industry</source>
          <volume>25</volume>
          (
          <year>1994</year>
          )
          <fpage>45</fpage>
          -
          <lpage>54</lpage>
          . doi:h t t p : / / d x .
          <source>d o i . o r g / 1 0 . 1 0</source>
          <volume>1 6 / 0 1 6 6 - 3 6 1 5 ( 9 4 ) 9 0 0 3 1 - 0</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Murata</surname>
          </string-name>
          ,
          <article-title>Petri nets: Properties, analysis and applications</article-title>
          ,
          <source>Proceedings of the IEEE</source>
          <volume>77</volume>
          (
          <year>1989</year>
          )
          <fpage>541</fpage>
          -
          <lpage>580</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. van Dongen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Solti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weidlich</surname>
          </string-name>
          , Conformance Checking, Springer International Publishing,
          <year>2018</year>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 3 1 9 - 9 9 4 1 4 - 7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Munoz-Gama</surname>
          </string-name>
          et al.,
          <article-title>Process mining for healthcare: Characteristics and challenges</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>127</volume>
          (
          <year>2022</year>
          ).
          <source>doi:1 0 . 1 0 1 6 / j . j b i . 2 0</source>
          <volume>2 2 . 1 0 3 9 9 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bifet</surname>
          </string-name>
          , G. Holmes,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kirkby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          , MOA:
          <article-title>Massive Online Analysis Learning Examples</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>11</volume>
          (
          <year>2010</year>
          )
          <fpage>1601</fpage>
          -
          <lpage>1604</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>