<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Process Mining ToolKit (PMTK): Enabling Advanced Process Mining in an Integrated Fashion (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Alessandro Berti</institution>
          ,
          <addr-line>Chiao-Yun Li</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>-Heaps of event data are being generated and stored during the execution of (business) processes. Over the recent years, various process mining solutions have been developed, i.e., both in industry and academia, that can translate such data into meaningful insights. However, there is a big gap between the number of possible analysis techniques proposed in the literature and the widespread availability of said techniques in commercial applications. At the same time, existing academic tools, i.e., exposing a plethora of analysis techniques, are not designed to be seamlessly integrated into the business nor to provide an end-to-end solution. Therefore, this paper presents the Process Mining ToolKit, i.e., PMTK, intended to bridge the gap mentioned. Building on top of the open-source project PM4Py, PMTK presents novel process mining algorithms and techniques in an easy-to-use, fully integrated solution. Index Terms-process mining, process analytics, visual analytics, data science II. TOOL OVERVIEW</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        The execution of (business) processes generates digital
records of historical process behavior, i.e., referred to as event
data. Process mining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is concerned with developing
techniques and methods that can translate such data into actionable
knowledge of the process. Examples of typical process mining
techniques include process discovery, i.e., automated discovery
of process models describing the process based on the event
data, and conformance checking, i.e., assessing whether the
execution of a process as recorded in the event data conforms
to a given reference model. Over the recent years, various
academic and commercial software solutions have been
proposed, implementing process mining technology. Commercial
solutions, such as Celonis (http://celonis.com), UiPath (http://
uipath.com), Fluxicon Disco (http://fluxicon.com/disco/), etc.,
often provide basic process discovery functionalities and
various (customizable) statistics of the process. Academical
tools such as ProM [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] (http://promtools.org), Apromore [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
(http://apromore.org), PM4Py [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (http://pm4py.org) and
bupaR [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] (http://bupar.net) are often open-source and implement
a wider range of process mining technologies. Most of these
solutions are hard to integrate into a business context or require
extensive knowledge of a specific programming language to
be used. To bridge this gap, we present the Process Mining
ToolKit (PMTK). PMTK is built on top of the PM4Py library,
i.e., extending our earlier work presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. As such,
PMTK allows non-technical users to exploit the advanced
process mining technology implemented in PM4Py.
      </p>
      <p>In this section, we present a short overview of the core
components of the PMTK tool. A screen recording
corresponding to this extended abstract can be found at https://
pmtk.fit.fraunhofer.de/icpm21/demo.mp4.1 We briefly discuss
the overall architecture of PMTK and its main functionalities,
i.e., the work space, the main analysis capabilities and the
integrated filtering.</p>
      <p>
        1) Architecture: Conceptually, PMTK consists of three
different layers: an algorithmic layer (based on PM4Py), a web
service layer (i.e., a controller, based on [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) and a front-end
layer built using web technologies such as HTML5, Javascript
and Angular. PMTK is available as a standalone tool, i.e.,
including the web services and the web interface, and as a
web application which can be deployed on any application
server.
      </p>
      <p>2) The Work Space: PMTK provides a work space in which
the user is able to organize various files. Consider Figure 1a,
in which we show a snapshot of the work space. In the work
space, the user can create a folder for each process she is
intending to analyze. Subsequently, various objects, e.g., event
logs and filters can be stored in the corresponding process’
folder. Some objects, e.g., event logs can be imported from
disk, other objects can be generated from within PMTK.</p>
      <p>3) Analysis Capabilities: When the user selects an object
from the work space, various analyses can be applied, based
on the selected object. Currently, the user is able to execute
the following analyses:</p>
      <p>Statistics; PMTK provides various typical event log
statistics, i.e., absolute/relative activity occurrences, an
overview of the average events per case, events/case
arrivals/active cases over time and throughput statistics.
Log Exploration; PMTK provides means to explore the
event log in detail, i.e., to gain a better understanding of
the process captured by the event log. Currently, PMTK
implements the following log exploration functionalities:
1Based on PMTK release 0.1.1., dated September 2nd 2021. PMTK is
available via http://pmtk.fit.fraunhofer.de</p>
      <p>
        (a) Screenshot of the work space. Two processes are defined, both
containing an event log. The event log of the second process is selected.
(b) Example screenshot of the Variant Explorer
Variant Explorer: In the variant explorer, the user is
able to consult what cases follow the same control-flow
behavior. See Figure 1b, in which we present a small
screenshot of the variant explorer functionality; Dotted
Chart: PMTK implements the dotted chart analysis, i.e., a
visualization of events over time [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]; Performance View:
PMTK implements the performance spectrum, i.e., as
described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Process Map; PMTK implements a process map with
various filtering options (i.e., filtering of edges and activities).</p>
      <p>
        The layout algorithm implemented is based on [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
4) Integrated Filtering: In PMTK, event data filtering is
considered a primary citizen. As such, various event data
filtering functionalities have been implemented. The user is
able to specify custom filters, e.g., based on start/end activities,
time-ranges, etc. Most of the analysis functionalities described
in subsubsection II-3, provide interactive filtering functionality
as well. The filters created can be stored in the work space,
e.g., to be re-applied at a later stage of the analysis.
      </p>
    </sec>
    <sec id="sec-2">
      <title>III. CONCLUSION</title>
      <p>Various software solutions exist that are able to translate
recorded event data into operation insights into the historical
execution of a process. However, commercial applications only
offer a marginal fraction of the algoritmic possiblities, i.e.,
available in the process mining literatrue. Academic and open
source solutions do provide a larger range of functionalities,
yet, often in a non-intuitive manner. In this paper, we have
presented the Process Mining ToolKit (PMTK), which aims
to bridge this gap, i.e., integrating advanced algorithms in
an integrated, user-friendly environment. As such, PMTK, can
be seen as a front-end solution for the advanced open source
process mining library PM4Py.</p>
      <p>
        Tool Maturity &amp; Novelty: The Fraunhofer FIT process
mining team has developed PMTK to provide an extensible,
customizable, easy-to-maintain product to its R&amp;D project
partners. Compared to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the web-service architecture has
been redesigned to increase the tool’s modularity. We have
adopted object relational mapping for multi-database support,
offering support for different artefacts (e.g., the integrated
filters) for the same process, i.e., exposed as the work space.
Furthermore, all visualizations are now rendered in the
frontend and the layout-algorithm of the process map has been
redesigned, and, a performance overlay has been added. All
log exploration analyses have been added w.r.t. our previous
work, i.e., the dotted chart, variant explorer and the
performance spectrum.
      </p>
      <p>Future Work: Adoption of new functionalities in PMTKis
fairly straightforward, i.e., any algorithm in PM4Py is easily
adopted by exposing it as a web-service in the
PM4PyWS service and correspondingly designing a corresponding
visualization. As future work, we aim to integrate several
new functionalities, e.g., various process discovery algorithms,
uploading and editing of process models and conformance
checking functionalities. We additionally aim to support event
logs that are stored in a distributed environment.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <source>Process Mining - Data Science in Action, Second Edition</source>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C. A. M.</given-names>
            <surname>Buijs</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. F. van Dongen</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , “
          <article-title>ProM 6: The Process Mining Toolkit,”</article-title>
          <source>in BPM Demonstration Track</source>
          <year>2010</year>
          , Hoboken, NJ, USA, September
          <volume>14</volume>
          -
          <issue>16</issue>
          ,
          <year>2010</year>
          , ser. CEUR Workshop Proceedings, M. L. Rosa, Ed., vol.
          <volume>615</volume>
          . CEUR-WS.org,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Reijers</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Garc</surname>
          </string-name>
          <article-title>´ıa-Ban˜uelos, “APROMORE: An Advanced Process Model Repository,” Expert Syst</article-title>
          .
          <source>Appl.</source>
          , vol.
          <volume>38</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>7029</fpage>
          -
          <lpage>7040</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , “
          <article-title>Process Mining for Python (PM4Py): Bridging the Gap Between Process-and Data Science,”</article-title>
          <source>in ICPM Demo Track</source>
          <year>2019</year>
          , Aachen, Germany, June 24-26,
          <year>2019</year>
          .,
          <year>2019</year>
          , p.
          <fpage>13</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Janssenswillen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Depaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Swennen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jans</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Vanhoof</surname>
          </string-name>
          , “bupaR:
          <string-name>
            <surname>Enabling Reproducible Business Process Analysis</surname>
          </string-name>
          ,
          <source>” Knowl. Based Syst.</source>
          , vol.
          <volume>163</volume>
          , pp.
          <fpage>927</fpage>
          -
          <lpage>930</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          and
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          , “PM4Py Web Services:
          <article-title>Easy Development, Integration and Deployment of Process Mining Features in any Application Stack,”</article-title>
          <source>in BPM Demonstration Track</source>
          <year>2019</year>
          , Vienna, Austria, September 1-
          <issue>6</issue>
          ,
          <year>2019</year>
          , ser.
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2420</volume>
          . CEUR-WS.org,
          <year>2019</year>
          , pp.
          <fpage>174</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Song</surname>
          </string-name>
          and
          <string-name>
            <surname>W. M. van der Aalst</surname>
          </string-name>
          , “
          <article-title>Supporting process mining by showing events at a glance,”</article-title>
          <source>in Proceedings of the 17th Annual Workshop on Information Technologies and Systems (WITS)</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>139</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Denisov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Belkina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. M. P. van der Aalst,</surname>
          </string-name>
          “
          <article-title>The performance spectrum miner: Visual analytics for fine-grained performance analysis of processes,”</article-title>
          <source>in BPM Demonstration Track</source>
          <year>2018</year>
          , Sydney, Australia, September 9-
          <issue>14</issue>
          ,
          <year>2018</year>
          , ser.
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2196</volume>
          . CEUR-WS.org,
          <year>2018</year>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. J. P.</given-names>
            <surname>Mennens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Scheepens</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Westenberg</surname>
          </string-name>
          , “
          <article-title>A stable graph layout algorithm for processes</article-title>
          ,
          <source>” Comput. Graph. Forum</source>
          , vol.
          <volume>38</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>725</fpage>
          -
          <lpage>737</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>