<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Meets the KNIME Analytics Platform (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Humam Kourani</string-name>
          <email>humam.kourani@fit.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastiaan van Zelst</string-name>
          <email>sebastiaan.van.zelst@fit.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barry-Detlef Lehmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriel Einsdorf</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Helfrich</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabian Liße</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Applied Information Technology FIT</institution>
          ,
          <addr-line>Schloss Birlinghoven, 53757 Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KNIME GmbH</institution>
          ,
          <addr-line>Reichenaustr. 11, 78467 Konstanz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>65</fpage>
      <lpage>69</lpage>
      <abstract>
        <p>Process mining allows organizations to transform the data recorded during the execution of their processes into meaningful insights. These insights can help to detect problems and to improve the processes. Various process mining solutions have been developed, both for industrial and academic purposes. However, most of these solutions do not support the creation and execution of analytics workflows. The KNIME Analytics Platform (KNIME in short) is an open-source workflow-based analytics platform that supports various techniques in the field of data science. KNIME is widely used in numerous industries across many countries. This paper presents the process mining extension for KNIME, which integrates many powerful process mining algorithms into KNIME. Using the process mining extension of KNIME, process mining can be combined with other types of data science techniques available in</p>
      </abstract>
      <kwd-group>
        <kwd>Abstract)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>KNIME.
process mining, data science, workflow</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Process mining helps to analyze and monitor processes based on the events recorded during
their execution. The goal of process mining is to extract information from these events to allow
organizations to detect problems in their processes and improve decision-making. The field of
process mining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] covers all techniques for discovering process models, checking conformance
between event data and process models, and recommending process enhancements.
      </p>
      <p>
        The growing interest in process mining led to the development of numerous process mining
tools. ProM [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is one of the most powerful (academic) process mining tools available, i.e., it
contains hundreds of plugins that implement numerous process mining algorithms. However,
its academic nature hampers integration in other applications, and it does not support the
creation and execution of analytical workflows. To bring process mining into a user-friendly
workflow-based environment, we present the open-source process mining extension of KNIME:
PM4KNIME.
      </p>
      <p>
        KNIME [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is an open-source workflow-based analytics platform that supports various
techniques in the field of data science, e.g., machine learning, data mining, modeling, etc.
CEUR
Workflows are built in KNIME by sequentially connecting diferent nodes where each node is
dedicated to performing a specific task based on the results of the preceding nodes. The KNIME
Hub (https://hub.knime.com/) contains thousands of workflows ready to be applied to data sets.
KNIME provides extensions and nodes for integrating many projects, systems, web services,
and databases. For example, it supports the integration of Python (https://www.python.org/),
Apache Spark (https://spark.apache.org/), MongoDB (https://www.mongodb.com/), and many
cloud storage systems. KNIME Server is commercial software that enables collaboration between
users and supports automated and distributed executions of workflows, deployment options,
workflow management, and monitoring functionalities.
      </p>
      <p>Thanks to its ease of use and high scalability (distributed executors on the KNIME Server, big
data, and cloud integration), KNIME software is used by hundreds of companies in numerous
industries. PM4KNIME integrates process mining algorithms implemented in ProM into KNIME.
This allows for creating analytics workflows that combine process mining with the other types of
data science techniques available in KNIME in a scalable, user-friendly environment. Instruction
on how to install PM4KNIME can be found under https://pm4knime.github.io/userDoc/guides/
installation.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Tool Overview</title>
      <p>In this section, we provide an overview of PM4KNIME. A screen recording corresponding to
this overview is available under https://pm4knime.github.io/userDoc/guides/demo.</p>
      <sec id="sec-3-1">
        <title>2.1. KNIME Workflows</title>
        <p>KNIME stores data in table-based objects called DataTables. Algorithms in KNIME are
implemented as nodes. A node can have multiple input ports, output ports, views, and dialogs. The
input ports should be connected to the input objects required for executing the underlying
algorithm of the node. The dialogs are used to set the parameters of the algorithm. After the
successful termination of an algorithm, the output objects can be accessed through the output
ports. A workflow in KNIME is a directed graph connecting multiple nodes through their input
and output ports.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Functionalities</title>
        <p>PM4KNIME currently supports:1
• Importing and exporting diferent objects (e.g., Petri net).
• Exploring event logs (e.g., dotted chart).
• Converting objects (e.g., XES logs into DataTables).
• Event data manipulation (e.g., filtering).
• Process discovery (e.g., inductive miner).</p>
        <p>• Conformance checking (e.g., alignment-based replay).
1See https://hub.knime.com/pm4knime/extensions/org.pm4knime.feature/latest for a complete overview of all
available functionalities.</p>
        <p>• JavaScript visualizations (e.g., for Petri nets).</p>
        <p>Most implemented nodes work on DataTables. Internally, we wrap around the
implementations of the underlying process mining techniques from the plugins available in ProM.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Example Workflows</title>
        <p>Figure 1 shows a typical workflow in the field of process mining. It contains nodes for importing
data from a CSV file, preprocessing, process discovery, JavaScript visualizations of the discovered
models, model and data transformation, and conformance checking. We applied this workflow
to a real-life data set that records the execution of a ticketing management process [4]. Further
workflow examples are available on the KNIME Hub under: https://kni.me/s/VJqKc-EypN7Jkrl2.</p>
      </sec>
      <sec id="sec-3-4">
        <title>2.4. Tool Novelty</title>
        <p>In [5], RapidProM was introduced as an extension of RapidMiner. It integrates process mining
algorithms from ProM into the workflow-based platform RapidMiner. The idea of [ 5] is similar to
our contribution, but PM4KNIME provides some features that diferentiate it from RapidMiner.</p>
        <p>
          We adapted some process mining techniques not supported in RapidMiner (e.g., hybrid
Petri net miner). Moreover, most implemented algorithms in PM4KNIME work on DataTables
(not XES logs). We wrapped around the implementations of the underlying process mining
techniques in ProM. In data science, data is often stored in table-based files (e.g., CSV files) that
can be easily imported as DataTables in KNIME. Applying process mining algorithms directly
on DataTables improves the time performance because KNIME uses powerful caching strategies
that ensure high scalability when processing large DataTables [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>The KNIME Server provides many valuable features for organizations, such as automated
and distributed executions of workflows, deployment options, workflow management, and
monitoring functionalities. PM4KNIME provides JavaScript visualizations for the diferent types
of supported process models. This allows for building interactive web-based applications using
the deployment options on the KNIME server.</p>
        <p>Both RapidProM and PM4KNIME allow for saving workflows to be reused later. However,
PM4KNIME additionally supports the serialization of intermediate results. Each node in a
KNIME workflow processes its entire input data and permanently stores its output before
forwarding it to the successor nodes. By saving a workflow, the settings of all nodes and
all already generated (intermediate) objects are stored together with the workflow structure.
Therefore, it is possible to stop the execution of a KNIME workflow at any node. The workflow
can be modified and saved to be resumed later without needing to re-execute already executed
nodes that are not afected by any modifications. For all implemented (intermediate) objects in
PM4KNIME, we created internal importers and exporters to support the serialization of results.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Conclusion</title>
      <p>In this paper, we introduced the process mining extension of KNIME (PM4KNIME). PM4KNIME
integrates process mining algorithms that are implemented in the academic process mining
tool ProM into a workflow-based data science analytics platform that is widely used in industry.
The process mining extension of KNIME supports many techniques for process discovery,
conformance checking, event data manipulation, and visualization of process models.</p>
      <p>As future work, we aim at adapting further algorithms to work directly on DataTables instead
of XES logs (e.g., conformance checking algorithms). Moreover, we aim at supporting more
types of process models (e.g., BPMN models) and integrating more process mining algorithms
from ProM and/or other academic tools like PM4Py (http://pm4py.org/).</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The authors would like to thank Kefang Ding and Ralf Riesen for their contribution to
PM4KNIME.
and Knowledge Organization, Springer, 2007, pp. 319–326. URL: https://doi.org/10.1007/
978-3-540-78246-9_38. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 5 4 0 - 7 8 2 4 6 - 9 \ _ 3 8 .
[4] M. Polato, Dataset belonging to the help desk log of an Italian Company (2017).</p>
      <p>URL: https://data.4tu.nl/articles/dataset/Dataset_belonging_to_the_help_desk_log_of_an_
Italian_Company/12675977. doi:1 0 . 4 1 2 1 / u u i d : 0 c 6 0 e d f 1 - 6 f 8 3 - 4 e 7 5 - 9 3 6 7 - 4 c 6 3 b 3 e 9 d 5 b b .
[5] R. Mans, W. M. P. van der Aalst, H. M. W. Verbeek, Supporting Process Mining Workflows
with RapidProM, in: L. Limonad, B. Weber (Eds.), Proceedings of the BPM Demo Sessions
2014 Co-located with the 12th International Conference on Business Process Management
(BPM 2014), Eindhoven, The Netherlands, September 10, 2014, volume 1295 of CEUR
Workshop Proceedings, CEUR-WS.org, 2014, p. 56. URL: http://ceur-ws.org/Vol-1295/paper5.pdf.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <source>Process Mining - Data Science in Action, Second Edition</source>
          , Springer,
          <year>2016</year>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>662</fpage>
          -49851-4.
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 6 6 2 - 4 9 8 5 1 - 4</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B. F. van Dongen</given-names>
            ,
            <surname>A. K. A. de Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J. M. M. Weijters</surname>
            ,
            <given-names>W. M. P. van der Aalst</given-names>
          </string-name>
          ,
          <article-title>The ProM Framework: A New Era in Process Mining Tool Support</article-title>
          , in: G. Ciardo, P. Darondeau (Eds.),
          <source>Applications and Theory of Petri Nets</source>
          <year>2005</year>
          , 26th International Conference, ICATPN 2005,
          <article-title>Miami</article-title>
          , USA, June 20-25,
          <year>2005</year>
          , Proceedings, volume
          <volume>3536</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2005</year>
          , pp.
          <fpage>444</fpage>
          -
          <lpage>454</lpage>
          . URL: https://doi.org/10.1007/11494744_25.
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 1 1 4 9 4 7 4 4 \ _ 2</volume>
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Berthold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cebron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Gabriel</surname>
          </string-name>
          , T. Kötter,
          <string-name>
            <given-names>T.</given-names>
            <surname>Meinl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ohl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sieb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Thiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wiswedel</surname>
          </string-name>
          ,
          <string-name>
            <surname>KNIME</surname>
          </string-name>
          : The Konstanz Information Miner, in: C. Preisach,
          <string-name>
            <given-names>H.</given-names>
            <surname>Burkhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmidt-Thieme</surname>
          </string-name>
          , R. Decker (Eds.),
          <source>Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation</source>
          e.V.,
          <string-name>
            <surname>AlbertLudwigs-Universität</surname>
            <given-names>Freiburg</given-names>
          </string-name>
          , March 7-
          <issue>9</issue>
          ,
          <year>2007</year>
          ,
          <article-title>Studies in Classification, Data Analysis</article-title>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>