=Paper=
{{Paper
|id=Vol-3299/Paper14
|storemode=property
|title=PM4KNIME: Process Mining Meets the KNIME Analytics Platform (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-3299/Paper14.pdf
|volume=Vol-3299
|authors=Humam Kourani,Sebastiaan van Zelst,Barry-Detlef Lehmann,Gabriel Einsdorf,Stefan Helfrich,Fabian Liße
|dblpUrl=https://dblp.org/rec/conf/icpm/KouraniZLEHL22
}}
==PM4KNIME: Process Mining Meets the KNIME Analytics Platform (Extended Abstract)==
PM4KNIME: Process Mining Meets the KNIME Analytics Platform (Extended Abstract) Humam Kourani1,∗ , Sebastiaan van Zelst1 , Barry-Detlef Lehmann1 , Gabriel Einsdorf2 , Stefan Helfrich2 and Fabian Liße2 1 Fraunhofer Institute for Applied Information Technology FIT, Schloss Birlinghoven, 53757 Sankt Augustin, Germany 2 KNIME GmbH, Reichenaustr. 11, 78467 Konstanz, Germany Abstract Process mining allows organizations to transform the data recorded during the execution of their processes into meaningful insights. These insights can help to detect problems and to improve the processes. Various process mining solutions have been developed, both for industrial and academic purposes. However, most of these solutions do not support the creation and execution of analytics workflows. The KNIME Analytics Platform (KNIME in short) is an open-source workflow-based analytics platform that supports various techniques in the field of data science. KNIME is widely used in numerous industries across many countries. This paper presents the process mining extension for KNIME, which integrates many powerful process mining algorithms into KNIME. Using the process mining extension of KNIME, process mining can be combined with other types of data science techniques available in KNIME. Keywords process mining, data science, workflow 1. Introduction Process mining helps to analyze and monitor processes based on the events recorded during their execution. The goal of process mining is to extract information from these events to allow organizations to detect problems in their processes and improve decision-making. The field of process mining [1] covers all techniques for discovering process models, checking conformance between event data and process models, and recommending process enhancements. The growing interest in process mining led to the development of numerous process mining tools. ProM [2] is one of the most powerful (academic) process mining tools available, i.e., it contains hundreds of plugins that implement numerous process mining algorithms. However, its academic nature hampers integration in other applications, and it does not support the creation and execution of analytical workflows. To bring process mining into a user-friendly workflow-based environment, we present the open-source process mining extension of KNIME: PM4KNIME. KNIME [3] is an open-source workflow-based analytics platform that supports various techniques in the field of data science, e.g., machine learning, data mining, modeling, etc. ICPM 2022 Doctoral Consortium and Tool Demonstration Track ∗ Corresponding author. Envelope-Open humam.kourani@fit.fraunhofer.de (H. Kourani); sebastiaan.van.zelst@fit.fraunhofer.de (S. van Zelst) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 65 Workflows are built in KNIME by sequentially connecting different nodes where each node is dedicated to performing a specific task based on the results of the preceding nodes. The KNIME Hub (https://hub.knime.com/) contains thousands of workflows ready to be applied to data sets. KNIME provides extensions and nodes for integrating many projects, systems, web services, and databases. For example, it supports the integration of Python (https://www.python.org/), Apache Spark (https://spark.apache.org/), MongoDB (https://www.mongodb.com/), and many cloud storage systems. KNIME Server is commercial software that enables collaboration between users and supports automated and distributed executions of workflows, deployment options, workflow management, and monitoring functionalities. Thanks to its ease of use and high scalability (distributed executors on the KNIME Server, big data, and cloud integration), KNIME software is used by hundreds of companies in numerous industries. PM4KNIME integrates process mining algorithms implemented in ProM into KNIME. This allows for creating analytics workflows that combine process mining with the other types of data science techniques available in KNIME in a scalable, user-friendly environment. Instruction on how to install PM4KNIME can be found under https://pm4knime.github.io/userDoc/guides/ installation. 2. Tool Overview In this section, we provide an overview of PM4KNIME. A screen recording corresponding to this overview is available under https://pm4knime.github.io/userDoc/guides/demo. 2.1. KNIME Workflows KNIME stores data in table-based objects called DataTables. Algorithms in KNIME are imple- mented as nodes. A node can have multiple input ports, output ports, views, and dialogs. The input ports should be connected to the input objects required for executing the underlying algorithm of the node. The dialogs are used to set the parameters of the algorithm. After the successful termination of an algorithm, the output objects can be accessed through the output ports. A workflow in KNIME is a directed graph connecting multiple nodes through their input and output ports. 2.2. Functionalities PM4KNIME currently supports:1 • Importing and exporting different objects (e.g., Petri net). • Exploring event logs (e.g., dotted chart). • Converting objects (e.g., XES logs into DataTables). • Event data manipulation (e.g., filtering). • Process discovery (e.g., inductive miner). • Conformance checking (e.g., alignment-based replay). 1 See https://hub.knime.com/pm4knime/extensions/org.pm4knime.feature/latest for a complete overview of all available functionalities. 66 Figure 1: Example process mining workflow in KNIME. • JavaScript visualizations (e.g., for Petri nets). Most implemented nodes work on DataTables. Internally, we wrap around the implementa- tions of the underlying process mining techniques from the plugins available in ProM. 2.3. Example Workflows Figure 1 shows a typical workflow in the field of process mining. It contains nodes for importing data from a CSV file, preprocessing, process discovery, JavaScript visualizations of the discovered models, model and data transformation, and conformance checking. We applied this workflow to a real-life data set that records the execution of a ticketing management process [4]. Further workflow examples are available on the KNIME Hub under: https://kni.me/s/VJqKc-EypN7Jkrl2. 2.4. Tool Novelty In [5], RapidProM was introduced as an extension of RapidMiner. It integrates process mining algorithms from ProM into the workflow-based platform RapidMiner. The idea of [5] is similar to our contribution, but PM4KNIME provides some features that differentiate it from RapidMiner. We adapted some process mining techniques not supported in RapidMiner (e.g., hybrid Petri net miner). Moreover, most implemented algorithms in PM4KNIME work on DataTables (not XES logs). We wrapped around the implementations of the underlying process mining techniques in ProM. In data science, data is often stored in table-based files (e.g., CSV files) that can be easily imported as DataTables in KNIME. Applying process mining algorithms directly on DataTables improves the time performance because KNIME uses powerful caching strategies that ensure high scalability when processing large DataTables [3]. The KNIME Server provides many valuable features for organizations, such as automated and distributed executions of workflows, deployment options, workflow management, and monitoring functionalities. PM4KNIME provides JavaScript visualizations for the different types of supported process models. This allows for building interactive web-based applications using the deployment options on the KNIME server. 67 Both RapidProM and PM4KNIME allow for saving workflows to be reused later. However, PM4KNIME additionally supports the serialization of intermediate results. Each node in a KNIME workflow processes its entire input data and permanently stores its output before forwarding it to the successor nodes. By saving a workflow, the settings of all nodes and all already generated (intermediate) objects are stored together with the workflow structure. Therefore, it is possible to stop the execution of a KNIME workflow at any node. The workflow can be modified and saved to be resumed later without needing to re-execute already executed nodes that are not affected by any modifications. For all implemented (intermediate) objects in PM4KNIME, we created internal importers and exporters to support the serialization of results. 3. Conclusion In this paper, we introduced the process mining extension of KNIME (PM4KNIME). PM4KNIME integrates process mining algorithms that are implemented in the academic process mining tool ProM into a workflow-based data science analytics platform that is widely used in industry. The process mining extension of KNIME supports many techniques for process discovery, conformance checking, event data manipulation, and visualization of process models. As future work, we aim at adapting further algorithms to work directly on DataTables instead of XES logs (e.g., conformance checking algorithms). Moreover, we aim at supporting more types of process models (e.g., BPMN models) and integrating more process mining algorithms from ProM and/or other academic tools like PM4Py (http://pm4py.org/). Acknowledgments The authors would like to thank Kefang Ding and Ralf Riesen for their contribution to PM4KN- IME. References [1] W. M. P. van der Aalst, Process Mining - Data Science in Action, Second Edition, Springer, 2016. URL: https://doi.org/10.1007/978-3-662-49851-4. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 6 2 - 4 9 8 5 1 - 4 . [2] B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, W. M. P. van der Aalst, The ProM Framework: A New Era in Process Mining Tool Support, in: G. Ciardo, P. Darondeau (Eds.), Applications and Theory of Petri Nets 2005, 26th International Conference, ICATPN 2005, Miami, USA, June 20-25, 2005, Proceedings, volume 3536 of Lecture Notes in Computer Science, Springer, 2005, pp. 444–454. URL: https://doi.org/10.1007/11494744_25. doi:1 0 . 1 0 0 7 / 1 1 4 9 4 7 4 4 \ _ 2 5 . [3] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, C. Sieb, K. Thiel, B. Wiswedel, KNIME: The Konstanz Information Miner, in: C. Preisach, H. Burkhardt, L. Schmidt-Thieme, R. Decker (Eds.), Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert- Ludwigs-Universität Freiburg, March 7-9, 2007, Studies in Classification, Data Analysis, 68 and Knowledge Organization, Springer, 2007, pp. 319–326. URL: https://doi.org/10.1007/ 978-3-540-78246-9_38. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 5 4 0 - 7 8 2 4 6 - 9 \ _ 3 8 . [4] M. Polato, Dataset belonging to the help desk log of an Italian Company (2017). URL: https://data.4tu.nl/articles/dataset/Dataset_belonging_to_the_help_desk_log_of_an_ Italian_Company/12675977. doi:1 0 . 4 1 2 1 / u u i d : 0 c 6 0 e d f 1 - 6 f 8 3 - 4 e 7 5 - 9 3 6 7 - 4 c 6 3 b 3 e 9 d 5 b b . [5] R. Mans, W. M. P. van der Aalst, H. M. W. Verbeek, Supporting Process Mining Workflows with RapidProM, in: L. Limonad, B. Weber (Eds.), Proceedings of the BPM Demo Sessions 2014 Co-located with the 12th International Conference on Business Process Management (BPM 2014), Eindhoven, The Netherlands, September 10, 2014, volume 1295 of CEUR Work- shop Proceedings, CEUR-WS.org, 2014, p. 56. URL: http://ceur-ws.org/Vol-1295/paper5.pdf. 69