=Paper=
{{Paper
|id=Vol-3758/paper-16
|storemode=property
|title=Rust4PM: A Versatile Process Mining Library for When Performance Matters
|pdfUrl=https://ceur-ws.org/Vol-3758/paper-16.pdf
|volume=Vol-3758
|authors=Aaron Küsters,Wil M.P. van der Aalst
|dblpUrl=https://dblp.org/rec/conf/bpm/KustersA24
}}
==Rust4PM: A Versatile Process Mining Library for When Performance Matters==
Rust4PM: A Versatile Process Mining Library for When Performance Matters Aaron Küsters1 , Wil M.P. van der Aalst1 1 Chair of Process and Data Science (PADS), RWTH Aachen University, Germany Abstract Rust4PM provides an open-source software library for process mining focused on performance. For instance, it supports parsing the most common data formats for standard and object-centric event logs (XES and OCEL 2.0) significantly faster than other available process mining tools. The library is written in the compiled, memory safe, programming language Rust, which focuses on performance and explicit error handling. As such, it is a good match for processing huge event data and other computationally expensive tasks or algorithms. Rust4PM aims to form a solid basis for new process mining software that emphasize execution speed or reliability, written in either Rust or another language (e.g., Python, Java, or JavaScript) via the use of appropriate language bindings. Keywords Process Mining, Event Data, XES Standard, Object-Centric Event Data, OCEL 2.0 Standard, Rust 1. Introduction Apart from commercial process mining software, such as Celonis or Fluxicon Disco, there are also a few open-source solutions available, like ProM, PM4Py, or bupaR. The ProM framework, first presented in [1] is implemented in Java and features a graphical user interface (GUI) and a plugin system, with a large ecosystem of available plugins. ProM also has limited support for automated tasks through a command line interface (CLI). PM4Py is a Python software library and was first introduced in 2019 [2]. Instead of using a plugin system, users can implement custom approaches in Python programs that use the functionality exposed by PM4Py. In [3], we presented an approach for implementing algorithms only once in Rust and ex- posing them via Java and Python bindings, making them available to both the PM4Py and ProM ecosystems, without any large re-implementation efforts. The initial implementation of prerequisites in Rust has since evolved into a standalone Rust library for process mining. In this paper, we introduce Rust4PM as a standalone process mining library project focused on performance. The project is available at https://github.com/aarkue/rust4pm. We first describe the features of Rust4PM in Section 2. Next, in Section 3, we compare the performance of the different event data parsers implemented in ProM, PM4Py, and Rust4PM. To show its versatility, we present four example applications developed using Rust4PM in Section 4, focusing on different architectures and features. Finally, we conclude this paper in Section 5. Proceedings of the Best BPM Dissertation Award, Doctoral Consortium, and Demonstrations & Resources Forum co-located with 22nd International Conference on Business Process Management (BPM 2024), Krakow, Poland, September 1st to 6th, 2024. $ kuesters@pads.rwth-aachen.de (A. Küsters); wvdaalst@pads.rwth-aachen.de (W. M.P. van der Aalst) 0009-0006-9195-5380 (A. Küsters); 0000-0002-0955-6940 (W. M.P. van der Aalst) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Features The main Rust4PM library is available at https://crates.io/crates/process_mining. It currently consists of four main modules, covering different functionalities. • The event_log module contains the data structure definitions for the internal represen- tation of traditional XES event logs. Additionally, it contains a full parser for the XES 2.0 standard, as well as export functionality for exporting event logs to XES. Moreover, there are case-streaming versions of the importer and exporter available, which allow processing cases of huge event logs without loading the full file in memory. • The ocel module contains data structures for the OCEL 2.0 standard for object-centric event data, as well as import functionality for OCEL 2.0 files in the XML or JSON format. • The petri_net module consists of data structures for Petri nets, as well as the ability to import and export basic Petri nets using the Petri Net Markup Language (PNML). • The alphappp module implements the Alpha+++ process discovery algorithm, which was the starting point for the Rust implementation, as introduced in [3]. Moreover, Rust4PM integrates well with other projects. For example, PM4Py optionally supports importing XES or OCEL 2.0 files via the Rust4PM-based importer rustxes (see [3]). 3. Performance Evaluation In this section, we present evaluation results for the performance of the XES and OCEL 2.0 XML parsers, as implemented in Rust4PM, PM4Py and (if available) ProM. For evaluation, we used a subset of the publicly available BPI Challenge XES event logs (see https://data.4tu.nl/search? search=BPI+Challenge) as well as OCEL 2.0 logs from https://www.ocel-standard.org/. BPI_Challenge_2017.xes BPI Challenge 2018.xes BPI_Challenge_2019.xes Speedup: 2.27× – 10.86× Speedup: 2.47× – 2.47× Speedup: 2.32× – 12.20× 35 70 40 30 Parse Duration [s] Parse Duration [s] Parse Duration [s] 60 25 50 30 20 40 20 15 30 10 20 10 5 10 0 0 0 PM4Py ProM Rust4PM PM4Py ProM Rust4PM PM4Py ProM Rust4PM XES Parser XES Parser XES Parser angular_github_commits_ocel.xml ContainerLogistics.xml order-management.xml Speedup: 16.48× Speedup: 32.76× Speedup: 41.94× 14 4 12 2.5 Parse Duration [s] Parse Duration [s] Parse Duration [s] 3.5 10 2 3 8 2.5 1.5 2 6 1 1.5 4 1 2 0.5 0.5 0 0 0 PM4Py Rust4PM PM4Py Rust4PM PM4Py Rust4PM OCEL2 XML Parser OCEL2 XML Parser OCEL2 XML Parser Figure 1: The total import durations for different XES and OCEL 2.0 files across the tested implementa- tions. All configurations were repeated 5 times and the observed standard deviation is included as an error bar. Missing bars indicate a failed import (e.g., because the program ran out of memory). The evaluation results are plotted in Figure 1. For all evaluated XES logs, the Rust4PM XES parser is at least two times faster on average than the other tested implementations. In comparison to PM4Py specifically, speedups of at least 10 times were observed for all evaluated XES event logs. For OCEL 2.0 XML, Rust4PM also significantly outperforms the PM4Py implementation by a factor of at least 15 for all evaluated files. As of now, no implementation of the new 2.0 version of the OCEL standard is available in ProM. 4. Use Cases: Example Projects and Architecture Recipes In this section, we present four example applications built using the main Rust4PM library as a base. Figure 2 shows screenshots of the tools. Each application is based on a different architecture and usage context. Additionally, each application focuses on a specific feature set of the Rust4PM library. Thus, these examples do not follow a common theme, but should instead demonstrate the versatility of using Rust4PM in different contexts and architectures. (a) log_strip (b) event_hours_analyzer (c) ocel_graph (d) petri_net_wasm Figure 2: The presented example applications using the Rust4PM library. Next, we will shortly describe each demo application, mentioning the used architecture and leveraged library features. All examples are also publicly available at https://github.com/aarkue/ rust4pm_demos, together with a demo video on Rust4PM as a whole, as well as these examples. 4.1. CLI for Stripping XES Attributes (log_strip) The Rust command-line-interface (CLI) program log_strip can strip or modify certain at- tributes from an input XES event log, resulting in a stripped output log. A typical application for such a tool lays in privacy-preserving process mining, as presented in [4]. In particular, this tool removes all top-level log attributes, all case level attributes besides concept:name and all event-level attributes apart from concept:name and time:timestamp. Additionally, the second and nanosecond portion of the event timestamp is removed (i.e., set to 0), reducing the timestamp precision. The tool uses the XES trace stream importer and exporter from Rust4PM. Thus, the CLI can also be used to modify event logs which are too large to fit in system memory. 4.2. Python Script for Data Visualization (event_hours_analyzer) event_hours_analyzer is a Python Jupyter notebook for visualizing the number of events per hour of the day based on an input event log. The computationally expensive parts, i.e., the import of the XES event log and the computation of the event counts, are implemented in Rust. This functionality is then exposed as a Python library, which is used in the Jupyter notebook. In this example, the XES importer and the event log data structures provided by the Rust4PM library are used. 4.3. Web Service for Constructing Graphs from OCED (ocel_graph) This demo application is implemented as a web server backend implemented in Rust, which uses Rust4PM, and an interactive web-based frontend. The backend allows loading OCEL 2.0 event data and constructing graphs based on an event or object in the input data. In particular, all event-to-object and object-to-object relationships inside the object-centric event log are recursively expanded, based on specified parameters (e.g., the maximal number of recursion steps). The resulting interactive graph contains the encountered events and objects of the OCEL as nodes and edges between nodes corresponding to the relationships in the OCEL. The backend utilizes the OCEL 2.0 XML or JSON import functionality of Rust4PM, as well as the OCEL 2.0 data structure representation to construct the graph. 4.4. Petri Net Editor with PNML and Discovery Support via WASM (petri_net_wasm) This example application is a basic client-side Petri net editor. While the main editor is im- plemented using web technologies, it leverages the PNML import and export functionality from Rust4PM. Additionally, it allows discovering Petri nets from XES event logs based on the implementation of the Alpha+++ process discovery algorithm and the XES parser in Rust4PM. For that, the corresponding functions are exposed via WebAssembly (WASM), which is a possible compilation target for Rust. In contrast to the web server example presented before, WASM runs directly in the web browser of the user. We include this example specifically to demonstrate running Rust4PM in the web browser, which allows users to try out or use a tool simply by visiting a website, without the need to install anything on their machine. 5. Conclusion In this paper, we introduced the performance-centric Rust4PM software project. Among other features, the main library supports importing the most common traditional and object-centric event data file formats, XES and OCEL 2.0. Through streaming XES traces, it also supports importing, processing and exporting huge XES event logs that would otherwise not fit into system memory. We evaluated the performance of the XES and OCEL 2.0 XML import parsers between two other popular open-source solutions, ProM and PM4Py, and observed significantly improvements in import speeds for all considered configurations, with speedups factors ranging from 2 to 40. To demonstrate the flexibility of Rust4PM, we presented four example applications, each with an own software architecture and feature focus. Maturity The main Rust4PM library was first published in December 2023 with rudimentary features, and received more than 25 version updates since then. In this time, it was downloaded more than 10,000 times in total, according to crates.io. There are around 40 software test cases included, which cover a large portion of the implemented features. However, as the project is still rather young, there will likely still be changes to the exposed API surface in the future. Future work There are still plenty of interesting features to be implemented in the library. For example, export support for OCEL 2.0, or the implementation of computationally expensive process mining algorithms, such as the computation of alignments. Furthermore, there are many possible advancements for the example applications presented in Section 4. References [1] B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, W. M. P. van der Aalst, The ProM Framework: A New Era in Process Mining Tool Support, in: G. Ciardo, P. Darondeau (Eds.), Applications and Theory of Petri Nets 2005, 26th International Conference, ICATPN 2005, Miami, USA, June 20-25, 2005, Proceedings, volume 3536 of Lecture Notes in Computer Science, Springer, 2005, pp. 444–454. URL: https://doi.org/10.1007/11494744_25. doi:10.1007/11494744\_25. [2] A. Berti, S. J. van Zelst, W. M. P. van der Aalst, Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science, CoRR abs/1905.06169 (2019). URL: http://arxiv.org/abs/1905.06169. arXiv:1905.06169. [3] A. Küsters, W. M. P. van der Aalst, Developing a High-Performance Process Mining Library with Java and Python Bindings in Rust, CoRR abs/2401.14149 (2024). URL: https://doi.org/ 10.48550/arXiv.2401.14149. doi:10.48550/ARXIV.2401.14149. arXiv:2401.14149. [4] M. Rafiei, W. M. P. van der Aalst, Privacy-Preserving Data Publishing in Process Mining, in: D. Fahland, C. Ghidini, J. Becker, M. Dumas (Eds.), Business Process Management Forum - BPM Forum 2020, Seville, Spain, September 13-18, 2020, Proceedings, volume 392 of Lecture Notes in Business Information Processing, Springer, 2020, pp. 122–138. URL: https://doi.org/10.1007/978-3-030-58638-6_8. doi:10.1007/978-3-030-58638-6\_8.