=Paper=
{{Paper
|id=Vol-3299/Paper22
|storemode=property
|title=Konekti: A Data Preparation Platform for Process Mining (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-3299/Paper22.pdf
|volume=Vol-3299
|authors=Lotte Vugs,Maarten van Asseldonk,Niek van Son
|dblpUrl=https://dblp.org/rec/conf/icpm/VugsAS22
}}
==Konekti: A Data Preparation Platform for Process Mining (Extended Abstract)==
<pdf width="1500px">https://ceur-ws.org/Vol-3299/Paper22.pdf</pdf>
<pre>
Konekti: A Data Preparation Platform for Process
Mining (Extended Abstract)
Lotte Vugs1 , Maarten van Asseldonk1 and Niek van Son1
1
    Waves Process Intelligence, Eindhoven, The Netherlands


                                         Abstract
                                         Data preprocessing frequently consumes a large proportion of the time spent on process mining projects.
                                         Although several commercial process mining software solutions offer data preparation scripts, these
                                         typically cover only a narrow set of process perspectives and source systems. When a particular
                                         perspective or system is not supported, practitioners usually need to fall back on ad-hoc ETL solutions.
                                         This paper introduces Konekti: a tool dedicated to offering a structured approach to prepare event data
                                         for process mining.

                                         Keywords
                                         Process Mining, Data Preparation, Data Transformation, Object-Centric, System-agnostic


1. Introduction
Process mining consists of a set of methods, tools and techniques to discover process models,
analyze performance, check compliance and compare variants using event data recorded in
information systems [1]. The key input for process mining is an event log, a collection of events
corresponding to a specific process. In practice, event logs are usually not readily available.
Instead, event data needs to be selected and extracted from a set of tables (or systems), and then
cleaned and transformed in order to create an event log suitable for analysis. Depending on
the complexity of the source data model and process to be discovered, up to 80% of a project
time span is used for data preprocessing and log creation, leaving 20% for the process analytical
work [2].
   Thus far, most scientific research has focused on data analysis rather than data preprocessing
[2]. The preprocessing tools that do stem from scientific research are usually prototypes and do
not seem to be used widely by practitioners: none of them were mentioned as tools used for
data pre-processing in the recent XES survey [3], nor were they mentioned in the interviews
with practitioners we held as part of a user tests. At the same time, commercial process
mining software primarily offers connectors: specific procedures that define for a particular
source system and process perspective how to extract relevant data, and which additional
transformations should be applied [4]. Practitioners who wish to preprocess event data from a
source and/or process for which their software vendor has no connector usually need to fall
back on ad-hoc solutions, like SQL editor tools.

ICPM 2022: International Conference on Process Mining, October 24-28, 2022, Bolzano, IT
†
     These authors contributed equally.
Envelope-Open lotte@wavespi.nl (L. Vugs); maarten@wavespi.nl (M. v. Asseldonk); niek@wavespi.nl (N. v. Son)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                         104
  This paper introduces Konekti, a no-code tool for preprocessing event data for process mining.
Konekti supports practitioners in preprocessing their process data. Its output, an event log, can
be directly used as input for existing process mining tools, making it an ideal tool to be used in
tandem with existing solutions.


2. Related Work
Several academic tools focus on extracting event data from operational systems and transforming
it into event logs. These include the EVS Model Builder [5], Xtract [6], and Eventifier [7]. An
underlying assumption of existing tools is that there is a single notion of a case. However,
processes in ERP systems are usually organized around multiple objects having one-to-many
and many-to-many relations. Often, more than one object could potentially serve as a case
notion. Which object is the best candidate depends on the perspective of interest, which can
change over time.
    Recently, more research has been done on how to extract and transform object-centric data in
a way that offers more flexibility in changing the case notion. A noteworthy contribution in
this field is the Onprom tool [8]. Furthermore, the Object-Centric Event Data standard [9] is
currently being developed. Although designed independently, Konekti’s meta-model mimics
the proposed standard almost completely, apart from some semantic details.


3. Tool Outline
Konekti is a tool dedicated to offer a structured approach to prepare event data. Konekti offers
some key benefits over the tools currently in use by practitioners:

       • System-agnostic: commercial process mining tools focus on data preparation scripts for
         a specific process perspective in a specific source system. Konekti focuses on simplifying
         data preparation in case the available scripts do not match with the process perspective
         and/or source systems in scope.
       • Built for practitioners; in contrast to academic prototypes, Konekti is designed to be
         used in industrial settings. Therefore, in developing the platform, a strong emphasis is
         put on user-friendliness.
       • No-code: in contrast to SQL editor tools, and most tools stemming from academic
         research, Konekti is completely no-code. This removes the hurdle of learning a new
         coding language, and stimulates shareability across users with different levels of coding
         skills.
       • Object-centric: rather than assuming the case notion is fixed at the start of the data
         preparation, Konekti implements an object-centric data model. From this data model,
         different event logs can be exported that use different case notions.

A screencast demonstrating the tool is available online1 .


1
    https://youtu.be/fjjexHpl9KM


                                                 105
Main Components A connector is the link to a specific database schema from which
a user can extract event data. A user can set up a connector by specifying the connection
details in a form.
   In the event store, the user identifies which data needs to be extracted and, simultaneously,
maps this data to Konekti’s object-centric data model. Konekti’s meta-model is built around
two main artifacts. The first of them is the object, representing a type of information carrier in a
process. The second main artifact is the activity, representing a mutation of an object. Objects
can be related to other objects using object-object relations. Similarly, objects can be related to
activities using object-activity relations.
   Configuration of the artifacts in the event store is done using forms rather than scripts (i.e.
no-code). After configuration, a query is automatically generated and a materialized view of the
data is stored in Konekti’s database.
   In the export builder, the user configures event logs from the data in the event store.
Depending on the perspective of interest, the user selects which object is used as case notion.
In the background, a breadth-first search is executed over all object-object relations to generate
a list of related objects. From the list of related objects, the user selects the objects to include.
Then, the user selects the activities to include; here the user can select activities that have an
object-activity relation to the case object or to any of the related objects selected previously.
Third, the user can select which attributes to include. Event attributes can be selected from
activity attributes. Case attributes can be selected from the case object. After configuration, a
single event log table is materialized and can be downloaded as a CSV file.

Limitations Konekti will be released in the beginning of 2023; the version described
in this paper is a prototype under active development. In this version, data can only be
extracted from PostgreSQL and MSSQL databases, making it more difficult to apply Konekti
to data stored in other databases. Also, the process of setting up an event store still contains
some unnecessarily repetitive tasks. Furthermore, there are no case studies done with Konekti
yet. Test databases from several ERP systems have been used to test the tool, but it is not yet
implemented in a practical setting. Additionally, the features to transform the data during
preprocessing are still limited. To work around this, users can pre- or post-process the event
data before/after loading it in Konekti.


4. Future Work
We are planning to launch Konekti in the beginning of 2023. Until the launch, we will focus
on implementing Konekti in practice, and expanding its features (e.g., extending the connector
types and export formats and including activity aggregation). Furthermore, we’re planning to
augment and improve the platform based on the feedback gathered from this conference and
alignment with the OCED working group.
   Once the foundations for event data preprocessing are laid, we plan to focus on enabling and
promoting the exchange of data preprocessing knowledge within the process mining community.
To enable that, all Konekti users agree to share the steps taken in the data preprocessing will be
eligible for a free license.


                                                106
5. Conclusion
The event log is the fundamental input for every process mining analysis, yet creating one can
be difficult and time consuming. Practitioners who wish to analyze a process from a source
system that is not supported by their process mining tool of choice are often forced to fall back
on ad-hoc solutions. This paper introduces Konekti: a user-friendly, no-code platform that
supports practitioners in data preprocessing, regardless of the systems included in the scope of
their analysis.


Acknowledgments
We would like to thank Nathan Cassee and Hilde Weerts for their comments on an earlier
version of this manuscript.


References
[1] W. Van der Aalst, Process mining: Data science in action, 2016. doi:1 0 . 1 0 0 7 /
    978- 3- 662- 49851- 4.
[2] R. Accorsi, J. Lebherz, A practitioner’s view on process mining adoption, event log en-
    gineering and data challenges, in: Process Mining Handbook, Springer, Cham, 2022, pp.
    212–240.
[3] M. T. Wynn, J. Lebherz, W. M. van der Aalst, R. Accorsi, C. Di Ciccio, L. Jayarathna,
    H. Verbeek, Rethinking the input for process mining: insights from the xes survey and
    workshop, in: International Conference on Process Mining, Springer, 2022, pp. 3–16.
[4] J. De Weerdt, M. T. Wynn, Foundations of Process Event Data, Springer International
    Publishing, Cham, 2022, pp. 193–211. URL: https://doi.org/10.1007/978-3-031-08848-3_6.
    doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 0 3 1 - 0 8 8 4 8 - 3 _ 6 .
[5] J. E. Ingvaldsen, J. A. Gulla, Preprocessing support for large scale process mining of sap
    transactions, in: International Conference on Business process management, Springer, 2007,
    pp. 30–41.
[6] E. H. Nooijen, B. F. v. Dongen, D. Fahland, Automatic discovery of data-centric and artifact-
    centric processes, in: International conference on business process management, Springer,
    2012, pp. 316–327.
[7] C. Rodrıguez, R. Engel, G. Kostoska, F. Daniel, F. Casati, M. Aimar, et al., Eventifier: Extract-
    ing process execution logs from operational databases, Proceedings of the demonstration
    track of BPM 940 (2012) 17–22.
[8] D. Calvanese, T. E. Kalayci, M. Montali, S. Tinella, Ontology-based data access for extracting
    event logs from legacy data: the onprom tool and methodology, in: International Conference
    on Business Information Systems, Springer, 2017, pp. 220–236.
[9] I. T. F. o. P. M. OCED working group, Object-centric event data: A meta model, Internal
    Document (2022).


                                                107

</pre>