<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Konekti: A Data Preparation Platform for Process Mining (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lotte Vugs</string-name>
          <email>lotte@wavespi.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maarten van Asseldonk</string-name>
          <email>maarten@wavespi.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Niek van Son</string-name>
          <email>niek@wavespi.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Process Mining, Data Preparation, Data Transformation, Object-Centric, System-agnostic</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Waves Process Intelligence</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>104</fpage>
      <lpage>107</lpage>
      <abstract>
        <p>Data preprocessing frequently consumes a large proportion of the time spent on process mining projects. Although several commercial process mining software solutions ofer data preparation scripts, these typically cover only a narrow set of process perspectives and source systems. perspective or system is not supported, practitioners usually need to fall back on ad-hoc ETL solutions. This paper introduces Konekti: a tool dedicated to ofering a structured approach to prepare event data for process mining.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Process mining consists of a set of methods, tools and techniques to discover process models,
analyze performance, check compliance and compare variants using event data recorded in
information systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The key input for process mining is an event log, a collection of events
corresponding to a specific process. In practice, event logs are usually not readily available.
Instead, event data needs to be selected and extracted from a set of tables (or systems), and then
cleaned and transformed in order to create an event log suitable for analysis. Depending on
the complexity of the source data model and process to be discovered, up to 80% of a project
time span is used for data preprocessing and log creation, leaving 20% for the process analytical
work [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Thus far, most scientific research has focused on data analysis rather than data preprocessing
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The preprocessing tools that do stem from scientific research are usually prototypes and do
not seem to be used widely by practitioners: none of them were mentioned as tools used for
data pre-processing in the recent XES survey [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], nor were they mentioned in the interviews
with practitioners we held as part of a user tests. At the same time, commercial process
mining software primarily ofers
      </p>
      <p>
        connectors: specific procedures that define for a particular
source system and process perspective how to extract relevant data, and which additional
transformations should be applied [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Practitioners who wish to preprocess event data from a
source and/or process for which their software vendor has no connector usually need to fall
back on ad-hoc solutions, like SQL editor tools.
†These authors contributed equally.
CEUR
      </p>
      <p>This paper introduces Konekti, a no-code tool for preprocessing event data for process mining.
Konekti supports practitioners in preprocessing their process data. Its output, an event log, can
be directly used as input for existing process mining tools, making it an ideal tool to be used in
tandem with existing solutions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Several academic tools focus on extracting event data from operational systems and transforming
it into event logs. These include the EVS Model Builder [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Xtract [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and Eventifier [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. An
underlying assumption of existing tools is that there is a single notion of a case. However,
processes in ERP systems are usually organized around multiple objects having one-to-many
and many-to-many relations. Often, more than one object could potentially serve as a case
notion. Which object is the best candidate depends on the perspective of interest, which can
change over time.
      </p>
      <p>
        Recently, more research has been done on how to extract and transform object-centric data in
a way that ofers more flexibility in changing the case notion. A noteworthy contribution in
this field is the Onprom tool [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Furthermore, the Object-Centric Event Data standard [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is
currently being developed. Although designed independently, Konekti’s meta-model mimics
the proposed standard almost completely, apart from some semantic details.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Tool Outline</title>
      <p>Konekti is a tool dedicated to ofer a structured approach to prepare event data. Konekti ofers
some key benefits over the tools currently in use by practitioners:
• System-agnostic: commercial process mining tools focus on data preparation scripts for
a specific process perspective in a specific source system. Konekti focuses on simplifying
data preparation in case the available scripts do not match with the process perspective
and/or source systems in scope.
• Built for practitioners; in contrast to academic prototypes, Konekti is designed to be
used in industrial settings. Therefore, in developing the platform, a strong emphasis is
put on user-friendliness.
• No-code: in contrast to SQL editor tools, and most tools stemming from academic
research, Konekti is completely no-code. This removes the hurdle of learning a new
coding language, and stimulates shareability across users with diferent levels of coding
skills.
• Object-centric: rather than assuming the case notion is fixed at the start of the data
preparation, Konekti implements an object-centric data model. From this data model,
diferent event logs can be exported that use diferent case notions.</p>
      <p>A screencast demonstrating the tool is available online1.
Main Components A connector is the link to a specific database schema from which
a user can extract event data. A user can set up a connector by specifying the connection
details in a form.</p>
      <p>In the event store, the user identifies which data needs to be extracted and, simultaneously,
maps this data to Konekti’s object-centric data model. Konekti’s meta-model is built around
two main artifacts. The first of them is the object, representing a type of information carrier in a
process. The second main artifact is the activity, representing a mutation of an object. Objects
can be related to other objects using object-object relations. Similarly, objects can be related to
activities using object-activity relations.</p>
      <p>Configuration of the artifacts in the event store is done using forms rather than scripts (i.e.
no-code). After configuration, a query is automatically generated and a materialized view of the
data is stored in Konekti’s database.</p>
      <p>In the export builder, the user configures event logs from the data in the event store.
Depending on the perspective of interest, the user selects which object is used as case notion.
In the background, a breadth-first search is executed over all object-object relations to generate
a list of related objects. From the list of related objects, the user selects the objects to include.
Then, the user selects the activities to include; here the user can select activities that have an
object-activity relation to the case object or to any of the related objects selected previously.
Third, the user can select which attributes to include. Event attributes can be selected from
activity attributes. Case attributes can be selected from the case object. After configuration, a
single event log table is materialized and can be downloaded as a CSV file.</p>
      <p>Limitations Konekti will be released in the beginning of 2023; the version described
in this paper is a prototype under active development. In this version, data can only be
extracted from PostgreSQL and MSSQL databases, making it more dificult to apply Konekti
to data stored in other databases. Also, the process of setting up an event store still contains
some unnecessarily repetitive tasks. Furthermore, there are no case studies done with Konekti
yet. Test databases from several ERP systems have been used to test the tool, but it is not yet
implemented in a practical setting. Additionally, the features to transform the data during
preprocessing are still limited. To work around this, users can pre- or post-process the event
data before/after loading it in Konekti.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Future Work</title>
      <p>We are planning to launch Konekti in the beginning of 2023. Until the launch, we will focus
on implementing Konekti in practice, and expanding its features (e.g., extending the connector
types and export formats and including activity aggregation). Furthermore, we’re planning to
augment and improve the platform based on the feedback gathered from this conference and
alignment with the OCED working group.</p>
      <p>Once the foundations for event data preprocessing are laid, we plan to focus on enabling and
promoting the exchange of data preprocessing knowledge within the process mining community.
To enable that, all Konekti users agree to share the steps taken in the data preprocessing will be
eligible for a free license.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The event log is the fundamental input for every process mining analysis, yet creating one can
be dificult and time consuming. Practitioners who wish to analyze a process from a source
system that is not supported by their process mining tool of choice are often forced to fall back
on ad-hoc solutions. This paper introduces Konekti: a user-friendly, no-code platform that
supports practitioners in data preprocessing, regardless of the systems included in the scope of
their analysis.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We would like to thank Nathan Cassee and Hilde Weerts for their comments on an earlier
version of this manuscript.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. Van der Aalst</surname>
          </string-name>
          ,
          <source>Process mining: Data science in action, 2016. doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 6 6 2 - 4 9 8 5 1 - 4</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Accorsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lebherz</surname>
          </string-name>
          ,
          <article-title>A practitioner's view on process mining adoption, event log engineering and data challenges</article-title>
          ,
          <source>in: Process Mining Handbook</source>
          , Springer, Cham,
          <year>2022</year>
          , pp.
          <fpage>212</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Wynn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lebherz</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. van der Aalst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Accorsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Ciccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jayarathna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <article-title>Rethinking the input for process mining: insights from the xes survey and workshop</article-title>
          , in: International Conference on Process Mining, Springer,
          <year>2022</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>J. De Weerdt</surname>
          </string-name>
          , M. T. Wynn,
          <source>Foundations of Process Event Data</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>193</fpage>
          -
          <lpage>211</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -08848-
          <issue>3</issue>
          _6.
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 1 - 0 8 8 4 8 - 3</volume>
          _
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Ingvaldsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Gulla</surname>
          </string-name>
          ,
          <article-title>Preprocessing support for large scale process mining of sap transactions</article-title>
          , in: International Conference on Business process management, Springer,
          <year>2007</year>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Nooijen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F. v.</given-names>
            <surname>Dongen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <article-title>Automatic discovery of data-centric and artifactcentric processes</article-title>
          ,
          <source>in: International conference on business process management</source>
          , Springer,
          <year>2012</year>
          , pp.
          <fpage>316</fpage>
          -
          <lpage>327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rodrıguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kostoska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Daniel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Casati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aimar</surname>
          </string-name>
          , et al.,
          <article-title>Eventifier: Extracting process execution logs from operational databases</article-title>
          ,
          <source>Proceedings of the demonstration track of BPM 940</source>
          (
          <year>2012</year>
          )
          <fpage>17</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Kalayci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tinella</surname>
          </string-name>
          ,
          <article-title>Ontology-based data access for extracting event logs from legacy data: the onprom tool and methodology</article-title>
          ,
          <source>in: International Conference on Business Information Systems</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>220</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>I. T. F. o. P. M. OCED</surname>
          </string-name>
          <article-title>working group, Object-centric event data: A meta model</article-title>
          ,
          <source>Internal Document</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>