<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>With PromG⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ava Swevels</string-name>
          <email>a.j.e.swevels@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eva L. Klijn</string-name>
          <email>e.l.klijn@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dirk Fahland</string-name>
          <email>d.fahland@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Object-Centric Process Mining, Object-Centric Event Data, Event Knowledge Graphs, Neo4j</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>PromG is an extensible Python library for managing and enriching object-centric event data (OCED) and for developing object-centric process mining (OCPM) techniques. It does so by using Event Knowledge Graphs, which model process-related concepts as a property graph in a Neo4j database. The library automatically generates Cypher queries to transform, enhance, and manipulate object-centric event data, giving analysts a straightforward way to explore and analyze object-centric processes. To enable others to develop OCPM techniques, the library is available as a Python package on PyPi and has been tested with real-life examples.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Analysis of real-life processes with multiple interrelated objects has revealed the limitations
of traditional case-centric process mining techniques [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. As a result, classical process
mining techniques such as control-flow discovery
and conformance checking must be adapted,
and new techniques must be developed addressing the multi-object interactions of the process.
These techniques are collectively referred to as object-centric process mining (OCPM) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Some
techniques have already been proposed by academia [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8 ref9">5, 6, 7, 8, 9</xref>
        ] and process mining vendors
(notably MyInvenio/IBM and Celonis).
      </p>
      <p>However, an open-source ecosystem that enables development and application of OCPM
in the broader process mining community has yet to form. It should ofer extensible,
easyto-use functionality for (1) managing object-centric event data (OCED), e.g., import, storage,
preprocessing, export, (2) exploring OCED from various angles, (3) routine analysis of OCED,
e.g., discovery, performance, and (4) one-of analysis specific to a particular use case.</p>
      <p>Toward this goal, we developed the open-source Python library PromG which uses the Neo4j
graph DB system to store data and analysis in a multi-layered knowledge graph. PromG
implements a recent community proposal for standard OCED1 and provides standard functionality
for importing, managing, and analyzing OCED (by automatically generating queries against
Neo4j). Additionally, it allows users to script custom OCPM analyses and implement newly
⋆The research underlying this paper was supported by AutoTwin EU GA n. 101092021
CEUR
Workshop
Proceedings
developed OCPM techniques. By leveraging on industrial GUIs for Neo4j, we relieve analysts
and researchers of engineering eforts for interactively querying, exploring, and visualizing
OCED.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Overview and Design</title>
      <p>
        PromG is a Python library that realizes OCPM by using a Neo4j graph database as data store.
Its architecture is illustrated in Fig. 1. The Neo4j database stores OCED and process mining
analysis results in multiple layers of an Event Knowledge Graph [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a specific labeled property
graph, that describes (qualified) relations between events, objects, relations, and their attributes
(over time).
      </p>
      <p>PromG translates process mining tasks into Cypher queries that are run against the Neo4j
instance. It consists of modules that capture the logic to store and analyze the data, and core
functionalities that provide a query library, a database connection to the Neo4j instance and the
data schema. The latter is implemented in the core, as Neo4j (or any graph database) lacks a
schema implementation.</p>
      <p>Users can build a process mining analysis using existing modules. Additionally, since the data
is stored in a Neo4j instance, it can be accessed through Cypher queries and industrial GUIs,
allowing further processing, exploration, and analysis to be built on top of PromG. Therefore, we
provide users with a template to create their own modules that interact with the core features,
thus enabling them to realize their own OCPM analysis techniques.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Functionalities Available</title>
      <p>
        While PromG is designed to be easily extended with additional features, we discuss the current
capabilities along the currently available layers, allowing users to take advantage of the tool
immediately.
(a) Raw Records to OCED Event Layer. The OCED-PG module enables the automatic import
of legacy data records as nodes in raw record layer (at the “bottom” of the graph). Based on
a user-provided semantic header (a JSON document describing the data’s domain semantics),
OCED-PG generates queries that automatically transform the raw record nodes into nodes of
related events, objects, and attribute forming the domain-level event layer in OCED format [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ];
each node of the event layer is linked to the nodes of the raw record layer it originates from.
(b) Object-path inference. Per object chosen by the user, OCED-PG infers the directly-follows (df)
path of events per object (enhancing the event layer), resulting in a partial order over all events
that can be analyzed [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
(c) Event Layer to Process Model layer. The process discovery module enables the automated
discovery of object-centric process models as multi-object DFGs [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The user specifies activity
features and objects (or relations) for which the model should be discovered, PromG generates
queries that aggregate event nodes and df-relations of the event layer into activity nodes and
lfow relations per object together – forming a process model layer. Each activity node is linked
to the event nodes in the event layer it models.
(d) Task Layer. PromG supports OCED analysis beyond classical OCPM use cases. The task
identification module infers df-paths per resource, uses these to detects sub-graphs where a
resource continuously worked on related objects. Queries then abstract the entire event layer
into a task layer by aggregating sub-graphs into task execution nodes (linked to the underlying
events); giving insights into how actors collaborate across executions [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Fig. 2 visualizes
the interconnected layers on BPIC’17: a task instance node (purple) linked to the underlying
event nodes (green) along their DF-paths, and how (some) events link to the multi-object DFG
(blue/orange nodes) of BPIC’17.
(e) Custom Modules. We provide a template for users to create their own module that generates
queries against Neo4j, enabling user to create custom routine and one-of analyses that enrich
existing layers or introduce new layers. Through the template architecture, routine analysis
modules can be included in PromG facilitating open-source contributions.
      </p>
    </sec>
    <sec id="sec-5">
      <title>4. Installation, Usage, and Maturity</title>
      <p>The PromG library is hosted on PyPi2 and open-source3 with example analyses, a demo video
and documentation. PromG can be used in any Python project as long as a Neo4j instance4 (with
the APOC plugin5 installed) is available. PromG provides example projects for constructing
EKGs of 5 public real-life event logs of diferent sizes (BPIC14, BPIC15, BPIC16, BPIC17, BPIC19).
Graph construction is a one-time operation that depends on the number of relationships to
construct [12, Tab.4]. Improving PromG query performance is planned future work.</p>
      <p>
        PromG’s approach and queries have been used in developing custom analyses in multiple
industrial case studies in baggage handling systems [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], semiconductor [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and ship
manufacturing [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and configuration management [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] with consistently positive feedback that the
graph-based approach enables insights and analytics not obtainable previously. Incorporating
relevant analysis functions into PromG is planned future work.
      </p>
    </sec>
    <sec id="sec-6">
      <title>5. Comparison to Related Software</title>
      <p>
        Next to closed-source implementations of OCPM, only the open-source Python library OCPA [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
addresses the same objective as PromG. OCPA currently ofers more analytics functionality
than PromG, and serves as “backbone” for the GUI-based analysis tools OCPM [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and OC [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>PromG’s strengths lie in the multi-layered Event Knowledge Graph (EKG) within a
standardized data store (Neo4j): the EKG implements standard OCED with domain semantics; the
extensible layers persist analysis results linked to the source data (see Fig. 2); Neo4j’s query
language Cypher and GUIs enables advanced, interactive data exploration and visualization
crucial for OCPM analysis.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion</title>
      <p>PromG is an open-source Python library designed to manage and explore OCED and to perform
OCPM analyses. Although its current functionality is limited compared to some academic
counterparts, PromG’s architecture prioritizes ease of extension and future development, positioning
it as a valuable tool in the growing field of OCED and OCPM.</p>
      <p>
        Particularly, PromG’s multi-layered knowledge graph promotes the development of a number
of extensions: next to realizing further OCPM capabilities [
        <xref ref-type="bibr" rid="ref8 ref9">9, 8</xref>
        ] an inference engine for inferring
missing or latent information [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] building on an integration of event data with system design
2https://pypi.org/project/promg/
3https://github.com/PromG-dev
4https://neo4j.com/product/neo4j-graph-database/
5https://neo4j.com/labs/apoc/
and context data [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]; analysis of actor behavior and organizational routines [
        <xref ref-type="bibr" rid="ref11 ref15">11, 15</xref>
        ]; and
detecting emergent behavior and its propagation across cases [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Object-centric process mining: Dealing with divergence and convergence in event data</article-title>
          ,
          <source>in: SEFM</source>
          <year>2019</year>
          , volume
          <volume>11724</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fournier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Limonad</surname>
          </string-name>
          , et al.,
          <article-title>AI-augmented business process management systems: A research manifesto</article-title>
          ,
          <source>ACM Trans. Manag. Inf. Syst</source>
          .
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <volume>11</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          :
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <article-title>Process mining over multiple behavioral dimensions with event knowledge graphs</article-title>
          ,
          <source>in: Process Mining Handbook</source>
          , volume
          <volume>448</volume>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>274</fpage>
          -
          <lpage>319</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Twin transitions powered by event data - using object-centric process mining to make processes digital and sustainable</article-title>
          ,
          <source>in: ATAED</source>
          <year>2023</year>
          , volume
          <volume>3424</volume>
          <source>of CEUR-WS.org</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nagelkerke</surname>
          </string-name>
          , D. van de Wiel, D. Fahland,
          <article-title>Discovering interacting artifacts from ERP systems</article-title>
          ,
          <source>IEEE Trans. Serv. Comput</source>
          .
          <volume>8</volume>
          (
          <year>2015</year>
          )
          <fpage>861</fpage>
          -
          <lpage>873</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , A. Berti,
          <article-title>Discovering object-centric petri nets</article-title>
          ,
          <source>Fundam. Informaticae</source>
          <volume>175</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          , W. M. van der Aalst,
          <article-title>Oc-pm: analyzing object-centric event logs and process models</article-title>
          ,
          <source>International Journal on Software Tools for Technology Transfer</source>
          <volume>25</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Adams</surname>
          </string-name>
          , W. M. van der Aalst, Oc :
          <article-title>Object-centric process insights</article-title>
          ,
          <source>in: Applications and Theory of Petri Nets</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] ocpa: A python library for object-centric process analysis</article-title>
          ,
          <source>Software Impacts</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>100438</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Swevels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montali</surname>
          </string-name>
          ,
          <article-title>Implementing object-centric event data models in event knowledge graphs, in: Process Mining Workshops</article-title>
          .
          <source>ICPM 2023, Lecture Notes in Business Information Processing</source>
          ,
          <year>2023</year>
          . Accepted, to appear.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Klijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <article-title>Classifying and detecting task executions and routines in processes using event graphs</article-title>
          ,
          <source>in: BPM'21 Forum</source>
          , volume
          <volume>427</volume>
          <source>of LNBIP</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>212</fpage>
          -
          <lpage>229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Esser</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Fahland, Multi-dimensional event data in graph databases</article-title>
          ,
          <source>Journal on Data Semantics</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <article-title>Using event knowledge graphs to model multi-dimensional dynamics in a baggage handling system</article-title>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Swevels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <article-title>Inferring missing entity identifiers from context using event knowledge graphs</article-title>
          ,
          <source>in: BPM</source>
          <year>2023</year>
          , volume
          <volume>14159</volume>
          <source>of LNCS</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Event graph model discovery for waiting time and workflow analysis in damen's process</article-title>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Marangoz</surname>
          </string-name>
          ,
          <article-title>Capturing multi-dimensional dynamics in a configuration management process through event knowledge graphs</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bakullari</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. van Thoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>The interplay between high-level problems and the process instances that give rise to them</article-title>
          , in: BPM 2023 Forum, volume
          <volume>490</volume>
          <source>of LNBIP</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>