=Paper=
{{Paper
|id=Vol-3730/keynote
|storemode=property
|title=Lifting Process Discovery and Conformance
  Checking to the Next Level: A General Approach to Object-Centric
  Process Mining (Invited Talk)
|pdfUrl=https://ceur-ws.org/Vol-3730/keynote.pdf
|volume=Vol-3730
|authors=Wil M.P. van der Aalst
|dblpUrl=https://dblp.org/rec/conf/apn/Aalst24
}}
==Lifting Process Discovery and Conformance
  Checking to the Next Level: A General Approach to Object-Centric
  Process Mining (Invited Talk)==
<pdf width="1500px">https://ceur-ws.org/Vol-3730/keynote.pdf</pdf>
<pre>
                                Lifting Process Discovery and Conformance Checking
                                to the Next Level: A General Approach to
                                Object-Centric Process Mining
                                    Wil M.P. van der Aalst1,2
                                1
                                    Process and Data Science, RWTH Aachen University, D-52074 Aachen, Germany
                                2
                                    Celonis, Theresienstraße 6, D-80333 München, Germany


                                              Abstract
                                              Traditional process mining approaches are case-centric, i.e., it is assumed that each event refers to a case,
                                              an activity, and a timestamp. However, in reality, events may refer to multiple objects of different types
                                              (instead of a single case). This simplifying assumption can be motivated by the fact that most process
                                              modeling notations are also case-centric, e.g., workflow nets, UML activity diagrams, BPMN models,
                                              and directly-follows graphs, all describe life-cycles of individual cases. However, as the process-mining
                                              discipline matures, we want to drop this assumption and better align event data and process models with
                                              the actual processes and the data stored in information systems. This explains the interest in Object-
                                              Centric Process Mining (OCPM). The significance of the transition from case-centric to process-centric
                                              is comparable to the transition from classical Petri nets to Colored Petri Nets (CPNs) and the transition
                                              from two-dimensional images (e.g., an X-ray) to three-dimensional images (e.g., a full-body MRI). This
                                              extended abstract shows how traditional techniques for process discovery and conformance checking can
                                              be lifted from case-centric to object-centric. We provide a generic framework that allows us to leverage
                                              traditional case-centric process mining techniques. This provides baseline approaches for object-centric
                                              process discovery and conformance checking.

                                              Keywords
                                              Process Mining, Object-Centric Process Mining, Process Discovery, Conformance Checking, Petri Nets,
                                              Object-Centric Event Data


                                1. Introduction
                                Object-Centric Process Mining (OCPM) aims to discover and analyze processes starting from
                                Object-Centric Event Data (OCED) [1, 2]. Traditional case-centric process mining allows for only
                                one type of objects (called cases) and assumes that each event refers to precisely one object. In
                                OCPM, there can be multiple object types, objects may be related, and one event may refer to
                                any number of objects.
                                   Figure 1 introduces basic process mining concepts. The right-hand side shows a small
                                fragment of a traditional event log where each event refers to one case, an activity, and a
                                timestamp. A case can be seen as a process instance consisting of events that are ordered using
                                the timestamps and labeled using the activity attribute. Events may have many more attributes,

                                International Workshop on Petri Nets and Software Engineering (PNSE 2024)
                                $ wvdaalst@pads.rwth-aachen.de ( Wil M.P. van der Aalst)
                                 https://vdaalst.com/ ( Wil M.P. van der Aalst)
                                 0000-0002-0955-6940 ( Wil M.P. van der Aalst)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: A traditional event log (right) and the different types of process mining: (0) extract, (1) discover,
(2) check, (3) predict, and (4) act (left).


but the three attributes shown in Figure 1 are sufficient to discover case-centric process models,
e.g., a classical Petri net (typically a workflow net with a clear start and end), a Directly-Follows
Graph (DFG), a BPMN (Business Process Modeling Notation) model, or a UML activity diagram.
   Currently, there are over 50 commercial process mining tools, all able to automatically discover
process models using such event data [3, 4]. The high-end tools offer not only process discovery,
but support all the tasks depicted in Figure 1 (left), i.e., also the extraction of event data from
source systems such as SAP, Oracle, ServiceNow, SalesForce, etc., conformance checking to
compare the real and modeled behavior, prediction of performance and conformance measures,
and automatically triggering actions to improve processes based on process-mining diagnostics.
   Although process mining is widely adopted (especially in Europe) and has proven to help
organizations improve their processes, it is evident that the single-case assumption is severely
limiting the scope of analysis and leads to distortions such as convergence and divergence [1, 2].
Most processes involve many interacting and related objects (e.g., orders, items, customers,
suppliers, machines, etc.). Therefore, Object-Centric Process Mining (OCPM) is in focus, both
in research and among tool vendors. For example, as we will show, the new Celonis process
mining platform is completely based on OCPM.


Figure 2: Case-centric event data (left) versus object-centric event data (right). The object-centric
meta-model on the right adds Event-to-Object (E2O) and Object-to-Object (O2O) relations [5].


   Figure 2 shows two meta-models compactly showing the differences between case-centric
(left) and object-centric (right). The case-centric meta-model on the left represents the classical
view that each event refers to precisely one case. The object-centric meta-model on the right
uses objects instead of cases, and allows for arbitrary Event-to-Object (E2O) and Object-to-Object
(O2O) relations. These relations can also be qualified. Note that one event can have many
objects and one object may be involved in many events. Objects and events are both typed and
may have additional attributes. Often, we refer to an event type as the activity. In the remainder,
we use the terms “event type” and “activity” interchangeably.
  The rest of this extended abstract is organized as follows. Section 2 discusses Object-Centric
Process Discovery (OCPD) and Section 3 discusses Object-Centric Conformance Checking
(OCCC). Example implementations are briefly described in Section 4, followed by a discussion
and conclusion (Section 5).


2. Object-Centric Process Discovery
Since the turn of the century, many process discovery techniques have been developed to
automatically learn representations such as Petri nets, DFGs, and BPMN models from event
data. This is challenging task because the input is just a sample of possible behaviors (i.e., only
positive examples and incomplete). Process models with loops describe infinitely many possible
traces, and even models without loops (but with concurrency) may have an exponential number
of states and a factorial number of traces. Therefore, even for large event logs, one cannot
assume that “what did not happen, cannot happen”.
   Discovery approaches can be classified into two main categories: bottom-up process discovery
approaches, including the Alpha algorithm and region-based techniques [6, 7, 8, 9, 10, 11, 12],
and top-down process discovery approaches, such as inductive mining methods [13, 14, 15]. For a
comprehensive review of process discovery techniques, refer to [16].
   All of the mentioned approaches assume that each event refers to precisely one case. As a
result, each case refers to a sequence of activities, and for process discovery event data can
be reduced to a multiset of activity sequences (i.e., traces). We would like to leverage existing
techniques for case-centric process discovery and discover object-centric process models. The
only assumption that we make is that case-centric process discovery produces process models
where each activity is unique, i.e., it is not allowed to have the same activity at two places in the
process model. Very few process discovery techniques produce duplicate activities violating this
assumption. An exception is process discovery using state-based regions with label splitting
[10, 11]. We do allow for techniques that discover silent activities (i.e., skips) and gateways.
This is not a problem for the approach described here, because the different object flows are
only synchronized in events that correspond to unique activities. Therefore, we are able to
reuse most of the existing discovery approaches.
   Next, we describe a general approach to discovering object-centric process models.

   1. Selection: Given object-centric event data according to the object-centric meta-model
      presented before (Figure 2), select the object types and event types that are in scope. It is
      also possible to make additional, more fine-grained selections for the selected object types.
      For example, select subsets of objects based on some filter criterion (e.g., remove all orders
      placed before a certain start date). It is also possible to use qualifiers or event-object-type
      combinations to filter E2O relations. The resulting selection of objects and events forms
      again object-centric event data in the sense of the meta-model on the right-hand-side in
      Figure 2.
   2. Flatten: For each object type 𝑂𝑇 , create a traditional event log 𝐿𝑂𝑇 by flattening the
      event data. Given an object type 𝑂𝑇 , consider all events including at least one object
      of type OT. For each of these events, create an event in 𝐿𝑂𝑇 for each object of the
      selected type. For example, if a place order event refers to one order object and five item
      objects, then there will be one corresponding event in the event log for orders and five
      corresponding events in the event log for items. The result of this step is a traditional
      event log 𝐿𝑂𝑇 for each object type selected.
   3. Discover: For each object type 𝑂𝑇 , use the event log 𝐿𝑂𝑇 and a traditional process
      discovery technique to discover a process model per object type. Any process discovery
      technique that produces unique activities can be used. The result is a process model 𝑀𝑂𝑇
      per object type 𝑂𝑇 . Note that due to flattening, multiple process models may refer to the
      same activities, but the activity frequencies may be different.
   4. Correct: One event in the original event log may refer to a variable number of events in
      the flattened event logs. This poses a problem when merging the models. Each process
      model 𝑀𝑂𝑇 needs to be “repaired” such that the frequency counts are correct. This can be
      achieved by “batching”, i.e., if multiple events in the flattened event log correspond to the
      same event in the original event log, then the activity is assumed to handle all objects in
      the original event in a single step. This concept can be visualized using so-called variable
      arcs that do not show the flow of individual objects, but groups of objects.
   5. Merge: The previous steps ensured that we have a model for each object type such
      that the activity names are unique and the frequencies are consistent (i.e., the activity
      frequencies match the frequencies in the original event logs before flattening). This means
      that the models are “in sync” and can be merged by fusing the activities with the same
      label.
   6. Enrich: It is possible to enrich the merged process model with cardinality constraints
      learned from the original event log, e.g., activity place order involves one or more items
      and precisely one order. It is also possible to add descriptive statistics such as the minimal,
      maximal, and mean number of objects involved in an activity. Also timing information
      and frequencies can be added.

   To illustrate the six steps, consider object-centric event data relating to 2000 orders, 6000
items, and 3000 packages. Assume that there are orders consisting of just one item, but also
orders with over ten items. On average, orders consist of three items. Also, packages may
contain a variable number of items. There are many packages with just one item, but also
packages with five items. On average, a package has two items. Note that items belonging to
the same order may end up in different packages and items in the same package may originate
from different orders. Using case-centric process mining, one would need to focus on orders,
items, or packages separately. However, this leads to partial models and misleading insights.
Therefore, we apply the six steps mentioned before.
   There may be many more object types and event types (i.e., activities), but assume we selected
the object types order, item, and package (and the respective event types). Based on this selection,
we flattened the object-centric event data into three traditional event logs. These flattened event
logs are used to discover the three process models shown in Figure 3.
   Figure 3 shows the life-cycles of the individual objects. However, the frequencies of the
Figure 3: Three BPMN models discovered based on the three flattened event logs. Note that activity
place order occurs 2000 times in the process model discovered for object type order and 6000 times in the
process model discovered for object type item. Activity create package occurs 6000 times in the process
model discovered for object type item and 3000 times in the process model discovered for object type
package. These differences in frequencies need to be resolved in order to merge the different objects
flows into a single model.


activities do not match. Note that place order occurs 2000 times in the order process, but 6000
times in the item process. There is also disagreement between the item and package processes,
e.g., activity create package occurs 6000 times in the item process and 3000 times in the package
process. These are the usual problems when flatting event data (see the convergence and
divergence problems described in [1, 2]). Therefore, we apply the corrections mentioned in the
fourth step. The result is shown in Figure 4. Activity place order now occurs 2000 times in each
process and not 6000 times as suggested by the item process in Figure 3. Note that activities
Figure 4: Three corrected BPMN models showing the original frequencies. This is achieved through
so-called “variable arcs” (see the double-headed arcs going into the activities that may involve a variable
number of objects of that type). In this example, only the model for the item process has to be corrected.
Note that activity place order occurs 2000 times and, on average, handles three items. Note that activity
create package occurs 3000 times and, on average, handles two items.


place order, create package, send package, and package delivered have “variable arcs” (represented
by the double-headed arcs in Figure 4), and the correct frequencies are indicated. For example,
package delivered occurs 3000 times and handles 6000 items.
   Figure 5 shows the result of merging the three BPMN models from Figure 4. This step is
trivial because activity labels are unique, and frequencies match. The resulting model shows the
flow of three types of objects and the activities they are involved in. The object-centric BPMN
model can be further extended with constraints and descriptive statistics related to cardinalities,
frequencies, and times.
   Note that the approach is generic and does not depend on a specific notation or a specific
Figure 5: The merged object-centric BPMN model showing the object flows of three different object
types.


discovery technique. The same ideas have already been applied to DFGs, Petri nets, process
trees, etc. (see Section 4).


3. Object-Centric Conformance Checking
It is also possible to check the conformance of processes by comparing a process model with
event data [6, 17]. The two most frequently used conformance-checking approaches are token-
based replay [18] and alignments [19, 17]. Here, we can apply an approach similar to the one
used for discovery in Section 2.
   1. Selection: As input, we need object-centric event data and an object-centric process
      model. The assumption is that the scope of the data and model are the same. If not,
      further selection and alignment operations are needed.
   2. Flatten: For each object type 𝑂𝑇 , create again a traditional event log 𝐿𝑂𝑇 by flattening
      the event data. Moreover, project the object-centric process model onto one model per
      object type 𝑂𝑇 . While flattening the model, replace the variable arcs with ordinary arcs,
      i.e., activities handle one object at a time.
   3. Check object flows: For each object type 𝑂𝑇 , use the event log 𝐿𝑂𝑇 and the correspond-
      ing flattened model. Using this as input, traditional conformance-checking approaches
      can be used (e.g., token-based replay or alignment computations). This yields diagnostics
      per object type. Note that all deviations found per object type are also real deviations.
   4. Check cardinalities: Using the object-centric event data it is also possible to check
      cardinality constraints (e.g., a send package event without items) and report deviations.

  Note that the flattened event data and process models provide necessary but not sufficient
conditions for conformance checking. Any deviation found in using the approach described
before is a real deviation. However, some conformance problems may remain undetected. See
[20] for a more detailed problem analysis. The object-centric process models used in OCPM
tend to be underspecified. For example, create package, send package, and package delivered
need to work on the same subsets of objects, but this is not clear from the model in Figure 4. To
support this, we need to explicitly use O2O relations in our process models.


4. Implementation
The approaches for Object-Centric Process Discovery (OCPD) and Object-Centric Conformance
Checking (OCCC) discussed in the previous sections should be seen as generic, baseline ap-
proaches. They illustrate that existing techniques for case-centric process mining can be reused
and provide a good starting point. These approaches have been implemented in open-source
tools such as OCPM [21, 22], OCPI [23], and OCPA [24] and are now making their way into
commercial software products.
   Celonis was the first process mining vendor to fully embrace OCPM. The Celonis process
mining platform implemented the OCPM approaches described in this paper. In the new version
of the platform, event data are stored as OCED and views (called perspectives) can be analyzed.
All the types of process mining mentioned in Figure 1 have been adapted for the new setting.
The Celonis Multi-Object Process Explorer (MOPE) is able to generate an object-centric DFG. The
Celonis Process Adherence Manager (PAM) uses a variant of the Inductive Mining (IM) algorithm
to discover object-centric BPMN models. These can be edited, stored, imported, and exported.
Moreover, PAM supports conformance checking using alignments and allows for performance
analysis across different object types. For example, it is easy to answer questions such as “What
is the average time between placing an order and the last item being delivered?” and “How often
do we send a payment reminder before the first item is delivered?”. Note that these questions
involve multiple object types.
Figure 6: OCPM is also supported by the Celonis process mining platform. Data is stored using an
object-centric data model. It is possible to discover object-centric DFGs and object-centric BPMN
models. Also, conformance checking using alignments is supported for object-centric BPMN models.


5. Discussion and Conclusion
This extended abstract provides an overview of Object-Centric Process Mining (OCPM) and
presents some of the key principles. The transition from case-centric to object-centric process
mining is significant, akin to the shift from classical Petri nets to Colored Petri Nets (CPNs)
and from two-dimensional images (like X-rays) to three-dimensional images (such as full-body
MRIs). In classical Petri nets, tokens are indistinguishable. To represent cases, we need to assume
that the Petri net models one case in isolation. This is not possible when moving to multiple
object types. Moreover, we are interested in interactions between objects. Just like classical
Petri nets and X-rays are still useful, also case-centric process mining is useful. However, as
the field matures and is ready to tackle more ambitious questions, OCPM helps to take process
mining to the next level.
  There are three main reasons for using OCPM:
   1. Avoid repeatedly going back to your source systems. Object-Centric Event Data
      (OCED) offers a single system-agnostic source of truth. This saves time and helps to
      capture real-life events and objects. Data extraction is decoupled from particular analysis
      questions. In traditional process mining, data is extracted using a specific case notion.
      This is not needed anymore, because it is possible to generate the views (often called
      perspectives) on demand.
   2. Avoid distortions due to the single-case assumption. Squeezing reality into simple
      event logs creates distortions. This includes the unintentional replication of events
      (convergence) and loss of causal relations (divergence). In traditional process mining,
      frequencies of activities and information on costs and delays highly depend on the way
      the data was flattened. Squeezing multiple object types into a single case notion also
      results in more complex process models where all structure is lost.
   3. See and understand the interactions between different object types. Problems
      live at the intersections of processes and organizational entities. For example, low On-
      Time-In-Full (OTIF) scores may be caused by problems in sales, production, procurement,
      logistics, etc.
   In this extended abstract, we limited ourselves to Object-Centric Process Discovery (OCPD)
and Object-Centric Conformance Checking (OCCC). We showed how existing techniques can be
used to create baseline OCPD and OCCC approaches. These approaches have been implemented
in open-source tools and the widely used Celonis process mining platform. However, OCPM
also extends to other tasks, such as predictive analytics and predictive and generative Artificial
Intelligence (AI). For example, in [25] we show that using OCED helps to make more accurate
predictions. This is unsurprising because exploiting the underlying structure relating objects
and events and using more context provides a better basis for machine learning.
   Despite these early successes, many improvements are possible. The object-centric process
models used thus far in OCPM are rather underspecified compared to, for example, Colored
Petri Nets (CPNs) with arc inscriptions and guards. The reason is that the relations are in the
data and not in the model (e.g., which items belong to an order). It makes sense to consider
Object-to-Object (O2O) relations more explicitly, instead of focusing mostly on Event-to-Object
(E2O) relations. This will make process discovery and conformance checking more challenging.
However, it will allow us to create “digital shadows” that are much closer to the actual processes.


Acknowledgments
Thanks to the PADS and Celonis teams for implementing the various OCPM techniques (includ-
ing OCEL 2.0, OCPM, OCPI, OCPA, MOPE, Process Sphere, and PAM) and thanks the Alexander
von Humboldt (AvH) Stiftung for supporting our research. The research is also funded by the
Deutsche Forschungsgemeinschaft (DFG) under Germany’s Excellence Strategy, Internet of
Production (390621612).


References
 [1] W. van der Aalst, Object-Centric Process Mining: Unraveling the Fabric of Real Processes,
     Mathematics 11 (2023) 2691.
 [2] W. van der Aalst, A. Berti, Discovering Object-Centric Petri Nets, Fundamenta Informaticae
     175 (2020) 1–40.
 [3] Process Mining, Process Mining Website, www.processmining.org, 2024.
 [4] M. Kerremans, D. Sugden, N. Duffy, Magic Quadrant for Process Mining Platforms, Gartner
     Research Note G00790664, 2024. www.gartner.com.
 [5] Object-Centric, Object-Centric Event Log Standard Website, www.ocel-standard.org, 2024.
 [6] W. van der Aalst, Process Mining: Data Science in Action, Springer-Verlag, Berlin, 2016.
 [7] W. van der Aalst, A. Weijters, L. Maruster, Workflow Mining: Discovering Process Models
     from Event Logs, IEEE Transactions on Knowledge and Data Engineering 16 (2004)
     1128–1142.
 [8] A. Augusto, R. Conforti, M. Marlon, M. La Rosa, A. Polyvyanyy, Split Miner: Automated
     Discovery of Accurate and Simple Business Process Models from Event Logs, Knowledge
     Information Systems 59 (2019) 251–284.
 [9] R. Bergenthum, J. Desel, R. Lorenz, S. Mauser, Process Mining Based on Regions of
     Languages, in: G. Alonso, P. Dadam, M. Rosemann (Eds.), International Conference on
     Business Process Management (BPM 2007), volume 4714 of Lecture Notes in Computer
     Science, Springer-Verlag, Berlin, 2007, pp. 375–383.
[10] J. Carmona, J. Cortadella, M. Kishinevsky, A Region-Based Algorithm for Discovering Petri
     Nets from Event Logs, in: Business Process Management (BPM 2008), 2008, pp. 358–373.
[11] M. Solé, J. Carmona, Process Mining from a Basis of State Regions, in: J. Lilius, W. Penczek
     (Eds.), Applications and Theory of Petri Nets 2010, volume 6128 of Lecture Notes in Computer
     Science, Springer-Verlag, Berlin, 2010, pp. 226–245.
[12] J. Werf, B. Dongen, C. Hurkens, A. Serebrenik, Process Discovery using Integer Linear
     Programming, Fundamenta Informaticae 94 (2010) 387–412.
[13] S. Leemans, D. Fahland, W. van der Aalst, Discovering Block-structured Process Models
     from Event Logs: A Constructive Approach, in: J. Colom, J. Desel (Eds.), Applications and
     Theory of Petri Nets 2013, volume 7927 of Lecture Notes in Computer Science, Springer-
     Verlag, Berlin, 2013, pp. 311–329.
[14] S. Leemans, D. Fahland, W. van der Aalst, Discovering Block-Structured Process Models
     from Event Logs Containing Infrequent Behaviour, in: N. Lohmann, M. Song, P. Wohed
     (Eds.), Business Process Management Workshops, International Workshop on Business Pro-
     cess Intelligence (BPI 2013), volume 171 of Lecture Notes in Business Information Processing,
     Springer-Verlag, Berlin, 2014, pp. 66–78.
[15] S. Leemans, D. Fahland, W. van der Aalst, Scalable Process Discovery and Confor-
     mance Checking, Software and Systems Modeling 17 (2018) 599–631. doi:10.1007/
     s10270-016-0545-x.
[16] A. Augusto, R. Conforti, M. Dumas, M. La Rosa, F. Maggi, A. Marrella, M. Mecella, A. Soo,
     Automated Discovery of Process Models from Event Logs: Review and Benchmark, IEEE
     Transactions on Knowledge and Data Engineering 31 (2019) 686–705.
[17] J. Carmona, B. van Dongen, A. Solti, M. Weidlich, Conformance Checking: Relating
     Processes and Models, Springer-Verlag, Berlin, 2018.
[18] A. Rozinat, W. van der Aalst, Conformance Checking of Processes Based on Monitoring
     Real Behavior, Information Systems 33 (2008) 64–95.
[19] W. van der Aalst, A. Adriansyah, B. van Dongen, Replaying History on Process Models for
     Conformance Checking and Performance Analysis, WIREs Data Mining and Knowledge
     Discovery 2 (2012) 182–192.
[20] L. L. J. Adams, W. van der Aalst, Object-Centric Alignments, in: J. Almeida, J. Borbinha,
     G. Guizzardi, S. Link, J. Zdravkovic (Eds.), International Conference on Conceptual Model-
     ing (ER 2023), volume 14320 of Lecture Notes in Computer Science, Springer-Verlag, Berlin,
     2023, pp. 201–219.
[21] A. Berti, W. van der Aalst, OC-PM: Analyzing Object-Centric Event Logs and Process
     Models, International Journal on Software Tools for Technology Transfer 25 (2023) 1–17.
[22] A. Berti, G. Park, M. Rafiei, W. van der Aalst, A Generic Approach To Extract Object-
     Centric Event Data From Databases Supporting SAP ERP, Journal of Intelligent Information
     Systems 61 (2023) 835–857.
[23] J. Adams, W. van der Aalst, OCpi: Object-Centric Process Insights, in: L. Bernardinello,
     L. Petrucci (Eds.), Application and Theory of Petri Nets and Concurrency (Petri Nets 2022),
     volume 13288 of Lecture Notes in Computer Science, 2022, pp. 139–150.
[24] J. Adams, G. Park, W. van der Aalst, ocpa: A Python Library for Object-Centric Process
     Analysis, Software Impacts 14 (2022) 100438.
[25] J. Adams, G. Park, W. van der Aalst, Preserving Complex Object-Centric Graph Structures
     to Improve Machine Learning Tasks in Process Mining, Engineering Applications of
     Artificial Intelligence 125 (2023) 106764.

</pre>