=Paper= {{Paper |id=Vol-3758/paper-07 |storemode=property |title=Concept-drift-aware Prescriptive Analytics for Object-centric Processes |pdfUrl=https://ceur-ws.org/Vol-3758/paper-07.pdf |volume=Vol-3758 |authors=Ngoc-Diem Le |dblpUrl=https://dblp.org/rec/conf/bpm/Le24 }} ==Concept-drift-aware Prescriptive Analytics for Object-centric Processes== https://ceur-ws.org/Vol-3758/paper-07.pdf
                                Concept-drift-aware Prescriptive Analytics for
                                Object-centric Processes
                                Ngoc-Diem Le1
                                1
                                    University of Padua, Italy


                                               Abstract
                                               In the field of Process Mining, Process-aware recommender systems are designed to monitor process
                                               executions, predict future behavior, and recommend effective interventions based on the knowledge
                                               learned from historical event logs to reduce the risk of failure or to maximize a given reference Key
                                               Performance Indicator. In reality, concept drift in business processes involves changes in the underlying
                                               process dynamics over time, which can invalidate prediction models. Additionally, in complex business
                                               environments that involve multiple events and objects interacting, traditional recommender systems
                                               may struggle to accurately model these interactions, leading to less precise recommendations. Due
                                               to the intricate nature of the event data, existing methods for generating recommendations may be
                                               inadequate as they fail to account for the dependencies and relationships between object types. This
                                               Ph.D. project aims to develop a recommendation module that can generate meaningful and applicable
                                               recommendations. The objective is to apply counterfactual reasoning to provide clear insights into how
                                               particular changes from input can lead to desired results. Additionally, the project intends to adapt the
                                               system to handle concept drift effectively through continuous learning. This adjustment ensures that the
                                               recommendations maintain their accuracy and effectiveness even when the underlying process dynamics
                                               alter. Finally, the project seeks to provide recommendations for object-centric processes. This paradigm
                                               is gaining popularity for its ability to model processes more accurately and detailedly.

                                               Keywords
                                               Process Prescriptive Analytics, Process-aware Recommendation systems, Concept drift detection, Coun-
                                               terfactuals, Object-centric Process Mining,




                                1. Research Problem and Motivation
                                In recent years, process mining has emerged as a research area bridging between data science
                                and process science. Its goal is to discover, monitor, and enhance actual processes by extracting
                                information from event logs, helping businesses identify inefficiencies and bottlenecks and
                                optimize overall performance [1].
                                   Recommender systems are a particular class of Information Systems that aim to analyze
                                user data and behavior to provide personalized recommendations, thereby enhancing user
                                experience and decision-making. In the context of Process Mining, there is a growing trend
                                towards the development of Process-ware recommender systems (hereafter referred to as PAR
                                systems). Conceptually, a PAR system comprises three sub-systems: monitoring, predictive

                                Proceedings of the Best BPM Dissertation Award, Doctoral Consortium, and Demonstrations & Resources Forum co-located
                                with 22nd International Conference on Business Process Management (BPM 2024), Krakow, Poland, September 1st to 6th,
                                2024.
                                Envelope-Open ngocdiem.le@phd.unipd.it (N. Le)
                                GLOBE https://github.com/ngocdiemle296 (N. Le)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
analytics block, and prescriptive analytics block. These systems are intended to (i) forecast
process execution outcomes, (ii) identify high-risk executions that may fail to meet performance
goals, such as costs, deadlines, and customer satisfaction, and (iii) recommend contingency
actions to improve the executions [2]. Generally, an outcome is defined through a so-called
process-specific KPI (Key Performance Indicator) that measures a process’ performance and
highlights improvement areas.
    Nevertheless, there is a lack of comprehensive studies on the application of prescriptive
analytics for dynamically generating the optimal combination of the next activity and resource
allocation in real time. Current recommendation modules predominantly focus on recommend-
ing the next activity, with limited capabilities in simultaneously recommending the appropriate
resources for these activities. This gap is significant, as the impact of the recommended activity
on the process outcome can be highly dependent on the specific resources assigned to perform
it.
    The purpose of this Ph.D. project is to construct a recommendation module that suggests
the optimal combinations of the next activity and resource in a workflow. The current work of
Padella et al. [3] recommends the next activity by testing the improvement for every possible
combination of activity and resource is unfeasible in practice, especially for processes involving
hundreds of potential resources. This leads to our primary research question.
    Research Question 1: How can we build a recommendation module that can efficiently help
improve a given reference KPI?
    Next, in reality, processes often change due to various factors such as market dynamics,
new regulations, or efforts toward process enhancement and repair [4]. These changes, as
known as concept drift, challenge the underlying assumptions of classical methods, which
typically assume that processes are in a steady state [5]. The occurrence of concept drift in
the recommendation module can result in inadequate recommendations. This is because the
prediction and recommendation models may no longer accurately capture the current dynamics
of the workflow. As the process evolves, previously effective models may become outdated,
resulting in suboptimal or incorrect recommendations. This issue drives us to our second
research question.
    Research Question 2: How do we maintain the effectiveness of recommendations through
continuous learning as processes evolve?
    Traditional process mining methods usually assume that each event is associated with a
single object. These considerations motivate the introduction of object-centric processes. This
paradigm has been gaining attention for its ability to model inter-organizational processes
naturally [6]. However, the transition from traditional single-flow processes to object-centric
ones poses significant challenges, especially regarding the complexity of the input data. This
makes generating recommendations particularly challenging when multiple object types and
varying numbers of objects are involved with events. This brings us to the third research
question.
    Research Question 3: How to expand the use of our framework beyond conventional single-flow
processes?
2. Literature Analysis
In the last decade, a significant amount of research has focused on process predictive analytics.
These methods have been developed to predict various KPIs, tackle challenges from diverse
perspectives, and have been applied across multiple domains. There are comprehensive reviews
of predictive monitoring works before 2022 by Márquez-Chamorro et al. [7], Di Francescomarino
et al. [8], and Rama-Maneiro et al. [9]. In addition, some recent papers [10, 11, 3] focus on
predicting time-related outcomes, total costs, and activity occurrences, where the latter involves
forecasting whether a specific activity is likely to happen in the future. Galanti et al. [11]
proposed a predictive analytics framework to compare the effectiveness of gradient boosting
(specifically, Catboost) and Long Short-Term Memory (LSTM) models on several real-life case
studies for predicting various KPIs. While both models provided similar prediction quality, the
experiments demonstrated that CatBoost significantly reduced the model’s training time.
   Once predictive analytics provides forecasts, prescriptive analytics uses these insights to
suggest specific actions to optimize outcomes. However, the benefits of prescriptive process
monitoring can only be fully realized if these methods prescribe effective interventions that are
followed [12]. Kubrak et al. [13] proposed a thorough study on prescriptive process monitoring,
discussing its evolution and future prospects and focusing on how predictive analytics can be
combined with optimization techniques to recommend actionable interventions in business
processes. However, the paper also mentions the lack of methods for explainability and feedback
loops between a prescriptive monitoring system and its end-users. In response to this, Padella
et al. [3] proposed a framework to accompany recommendations with sensible explanations
based on process-related characteristics, using Shapley values [14] to generate explanations for
the selected recommendations.
   Sato et al. [4] conducted a comprehensive survey on concept drift detection in process mining,
categorizing approaches into two main branches: explicit drift detection and adapting process
mining techniques to handle event streams in evolving environments. However, none of the
papers reviewed in [4] addressed the resource perspective. Adams et al. [15] generalized the
concept drift detection framework using PELT (Pruned Exact Linear Time) proposed in [16]
by allowing for the testing of non-linear relationships, which can be potentially applied to
resource allocation within information systems. Moreover, their approach supports object-
centric event logs. A recent study [17] extended control-flow drift detection to multi-perspective
drift detection by extracting features from a multi-layered event knowledge graph (EKG). The
idea is to aggregate information in an EKG with actor and case paths to gain new insights into
actor behavior and task handovers and utilize an existing change point detection technique
proposed in [15]. However, it is worth noting that concept drift in process mining is still
primarily being focused on offline analysis. Accordingly, Hassani [18] introduced a method
for detecting concept drift in event streams by employing ADWIN (Adaptive Windowing)
[19], an adaptive window technique. This approach dynamically adjusts the window size
by focusing on short intervals for highly deviating periods or increasing its width in case of
uniform observations. Additionally, by integrating the advantages of both Heuristic Miner
and Fuzzy Miner, it enhances the detection and adaptation to changes in process behavior.
Nevertheless, there remains a lack of research that focuses on the problem of concept drift
in terms of optimizing the project-specific KPIs and their impact on the recommendations
generated by the PAR system.
   The object-centric process paradigm is increasingly gaining traction in both academia and
industry [20]. While existing discovery algorithms focus on modeling the control flow of
processes, they often struggle to effectively represent the interactions and dependencies between
different objects in event data. Tour et al. [21] introduced Agent Miner, a divide-and-conquer
algorithm that constructs models of agents and their interactions from event data, providing
a new perspective on how agents work together to perform activities. Besides, Klijn et al.
[22] proposed a new aggregation method that focuses on analyzing task executions in event
knowledge graphs. Regarding predictive analytics, some studies [20, 23, 24] have proposed
approaches to incorporate information about object interactions into the predictive model. The
first two papers [20, 23] showed promising results when using techniques based on gradient
boosting, particularly the Catboost model. In addition, Adams et al. [24] examined the impact
of flattening and the potential benefits of object-centric innovations in predictive process
monitoring. Despite these advancements, in the context of prescriptive analytics, no known
research has been conducted so far on providing recommendations for object-centric processes.


3. Project Roadmap
The project aims to develop a recommendation module within the PAR system that suggests
optimal combinations of the next activity and resource in a workflow, specifically targeting
administrative processes. Initially, the construction of the recommendation module requires
the development of a predictive analytics component. Due to its ease of implementation and
significant advantage in training time, we chose the CatBoost model [25] to predict the KPIs of
interest.
   In recent years, counterfactual reasoning has become increasingly crucial in many aspects of
process mining. This method seeks to demonstrate how modifying a real input instance can lead
to significant changes in the output based on what the machine learning prediction model has
learned. It is important to distinguish our approach from counterfactuals used in causal machine
learning. In our case, we are not estimating causal effects but rather aiming to understand
how changes in input affect the output. In the PAR system, counterfactual reasoning is mostly
applied in the field of predictive analytics [26, 27, 28]. However, to the best of our knowledge,
this approach has yet to be explored in the context of prescriptive analytics to recommend
the next activities and resources while optimizing process-specific KPIs. Therefore, to address
research question 1, we have developed a recommendation module that uses counterfactuals
to generate a list of recommendations that contains potential next activities and resources.
Specifically, we employ the DiCE (Diverse Counterfactual Explanations) algorithm [29] to
explore the potential KPIs that would occur if a different combination of activity and resource
was taken at a decision point in the process. Within DiCE, the Catboost model predicts potential
KPIs for these alternative scenarios. The combinations of the next activity and resource are
considered hypothetical alternatives to the actual events that have happened. By using the data
about historical events and features, DiCE can create a series of “what-if” scenarios by altering
features related to the next activity and resource while keeping others constant. By assessing
these “what-if” scenarios, the system can infer potential KPIs without requiring exhaustive
testing, making it more feasible in practice. Through this methodology, the recommendation
module can effectively suggest optimal activity-resource combinations, thereby enhancing
workflow efficiency and productivity. The advantage of DiCE is that it uses genetic approaches
that prevent testing all combinations of activities and resources. We are aware that the solutions
proposed are not necessarily optimal, but we expect DiCE to give very good solutions while
keeping the problem tractable.
   To assess the effectiveness of our recommendation module, we need to compare the KPI
values for the process instances being executed, when the recommendations are followed versus
when they are ignored. The best would be to have an A/B testing with the system in production,
but that is unfeasible in practice because companies do not want to put their business at risk.
In [3], Padella et al. use the event log of the past executions to find traces that are similar to those
subjected to recommendations and assume that the latter ones would behave similarly under the
same recommendations. Unfortunately, it is often the case that you cannot find similar traces,
invalidating the proposal. In this project, we aim to use business process simulation to generate
executions that do or do not follow the recommendations, similar to what was proposed by
Padella et al. [30]. Simulated data can help generate a wide range of hypothetical scenarios
that may not have occurred in the past. We have already achieved preliminary results at this
step in our project. We will soon compile a paper on our findings, which we aim to submit to a
workshop at an upcoming conference.
   The concept drift problem in Process Mining underscores the importance of continuously
monitoring and updating the model to maintain system effectiveness. In addressing research
question 2, we intend to detect changes in data patterns while keeping the predictive model
up-to-date. Since our goal is to recommend optimal combinations of the next activity and
resource, it is crucial to consider both actor behavior and control flow together when detecting
concept drift. Therefore, we intend to leverage the framework proposed in [17] for multi-
perspective concept-drift detection by extracting features from a multi-layered EKG. To enhance
the recommendation module against concept drift, we plan to leverage CatBoost’s support for
continuous learning. By utilizing CatBoost’s capabilities to adapt incrementally to new data,
we aim to maintain the system’s predictive accuracy and effectiveness as patterns evolve.
   Regarding research question 3, we aim to enhance our concept-drift-aware prescriptive
analytics framework with object-centric processes. Specifically, we plan to adapt the Catboost
model with object-centric processes within the predictive analytics block to achieve optimal
performance. As highlighted by [20], the Catboost model provides similar results to LSTM
and graph-based neural networks in terms of accuracy. However, it is significantly faster
than the methods relying on graph-based neural networks when applied in object-centric
processes. However, a significant challenge we encounter involves determining what constitutes
a recommendation for object-centric processes. Due to the intricate nature of the event data,
a recommendation for object-centric processes should consider the multiple dimensions and
interactions of the various objects involved. Therefore, the recommendations generated by
our initial framework may be inadequate as they fail to account for the dependencies and
relationships between different object types. To address this, it is crucial to develop a more
advanced framework capable of comprehensively identifying and analyzing these interactions.
This approach will ensure that the recommendations are well-informed and tailored to meet
the particular requirements and conditions of the interacting objects.
References
 [1] W. M. P. van der Aalst, Process Mining: Data Science in Action, Springer, 2016.
 [2] M. de Leoni, M. Dees, L. Reulink, Design and evaluation of a process-aware recommender
     system based on prescriptive analytics, in: Proceedings of the 2nd International Conference
     on Process Mining (ICPM 2020), IEEE, 2020.
 [3] A. Padella, M. de Leoni, O. Dogan, R. Galanti, Explainable process prescriptive analytics,
     in: Proceedings of the 4th International Conference on Process Mining (ICPM 2022), IEEE,
     2022.
 [4] D. M. V. Sato, S. C. De Freitas, J. P. Barddal, E. E. Scalabrin, A survey on concept drift in
     process mining, ACM Computing Surveys 54 (2022) 189:1–189:38.
 [5] R. J. C. Bose, W. M. P. van der Aalst, I. Žliobaitė, M. Pechenizkiy, Dealing with concept
     drifts in process mining, IEEE Transactions on neural networks and learning systems 25
     (2013) 154–171.
 [6] W. M. P. van der Aalst, Object-centric process mining: Dealing with divergence and
     convergence in event data, in: Proceedings of the 17th International Conference on
     Software Engineering and Formal Methods (SEFM 2019), Springer, 2019, pp. 3–25.
 [7] A. E. Márquez-Chamorro, M. Resinas, A. Ruiz-Cortés, Predictive monitoring of business
     processes: a survey, IEEE Transactions on Services Computing 11 (2017) 962–977.
 [8] C. Di Francescomarino, C. Ghidini, F. M. Maggi, F. Milani, Predictive process monitoring
     methods: Which one suits me best?, in: Proceedings of the 16th International Conference
     on Business Process Management (BPM 2018), Springer, 2018, pp. 462–479.
 [9] E. Rama-Maneiro, J. C. Vidal, M. Lama, Deep learning for predictive business process
     monitoring: Review and benchmark, IEEE Transactions on Services Computing 16 (2021)
     739–756.
[10] R. Galanti, B. Coma-Puig, M. de Leoni, J. Carmona, N. Navarin, Explainable predictive
     process monitoring, in: Proceedings of the 2nd International Conference on Process
     Mining (ICPM 2020), IEEE, 2020, pp. 1–8.
[11] R. Galanti, M. de Leoni, M. Monaro, N. Navarin, A. Marazzi, B. Di Stasi, S. Maldera,
     An explainable decision support system for predictive process analytics, Engineering
     Applications of Artificial Intelligence 120 (2023) 105904.
[12] M. Dees, M. de Leoni, W. M. van der Aalst, H. A. Reijers, What if process predictions are not
     followed by good recommendations?(technical report), arXiv preprint arXiv:1905.10173
     (2019).
[13] K. Kubrak, F. Milani, A. Nolte, M. Dumas, Prescriptive process monitoring: Quo vadis?,
     PeerJ Comput. Sci. 8 (2022) e1097.
[14] L. S. Shapley, et al., A value for n-person games (1953).
[15] J. N. Adams, S. J. van Zelst, T. Rose, W. M. van der Aalst, Explainable concept drift in
     process mining, Information Systems 114 (2023) 102177.
[16] J. Adams, S. van Zelst, L. Quack, K. Hausmann, W. M. P. van der Aalst, T. Rose, A framework
     for explainable concept drift detection in process mining, in: Proceedings of the 19th
     International Conference on Business Process Management (BPM 2021), 2021.
[17] E. L. Klijn, F. Mannhardt, D. Fahland, Multi-perspective concept drift detection: Including
     the actor perspective, in: Proceedings of the International Conference on Advanced
     Information Systems Engineering, Springer, 2024, pp. 141–157.
[18] M. Hassani, Concept drift detection of event streams using an adaptive window, in:
     Proceedings of the 33rd International ECMS Conference on Modelling and Simulation,
     ECMS 2019, 2019, pp. 230–239.
[19] A. Bifet, R. Gavalda, Learning from time-changing data with adaptive windowing, in:
     Proceedings of the 7th International Conference on Data Mining (2007), SIAM, 2007, pp.
     443–448.
[20] R. Galanti, M. de Leoni, Predictive analytics for object-centric processes: Do graph neural
     networks really help?, in: Proceedings of the 21st International Conference on Business
     Process Management (BPM 2023), Springer, 2023, pp. 521–533.
[21] A. Tour, A. Polyvyanyy, A. Kalenkova, A. Senderovich, Agent miner: an algorithm for
     discovering agent systems from event data, in: Proceedings of the International Conference
     on Business Process Management, Springer, 2023, pp. 284–302.
[22] E. L. Klijn, F. Mannhardt, D. Fahland, Aggregating event knowledge graphs for task
     analysis, in: Proceedings of the International Conference on Process Mining, Springer,
     2022, pp. 493–505.
[23] R. Galanti, M. de Leoni, N. Navarin, A. Marazzi, Object-centric process predictive analytics,
     Expert Systems with Applications 213 (2023) 119173.
[24] J. N. Adams, H. Drescher, A. Swoboda, N. Günnemann, G. Park, W. M. P. van der Aalst, Im-
     proving predictive process monitoring using object-centric process mining, in: Proceedings
     of the 32nd European Conference on Information Systems (ECIS), 2024.
[25] A. V. Dorogush, V. Ershov, A. Gulin, Catboost: Gradient boosting with categorical features
     support, arXiv preprint arXiv:1810.11363 (2018).
[26] C. Hsieh, C. Moreira, C. Ouyang, DiCE4EL: Interpreting process predictions using a
     milestone-aware counterfactual approach, in: Proceedings of the 3rd International Con-
     ference on Process Mining (ICPM 2021), IEEE, 2021, pp. 88–95.
[27] A. Buliga, C. Di Francescomarino, C. Ghidini, F. M. Maggi, Counterfactuals and ways to
     build them: Evaluating approaches in predictive process monitoring, in: Proceedings of
     the 35th International Conference on Advanced Information Systems Engineering (CAiSE
     2023), Springer, 2023, pp. 558–574.
[28] A. Stevens, C. Ouyang, J. De Smedt, C. Moreira, Generating feasible and plausible coun-
     terfactual explanations for outcome prediction of business processes, arXiv preprint
     arXiv:2403.09232 (2024).
[29] R. K. Mothilal, A. Sharma, C. Tan, Explaining machine learning classifiers through diverse
     counterfactual explanations, in: Proceedings of the 3rd ACM Conference on Fairness,
     Accountability, and Transparency (ACM FAT 2020), 2020, pp. 607–617.
[30] A. Padella, F. Mannhardt, F. Vinci, M. de Leoni, I. Vanderfeesten, Experience-based resource
     allocation for remaining time optimization, in: Proceedings of the 22nd International
     Conference on Business Process Management (BPM 2024), 2024. Accepted. In press.