Prescriptive Process Monitoring in Intelligent Process Automation with Chatbot Orchestration Sergey Zeltyn1 , Segev Shlomov1 , Avi Yaeli1 and Alon Oved1 1 IBM Research - Haifa, Haifa University Campus, Mount Carmel Haifa, 3498825, Israel Abstract Business processes that involve AI-powered automation have been gaining importance and market share in recent years. These business processes combine the characteristics of classical business pro- cess management, goal-driven chatbots, conversational recommendation systems, and robotic process automation. In the new context, prescriptive process monitoring demands innovative approaches. Un- fortunately, data logs from these new processes are still not available in the public domain. We describe the main challenges in this new domain and introduce a synthesized dataset that is based on an actual use case of intelligent process automation with chatbot orchestration. Using this dataset, we demon- strate crowd-wisdom and goal-driven approaches to prescriptive process monitoring. Keywords Business Process Management, chatbots, Robotic Process Automation, prescriptive process monitoring 1. Introduction In recent years, the term intelligent process automation (IPA) has been used to describe a new category of digital workers. IPA combines a mix of technologies from robotic process automation (RPA) [1], natural language processing (NLP), machine learning (ML), artificial intelligence (AI), and traditional digital business process automation and integration platforms. By combining these technologies, IPA promises to support a wider and more complex range of process automation needs, and enable non-professional developers to build, deploy, and engage more naturally with digital workers to perform their tasks. Overall, IPA is expected to become a key component for orchestrating human-to-bot work and is already being adopted across a variety of industries and business areas such as finance, human resources, operations, and sales. An initial survey of IPA vision and characteristics was done by Chakraborti et al. [2]. One of the key characteristics of IPA is a conversational interface, since this represents a very natural way for humans to interact and collaborate with bots as part of the automation process. Another key component is the ability to dynamically orchestrate RPA and other types of automation based on dynamic and contextual user needs. For example, Rizk et al. [3] describe a conversational IPA platform that uses AI planning to orchestrate conversational dialogues and automation sequences based on user utterances and a pre-populated catalog of automation skills. PMAI@IJCAI22: International IJCAI Workshop on Process Management in the AI era, July 23, 2022, Vienna, Austria email: sergeyz@il.ibm.com (S. Zeltyn); segev.shlomov1@ibm.com (S. Shlomov); aviy@il.ibm.com (A. Yaeli); alon.oved@il.ibm.com (A. Oved) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) As IPA platforms become more powerful, we can expect a new generation of conversation- oriented digital employees that can handle complex and dynamic processes. The duration of these processes can span from seconds and minutes to weeks and months. Prescriptive process monitoring (PPM) techniques are used to improve business processes by triggering interventions at runtime to optimize the process towards a goal such as a key performance indicator (KPI). Traditionally, PPM looks at case ID, activity timing and sequences, resources, and business attributes to predict the progress towards the goal and introduce interventions. Because IPA has many unique elements when compared to traditional BPM, PPM requires adaptations to support prescriptive tasks in IPA. One scenario of these adaptations occurs when there is a need to prescribe a recommendation in the form of a human-to-bot interaction, such as a button or textual utterance. There are four possible situations to consider: First, the discovery of skills and utterances for new users who are not yet aware of what the bot can do and which utterances will trigger those skills. Second, a recovery from interaction failure. Occasionally the bot will not understand the user or an error may occur that will cause the interaction to derail from its original context. Third, goal-oriented actions that help or remind the user to perform some activity in order to achieve the process goals. Fourth, the wisdom of the crowd and personalization that can recommend possible actions based on the behavior of other users, or on personal preferences. Aside from the type of prescription, there are additional characteristics that are different from classical PPM, such as how process and session identifiers in human-to-bot interaction map to the concept of case ID, how disambiguation and errors are modeled as activities, how to treat or leverage human feedback, and the types of intervention that can achieve user engagement with the bot. IPA is a relatively new domain and real-world deployments are not available for researchers due to confidentiality. Therefore, we believe that a synthetic dataset, inspired by actual IPA de- ployment, will be of value in terms of understanding the data model, testing existing algorithms, and developing new ones that can later be validated on real-world datasets. The contribution of this paper is twofold. First, we introduce and share a first-of-a-kind synthetic dataset based on a real-world use case for an HR Management Incentive Program (MIP). This dataset presents a new type of data from IPA-based processes. Second, we demonstrate an implementation of crowd-wisdom and goal-oriented prescription tasks for this new dataset and explain the IPA-related adaptations to traditional algorithms. The remaining part of the paper is organized as follows. Section 2 discusses the related art. Section 3 introduces the MIP dataset. Specifically, Section 3.1 explains its business use case, Section 3.2 presents parameters of the dataset simulation and Section 3.3 provides the dataset schema. The following sections apply two approaches for prescriptive process monitoring to the MIP dataset. Section 4 describes the crowd-wisdom approach based on the prediction of the next activity and Section 5 explains the goal-driven approach based on lateness prediction. Finally, Section 6 summarizes the paper and future research challenges. 2. Related Art The basic goal of prescriptive process monitoring (PPM) is to optimize a process at run-time [4]. The three main PPM methods are predictions (e.g., next activity [5, 6]), corrective actions, and resource optimization (e.g., which resource should perform the next task [7]). Over the years, many techniques have been developed using classical machine learning techniques [8] and deep learning algorithms [9]. The recent emergence of trustworthiness in AI systems, has brought explainability and causal papers to the PM domain. Galanti et al. [10] showed how explanations can be given in the field of predictive business process monitoring by using Shapley values to obtain explanations of KPI predictions, such as remaining time and activity execution. Bozorgi et al. [11] applied causal machine learning techniques to the BPM domain and explored which action should be applied to yield the highest causal effect on the business process outcome. Metzger et al. [12] used a reinforcement learning technique in prescription. They tackled the trade-off of earlier predictions, which leave more time for adaptations but exhibit lower accuracy. The authors described when to trigger proactive process adaptations with online reinforcement learning. Our paper is also related to recommendations for goal-driven chatbots [13] and conversational recommendation systems, where many papers suggest to use reinforcement learning techniques [14, 13, 12]. The chatbot research studies systems that are designed by tools such as IBM Watson Assistant, Google Dialog, and Microsoft’s Cortana. In the conversational recommendation systems, users might ask questions about the recommendations and provide feedback. In a recent paper, Weinzierl et al. [6] used BPM prediction techniques in recommender systems. They modeled a sequence of user clicks as a process and used an NLP-based process embedding to recommend the next best click. The paper also argues the importance of “crowd knowledge" for providing good recommendations. Datasets: There are many open datasets that are process-based. Examples include the ones created for the BPI 2011-2020 challenges [15]. They vary in size, domain (e.g., incidents, loans and complaints), and complexity. Other related datasets come from the dialogue domain. Some of them describe human-to-human interaction [16, 17] while the others describe human-to-bot [18] interaction. These datasets contain many different tasks, such as intent prediction [19], slot filling [20], dialogue state tracking [21], and dialog act [22, 23]. While some of these datasets satisfy part of the new domain properties (see also [24]), none of them satisfy all of them; that is, a dataset that is process-based, contains both session ID and case ID, and consists of multi-person interactions. 3. MIP Dataset: Use Case and Description Datasets from IPA systems with chatbot orchestration are still unavailable in the public domain. Thus, we strive to partially close this gap by providing a synthesized dataset based on a real-life use case. Specifically, the use case focuses on human resource (HR) automation systems, which constitute an important application domain for the new technology. Guenole and Feinzig [25] summarize the IBM HR business case with automation IPA systems being used for recruiting, onboarding new employees, career coaching, personalized learning, and other important tasks. As a source of inspiration for our dataset, we used an internal IBM application in the domain of compensation and promotion. The implementation is based on Watson Orchestrator1 . This use case satisfies several basic prerequisites for an IPA dataset. It is based on a multi-person business process, contains chatbot conversational interactions, and gives rise to two embedded process identifiers: case ID defines an instance of a business process and session ID defines a chatbot conversation instance. 3.1. The MIP Dataset Use Case We consider a large software engineering organization. The Management Incentive Program (MIP) process is run in the organization twice a year. The goal of the process is to determine which employees will get a salary increase. There exist two roles in the MIP process: first-line team leaders and second-line department managers. The team leaders initiate a case instant to select team employees who are eligible to participate in the program. Then they provide the names of nominated candidates to the department managers. For each candidate, a department manager decides whether the candidate nomination should be approved, approved with correction (e.g., the amount of salary increase is changed), or rejected. Then, the department manager submits the final decision to the HR system, completing the case instance for a specific team. In summary, each case instance consists of two sequential tasks, nomination and approval, which are performed by two different users. In order to make informed decisions, the team leaders and department managers go over a number of reports that contain different employee performance metrics and summarize employee activities and feedback. The list of 20 available reports is provided in Table 1. The table also contains probabilities that a user will look at the reports at least once during the process. These reports differ in their importance and the users view them in a random order with different frequencies. MIP criteria and yearly assessment reports are always viewed by the team leaders before initiating nomination actions. We also observe from Table 1 that the department managers view reports with a lower frequency than the team leaders. The activities that the team leaders and the department managers can perform are as follows. The team leader can: view report, add nomination, view nomination, submit nomination, and provide candidate name. The department manager can: view report, select candidate name, review nominated candidate, approve nomination, approve nomination with correction, reject nomination, and submit final nominations. We assume that most of these activities are initiated via free text chat. The system recognizes a specific intent of a user and performs the activity that corresponds to this intent. Two name selection activities are performed via a slot-filling mechanism that is frequently used by the goal-driven chatbots. The use of free text to trigger activities gives rise to the two additional scenarios. First, sometimes the IPA system cannot detect a user intent with sufficient confidence. In this case, the user utterance is followed by a fallback: a user is asked to provide additional input. Second, a user utterance can potentially correspond to more than one intent. A standard solution for this problem is disambiguation - the widely used prescriptive technique in the chatbot domain. The user is asked to select between several competing intents via the corresponding buttons. In the MIP dataset, we consider four disambiguation scenarios that are shown in Table 2. Like 1 https://www.ibm.com/cloud/automation/watson-orchestrate Table 1 List of reports Report Viewing by team leader, % Viewing by department manager, % MIP criteria 100 62.6 Yearly assessments 100 50.8 Project assessments 91.1 56.7 Learning activities 87.2 47.0 Client feedback 86.5 50.7 Internal feedback 89.4 51.8 Compensation report 91.6 52.7 MIP history 91.9 56.7 Overtime 55.9 30.1 Innovation and patents 70.9 35.5 Product defects 36.3 15.1 Sprints velocity 32.9 17.0 Bugs fixed 38.6 15.2 Pull requests 15.2 12.9 Features shipped 30.3 26.1 Defects repair time 17.1 10.8 Lead time 27.3 17.1 Project costs 51.1 27.2 Code churn 31.0 27.1 Absence 34.1 18.2 Table 2 List of disambiguation scenarios with examples Activities Example view client feedback report / internal feedback report show feedback report view project assessment / project cost report view project data view MIP criteria report / MIP history report MIP data view product defects report / defects repair time report I need defects report many business processes, the MIP process has time constraints that include a regular and a “hard stop” deadline. It is technically possible but undesirable to violate a regular deadline. On the date of a “hard stop” deadline, process participants are forced to complete the MIP process within several hours. Finally, we assume that both team leaders and department managers are divided into two groups with different statistical properties: struggling users and successful users. Struggling users have, on average, larger intervals between conversation sessions, a higher number of fallbacks, and a higher probability of abandoning a session without successful completion of the Table 3 Performance characteristics per role User type Average interval between Average number Probability of fallback sessions (working days) of conversations per utterance team leader, successful 1.5 2 0.05 team leader, struggling 4 5.5 0.3 dep. manager, successful 1 1.5 0.05 dep. manager, struggling 2 3.5 0.3 task. Naturally, struggling users have a tendency to push the business process over the deadline. 3.2. Parameters and Statistical Properties of the MIP Dataset In this section, we describe some deterministic and statistical properties of the MIP dataset simulation. We assume that the process starts on Monday, Mar 7, 2022 and it is desirable to complete it until the regular deadline: end of Monday, Mar 28, 2022. A second “hard stop” deadline takes place on Monday, Apr 11, 2022. Users can interact with the IPA system during a Monday to Friday working week, with working hours between 8am and 5pm, although sometimes conversation sessions take place later in the evening. There are 250 department managers, and 4 first-line team leaders under each department manager. Overall, there are 250 · 4 = 1, 000 cases of the process, which should provide a sufficiently large data sample for the analysis in Sections 4 and 5. On average, there are 10 employees in each team. The MIP nomination rate is 20%. We assume that the struggling users constitute 1/3 of team leaders and department managers. The main statistical characteristics of successful and struggling users are presented in Table 3. To generate user free-text we applied Lambada [26] methodology. For each intent, we used a small manually prepared seed of utterances that was enriched via the Lambada algorithm. 3.3. Dataset Schema The MIP dataset is provided in csv format2 . Table 4 presents the names of the dataset columns with brief descriptions and examples. Each row of the file corresponds to the turn of a conversation with a chatbot. During all turns, except those corresponding to the chatbot welcome message, a user utterance triggers an activity in the IPA system. For some turns, the system recognizes a user intent, in which case the name of the activity coincides with the intent name. Additional activities are responsible for slot filling and user utterances with unrecognized intent (fallbacks and disambiguations). We assumed that the activities are orchestrated using an agent orchestration concept similar to the one used by Rizk et al. [3]. The score concept we used for activity selection by an IPA system was also introduced in [3]. Since chatbot responses are typically not used in the 2 https://github.com/Sergey-Zeltyn/MIP-dataset Table 4 MIP column descriptions Feature name Description Example case_id corresponds to the MIP process per team 1 session_id conversation session id M7vkTk2f537I role role of the user team leader user_id id of the user Robert North timestamp timestamp of a user utterance 2022-03-17T11:20:21 turn current turn in a conversation 2 activity IPA activity triggered by the user utterance report_lead_time user utterance user utterance that triggers chatbot and system activities view lead time table chatbot response chatbot response to the user utterance Lead Time Report intent intent triggered by the user utterance report_lead_time intent_confidence conversation engine confidence in the user intent 0.898 entity appears when a user is engaged in slot-filling Troy Donovan entity_confidence conversation engine confidence in user entity 1.0 score IPA system orchestrator score based on the intent 0.862 confidence and other system characteristics expecting_response indicates if a user is expected to answer the chatbot question False prescription methods, brief stub utterances (for example, “Welcome Message”) replace them in the dataset. 4. Crowd-Wisdom Prescriptions New users of IPA systems are frequently not fully aware of the actions that they can perform. They need a straightforward and intuitive way to flatten the learning curve. Crowd-wisdom methods help achieve this goal. Experienced users can implicitly guide inexperienced users to “happy paths”. For example, in the MIP use case, the path statistics for advanced users can be used to recommend the most important performance reports. Prediction of the next activity or the next several activities is the key stage of the crowd- wisdom approach. Such prescription systems typically present several options for the user’s next actions. As a result, the underlying predictions should be probabilistic and not single-activity ones. At the same time, undesirable activities, such as fallback or disambiguation, should not be mentioned in recommendations, even if they are performed frequently by other users. In this section, we focus on predicting the next activity for the MIP dataset, while emphasizing several different feature generation approaches. There are 36 activities overall: • viewing of 20 reports, 4 disambiguation activities, and fallback for both roles; • candidate nomination, candidate name selection, viewing nomination, and submitting nomination for the team leaders; • reviewing nomination, nominee name selection, selecting one of three possible decisions, and final submission for the department managers. • session end should be considered an additional activity in the prediction problem. For the prediction task, we do not filter out undesirable activities and compare different predic- tion methods based on the overall goodness-of-fit. Feature generation is especially important in IPA processes that involve chatbots, where conversation sessions constitute sub-processes with special characteristics. For each prediction technique, we consider two feature generation dilemmas. The first dilemma is how to extract the process features: is information on the previous turn sufficient for prediction or should it be complemented by process-aware features? In the non-process-aware (npa) approach, features include the previous activity and several attributes of the previous conversation turn. The list of the attributes includes role, intent confidence, score, expecting_response, and the number of turns in a session. The process-aware (pa) approach adds process path statistics to the feature vector. In our case, this path statistics includes the number of occurrences of each activity during the current session until the current conversation turn. The second, more subtle dilemma, is related to the session definition. Should it be based on conversation (option conv in Table 5)? Or should we base it on the overall path of a user, even if a user was engaged in several conversations on different days (user option in Table 5)? This is an important question since a user can behave differently over different sessions or forget details of the previous ones. In the conv process-aware setting, we count activities from the start of conversations and add a sequential conversation number of a specific user to the feature space. In the user process-aware setting, we count activities from the first user login into the system. Finally, we combine the two approaches and use the union of features from the two session definitions (conv+user option in Table 5). In Table 5, we compare the three process-aware approaches described above and an implementation of a non-process-aware approach. We implemented three prediction techniques: logistic regression, CatBoost, and XGBoost. Our goodness-of-fit metrics included accuracy, weighted Top-3 recall, average Top-3 recall, weighted 𝐹1 score, and average 𝐹1 score. We computed the averages between 35 prediction classes, while leaving out the “end” class since it is defined differently for different session definitions. Weighted averages were weighted by volume in the testing sets. The Top-3 recall can be interpreted as a fraction of samples where the actual activity belongs to the Top-3 predicted activities, ordered by predicted probabilities. This metric is important because crowd-wisdom recommendations typically provide several alternatives. We performed 5-fold cross-validation and averaged metrics over 5 testing datasets. Table 5 indicates that process-aware features significantly improve the goodness-of-fit for all prediction settings under consideration. XGBoost, which uses the union of features from the two session definitions, implies the best goodness-of-fit. The conversation-based session definition shows better results than the user-based definition for all prediction techniques. Given the large number of classes overall and the fact that 20 performance reports had very significant variance in their sequences, the accuracy and Top-3 recall numbers are satisfactory. For example, a random class selection would imply approximately 0.03 accuracy. Table 5 Goodness of-fit for different methods of next activity prediction Prediction method Weighted Average Weighted Average Accuracy Top-3 Recall Top-3 Recall 𝐹1 -score 𝐹1 -score Logistic Regression, npa, conv 0.519 0.412 0.230 0.176 0.302 Logistic Regression, pa, conv 0.588 0.490 0.293 0.241 0.351 Logistic Regression, pa, user 0.565 0.457 0.287 0.233 0.348 Logistic Regression, pa, conv+user 0.596 0.497 0.306 0.251 0.361 CatBoost, npa, conv 0.518 0.414 0.243 0.196 0.313 CatBoost, pa, conv 0.586 0.470 0.298 0.238 0.367 CatBoost, pa, user 0.566 0.445 0.283 0.214 0.355 CatBoost, pa, conv+user 0.596 0.477 0.311 0.244 0.380 XGBoost, npa, conv 0.518 0.415 0.245 0.200 0.311 XGBoost, pa, conv 0.605 0.500 0.317 0.265 0.375 XGBoost, pa, user 0.580 0.469 0.300 0.241 0.362 XGBoost, pa, conv+user 0.616 0.511 0.330 0.270 0.390 5. Goal-Driven Prescriptive Process Monitoring The crowd-wisdom approach, presented in Section 4, is a useful one but, in many circumstances, should be complemented by a goal-driven prescription setting. Processes in the IPA domain can have goals related to process time, cost, quality, and outcomes. One of mainstream approaches in business process management uses a two-step method [27]. First, prediction is performed for the current process instance with respect to process goals. Second, a corrective action is implemented for the instances with unsatisfactory predictions. For the MIP dataset, we address the binary prediction problem of the Mar 28, 2022 deadline violation. In the MIP dataset, 32% of the process instances violated the deadline. We do not explicitly simulate a corrective action assuming that it could be a reminder mail to a user. We apply the same feature generation methods and prediction techniques as in Section 4 with a single change: the timestamp of the current turn is added to the features in all settings. The timestamp is transformed into the number of working days since the start of the process. We use the standard metrics for binary classification problems to compare prediction methods. Table 6 summarizes the prediction results. The process-aware approach performs better than the non-process-aware one. In contrast to Section 4, the definition of our user-based session is preferable to the conversation-based method and implies a better balance between precision and recall. Applying a combination of the two approaches does not improve the goodness-of-fit. Although the results for the three prediction techniques are relatively close, CatBoost produces the best predictions. An analysis of the feature importance values for XGBoost provides insights into the results above. In addition to the user role and timestamp, which clearly affect lateness predictions, the number of fallbacks during the session is also in the Top-3 features based on importance. The possible reason is that struggling users have both a significant number of fallbacks and a high Table 6 Goodness of-fit for different methods of lateness prediction Prediction method Precision Recall 𝐹1 -score Accuracy Logistic Regression, npa, conv 0.934 0.879 0.906 0.926 Logistic Regression, pa, conv 0.933 0.885 0.908 0.927 Logistic Regression, pa, user 0.938 0.897 0.917 0.934 Logistic Regression, pa, conv+user 0.934 0.901 0.917 0.934 CatBoost, npa, conv 0.941 0.892 0.916 0.933 CatBoost, pa, conv 0.939 0.914 0.926 0.941 CatBoost, pa, user 0.940 0.930 0.936 0.947 CatBoost, pa, conv+user 0.939 0.927 0.933 0.946 XGBoost, npa, conv 0.936 0.889 0.911 0.929 XGBoost, pa, conv 0.936 0.909 0.922 0.938 XGBoost, pa, user 0.938 0.928 0.933 0.945 XGBoost, pa, conv+user 0.938 0.925 0.931 0.944 probability of being late for the deadline. It is reasonable to assume that the user-based session definition provides more reliable fallback statistics than a conversation-based one. In addition, observation on the number of fallbacks demonstrates that a language-based conversation feature can be important for long-term process prediction: users with many fallbacks can be identified as risk-prone ones at an early stage of the process and provided with assistance. 6. Summary We presented the emerging domain of intelligent process automation bots with a chat interface. We highlighted unique aspects of PPM for this new domain, such as the type of prescriptions and data model mapping, which require adaptations of traditional prescriptive approaches to BPM. We further introduced HR MIP - a synthetic dataset that is inspired by real-world IPA deployment. This first-of-a-kind dataset can be used by researchers to develop and test new algorithms in this domain. In addition, we presented an implementation of crowd-wisdom and goal-oriented prescriptive tasks and used it to illustrate the necessary adaptations to data mapping and feature generation steps. We hope that these contributions and the availability of an open dataset will be of value to other researchers in the community. Prescriptions in intelligent process automation with a chat interface include many additional challenges for future research. The dynamic nature of AI-based digital employees and human-to- bot interaction may entail continuous concept drift. In this setting, explore-exploit techniques such as reinforcement learning could be applied to leverage implicit and explicit user feedback. Another challenge is how to deal with more complex utterances, e.g., that are handled by an AI planner to dynamically orchestrate robotic process management tasks. Such use cases may require deep learning, NLP, and program synthesis approaches to map from user utterances to activities and then back to textual recommendations. We plan to address some of these challenges in the next version of the HR IPA system that inspired the MIP use case. Acknowledgments The authors thank Slobodan Radenković, Tomáš Bene, Yara Rizk, Vatche Isahagian, and Vinod Muthusamy for fruitful discussions on methodology and use cases in intelligent process au- tomation. References [1] W. M. Van der Aalst, M. Bichler, A. Heinzl, Robotic process automation, Business & Information Systems Engineering 60 (2018) 269–272. [2] T. Chakraborti, V. Isahagian, R. Khalaf, Y. Khazaeni, V. Muthusamy, Y. Rizk, M. Unuvar, From robotic process automation to intelligent process automation, in: BPM 2020 Blockchain and RPA Forum, Springer, Cham, 2020, pp. 215–228. [3] Y. Rizk, V. Isahagian, S. Boag, Y. Khazaeni, M. Unuvar, V. Muthusamy, R. Khalaf, A conversational digital assistant for intelligent process automation, in: BPM 2020 Blockchain and RPA Forum, Springer, Cham, 2020, pp. 85–100. [4] K. Kubrak, F. Milani, A. Nolte, M. Dumas, Prescriptive process monitoring: Quo vadis?, https://arxiv.org/pdf/2112.01769.pdf, 2021. [5] M. de Leoni, M. Dees, L. Reulink, Design and evaluation of a process-aware recommender system based on prescriptive analytics, in: 2020 2nd International Conference on Process Mining (ICPM), IEEE, 2020, pp. 9–16. [6] S. Weinzierl, M. Stierle, S. Zilker, M. Matzner, A next click recommender system for web-based service analytics with context-aware lstms, in: Proceedings of the 53rd Hawaii International Conference on System Sciences, IEEE, 2020, p. 1542–1551. [7] A. Wibisono, A. S. Nisafani, H. Bae, Y.-J. Park, On-the-fly performance-aware human resource allocation in the business process management systems environment using naïve bayes, in: Asia-Pacific Conference on Business Process Management, Springer, 2015, pp. 70–80. [8] C. Di Francescomarino, C. Ghidini, F. M. Maggi, F. Milani, Predictive process monitoring methods: Which one suits me best?, in: International conference on business process management, Springer, Cham, 2018, pp. 462–479. [9] D. A. Neu, J. Lahann, P. Fettke, A systematic literature review on state-of-the-art deep learning methods for process prediction, Artificial Intelligence Review (2021) 1–27. [10] R. Galanti, B. Coma-Puig, M. de Leoni, J. Carmona, N. Navarin, Explainable predictive process monitoring, in: 2nd International Conference on Process Mining (ICPM), IEEE, 2020, pp. 1–8. [11] Z. D. Bozorgi, I. Teinemaa, M. Dumas, M. La Rosa, A. Polyvyanyy, Process mining meets causal machine learning: Discovering causal rules from event logs, in: 2nd International Conference on Process Mining (ICPM), IEEE, 2020, pp. 129–136. [12] A. Metzger, T. Kley, A. Palm, Triggering proactive business process adaptations via online reinforcement learning, in: International Conference on Business Process Management, Springer, Cham, 2020, pp. 273–290. [13] Z. Lipton, X. Li, J. Gao, L. Li, F. Ahmed, L. Deng, Bbq-networks: Efficient exploration in deep reinforcement learning for task-oriented dialogue systems, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, Springer, Cham, 2018, pp. 5237—-5244. [14] M. M. Afsar, T. Crump, B. Far, Reinforcement learning based recommender systems: A survey, https://arxiv.org/pdf/2101.06286.pdf, 2021. [15] B. van Dongen, Bpi challenges: 10 years of real-life datasets, https://www.tf-pm.org/newsletter/newsletter-stream-2-05-2020, 2020. [16] R. Lowe, N. Pow, I. Serban, J. Pineau, The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems, in: SIGdial – Special Interest Group on Discourse and Dialogue, Springer, Cham, 2015, pp. 285–294. [17] W. Wei, Q. V. Le, A. Dai, J. Li, Airdialogue: An environment for goal-oriented dialogue research, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 3844–3854. [18] V. Logacheva, M. Burtsev, V. Malykh, V. Polulyakh, A. Seliverstove, Convai dataset of topic-oriented human-to-chatbot dialogues, in: NIPS’17 Competition: Building Intelligent Systems, Springer, Cham, 2018, pp. 47–57. [19] S. Mehri, M. Eric, D. Hakkani-Tur, Dialoglue: A natural language understanding benchmark for task-oriented dialogue, arXiv preprint arXiv:2009.13570 (2020). [20] A. H. Miller, W. Feng, A. Fisch, J. Lu, D. Batra, A. Bordes, D. Parikh, J. Weston, Parlai: A dialog research software platform, arXiv preprint arXiv:1705.06476 (2017). [21] A. Rastogi, X. Zang, S. Sunkara, R. Gupta, P. Khaitan, Schema-guided dialogue state tracking task at dstc8, arXiv preprint arXiv:2002.01359 (2020). [22] P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, M. Gašić, Multiwoz–a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling, arXiv preprint arXiv:1810.00278 (2018). [23] M. Eric, R. Goel, S. Paul, A. Sethi, S. Agarwal, S. Gao, D. Hakkani-Tür, Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines, arXiv preprint arXiv:1907.01669 (2019). [24] M. Dumas, F. Fournier, L. Limonad, A. Marrella, M. Montali, J.-R. Rehse, R. Accorsi, D. Calvanese, G. De Giacomo, D. Fahland, et al., Augmented business process management systems: A research manifesto, arXiv preprint arXiv:2201.12855 (2022). [25] N. Guenole, S. Feinzig, The business case for ai in hr, https://www.ibm.com/downloads/cas/AGKXJX6M, 2022. [26] A. Anaby-Tavor, B. Carmeli, E. Goldbraich, A. Kantor, G. Kour, S. Shlomov, N. Tepper, N. Zwerdling, Do not have enough data? deep learning to the rescue!, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 2020, pp. 7383–7390. [27] I. Teinemaa, N. Tax, M. de Leoni, M. Dumas, F. M. Maggi, Alarm-based prescriptive process monitoring, in: International Conference on Business Process Management, Springer, Cham, 2018, pp. 91–107.