A preliminary study on Business Process-aware Large Language Models Mario Luca Bernardi2,† , Angelo Casciani1,*,† , Marta Cimitile3,† and Andrea Marrella1,† 2 Department of Engineering, University of Sannio, Piazza Roma 21, Benevento, 82100, Italy 1 Department of Computer, Control and Management Engineering, Sapienza University of Rome, Via Ariosto 25, Rome, 00185, Italy 3 Department of Law and Digital Society, UnitelmaSapienza, Piazza Sassari, Rome, 00185, Italy Abstract AI-Augmented Business Process Management Systems (ABPMSs) are innovative information systems with increased flexibility, autonomy, and conversational capability. These systems can be boosted by Large Language Models (LLMs), renowned for their ability to handle natural language processing tasks. Nevertheless, no significant empirical validations exist about their usefulness in process-driven decision support. In this study, we propose a business process-oriented LLM framework, for enacting actionable conversations with workers involved in a business process, leveraging Retrieval-Augmented Generation (RAG) to enrich process-specific knowledge. The methodology has been assessed to evaluate its capacity to produce precise responses to inquiries posed by users within a public administration context. The preliminary study shows the framework’s ability to identify specific activities and sequence flows within the targeted process model, thereby providing valuable insights into its potential for improving ABPMSs. Keywords Business Process, Decision Support Systems, Large Language Models, Retrieval-Augmented Generation 1. Introduction Natural Language Processing (NLP) tasks [6]. Thanks to their huge advantages, practitioners are progressively AI-Augmented Business Process Management Systems utilizing LLMs across various domains, gaining signifi- (ABPMSs) embody new human-centered information sys- cant benefits for industries and business operations while tems distinguished by significant flexibility, autonomy, reshaping the dynamics of human interaction with man- and extensive conversational and self-enhancement abil- agement systems [7]. Notably, LLMs have been trans- ities. [1]. Thus, Artificial Intelligence (AI) expands con- forming several organizations towards the paradigm of ventional process-aware Decision Support Systems (DSS) autonomous enterprise and enable ABPMSs to hold a to facilitate prompt and effective decision-making by elu- central position in assisting human activities and deci- cidating the underlying factors influencing the decisions sions across the system life cycle. Indeed, starting from [2]. Integrating ABPMSs into human workflows may business processes, LLMs should transcend local reason- introduce shifts in workforce dynamics, potentially lead- ing contexts, support the management of diverse scenar- ing to a lack of trust [3]. One possible remedy for this ios, and enhance the business activities understanding challenge is the incorporation of Conversational Systems [7]. In front of the recognized potentiality of LLMs to (CSs). The emergence of CSs presents a promising av- assist human decisions in the business landscape [1], enue for enhancing Business Process Management (BPM) this topic is few explored in literature [7] and, as far initiatives, significantly empowering ABPMSs [4, 5]. The as we are aware, an empirical validation regarding the adoption of Large Language Models (LLMs) could push efficacy of LLMs for process-aware decision support is substantial advancements in these systems [5]. LLMs missing. In this research context, our work presents an represent an emerging class of machine learning models innovative methodology for business process analysis showcasing great performance in accomplishing various leveraging the usage of LLMs to develop a conversational process-aware DSS. We propose to adopt a process-aware Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- Retrieval-Augmented Generation (RAG) [8] framework to nized by CINI, May 29-30, 2024, Naples, Italy * Corresponding author. extend process- and domain-specific knowledge, in the † direction of improving the conversational capability of These authors contributed equally. $ bernardi@unisannio.it (M. L. Bernardi); a LLM to respond to business process-related inquiries. angelo.casciani@uniroma1.it (A. Casciani); The overall system supports the user in a wide range of marta.cimitile@unitelasapienza.it (M. Cimitile); process comprehension and execution tasks using natu- andrea.marrella@uniroma1.it (A. Marrella) ral language. Our work evaluates the proficiency of the  0000-0002-3223-7032 (M. L. Bernardi); 0009-0003-7843-8045 methodology in producing precise and contextually ap- (A. Casciani); 0000-0003-2403-8313 (M. Cimitile); 0000-0002-1031-0374 (A. Marrella) propriate responses to process-related questions within © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License different settings. In particular, we investigate the effi- Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings cacy of the approach in a real-world scenario within the 3. The Business Process LLM realm of public administration. In this study, we present a business process-oriented LLM framework, better detailed in [32]. The steps utilized 2. Related Work for answering queries pertaining to business processes are summarized in Figure 1. The overall architecture As asserted in [4], the integration of CSs holds significant comprises two major phases: Knowledge Augmentation potential for enhancing ABPMSs. Numerous methodolo- and Querying. gies have emerged in recent years directed at leveraging the capabilities of CSs to enhance various critical areas Knowledge Augmentation The process-aware LLM within BPM [5]. pipeline starts by considering a business process model In the sub-field of Descriptive Process Analytics, describ- in input, resulting in the production of multiple chunks. ing current business processes and identifying problems This operation is undertaken to facilitate the LLM’s un- and potential improvements, NLP and neural architec- derstanding in generating responses. In this study, we tures, proved their effectiveness in extracting process utilized a Directly-Follows Graph (DFG) representation models from natural language descriptions [9, 10, 11]. expressed in natural language. Conversely, expressing business process models in natu- In fact, chunking aims to partition broad textual con- ral language aids human comprehension [12, 13]. More- tent into more manageable segments, enabling the LLM over, conversational interfaces further enhance under- to ingest only relevant context and overcoming limita- standing and accessibility of process mining findings tions imposed by its context window. To ensure mean- [14, 15]. ingful chunks and mitigate unnatural segmentation of Predictive Process Analytics concerns building predic- the process model, two distinct chunking strategies were tive models to forecast the future state and performance evaluated: fixed-size and recursive chunking. of business processes. Specifically, current trends in this Subsequently, the framework proceeds to transform area are centered around the development of conversa- the raw input chunks into model embeddings for storage tional interfaces to assist the what-if analysis of digital in a vector index. These embeddings are dense, low- process twins [16, 17] and predictive process monitoring dimensional vectors designed to encapsulate semantic [18, 19, 20]. information and contextual relationships necessary for Prescriptive Process Optimization primarily focuses on the successive retrieval and generation operations. improving processes, often by translating insights into ac- Afterward, the business process model embeddings tionable steps aimed at enhancing process execution. CSs are stored within a specialized vector database to enable designed for this BPM area mainly support automated efficient retrieval. This retrieval procedure is enacted process optimization, suggesting adjustments to optimize through semantic search that, in our case, relies on cosine process performance across various indicators [21, 22]. similarity. Additionally, these systems contribute to prescriptive pro- cess monitoring, providing real-time recommendations for actions to be taken, as illustrated in [23]. Querying The Querying stage begins with the retrieval Augmented Process Execution embodies the concept of the pertinent process model chunks needed for the wherein system-driven management actively oversees crafting of precise responses to the process-related ques- business process execution, with human operators pro- tions. In particular, this retrieval step involves fetching viding support as needed. In this sub-field, various relevant process chunks from the vector store through conversational agents have been developed to facilitate semantic search utilizing cosine similarity. Following seamless interaction between systems and human users this, these segments, along with the user question, are [24, 25, 26]. Furthermore, Robotic Process Automation fed into an LLM to generate an answer. (RPA), which involves creating software robots to auto- Ultimately, to offer contextually grounded answers mate repetitive tasks on application user interfaces, will based on the user query and the retrieved information, likely benefit from the combination with CSs. Such inte- the proposed framework relies on two primary compo- gration enables the automation of business processes nents: a LLM and its associated tokenizer. Initially, a [27, 28, 29], and aids in identifying suitable routines prompt is formulated by merging the user query with for automation through natural language interaction the previously retrieved process context. Subsequently, [30, 31]. the tokenizer converts the prompt into a format com- prehensible by the model. Eventually, the prompt is fed to the LLM to generate contextually relevant answers. In particular, our process-aware approach integrates the Llama 2 13B [33] model as the LLM. Figure 1: The business process-oriented LLM framework. 4. Evaluation using the DFG expressed in natural language. The queries adopted in this evaluation require, to be We performed a preliminary validation on the adoption answered, to recognize both structural and behavioral of the proposed framework by applying it to a real public information within the model. By structural information, administration procedure. The process model, illustrated it is considered the presence of activities, events, and in Figure 2, involves the reimbursement of expenses for gateways in the process model whereas behavioral in- missions, a critical procedure within a university. This formation encompasses details concerning the sequence administrative process entails the processing of expense flows linking these entities. reports submitted by employees and the subsequent de- Specifically, for structural information correctness cision to either reimburse or reject these reports. In par- analysis, we queried the presence of specific activi- ticular, the process was analyzed using textual DFG de- ties within the business process model, prompting the scriptions of activities and sequence flows. pipeline to answer with a simple "yes" or "no" and to The proposed framework, being rooted in generative provide relevant contextual references if available. models, provides feedback to users in natural language. When assessing behavioral features, inquiries were ex- To assess its effectiveness in aiding users’ comprehension pressed to check the presence of sequence flows between of business processes, the validation encompasses assess- specified activities in the process representation. The ing the accuracy of the answers concerning the entities LLM was prompted to state their existence in a binary and relationships present in both the process model and manner, reporting contextual references. the response of the LLM. The conclusion derived from Striving to obtain a thorough evaluation, we analyzed this research effort centers on evaluating the approach’s all single-pass transitions, an equivalent number of se- overall effectiveness in assisting business process users quence flows between activities present in the model but and discussing its potential applications in real-world not directly connected, and the same number of flows scenarios. linking tasks that do not belong to the process. First, we assessed the performance of the RAG-based 4.1. Evaluation Setting framework in comparison to the basic version of the language model for responding to the queries within the All the evaluations are performed using the reimburse- context of the reimbursement process model. ment process model previously introduced, represented Specifically, we estimated the capability of the LLaMA Figure 2: The DFG model of the reimbursement process in a university. 2 13B model and the RAG-based pipeline in addressing Table 1 related to business processes, employing accuracy as the Accuracy obtained for basic LLM and RAG framework. measure. Methodology Representation Accuracy For this reason, we designed an evaluation approach Basic LLM None 40.18% for assessing the performance of the framework relying RAG-based framework Natural Language DFG 72.37% on binary response questions (expecting either a "yes" or a "no" as allowed answers) to allow a rigorous assessment of the provided answers. The accuracy quantifies the comprehensive overview of the process model, enabling proportion of exact predictions generated by the LLM in the language model to generate grounded responses. answering the user’s questions out of the total responses The experiments were conducted on a workstation provided. We classify predictions given by the framework running the Linux/Ubuntu 22.04.3 LTS operating system as true positives (TP) when they correspond to positive and equipped with an NVIDIA A100 GPU. expected outcomes and as true negatives (TN) when they match negative expected outcomes. Vice versa, false positives (FP) arise when the approach produces positive 4.2. Evaluation results answers opposite to negative expectations, whereas false We proceed to analyze the results obtained during the negatives (FN) derive from negative answers generated evaluation phase under various experimental conditions. by the framework despite positive expected ones. The results in terms of accuracy for the basic LLM and the RAG-based pipeline on the reimbursement DFG 𝑇𝑃 + 𝑇𝑁 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1) model described in natural language are presented in 𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁 Table 1. The table demonstrates a notable improvement in accu- Subsequently, we estimated the effects of employing racy upon utilizing the RAG-based LLM, which is consis- various chunking techniques within the process-aware tent with our expectations for the test. This enhancement LLM pipeline, alongside investigating how prompt en- exhibits an acceptable performance level (72.37 percent) gineering can further augment the framework’s perfor- for the framework, relying on the natural language repre- mance. Fixed-size and recursive chunking with different sentation to drive more informed and accurate decision- sizes are tested. making. In both cases, the accuracy (reported in Formula 1) of Our observations revealed instances of hallucination, the framework in answering the queries is evaluated. wherein the pure LLM would provide responses despite We carried out this evaluation employing an oracle that lacking pertinent information about the process model, considers both the query and the corresponding binary occasionally asserting familiarity with certain activities response as input. Such oracle compares the answers of even when such knowledge was absent. the pipeline with the expected ones and computes the Table 2 illustrates the accuracy computed using various accuracy as the ratio of correct predictions to the total chunking methods, including no chunking, fixed-size number of tests conducted in that particular assessment. chunking, and recursive chunking. In our experimentation, we found that by retrieving Comparable outcomes are achieved through the us- the top 20 chunks, we were always able to capture a age of a fixed-size strategy and a recursive technique Table 2 References Accuracy obtained using different chunking strategies. [1] M. Dumas, F. Fournier, L. Limonad, A. Marrella, Chunking Accuracy M. Montali, et al., AI-augmented Business Process No Chunking Fixed 79.52% 81.58% Management Systems: A Research Manifesto, ACM Recursive 82.89% Trans. Manage. Inf. Syst. 14 (2023). doi:10.1145/ 3576047. [2] P. Agarwal, B. Gao, S. Huo, P. Reddy, et al., A for chunking leveraging the natural language represen- Process-Aware Decision Support System for Busi- tation. In both cases, the ideal size for the chunks is ness Processes, in: Proceedings of the 28th ACM identified as 128 tokens with a 10-token overlap. We SIGKDD Conference on Knowledge Discovery and can attribute this observation to the relatively modest Data Mining, KDD ’22, 2022, p. 2673–2681. doi:10. scale of the process model, which causes its content to 1145/3534678.3539088. be nearly encapsulated within a single chunk. Addition- [3] J. D. Lee, K. A. See, Trust in automation: Designing ally, the above consideration clarifies why the absence of for appropriate reliance, Human factors 46 (2004). chunking yields analogous results. [4] D. Chapela-Campa, M. Dumas, From process min- ing to augmented process execution, Software and Systems Modeling (2023) 1–10. 5. Conclusion [5] A. Casciani, M. L. Bernardi, M. Cimitile, A. Mar- rella, Conversational Systems for AI-Augmented In conclusion, this work introduced a business process- Business Process Management, in: Proceedings of aware LLM, an innovative framework designed to facili- the 18th Research Challenges in Information Sci- tate actionable conversations and support process-aware ence (RCIS 2024), 2024, pp. 1–16. DSSs, thereby laying the ground for intelligent interac- [6] I. Ozkaya, Application of Large Language Mod- tion with ABPMSs. The proposed methodology, tailored els to Software Engineering Tasks: Opportunities, for aiding business process analysis, aims to enhance Risks, and Implications, IEEE Software 40 (2023) the conversational skills of LLMs in the business pro- 4–8. doi:10.1109/MS.2023.3248401. cess context. This objective is realized through the de- [7] D. Fahland, F. Fournier, L. Limonad, I. Skarbovsky, velopment of a RAG-based architecture, which extends et al., How well can large language models explain its knowledge of the structural and behavioral aspects business processes?, 2024. arXiv:2401.12846. of process models by ingesting contextual information [8] P. Lewis, E. Perez, A. Piktus, F. Petroni, et al., concerning specific inquiries. Consequently, the process- Retrieval-Augmented Generation for Knowledge- aware framework is equipped to assist users in under- Intensive NLP Tasks, in: Proceedings of the 34th standing and executing business processes through a International Conference on Neural Information natural language interface. Additionally, we assessed the Processing Systems, NIPS’20, 2020, pp. 1–16. performance of the process-aware LLM in providing pre- [9] K. Sintoris, K. Vergidis, Extracting business process cise and pertinent answers to the queries posed by the models using natural language processing (NLP) users across diverse evaluation scenarios. techniques, in: Proceedings - 2017 IEEE 19th Con- In future research within the domain of process dis- ference on Business Informatics, CBI 2017, vol- covery [34], we intend to delve into the analysis of the ume 1, Institute of Electrical and Electronics En- business process execution information and explore the gineers Inc., 2017, p. 135 – 139. impact of different embedding models on the developed [10] H. van der Aa, K. J. Balder, F. M. Maggi, A. Nolte, technique. Furthermore, investigating the integration Say it in your own words: Defining declarative of the framework with symbolic AI solvers to embed process models using speech recognition, BPM reasoning capabilities could present another intriguing Forum (2020). avenue for future work. [11] C. Qian, L. Wen, A. Kumar, BEPT: A behavior-based process translator for interpreting and understand- Acknowledgments ing process models, Int. Conf. on Information and Knowledge Management, Proceedings (2019). The work of Angelo Casciani has been carried out in [12] L. Ackermann, S. SchöNig, et al., Natural language the range of the Italian National Doctorate on AI run by generation for declarative process models, CAiSE Sapienza. Workshops LNBIP 231 (2015) 3 – 19. [13] Y. Fontenla-Seco, M. Lama, A. Bugarín, Process-To- Text: A Framework for the Quantitative Description of Processes in Natural Language, Trustworthy AI - Integrating Learning, Optimization and Reasoning wards Hybrid Automation by Bootstrapping Con- Workshop (2020). versational Interfaces for IT Operation Tasks, in: [14] L. Barbieri, E. Madeira, K. Stroeh, W. van der Aalst, AAAI, 2023, pp. 15654–15660. A natural language querying interface for process [27] P. D. Hung, D. T. Trang, T. Khai, Integrating Chat- mining, Journal of Intelligent Information Systems bot and RPA into Enterprise Applications Based on 61 (2023) 113 – 142. Open, Flexible and Extensible Platforms, in: Int. [15] H. Yeo, E. Khorasani, V. Sheinin, I. Manotas, N. P. A. Conf. on Cooperative Design, Visualization and En- Vo, O. Popescu, P. Zerfos, Natural Language Inter- gineering, 2021, pp. 183–194. face for Process Mining Queries in Healthcare, in: [28] G. Dan, D. Claudiu, F. Alexandra, et al., Multi- Proceedings - 2022 IEEE Int. Conf. on Big Data, Big Channel Chatbot and Robotic Process Automation, Data 2022, Institute of Electrical and Electronics in: IEEE Int. Conf. on Automation, Quality and Engineers Inc., 2022, p. 4443 – 4452. Testing, Robotics, 2022, pp. 1–6. [16] D. Barón-Espitia, M. Dumas, O. González-Rojas, [29] Y. Rizk, V. Isahagian, S. Boag, et al., A Conver- Coral: Conversational What-If Process Analysis, in: sational Digital Assistant for Intelligent Process ICPM Demo, 2022, pp. 118–122. Automation, BPM Forum (2020). [17] M. Li, R. Wang, X. Zhou, Z. Zhu, Y. Wen, R. Tan, [30] H. van der Aa, H. Leopold, Automatically identi- ChatTwin: Toward Automated Digital Twin Gen- fying process automation candidates using natural eration for Data Center via Large Language Mod- language processing, Blockchain and Robotic Pro- els, in: Proceedings of the 10th ACM Int. Conf. cess Automation (2022) 77 – 86. on Systems for Energy-Efficient Buildings, Cities, [31] Z. Zeng, W. Watson, N. Cho, et al., FlowMind: and Transportation, BuildSys ’23, Association for Automatic Workflow Generation with LLMs, in: Computing Machinery, 2023, p. 208–211. ACM Int. Conf. on AI in Finance, 2023, pp. 73–81. [18] K. Brennig, K. Benkert, B. Löhr, O. Müller, Text- [32] A. Casciani, M. L. Bernardi, M. Cimitile, A. Mar- Aware Predictive Process Monitoring of Knowledge- rella, Conversational systems for ai-augmented Intensive Processes: Does Control Flow Matter?, business process management, PREPRINT (Version in: Int. Conf. on Business Process Management, 1) available at Research Square (2024). doi:https: Springer, 2023, pp. 440–452. //doi.org/10.21203/rs.3.rs-4125790/v1. [19] L. Cabrera, S. Weinzierl, S. Zilker, M. Matzner, Text- [33] H. Touvron, T. Lavril, G. Izacard, X. Martinet, et al., Aware Predictive Process Monitoring with Contex- LLaMA: Open and Efficient Foundation Language tualized Word Embeddings, in: BPM Workshops, Models, 2023. arXiv:2302.13971. volume 460 LNBIP, 2022, p. 303 – 314. [34] M. L. Bernardi, M. Cimitile, C. Di Francescomarino, [20] C. Warmuth, H. Leopold, On the Potential of Tex- F. M. Maggi, Using discriminative rule mining tual Data for Explainable Predictive Process Mon- to discover declarative process models with non- itoring, ICPM Workshops 468 LNBIP (2023) 190 – atomic activities, Lecture Notes in Computer Sci- 202. ence (including subseries Lecture Notes in Arti- [21] S. Badini, S. Regondi, E. Frontoni, R. Pugliese, As- ficial Intelligence and Lecture Notes in Bioinfor- sessing the capabilities of ChatGPT to improve addi- matics) 8620 LNCS (2014) 281 – 295. doi:10.1007/ tive manufacturing troubleshooting, Advanced In- 978-3-319-09870-8_21. dustrial and Engineering Polymer Research (2023). [22] A. Mustansir, K. Shahzad, M. K. Malik, Towards automatic business process redesign: an NLP based approach to extract redesign suggestions, Auto- mated Software Engineering 29 (2022). [23] S. Zeltyn, S. Shlomov, et al., Prescriptive Process Monitoring in Intelligent Process Automation with Chatbot Orchestration, in: PMAI@IJCAI, 2022, pp. 49–60. [24] T. Chakraborti, S. Agarwal, Y. Khazaeni, et al., D3BA: A Tool for Optimizing Business Processes Using Non-deterministic Planning, in: BPM Work- shops, 2020, pp. 181–193. [25] L. F. Lins, G. Melo, T. Oliveira, et al., PACAs: Process-Aware Conversational Agents, in: BPM Workshops, 2021, pp. 312–318. [26] J. Bandlamudi, K. Mukherjee, P. Agarwal, et al., To-