Stakeholder-specific Jargon-based Representation of Multimodal Data within Business Process Aleksandar Gavric, Dominik Bork and Henderik A. Proper Business Informatics, TU Wien, Favoritenstrasse 9-11/194-3, 1040 Vienna, Austria Abstract Stakeholders can struggle to understand and engage with process models due to a mismatch between the technical language used and their own domain-specific jargon and personal communication styles. The paper explores the application of transformer-based architectures to enhance the representation of process models and additional multimodal process data by tailoring them to the language of stakeholders. We present an approach that personalizes process model representations through two types of paraphrasers: one that aligns with domain- specific jargon and another that adapts to individual stakeholder styles. We developed a golden dataset from process model-stakeholder interaction simulation and a silver dataset using large language models to train and validate our approach. Initial findings suggest that these methods could enhance stakeholder engagement and contribute to better teaching of process mining and procedural thinking. Keywords Process Models, Transformer models, Multimodal Evidence, Process Representation 1. Introduction Process mining focuses on extracting insights from event logs to discover, monitor, and improve actual processes by analyzing the flow of activities within an organization [1]. Beerepoot et al. [2] have highlighted that the struggle between human involvement and task automation in managing work processes points to the significant impact that resolving these challenges will have on knowledge- intensive work. Despite its potential, one of the significant challenges in process mining is effectively communicating the insights gained from these analyses to stakeholders [3], who often come from diverse backgrounds with varying levels of familiarity with the technical and domain-specific language. The precision required in process models leads to the use of jargon, which, while transparent to domain experts, can be confusing or opaque to others. This communication barrier can hold back the adoption of process mining insights, limiting their impact on decision-making and process improvement. For instance, let’s imagine a project manager, Sarah, who oversees the implementation of a new digital healthcare system in a large hospital. Sarah has extensive experience in project management, but her familiarity with healthcare-specific jargon is limited. She works closely with a team of doctors, nurses, and IT specialists, each fluent in their domain language. During meetings, the medical professionals often discuss processes in terms that are second nature to them—terms like “EHR integration,” “clinical workflows,” and “patient pathway optimization.” To them, these phrases precisely capture the complexity of the processes involved in patient care. However, to Sarah, who lacks a clinical background, these discussions often feel like a foreign language, making it difficult for her to grasp the important details of the system she’s managing. Now, let’s imagine another scenario where Sarah is working on a similar project, but this time, the communication has been tailored to her level of understanding. Instead of using healthcare-specific jargon, the process models are described in more general project management terms. For instance, rather than discussing “EHR integration,” the conversation revolves around “aligning the digital system with existing hospital processes.” Instead of “clinical workflows,” they talk about Companion Proceedings of the 17th IFIP WG 8.1 Working Conference on the Practice of Enterprise Modeling Forum, M4S, FACETE, AEM, Tools and Demos co-located with PoEM 2024, Stockholm, Sweden, December 3-5, 2024 Envelope-Open aleksandar.gavric@tuwien.ac.at (A. Gavric); dominik.bork@tuwien.ac.at (D. Bork); henderik.proper@tuwien.ac.at (H. A. Proper) Orcid 0009-0005-1243-7722 (A. Gavric); 0000-0001-8259-2297 (D. Bork); 0000-0002-7318-2496 (H. A. Proper) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings “task sequences in patient care.” In this scenario, Sarah could feel more confident and engaged in the project because the information is presented in a way that resonates with her background and expertise. This contrast between jargon-heavy communication and language tailored to the listener’s experience is not just a hypothetical situation—it’s an identified challenge in many industries [4, 5], especially those that rely on complex processes and specialized knowledge, such as healthcare, finance, manufacturing, or education. The problem of jargon-laden communication is intensified in process mining, where the interpretation and representation of process models are essential for understanding and improving organizational workflows. Process mining involves extracting knowledge from event logs to visualize and analyze processes. However, when these process models are presented in technical or domain-specific language that stakeholders may not understand, the benefits of process mining can be significantly diminished. Existing methods have made progress in addressing similar challenges by processing language [6, 7] or using visual aids [8, 9] to make process models more accessible. Natural language processing (NLP) technologies, particularly those based on transformer architectures, have also shown promise in generating more understandable text by leveraging vast amounts of contextual information. These approaches [10], however, often remain one-size-fits-all solutions, lacking the personalization needed to engage stakeholders who may have varying levels of familiarity with the subject matter. This paper proposes an approach that leverages transformer-based architectures to create personalized representations of process models. Our goal is to bridge the communication gap by developing two types of paraphrasing: one that aligns with the domain-specific jargon used by experts and another that adapts to the individual communication styles of different stakeholders. But language is only part of the equation. In complex fields like healthcare, education, and industrial operations, process models can benefit from integrating multimodal evidence—combining text, images, data visualizations, and even video to provide a comprehensive understanding of the processes involved. Integrating this multimodal data into personalized process models adds another layer of complexity and offers an opportunity to enhance the discovery of learning patterns within process mining [11]. By understanding how different stakeholders interact with these multimodal representations, we can gain insights into their learning processes, which can improve how we teach and implement process mining techniques [10]. In the sections that follow, we will discuss the related work (Section 2), detail our methodology for developing (Section 3) and the personalized paraphrases, explore the implications of our findings for the future of process mining and stakeholder communication (Section 4), and conclude with closing remarks (Section 5). Through this work, we hope to contribute to the ongoing efforts to make complex processes more accessible, understandable, and actionable for all stakeholders involved. 2. Related Work This section provides an overview of the most relevant research in process modeling and machine learning techniques, particularly in the domain of paraphrasing and semantic transformation of process- related representations, contextualizing our work within the broader landscape of NLP-based process model management. 2.1. Paraphrasing and Semantic Transformation in Process Models One key challenge in business process model management is ensuring that models are interpretable and usable by various stakeholders, each with varying expertise and domain-specific knowledge. Early work by Leopold et al. [12] addressed this issue by introducing automated techniques for transforming business process models into natural language descriptions. Their approach laid the groundwork for subsequent research by demonstrating that computerized tools could effectively bridge the gap between formal process models and natural language, albeit with limited adaptability to different domains or stakeholder needs. Recent advancements in transformer-based models, such as BERT [13] and GPT [14], have opened new avenues for paraphrasing and semantic transformation tasks. These models have been applied to various domains, including text summarization, translation, and paraphrasing, but their application to process models is still an emerging field. In a recent work [15], Kourani et al. (2024) leverage the capabilities of Large Language Models (LLMs) to represent process models in the context of Business Process Management (BPM). This study introduces a novel framework that harnesses LLMs to enhance the interpretability of complex process models, addressing challenges that arise as organizations scale and processes become increasingly intricate. Our work builds on these advancements by applying transformer models specifically designed to paraphrase process models across multiple domains and stakeholder profiles, addressing the limitations of earlier approaches in handling domain-specific jargon and personalized communication. 2.2. Domain-Specific Language Models The use of domain-specific language models has been explored in several contexts, particularly in medical [16] and legal [17] domains, where the accuracy of language processing is critical due to the specialized terminology involved. These studies highlight the importance of tailoring language models to specific domains to improve performance. For instance, Lee et al. [16] developed BioBERT, a variant of BERT pre-trained on biomedical text, which significantly outperformed general-purpose models on tasks like named entity recognition and relation extraction in the medical domain. Our work draws inspiration from these works by training custom transformers on process model representations specific to different domains. This approach allows our models to effectively generate paraphrases that are accurate and contextually relevant, aligning with each domain’s specialized terminology and communication styles. 2.3. Hybrid Approaches to Process Model Translation The literature has also explored hybrid approaches that combine rule-based methods with machine- learning techniques. For example, Friedrich et al. [18] developed a hybrid approach to automatically generate textual descriptions of process models by combining rule-based transformations with statistical methods. While effective, these approaches often require extensive domain knowledge to implement and are less adaptable to new or evolving domains. Our work diverges from these traditional hybrid methods by leveraging entirely data-driven transformer models, which learn the nuances of process model paraphrasing directly from training data. Zerbato et al. (2023) develop methodological guidance [19] to assist novice analysts during their analysis and build an empirical basis for process mining, laying the foundation for the development of user-centered support. Our work aims to contribute to question development in process mining and interactive modeling, addressing areas where support is still lacking. Overall, our work extends the existing literature on process model paraphrasing by introducing transformer-based models explicitly tailored to domain-specific jargon and stakeholder communication styles. 3. Adaptive Transformer-Based Framework for Process Model Communication We propose a transformer-based architecture to address the challenges of effectively communicating process models to stakeholders with varying levels of domain expertise (see Fig. 1). This architecture consists of two core components: a jargon-specific paraphraser (proc2jargon ) and a personalized paraphraser (proc2ownw ); both are designed to translate process models and multimodal data into text that is accessible and meaningful to different audiences. Figure 1: Illustration of our solution that includes (left) a jargon-specific paraphraser (proc2jargon ) for domain-specific language and (right) a personalized paraphraser (proc2ownw ) tailored to individual stakeholder preferences. 3.1. Jargon-Specific Paraphraser (proc2jargon ) proc2jargon generates text that aligns with the specialized domain jargon. This model leverages pre-trained knowledge combined with the specifics of the process model and any associated multimodal data to produce outputs that maintain the technical rigor and precision expected by domain experts. Let 𝐷𝑗 represent the domain-specific jargon dictionary, and X𝑝𝑚 denote the input process model, which includes both the textual and multimodal data features. The model’s task is to generate a sequence Y𝑗 = (𝑦𝑗1 , 𝑦𝑗2 , … , 𝑦𝑗𝑛 ) where each 𝑦𝑗𝑖 ∈ 𝐷𝑗 . The architecture of proc2jargon can be represented as Y𝑗 = Transformerjargon (E𝑝𝑡𝑘 , E𝑝𝑚 ) where E𝑝𝑡𝑘 is the embedding of pre-trained knowledge, and E𝑝𝑚 is the embedding of the process model and multimodal data. This model aims to maximize the conditional probability 𝑃(Y𝑗 ∣ X𝑝𝑚 , E𝑝𝑡𝑘 ), such that: 𝑛 𝑃(Y𝑗 ∣ X𝑝𝑚 , E𝑝𝑡𝑘 ) = ∏ 𝑃(𝑦𝑗𝑖 ∣ X𝑝𝑚 , E𝑝𝑡𝑘 , 𝑦𝑗1 , … , 𝑦𝑗(𝑖−1) ) 𝑖=1 The conditional probability 𝑃(Y𝑗 ∣ X𝑝𝑚 , E𝑝𝑡𝑘 ) represents the likelihood of generating the sequence of jargon terms Y𝑗 given the input process model X𝑝𝑚 and the embedding of pre-trained knowledge E𝑝𝑡𝑘 . In essence, the model generates each jargon term one by one, ensuring that each term is not only contextually appropriate based on the process model and pre-trained knowledge but also coherent with the previously generated terms in the sequence. 3.2. Personalized Paraphraser (proc2ownw ) proc2ownw produces text personalized to individual stakeholders’ communication styles and language preferences. This model balances integrating human-model interaction data with pre-trained knowledge and process model inputs to generate outputs that resonate with non-expert stakeholders. Let 𝐻 represent the set of human-model interaction embeddings, which encode personalized commu- nication preferences, and let 𝑤 be the weight that determines the influence of these interactions on the model’s output. The text sequence Y𝑝 = (𝑦𝑝1 , 𝑦𝑝2 , … , 𝑦𝑝𝑛 ) generated by proc2ownw is computed as: Y𝑝 = Transformerpersonal (𝑤 ⋅ E𝐻 + (1 − 𝑤) ⋅ E𝑝𝑡𝑘 , E𝑝𝑚 ) where E𝐻 is the embedding of human-model interactions, and E𝑝𝑡𝑘 and E𝑝𝑚 are as defined previously. The model optimizes the conditional probability 𝑃(Y𝑝 ∣ 𝐻 , X𝑝𝑚 , E𝑝𝑡𝑘 ), expressed as: 𝑛 𝑃(Y𝑝 ∣ 𝐻 , X𝑝𝑚 , E𝑝𝑡𝑘 ) = ∏ 𝑃(𝑦𝑝𝑖 ∣ 𝐻 , X𝑝𝑚 , E𝑝𝑡𝑘 , 𝑦𝑝1 , … , 𝑦𝑝(𝑖−1) ) 𝑖=1 The weights 𝑤 and 1 − 𝑤 are determined based on historical interaction data, allowing the model to adapt over time and refine its outputs for each stakeholder. This formulation allows for dynamic adaptation to the stakeholder’s language preferences, producing outputs that are accurate in content and tailored in style. Figure 2: The Solve4X [21] process data used for training of our framework. We explore different inputs to our framework, in particular: (A) an Object-Centric Event Log (OCEL), (B) multimodal evidence, and (C) a process model example. (The labels in the model example are illustrative and should be read according to [22]) 3.3. Multimodal Data Handling To incorporate multimodal data, including textual descriptions, images, videos, and structured data (e.g., event logs), our models process input by embedding these different data types into a unified representation space. The multimodal embeddings are combined with the process model embeddings through a fusion function [20] that integrates these various data modalities. We use the unified representation space to input pre-trained multimodal embeddings as tokens. 3.4. Training proc2jargon and proc2ownw models The training process begins with the preparation of the dataset. In our case, the dataset includes detailed process instances related to asset disbursement, Solve4X [21], where each instance provides a multimodal source of information for the model to learn from. The process model outlines several key activities: the IT staff using an asset management system to manage the issuance of items, performing quality checks, and the eventual handover of assets to clients. Alongside these textual descriptions, the dataset includes multimodal data such as sensor readings. We take the event log and multimodal evidence from Solve4X and create oc-DFG (object-centric Directly Follows Graphs), oc-Petri Net and oc-BPMN (Business Process Model and Notation) model using OC-PM (process mining) tool [22]. The data is illustrated in Fig. 2. We utilized a supervised learning approach to train the two transformer-based models. The trans- former model’s attention mechanism plays a crucial role in this process. Multi-head self-attention allows the model to focus on different parts of the input sequence, capturing complex relationships between the elements of the process description. The attention mechanism calculates the weighted sum of values based on the similarity between queries and keys, enabling the model to attend to relevant Table 1 Summary of Profile Background Knowledge Profile Background Knowledge Profile A Advanced technical background in engineering Profile B Background in project management with moderate technical knowledge Profile C Executive-level knowledge, low technical detail required information from different subspaces. This is implemented using PyTorch and NanoGPT1 , simplifying the creation of these attention layers. The overall architecture, which includes both encoder and decoder components, is trained on a single NVIDIA A40 GPU, optimizing the model using the Adam optimizer. We pre-trained our model on the openwebtext [23] dataset and trained on custom Solve4X-based data with a micro-batch size of 12, using gradient accumulation steps of 40, a block size of 1024, 12 layers with 12 attention heads and an embedding size of 768, no dropout during pretraining, no bias in LayerNorm and Linear layers, an AdamW optimizer with a max learning rate of 6e-4, weight decay of 0.1, beta values of 0.9 and 0.95, gradient clipping at 1.0, and a learning rate decay enabled over 600,000 iterations, including 2,000 warmup steps, down to a minimum learning rate of 6e-5. 4. Evaluation To guide our exploration of personalized process model representation, we formulate our research question (RQ) as: how can transformer-based architectures be effectively utilized to generate paraphrased process data that adapt to the diverse linguistic preferences of stakeholders? This question addresses the challenge of adapting technical language to improve stakeholder engagement without losing the precision required for accurate decision-making. In this section, we provide details on the evaluation of the custom transformer models proc2ownw and proc2jargon by creating simulated stakeholder profiles with distinct personalities and generating domain-specific jargon corpora for five different fields. The evaluation aims to assess the ability of proc2ownw to tailor communication to specific stakeholders and proc2jargon to generate domain- specific language accurately. The experiments use simulated conversations parameterized to cover realistic scenarios. We developed three distinct stakeholder personality profiles to test the personalization capabilities of proc2ownw . Each profile has unique communication preferences and background knowledge, which influence how they interact with process models. The profiles were designed to reflect common stakeholders in various professional environments, as detailed in Table 1. These profiles were used to generate simulated conversations, which were then fed into the model to evaluate its ability to adapt its output to different communication styles. 4.1. Interviewing We interview each profile using a set of evaluation questions, with examples provided in the following. Questions follow the discovered steps taken during an exploratory data analysis in a process-oriented experiment [24] conducted by Daele and Janssenswillen (2023). For the test scenarios, we created questions both from the same domain as the training data [21] and from other domains to assess cross-domain generalization in relation to the domain of the process being taught. For the other domain, we chose data that originate from a large multinational company operating from The Netherlands in the area of coatings and paints [25]. Profile A (Analytical): Human: “Can you explain the process of asset disbursement in detail?” Model: “Certainly. The process begins with IT staff utilizing an asset management system to track each asset. Every step is logged, from retrieval to quality control, ensuring compliance with technical standards. Detailed logs and sensor data verify the accuracy at each stage.” 1 https://github.com/karpathy/nanoGPT Profile B (Collaborative): Human: “I think the team needs a clear outline of the steps involved.” Model: “Sure! The process is straightforward: IT staff check out the item in the system, perform a quality check, and hand it over to the client. Everyone on the team can see these steps and collaborate to ensure nothing is missed.” Profile C (Strategic): Human: “Give me the high-level overview of asset disbursement.” Model: “The key steps involve tracking inventory in the asset management system, verifying quality, and issuing assets to clients. It’s crucial for maintaining operational efficiency and reducing errors.” To evaluate proc2jargon , we generated domain-specific jargon corpora for the medical, police, software development, logistics, and accounting domains. These jargon corpora were created using large language model (LLM) [7] to simulate the typical language used in these domains. This evaluation aimed to test how well proc2jargon could adapt its output to match the specialized language of different fields. We used LLM to generate heavy jargon content for each domain by crafting specific prompts. These prompts were designed to evoke responses filled with domain-specific language, which was then used to build the jargon corpus for proc2jargon . 4.2. Examples In the following, we provide examples of a tailored explanation for the processes of asset management systems for different professionals. For a Medical Doctor (MD): The IT team manages new equipment by logging its details into a management system, conducting thorough quality inspections, tagging each item, and storing it in designated areas. This process ensures accurate tracking and integration into the inventory, like how patient records are carefully maintained and monitored for precise medical management. For a Police Officer: IT staff handle new equipment by documenting its details in a system, inspecting it for any issues, tagging it with a barcode, and storing it in secure locations. Every action is logged, akin to how evidence is handled and recorded to ensure proper tracking and security. For a Software Engineer: New IT assets are processed by entering their specifications into a man- agement system, verifying their condition through quality checks, tagging them with barcodes, and updating their storage locations. This ensures that all inventory data is accurately synchronized and traceable, much like managing code versions and deployments. For a Logistics Representative: IT staff process new assets by recording their details in a digital system, performing quality inspections, labeling the items, and placing them in specified warehouse locations. This organized approach to inventory management ensures precise tracking and efficient storage, similar to coordinating shipments and warehouse operations. For an Accountant: The IT department handles new assets by entering their details into an asset management system, verifying their condition, tagging them for identification, and logging their storage locations. This careful documentation and tracking are crucial for maintaining accurate financial records and ensuring proper asset management, similar to tracking expenses and inventory in accounting. 4.3. Results The evaluation reports the effectiveness of both proc2ownw and proc2jargon in adapting to stakeholder preferences and domain-specific language. The results demonstrate that both proc2ownw and proc2jargon can generate customized outputs that meet the specific needs of stakeholders and domains. The findings also suggest that while pre-existing social network data can provide a useful foundation, exposure to relevant process models is essential for achieving the highest levels of personalization and accuracy. Table 2 summarizes the results of training custom models (proc2jargon and proc2ownw ) to para- phrase various types of process models. The evaluation involved multiple process model representations, including Object-Centric Event Logs (OCEL), and multimodal evidence (mmevd ), Directly Follows Graphs Table 2 Results of Model Training Across Different Process Model Types, Domains, and Personalities. Inputs Domain Soft. Average Process Inter. Type Prof. Medical Police Dev. Logistics Account. ocel2jargon ABC 90.55% 87.06% 84.01% 90.30% 87.17% 87.82% proc2jargon mmevd2jargon ABC 86.92% 91.77% 85.37% 87.83% 92.80% 88.94% dfg2jargon ABC 87.55% 85.01% 85.16% 89.10% 87.88% 86.94% petri2jargon ABC 91.02% 85.86% 87.59% 87.57% 89.85% 88.38% bpmn2jargon ABC 91.04% 85.47% 90.60% 90.78% 85.22% 88.62% A 84.95% 90.88% 88.01% 91.08% 88.70% ocel2ownw B 90.00% 91.16% 88.08% 91.65% 86.11% 88.67% C 89.21% 87.99% 86.56% 90.32% 85.40% A 88.81% 89.94% 87.00% 86.32% 84.68% mmevd2ownw B 91.10% 86.97% 92.25% 89.38% 87.81% 89.12% C 86.20% 91.73% 91.86% 91.64% 91.07% proc2ownw A 88.24% 89.60% 91.79% 88.10% 84.54% dfg2ownw B 85.06% 84.24% 85.47% 89.46% 88.40% 87.42% C 86.23% 84.47% 92.45% 87.46% 85.84% A 92.73% 91.41% 89.73% 92.46% 90.05% petri2ownw B 92.05% 90.09% 91.92% 90.13% 84.23% 90.55% C 89.65% 90.16% 91.07% 91.18% 91.37% A 88.95% 84.52% 85.28% 92.26% 88.15% bpmn2ownw B 88.03% 85.94% 84.67% 92.77% 90.13% 88.42% C 92.60% 87.24% 84.86% 92.25% 88.60% Average 89.04% 88.08% 88.19% 90.10% 87.90% 88.49% (DFG), Petri nets, and Business Process Model and Notation (BPMN). The models were trained and tested across five domains (medical, police, software development, logistics, and accounting) and three distinct stakeholder personalities, as described earlier. When it comes to process models representation, Table 2 shows the percentage accuracy in predict- ing the next token in the test conversation sets across different process model types, domains, and stakeholder personalities, indicating that the proc2ownw approach generally achieves higher accuracy than proc2jargon , particularly when representing Petri Net model, with an average accuracy of 90.55%. Accuracy refers to the percentage of correct predictions made by the model when forecasting the next token (sub-word unit) in the sequences of the test conversation sets. This accuracy was measured by comparing the predicted process-relevant token against the actual token that appeared next in the sequence. The higher the percentage, the more frequently the model correctly predicted the next token in the conversation. The relevance of each token is manually annotated based on common sense and process data description. Beyond process models, when it comes to representation of sub-model process data (event log and multimodal evidence), Table 2 shows that models trained with multimodal evidence data (mmevd2{jargon,ownw} ) generally perform better, achieving a higher average accuracy compared to those trained with OCEL data (ocel2{jargon,ownw} ). This suggests that multimodal data, which includes various forms of input like images and visualizations, enhances the model’s ability to adapt to stakeholder preferences and predict the next token more effectively than using OCEL alone. Overall, the models demonstrate a robust ability to generate domain-specific, personalized outputs, with an average accuracy of 88.49% across all categories. 5. Conclusion In this study, we explored how transformer-based architectures can be leveraged to generate paraphrased process data that adapt to the diverse linguistic preferences of stakeholders across various domains. We focused on two custom transformer models, proc2ownw and proc2jargon , to assess their ability to tailor communication effectively to different stakeholder profiles and domain-specific languages. The paper also proposes an approach that extends beyond linguistic personalization to integrate multimodal evidence—combining text and images into process models. The evaluation results demonstrate that proc2ownw is particularly effective in adapting to stakeholder communication preferences, especially when tested across distinct personality profiles. On the other hand, proc2jargon successfully generated domain-specific language, as evaluated in fields such as medical, police, software development, logistics, and accounting. Overall, our findings suggest that transformer-based models can improve communication in process management by tailoring outputs to both the stakeholder’s linguistic preferences and the specific jargon of their domain. Additionally, integrating multimodal data into personalized process models adds another layer of complexity and opens up new opportunities for enhancing stakeholder understanding and engagement. Future work will involve further refining these models to handle more complex com- munication scenarios, expanding the scope of evaluation to include additional domains and stakeholder profiles, and exploring the full potential of multimodal evidence in process mining, while conducting studies and experiments with stakeholders. References [1] W. M. P. van der Aalst, J. Carmona (Eds.), Process Mining Handbook, volume 448 of Lecture Notes in Business Information Processing, Springer, 2022. [2] Beerepoot, I. et al., The biggest business process management problems to solve before we die, Computers in Industry 146 (2023) 103837. doi:10.1016/j.compind.2022.103837 . [3] W. M. Van der Aalst, Process-aware information systems: Lessons to be learned from process mining, in: Transactions on Petri Nets and Other Models of Concurrency II, 2009, pp. 1–26. [4] T. Fassier, E. Azoulay, Conflicts and communication gaps in the intensive care unit, Current Opinion in Critical Care 16 (2010) 654–665. doi:10.1097/MCC.0b013e32834044f0 . [5] J. D. Peters, The gaps of which communication is made, Critical Studies in Media Communication 11 (1994) 117–140. [6] K. Busch, A. Rochlitzer, D. Sola, H. Leopold, Just tell me: Prompt engineering in business process management, in: Enterprise, Business-Process and Information Systems Modeling, Springer Nature Switzerland, Cham, 2023, pp. 3–11. [7] OpenAI, Chatgpt, 2024. URL: https://chat.openai.com, accessed: 2024-08-01. [8] A. Alman, A. Arleo, I. Beerepoot, A. Burattin, C. Di Ciccio, M. Resinas, Tiramisù: Making sense of multi-faceted process information through time and space, Journal of Intelligent Information Systems (2024). doi:10.1007/s10844- 024- 00875- 8 . [9] I. Bider, E. Perjons, D. Bork, Towards on-the-fly creation of modeling language jargons, in: 17th International Conference on ICT in Education, Research and Industrial Applications, 2021, pp. 142–157. [10] S. Yin, C. Fu, S. Zhao, K. Li, X. Sun, T. Xu, E. Chen, A survey on multimodal large language models, ArXiv abs/2306.13549 (2023). [11] A. Gavric, D. Bork, H. Proper, Multimodal process mining, in: 26th International Conference on Business Informatics, IEEE, 2024. URL: https://model-engineering.info/publications/papers/ CBI-MultiModalProcessMining.pdf, in press. [12] H. Leopold, J. Mendling, A. Polyvyanyy, Generating natural language texts from business process models, in: Advanced Information Systems Engineering - 24th International Conference, 2012, pp. 64–79. [13] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Conference of the North American Chapter of the Association for Computational Linguistics, 2019, pp. 4171–4186. [14] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language understanding by genera- tive pre-training, OpenAI (2018). https://cdn.openai.com/research-covers/language-unsupervised/ language_understanding_paper.pdf. [15] H. Kourani, A. Berti, J. Henrich, W. Kratsch, R. Weidlich, C.-Y. Li, A. Arslan, D. Schuster, W. M. P. van der Aalst, Leveraging large language models for enhanced process model comprehension, 2024. URL: https://arxiv.org/abs/2408.08892. arXiv:2408.08892 . [16] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, Biobert: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics 36 (2020) 1234–1240. [17] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, I. Androutsopoulos, Legal-bert: The muppets straight out of law school, arXiv preprint arXiv:2010.02559 (2020). [18] F. Friedrich, J. Mendling, F. Puhlmann, Process model generation from natural language text, in: Advanced Information Systems Engineering: 23rd International Conference, CAiSE 2011, Springer, 2011, pp. 482–496. [19] F. Zerbato, L. Zimmermann, H. Völzer, B. Weber, Promise: Process mining support for end-users., in: CAiSE Research Projects Exhibition, 2023, pp. 60–67. [20] R. Girdhar, A. El-Nouby, Z. Liu, M. Singh, K. V. Alwala, A. Joulin, I. Misra, Imagebind: One embedding space to bind them all, in: CVPR, 2023. [21] T. Fehrer, A. Egger, D. Chvirova, J. Wittmann, N. Wördehoff, W. Kratsch, M. Röglinger, Busi- ness Processes in IT Asset Management Multimedia Event Log, 2024. doi:10.6084/m9.figshare. 25246291 . [22] A. Berti, W. M. P. van der Aalst, OC-PM: analyzing object-centric event logs and process models, International Journal on Software Tools for Technology Transfer 25 (2023) 1–17. doi:10.1007/ s10009- 022- 00668- w . [23] A. Gokaslan, V. Cohen, Openwebtext corpus, http://Skylion007.github.io/OpenWebTextCorpus, 2019. [24] S. V. Daele, G. Janssenswillen, Identifying the steps in an exploratory data analysis: A process- oriented approach, in: Process Mining Workshops, 2023, pp. 526–538. [25] S. Esser, D. Fahland, Multi-dimensional event data in graph databases, 2021. URL: https://arxiv. org/abs/2005.14552. arXiv:2005.14552 .