<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Process Management in the AI era, Oct</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Variability⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>IBM Research</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Israel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabiana Fournier</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lior Limonad</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuval David</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Business Process Management, AI, Agents, LLM</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>25</volume>
      <issue>2025</issue>
      <fpage>3</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>AI agents that leverage Large Language Models (LLMs) are increasingly becoming core building blocks of modern software systems. A wide range of frameworks is now available to support the specification of such applications. These frameworks enable the definition of agent setups using natural language prompting, which specifies the roles, goals, and tools assigned to the various agents involved. Within such setups, agent behavior is nondeterministic for any given input, highlighting the critical need for robust debugging and observability tools. In this work, we explore the use of process and causal discovery applied to agent execution trajectories as a means of enhancing developer observability. This approach aids in monitoring and understanding the emergent variability in agent behavior. Additionally, we complement this with LLM-based static analysis techniques to distinguish between intended and unintended behavioral variability. We argue that such instrumentation is essential for giving developers greater control over evolving specifications and for identifying aspects of functionality that may require more precise and explicit definitions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>Artificial intelligence is advancing swiftly, transitioning from basic task automation to the development
of sophisticated, autonomous systems. A key development in this progression is the emergence of
Agentic AI. “This concept refers to AI systems that can perceive their environment, reason, plan, and
act to achieve specific goals, much like human agents.” 1. In most contemporary frameworks realizing
CEUR</p>
      <p>ceur-ws.org
developer to better crystallize the specification.</p>
      <p>
        In this work, we aim to provide software engineers who develop AI agents with the means to examine
points of variability arising in the specification of their agentic applications employing LLM-based static
analysis. We consider agent execution trajectories as process event logs that constitute timestamped
events (e.g., tool invocations) as the data source for analysis. This allows using Process Mining [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
and Causal Process Discovery [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] capabilities to reveal invocation dependencies and to recognize
variability that arises as split points in such views.
      </p>
      <p>
        Following the terminology presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], processes with similar inputs and outcomes can be
considered variations of a single process and are referred to as variants. In the process model, each
branching point is either a variation point or a decision point. It is a variation point if its branches
correspond to diferent process variants; otherwise, it is a decision point. In the context of our work, we
adopt this terminology to distinguish between intended variability, which arises from explicit decision
statements in the agent specifications (i.e., decision points), and unintended variability, which results
from the non-deterministic nature of LLM agents leading to inconsistent execution trajectories (i.e.,
variation points).
      </p>
      <p>The work presented here represents early eforts in the emerging area of agent observability. Our
main contribution lies in treating agent execution trajectories as the target of process mining. This
perspective enables the use of causal and process discovery techniques to explore the behavior and
collaboration of AI agents. As part of our approach, LLM-based static analysis complements the
discovery process by providing additional insights into behavioral variability.</p>
      <p>The remainder of the paper is organized as follows. Section 2 reviews related work. Section 3
introduces an example application of a calculator in CrewAI. Section 4 details our overall method for
agentic process observability, which is then instantiated in the context of the example application,
with results presented in Section 5. We conclude with key insights and future research directions in
Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Agentic Business Process Management (Agentic BPM) traces its roots to early work at the intersection
of Multi-Agent Systems (MAS) and BPM [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In these early eforts, agent-centric data abstractions
helped reshape complex system behavior specifications by partitioning them into smaller, encapsulated
components that were easier to specify and verify. Since the introduction of the artifact-centric
approach [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], this line of work has progressively laid the groundwork for process mining across multiple
behavioral dimensions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and more recently, has initiated discussions around an Object-Centric Event
Data (OCED) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] standard. Today, with the widespread adoption of AI and the rise of LLMs, Agentic
AI is experiencing a renaissance in BPM, reflected in the AI-Augmented Business Process Management
Systems (ABPMS) manifesto [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and echoed in the development of AI agent-centric BPM systems,
namely Agentic BPM [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        In this work, we employ process and causal mining in the scope of Agentic AI for the sake of process
observability. The former is a relatively mature discipline with instances of an agentic flavor already
explored with the goal of specification verification [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The application of causal mining to Agentic AI
is relatively new.
      </p>
      <p>
        Causal discovery aims to uncover causal relationships from observational data, distinguishing
causeefect directionality from mere correlation [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. Previous work on causal discovery from process
data generated graphs merely based on key performance indicators [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or decision points [16, 17, 18].
Our method for causal discovery in business processes [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] infers causal graphs from the activity
timestamps by adapting and extending the work in [19]. We employ causal discovery over agent
execution trajectories to uncover function calls and tool invocations dependencies within and between
the agents. The novelty of our work lies in leveraging process mining and causal process discovery,
based on the execution times of activities, to identify variability in the non-deterministic behavior of AI
agents. More specifically, it aims to reveal invocation dependencies and recognize variability that arises
at split points in these views.
      </p>
      <p>The types and the configuration of LLMs employed by agents can significantly influence agent
behavior. Although parameters such as temperature, top-k, top-p, and repetition penalty are commonly
used to reduce non-deterministic responses to identical or similar inputs, recent work already concludes
that even with stricter settings, such as setting the temperature to zero, LLMs can still exhibit notable
instability [20, 21]. Consequently, observability of such behavioral variabilities is crucial, not only
for selecting among diferent LLM models to be associated with diferent agents, but also for guiding
developers in ‘tightening’ all loose ends in the agent specifications, ultimately supporting a more
consistent user experience.</p>
      <p>The concept of variability has been extensively studied in Software Engineering (SE), particularly in
the context of feature modeling within the paradigm of software product lines [ 22]. In this paradigm,
variability is seen as a means of introducing flexibility into the software architecture, enabling multiple
alternative instantiations of a single specification to suit diferent deployment needs. A similar concept
was adopted in BPM, as in [23], where variability denotes customizable elements in a process model
representing a family of business process variants. In our work, by contrast, we focus on the undesired
form of variability that arises accidentally due to insuficiently rigorous specifications. These ‘loose
ends’ in the design enable agents to perform unforeseen behaviors during execution.</p>
      <p>We leverage an LLM-based static analysis approach to highlight the sources of variability in the
specifications. Traditionally, static analysis is an integral component of SE and involves examining
source code without executing it, to identify potential errors, code quality issues, and security
vulnerabilities [24]. Given the natural language style in which AI agents are currently specified, captured by
the recently coined term vibe coding5, our approach aligns with recent work leveraging LLMs for static
analysis in SE [25, 26, 27].
calculation_task = Task(
description=f"""
Use the provided operations to calculate the result of the expression.</p>
      <p>For each operation in the sequence:
1. If an operand is a variable (like E0), substitute its current value
2. Use the appropriate tool to perform the calculation:
- addition(a, b)
- subtraction(a, b)
- multiplication(a, b)
- division(a, b)
- evaluate_parentheses(expr)
3. Store the result in the variable specified by "name"
For every calculation step, show:
- The operation being performed: "[name] = [operation]([op1], [op2])"
- The tool being used with resolved values: "Using tool: [tool_name]([value1], [value2])"
- The result: "Result: [value]"
IMPORTANT:
- You MUST use the exact tool matching the operation
- You MUST show your work for each step
- You MUST substitute variable values correctly
- If you have multiple mathematical operations you should execute the calculation in the following
↪ order: First do Multiplication then Division then Addition and lastly Subtraction
)
crew = Crew(</p>
      <p>Return only the final numerical result at the end.
""",
expected_output="The calculated result as a number",
agent=calculator_agent
agents=[decomposer_agent, calculator_agent],
tasks=[decomposition_task, calculation_task],
verbose=True,
process=Process.hierarchical,
manager_agent=manager,
tools=math_tools
)
# Define the agents
decomposer_agent = Agent(
role="Expression Decomposer", goal="Decompose the given expression into a sequence of operations",
↪ backstory="""You are a mathematical expression decomposer. Your job is to take a mathematical
expression and break it down into a sequence of simple operations that can be calculated
step by step. You follow PEMDAS rules and assign variables to intermediate results. You never calculate
↪ values - you only identify the operations needed.""", llm=llama-3-3-70b-instruct, verbose=True,
↪ allow_delegation=False
)
calculator_agent = Agent(
role="Calculator", goal="Calculate expressions using only the provided tools", backstory="""You are a
↪ calculator that can only work by using tools. For every mathematical operation, you must use the
↪ corresponding tool. You carefully track variables and substitute their values when needed.""",
↪ llm=llama-3-3-70b-instruct, verbose=True, allow_delegation=False, tools=math_tools, temperature=0.1
)
manager = Agent(
role="Project Manager", goal="Efficiently manage the crew and ensure high-quality calculation completion,
↪ you are not allowed to call tools only to delegate work to other agents", backstory="You're an
↪ experienced calculation manager, skilled in overseeing complex calculations and guiding teams to
↪ correctly compute mathematical formulae. Your role is to coordinate the efforts of the crew members,
↪ ensuring that each task is completed on time and to the highest standard. but you do not call the tools
↪ yourself only to your agents", allow_delegation=True, llm=llama-3-3-70b-instruct
)</p>
    </sec>
    <sec id="sec-3">
      <title>3. Example Application</title>
      <p>We use a simple toy example of a calculator application in CrewAI as shown in Figure 1 to evaluate
basic calculations when given mathematical formulae as input. In its setup, three agents were explicitly
defined, Decomposer, Calculator, and Manager. The Calculator agent was assigned a calculation task,
and the Decomposer agent was assigned a decomposition task relevant to expressions with parentheses.
The Manager agent oversees the overall execution process and alters the delegation of responsibilities
between the other agents. To fulfill the calculation task, a set of math tools is made available, including
multiplication, division, addition, subtraction, and parentheses evaluation.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Approach</title>
      <p>
        Overall, our approach is the process depicted in Figure 2. It facilitates an ongoing, high-level
create–insight–improve development cycle as the code is being shaped by the developer. The process consists of
the following steps:
1. Trajectory files generation – A set of #k runs is invoked with a given input as a basis for the
analysis. In each run, the full execution trajectory of the agents is captured in a corresponding
log file, recording every agent action, particularly tool invocations, along with its associated
timestamp. For this step, we conducted 290 runs of the Calculator application using the same
input. Specifically, we used the formula 1 + 2 − 3 ∗ 4/5, chosen to ensure that each of the basic
arithmetic operations appears exactly once.
2. Event-log processing – A single consolidated process event log is compiled from the trajectory
ifles, the 290 trajectory logs in our case. This processing step extracts the tool invocations
performed by each agent, along with their corresponding timestamps. The resulting data is
organized as a tabular event-log structure, where each row represents a single, timestamped
invocation of a tool by an agent. From a process mining perspective, the log is examined by
using the concatenation of the agent and tool columns as the activity type identifier and the run
number as the trace identifier.
3. Process and causal discovery - subsequently, process and causal discovery are applied as two
complementary views to form the collective execution flows. More specifically, the causal view
depicts functional dependencies (among tool invocations) and variability via logical gateways [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
(i.e., split points in this view), whereas the process view captures temporal dependencies and
frequencies. We employed Heuristics [28] and Causal [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] Process Mining on the input event
log.
4. Rule derivation – For each split point identified as a gateway within the causal model, the
developer can examine its essence (i.e., whether it represents a variation or decision point). To
support this, rule derivation is applied, producing a rule statement for the selected gateway that
captures the control-flow structure it represents within the causal model. In our example, we
chose the   _0 gateway to demonstrate this. In a more realistic scenario, it is likely that the
developer will address all other split points.
5. Static analysis - For any selected gateway, we apply LLM-based static analysis to distinguish
decision points from variation points. Given the corresponding rule statement as input, the
LLM is prompted to match it against the source text of the agentic application to identify its
manifestation. If a matching instruction is found in the application specification, it is highlighted
for the developer’s attention. Details about the choice of LLM input and output prompts are
elaborated below in our example application results.
6. Reliability calculation - Complementing the gateway selection, our approach also includes a
statistical computation of reliability based on the frequencies in the process model to determine
the number of runs required and the degree of confidence, as elaborated below.
The analysis of gateways also raises the question of whether the data acquired is suficient to infer
faithful conclusions about each variation point, considering the number of observation runs traversing
it and the proportions of observed outbound runs.
      </p>
      <p>Drawing on the normal approximation to the binomial distribution, the minimum required sample
size to estimate an observed proportion  of process runs following a specific branch at a gateway is
given by the formula  =  2⋅⋅( 12−) , where  is the required sample size,  is the Z-score corresponding
to the desired confidence level (e.g.,  = 1.96 for 95% confidence),  is the observed branch proportion,
and  is the desired margin of error (e.g.,  = 0.05 for ±5%). To estimate the observed proportion, a
pilot sample is required. In our case, we used the initial set of 290 runs for this purpose.</p>
      <p>This should be complemented by ensuring suficient sampling to detect rare branches that may
not yet have been observed. Specifically, to be 95% confident that a branch with true prevalence  is
observed at least once, the number of required runs  must also satisfy (1 − )  &lt; 0.05.</p>
      <p>Overall, for any given gateway, the number of observed runs must exceed both the minimum required
to estimate observed branch proportions accurately and the threshold necessary to detect unobserved
rare branches with high confidence. Complementing our last step, we also pursue an analysis of the
minimal number of runs.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Example Application Results</title>
      <p>Applying our approach to the calculator example yielded the following results. For the given input, the
two execution graphs are illustrated in Figure 3.</p>
      <p>The heuristics view (Figure 3A) helps trace outlier (e.g., less frequent) trajectories. In our example
application, it shows an unusual loophole invocation of the “Evaluate_parentheses” tool by the Calculator
agent despite the fact that there were no parentheses in the input. From analyzing the logs, we
discovered that this behavior was triggered by the LLM arbitrarily surrounding some sub-expressions
with parentheses.</p>
      <p>The causal view (Figure 3B) depicts the invocation tool calls associated with each agent, helping to
identify possible ‘breaches of responsibility’, having an agent invoke a tool that does not correspond
to its role according to the application specification. In our case, the Project Manager agent was not
explicitly granted access to any of the math_tools. However, in the majority of the execution trajectories,
it invoked these operations directly without delegation.</p>
      <p>(A) Heuristics view
(B) Agent-centric causal view</p>
      <p>In addition, the causal view also captures the variability in the execution of the trajectories as
illustrated by the diamond-shaped gateways. This serves our further exploration of the concrete type
of each of these gateways. For static analysis, we populated the prompt, which also included the rule
statement, to describe the junction structure of the   _0 gateway as highlighted in Figure 4. This
prompt was presented to an LLM (LLaMA 3-3 70B Instruct), along with the application specification
shown in Figure 1, for matching purposes. We used LLaMA 3-3 70B Instruct for static Python code
analysis due to its strong instruction-following capabilities and demonstrated efectiveness in code
understanding and generation tasks, as evidenced in recent evaluations [29]. As illustrated, the LLM’s
response (Figure 5) identified a part of the task specification (also highlighted in Figure 1) as implying
the rule statement for the   _0 gateway, recognizing it as a decision point. However, as also noted,
this implication fails to account for the alternation of control between the two agents—an aspect not
explicitly specified in the application and one that requires further attention from the application
developer. With the help of the aforementioned views and the static analysis, the developer of the
application is able to determine that the Project Manager agent’s definition should be augmented with
a tools=[] entry to eliminate the discovered breach of responsibility.
After carefully examining the prompt, I found a statement that **implies** the rule:
"...If you have multiple mathematical operations you should execute the calculation in the following order:
↪ First do Multiplication then Division then Addition and lastly Subtraction..."
This statement implies that after performing a division operation (using the division tool), the next operation
↪ will be either an addition operation (which could be invoked by the Calculator agent or the Project Manager
↪ agent). Although it does not explicitly mention the agents, the order of operations suggests that after
↪ division, addition is the next step, which aligns with the rule.</p>
      <sec id="sec-5-1">
        <title>Reliability Assessment of the XOR_0 Gateway</title>
        <p>Lastly, we examined the reliability of the   _0 gateway with respect to the minimum number of runs
required. Given the current number of observed runs through this gateway, and targeting a margin of
error of 5% at a 95% confidence level (i.e.,  = 0.05 and  = 1.96 ), each of the two branches requires
approximately 157 runs. This implies that 35 additional observations are needed beyond the current
122. Otherwise, with only 122 runs, the current margin of error remains at ±5.66%. Adding 35 more
runs (to reach 157) will reduce this to ±5%.</p>
        <p>However, to also ensure—with 95% confidence—that a branch with a minimum true prevalence of 1%
is observed at least once, the total number of runs must exceed 298. Therefore, to fully validate the
gateway both in terms of proportion estimation and rare-branch detection, an additional 176 runs are
required for this split, entailing 418 additional runs in total. Given the nature of our simple application
example, we considered the observed margin of error of ± 5.66% satisfactory.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Modifying the specification</title>
        <p>We adapted the specification shown in Figure 1, adding tools=[] to explicitly prohibit the manager agent
from executing any tools. Figure 6(A) illustrates the result of this modification. As shown, while this
change eliminated the manager agent’s breach of responsibility, it still continued to invoke some tools.</p>
        <p>To address this, we further revised the description of the calculation task and the manager’s backstory
to more strictly prohibit tool usage by the manager agent, as depicted in Figure 7. This additional
reifnement successfully enforced the intended behavior, restricting tool invocation solely to the calculator
agent as shown in Figure 6(B).</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Outlook</title>
      <p>In this work, we propose an approach for agent observability that leverages two complementary
techniques of process and causal discovery to identify points of variability in agent trajectories. We then
apply LLM-based static analysis to determine the nature of these variation points. Our contribution is
further complemented by a reliability measurement for split points.</p>
      <p>We illustrated the approach using an example of a calculator application, demonstrating the possible
valuable insights that such instrumentation can provide to support developers engaged in Agents
DevOps. Our preliminary results show the potential of applying our framework in the context of
observability. We acknowledge that further empirical validation on real-world applications and with
multiplication.tool</p>
      <p>AND_0
addition.tool</p>
      <p>XOR_0
division.tool</p>
      <p>OR_0
evaluate_parentheses.tool
or_1</p>
      <p>division.tool
Project Manager
multiplication.tool
or_0</p>
      <p>Calculator
multiplication.tool</p>
      <p>AND_0
addition.tool
addition.tool
division.tool
subtraction.tool
subtraction.tool
subtraction.tool
(A) Restricting tool usages with “tools=[]”
other agentic frameworks is needed to establish the robustness of our approach. Furthermore, future
work should investigate how multiple input utterances can be populated to enable joint observation
and robust testing coverage.</p>
      <p>Our control-flow-based realization for rule derivation is currently agnostic to the potential data
richness underlying such decision points, and future research could extend this with data-aware analysis
of these decisions.</p>
      <p>Whether through single-agent input analysis or the cumulative investigation of multi-agent
trajectories recorded over time in a running Agentic AI system, the domain of agent process observability
presents a fresh playground for exploration using process and causal discovery tools developed over
the past decades. As seen in the evolution of other application domains, this area may first emerge with
observability tools and gradually progress toward realizing the vision of self-debugging and adaptive
agents—agents that monitor their own execution, explain their actions, debug and enhance one another’s
behavior, and learn to evolve over time to become more reliable and autonomous.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4o to: Grammar and spelling check. After
using this tool/service, the authors reviewed and edited the content as needed and take full responsibility
for the publication’s content.
[16] A. J. Alaee, M. Weidlich, A. Senderovich, Data-Driven Decision Support for Business Processes:
Causal Reasoning and Discovery, 2024. URL: https://link.springer.com/10.1007/978-3-031-70418-5_
6.
[17] T. Narendra, P. Agarwal, M. Gupta, S. Dechu, Counterfactual reasoning for process optimization
using structural causal models, in: Lecture Notes in Business Information Processing, volume 360,
2019. URL: https://doi.org/10.1007/978-3-030-26643-1_6.
[18] S. J. J. Leemans, N. Tax, Causal Reasoning over Control-Flow Decisions in Process Models, in:
CAiSE 2022, Leuven, Belgium, June 6-10, 2022, Proceedings, volume 13295 of LNCS, Springer, 2022,
pp. 183–200.
[19] S. Shimizu, Statistical Causal Discovery: LiNGAM Approach, SpringerBriefs in Statistics, Springer</p>
      <p>Japan, Tokyo, 2022. URL: https://link.springer.com/10.1007/978-4-431-55784-5.
[20] S. Ouyang, J. M. Zhang, M. Harman, M. Wang, An empirical study of the non-determinism of
chatgpt in code generation, ACM Trans. Softw. Eng. Methodol. 34 (2025). URL: https://doi.org/10.
1145/3697010. doi:10.1145/3697010.
[21] B. Atil, S. Aykent, A. Chittams, L. Fu, R. J. Passonneau, E. Radclife, G. R. Rajagopal, A. Sloan,
T. Tudrej, F. Ture, Z. Wu, L. Xu, B. Baldwin, Non-determinism of “deterministic” LLM settings,
2025. URL: https://arxiv.org/abs/2408.04667. arXiv:2408.04667.
[22] K. Czarnecki, S. Helsen, U. Eisenecker, Staged configuration using feature models, in: R. L. Nord
(Ed.), Software Product Lines, Springer, Berlin, Heidelberg, 2004, pp. 266–283.
[23] M. L. Rosa, W. van der Aalst, M. Dumas, F. P. Milani, Business process variability modeling: A
survey, ACM Comput. Surv. 50 (2017). URL: https://doi.org/10.1145/3041957.
[24] A. P. S. Venkatesh, S. Sabu, A. M. Mir, S. Reis, E. Bodden, The Emergence of Large Language
Models in Static Analysis: A First Look through Micro-Benchmarks, in: Proceedings of the 2024
IEEE/ACM First International Conference on AI Foundation Models and Software Engineering,
ACM, New York, NY, USA, 2024, pp. 35–39. doi:10.1145/3650105.3652288.
[25] A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, J. M. Zhang, Large Language
Models for Software Engineering: Survey and Open Problems, in: Proceedings - 2023 IEEE/ACM
International Conference on Software Engineering: Future of Software Engineering, ICSE-FoSE
2023, 2023. doi:10.1109/ICSE-FoSE59343.2023.00008.
[26] A. Carleton, M. Klein, J. Robert, E. Harper, R. Cunningham, D. de Niz, J. Foreman, J. Goodenough,
J. Herbsleb, I. Ozkaya, D. Schmidt, F. Shull, Architecting the Future of Software Engineering: A
National Agenda for Software Engineering Research &amp; Development, 2021.
[27] I. Ozkaya, Application of Large Language Models to Software Engineering Tasks: Opportunities,</p>
      <p>Risks, and Implications, 2023. doi:10.1109/MS.2023.3248401.
[28] A. Weijters, W. van der Aalst, A. K. A. De Medeiros, Process mining with the heuristics
mineralgorithm, Technische Universiteit Eindhoven, Tech. Rep. WP 166 (2006) 1–34.
[29] P. Ersoy, M. Erşahin, Benchmarking Llama 3 70B for code generation: A comprehensive evaluation,
Orclever Proceedings of Research and Development 4 (2024) 52–58. doi:10.56038/oprd.v4i1.444.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Plaat</surname>
          </string-name>
          , M. van
          <string-name>
            <surname>Duijn</surname>
            ,
            <given-names>N. van Stein</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Preuss</surname>
          </string-name>
          , P. van der Putten,
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Batenburg</surname>
          </string-name>
          ,
          <article-title>Agentic large language models, a survey</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2503.23037. arXiv:
          <volume>2503</volume>
          .
          <fpage>23037</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lu</surname>
          </string-name>
          , L. Zhu, Agentops: Enabling observability of LLM agents,
          <year>2024</year>
          . URL: https://arxiv. org/abs/2411.05285. arXiv:
          <volume>2411</volume>
          .
          <fpage>05285</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>W. van der Aalst</surname>
          </string-name>
          , Process Mining, Springer, Berlin, Heidelberg,
          <year>2016</year>
          . URL: http://link.springer. com/10.1007/978-3-
          <fpage>662</fpage>
          -49851-4. doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>662</fpage>
          - 49851- 4.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fournier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Limonad</surname>
          </string-name>
          , I. Skarbovsky,
          <string-name>
            <surname>Y. David,</surname>
          </string-name>
          <article-title>The WHY in Business Processes: Discovery of Causal Execution Dependencies</article-title>
          , Künstliche
          <string-name>
            <surname>Intelligenz</surname>
          </string-name>
          (
          <year>2025</year>
          ). URL: https://rdcu.be/d52Qz. doi:https://doi.org/10.1007/s13218- 024- 00883- 4.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>David</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fournier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Limonad</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Skarbovsky</surname>
          </string-name>
          ,
          <article-title>The WHY in Business Processes: Unification of Causal Process Models</article-title>
          , in: BPM Forum in BPM Conference (to appear),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Milani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Matulevičius</surname>
          </string-name>
          ,
          <article-title>Identifying and classifying variations in business processes</article-title>
          ,
          <source>in: Lecture Notes in Business Information Processing</source>
          , volume
          <volume>113</volume>
          LNBIP,
          <year>2012</year>
          , pp.
          <fpage>136</fpage>
          -
          <lpage>150</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>642</fpage>
          - 31072- 0_
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Belardinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lomuscio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Patrizi</surname>
          </string-name>
          ,
          <article-title>An abstraction technique for the verification of artifactcentric systems</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on Principles of Knowledge Representation and Reasoning</source>
          , KR'12, AAAI Press,
          <year>2012</year>
          , p.
          <fpage>319</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Damaggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fournier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hobson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Linehan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maradugu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nigam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sukaviriya</surname>
          </string-name>
          , et al.,
          <article-title>Introducing the guard-stage-milestone approach for specifying business entity lifecycles</article-title>
          ,
          <source>in: Web Services and Formal Methods: 7th International Workshop</source>
          , WS-FM
          <year>2010</year>
          ,
          <article-title>Hoboken</article-title>
          , NJ, USA, September
          <volume>16</volume>
          -
          <issue>17</issue>
          ,
          <year>2010</year>
          .
          <source>Revised Selected Papers 7</source>
          , Springer,
          <year>2011</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <article-title>Process Mining over Multiple Behavioral Dimensions with Event Knowledge Graphs</article-title>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>274</fpage>
          -
          <lpage>319</lpage>
          . URL: https://doi.org/10. 1007/978-3-
          <fpage>031</fpage>
          -08848-
          <issue>3</issue>
          _9. doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 08848-
          <issue>3</issue>
          _
          <fpage>9</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lebherz</surname>
          </string-name>
          , W. van der Aalst, M. van Asseldonk,
          <string-name>
            <given-names>P.</given-names>
            <surname>Blank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bosmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brenscheidt</surname>
          </string-name>
          , C. di
          <string-name>
            <surname>Ciccio</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Delgado</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Calegari</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Peeperkorn</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Vugs</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          <string-name>
            <surname>Wynn</surname>
          </string-name>
          ,
          <article-title>Towards a simple and extensible standard for object-centric event data (OCED) - core model, design space</article-title>
          ,
          <source>and lessons learned</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2410.14495. arXiv:
          <volume>2410</volume>
          .
          <fpage>14495</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fournier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Limonad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marrella</surname>
          </string-name>
          , et al.,
          <source>AI-augmented Business Process Management Systems: A Research Manifesto, ACM Transactions on Management Information Systems</source>
          <volume>14</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1145/3576047.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Vu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Klievtsova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Leopold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rinderle-Ma</surname>
          </string-name>
          , T. Kampik,
          <article-title>Agentic business process management: The past 30 years and practitioners' future perspectives</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2504.03693. arXiv:
          <volume>2504</volume>
          .
          <fpage>03693</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Spirtes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Glymour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Scheines</surname>
          </string-name>
          ,
          <string-name>
            <surname>Causation</surname>
          </string-name>
          , Prediction, and
          <string-name>
            <surname>Search</surname>
          </string-name>
          , The MIT Press,
          <year>2001</year>
          . URL: https://direct.mit.edu/books/book/2057/causation
          <article-title>-prediction-and-search</article-title>
          . doi:
          <volume>10</volume>
          .7551/mitpress/ 1754.001.0001.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          ,
          <article-title>Causality: Models, reasoning, and inference, second edition</article-title>
          ,
          <source>Causality: Models, Reasoning</source>
          , and Inference,
          <string-name>
            <surname>Second Edition</surname>
          </string-name>
          (
          <year>2011</year>
          )
          <fpage>1</fpage>
          -
          <lpage>464</lpage>
          . URL: https://www.cambridge.org/core/books/causality/ B0046844FAE10CBF274D4ACBDAEB5F5B. doi:
          <volume>10</volume>
          .1017/CBO9780511803161.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. F. A.</given-names>
            <surname>Hompes</surname>
          </string-name>
          , et al.,
          <source>Discovering Causal Factors Explaining Business Process Performance Variation, in: Advanced Information Systems Engineering</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>