<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Explainable Multi-Agent Systems with GraphRAG: A Guide to Explain Explanations in the Audit Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emanuel Slany</string-name>
          <email>emanuel.slany@dab-gmbh.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonas Amling</string-name>
          <email>jonas.amling@dab-gmbh.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Frummet</string-name>
          <email>alexander.frummet@dab-gmbh.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moritz Lang</string-name>
          <email>moritz.lang@dab-gmbh.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephan Scheele</string-name>
          <email>stephan.scheele@oth-regensburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>OTH Regensburg</institution>
          ,
          <addr-line>Prüfeninger Straße 58, Regensburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>dab:GmbH</institution>
          ,
          <addr-line>Hans-Obser-Straße 12, Deggendorf</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Combining Large Language Models (LLMs) with graph-based retrieval augmented generation (GraphRAG) in multi-agent Artificial Intelligence (AI) systems promises new levels of process automation, even in highly restricted and regulated domains such as financial audits. Since trustworthiness is essential, we introduce an architecture designed with it in mind: Every agent is either inherently explainable or its decision-making mechanism is rendered transparent through post-hoc Explainable AI techniques. Procedural knowledge, agent outcomes, and their explanations are represented as nodes in a knowledge graph accessible via GraphRAG, while LLMs are confined as semantic translators, bridging graph and natural-language representations. However, when user prompts involve multiple agent subgraphs, their individual agent attribution is still opaque. We introduce an occlusion-based agent importance metric that quantifies the relative attribution of each serialized agent subgraph. Our evaluation demonstrates that the quantification of agent importance is feasible, while the presence of systematic agent interactions or narrative context efects requires further investigation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainable AI</kwd>
        <kwd>Knowledge Graphs</kwd>
        <kwd>GraphRAG</kwd>
        <kwd>Multi-Agent Systems</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Audit</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Related Work. While LLMs have been used for several downstream tasks such as text generation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
prediction [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], explanatory model finetuning [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], or explainable exploration of event data [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]
the complementary conjunction of LLMs and knowledge graphs under the scope of transparency has
recently received greater attention [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Currently, LLMs are frequently discussed in relation to agentic
AI systems [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Knowledge graphs have traditionally been leveraged for explainability [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The short
history of graph representations for LLMs [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] has now transitioned into XAI applications [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>Can you explain your inference?</p>
      <p>The company explanation highlights a strong
influence of […]. In contrast, the sector
explanation places greater emphasis on […].</p>
      <p>Where does this information originate?</p>
      <p>Agent 1 is primarily responsible for the
explanation, whereas […].</p>
      <p>chat</p>
      <p>III
prompt + &lt;&lt;ssuubbggrraapphh aaggeenntt 12&gt;&gt;
verbalization baseline
Problem. Despite the decision-making mechanism behind each agent has become tractable by XAI
techniques, the LLM result does not disclose the attribution of agents to the verbalization (Figure 1, left).
Consequently, an auditor in our running example might still be confronted with over- or under-amplified
risks, undermining the organization of the entire examination strategy.</p>
      <p>Solution. We quantify agent importance in three steps (Figure 1, right): (i) We extract the relevant
subgraph sequence from the knowledge graph. (ii) Then, we generate verbalizations: a baseline
representing the entire subgraph sequence and separate variants for each subgraph occlusion. (iii) Finally,
we quantify the agent importance by semantic similarity metrics. The application of our approach
enables auditors to understand the origin of information and evidence in multi-agent system outcomes.
Contributions. Our core contributions are twofold: First, we present a multi-agent GraphRAG
architecture in a case study using open-source finance data sets that mimic the dynamics of annual
audits (Section 2). Second, we formalize the proposed agent importance approach (Section 3).
Research Questions. We pose two research questions: (R1) Does agent importance eligibly quantify
the attribution of agent contexts retrieved from knowledge graphs to LLM verbalizations? And, (R2)
does agent importance account for biases in the narrative task specification? The research questions
are preliminary evaluated by an ablation study (Section 4).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Case Study</title>
      <p>To address the domain-specific requirements of financial auditing without disclosing proprietary
information, we validate our architecture in a controlled scenario using open-source datasets1: Specifically,
we model the influence of macroeconomic indicators on the relative annual return of individual company
stocks and the average annual return of stocks within the same industry sector. The primary goal in
this setting is to compare the explanatory factors for the company prediction with the reasons driving
the sector prediction. We employ labeled property graphs for knowledge representations2.</p>
      <p>The agents graph (Figure 2, left) encodes the procedural task knowledge and is responsible for the
task execution. Among multiple properties, agents have an API address and Pydantic models for input
and output validation3. Natural language queries are converted to an input model and trigger API
execution. The outcome of which is verified using the output model. We distinguish between two
types of agents (Figure 3): static agents, which rely on deterministic computations (write) or classical
probabilistic methods (predict_company, predict_sector), and LLM agents, which generate natural
language responses (compare). Figure 3 depicts their functionality in detail.
1The following data sets are merged wrt. the date and the company symbol: economic indicators: https://www.kaggle.
com/datasets/alfredkondoro/u-s-economic-indicators-1974-2024?select=cpi_data.csv, stock prices: https://www.kaggle.com/
datasets/camnugent/sandp500?select=all_stocks_5yr.csv, stock information: https://www.kaggle.com/datasets/paytonfisher/
sp-500-companies-with-financial-information?select=financials.csv, all 6 th June 2025.
2https://memgraph.com/, 6th June 2025.
3https://docs.pydantic.dev/latest/, 6th June 2025.</p>
      <p>properties</p>
      <p>name
description
address
input model
output model</p>
      <p>
        The results graph (Figure 2, right) contains nodes generated with Cypher queries [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] yield by the
write method. It encodes domain knowledge in the sense that it models dependencies between the
feature and the target spaces and facilitates agent explainability.
      </p>
      <p>We combine LLMs4 and knowledge graphs as follows: Each natural language prompt is mapped
to a predefined Cypher query, which either triggers agents or retrieves subgraphs. Novel queries are
generated for out-of-distribution requests. The obtained subgraphs serve as additional context for the
initial prompt such that the generated verbalization contains only graph-encoded information.</p>
      <p>In proposing an XAI method for multi-agent systems integrated with GraphRAG, we aim to address
a core contradiction, which we term the explanation paradox: All agents in our architecture are either
intrinsically explainable or accompanied by XAI techniques. Via GraphRAG, a LLM exclusively accesses
this precomputed information. Still, recipients of a LLM verbalization are not yet aware of which agent
contributed to the answer. In particular, when prompting a LLM to combine explanations, the user
cannot comprehend the relevance of each incorporated agent. The agent importance method proposed
in the next section is designed to resolve this paradox.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>Agent importance quantifies the influence of a serialized subgraph of an agent on the generated LLM
verbalization for a given task in a GraphRAG scenario. The agent importance approach approximates
feature attributions inspired by Shapley Additive Explanations (SHAP) [16]. It systematically occludes
serialized subgraphs within a GraphRAG multi-agent system, assuming a fixed verbalization task. Given
a task, we retrieve and serialize the subgraphs of all addressed agents to generate a baseline verbalization.
Next, we occlude one subgraph at a time and regenerate verbalizations. The intuitive idea is: The
4https://platform.openai.com/docs/models/gpt-4o-mini, 28th May 2025.
more similar the occluded output is to the baseline, the less the agent contributes to the verbalization.
Attribution values are estimated by semantic similarity [17]. We compute the cosine similarity between
the embeddings of the occluded and baseline verbalizations; agent importance is defined as the inverse
of the calculated similarity.</p>
      <p>Definition. Let  be a subgraph serialization of a knowledge graph,  = (1, . . . , , . . . , ) ∈  be
an ordered sequence of serialized subgraphs,  ∈  and  ∈  denote prompts and verbalizations, and
 :  ×  →  be a LLM5, generating a verbalization given a prompt and a subgraph sequence. Let
ℎ :  ×  → R represent the cosine distance between subgraph sequence embeddings 6. Let
∖ := (1, . . . , −1 , ,  +1, . . . , ) with || = 0
denote the occlusion of  in  – practically, a substitution of  with an empty sequence  . Finally, let
,() := ,(, ∖ ) for all  ∈ (1, . . . , ) with ,(, ∖ ) = 1 − ℎ(,  ∖ )
measure the attribution of each subgraph in the subgraph sequence given a model  and prompt . We
assume that subgraph serializations are agent outcomes and thus can be mapped to their origin.
Example. Suppose an AI literature search uses three agents: one for researchers, one for publications,
and one for scientific areas, with results encoded as a knowledge graph, serialized as  = ( 1, 2, 3).
subgraph
label</p>
      <p>node
1
2
3
:Researcher
:Publication
:Area
’Ina Marie’
’XAI in Multi-Agent Systems for Audit: Why Our Method is Important’
’Financial Audits’
Suppose the following user prompt for a GraphRAG system, which accesses the (sequence of) subgraph
serializations:  =’Summarize the publications of Ina Marie in the field of financial audits’.
context</p>
      <p>∖1
∖2
∖3
’Ina Marie has published several influential works, including XAI in Multi-Agent Systems for
Audit: Why Our Method is Important, which highlights the importance of explainable AI in
enhancing the eficiency and efectiveness of financial audits.’
’The publication XAI in Multi-Agent Systems for Audit: Why Our Method is Important explores
the role of explainable AI in improving the accuracy and eficiency of financial audits.’
’Ina Marie has made significant contributions to the field of financial audits through her
research on innovative approaches to auditing practices.’
’Ina Marie’s work focuses on the applications of explainable AI in the auditing process.’
,(, ∖ )</p>
      <p>0.0
0.0482
The publication node subgraph frames the most important context for the verbalization.</p>
      <p>The original SHAP framework is characterized by theoretical properties – local accuracy, faithfulness,
missingness, and consistency – which have been formally established over the course of its
development [18, 16, 19]. Furthermore, its computational feasibility has been systematically analyzed [20].
Despite proposing our method in a mathematically sound style, we leave a formal assessment for
future work, yet highlight the importance of which. In contrast, the experiments of the subsequent
section provide quantitative evidence through an ablation study that evaluates the importance of agent
subgraph serializations in diverse tasks.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>Suppose two single-agent – explanation interpretations – and three multi-agent tasks – explanation
comparisons, either with or without narrative biases – in the domain of Section 2. We select the 100
companies with the highest relative return in 2016 and retrieve the subgraphs for the company and the
corresponding sector explanation. We exploit the following evaluation strategy (Figure 4): (i) obtain
occluded and baseline verbalizations, and (ii) estimate the agent importance (Definition 3).
5https://platform.openai.com/docs/models/gpt-4o-mini, 28th May 2025.
6https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2, 04th June 2025.</p>
      <p>0.8
0.6</p>
      <p>Figure 5 presents one subplot per task, with each containing a boxplot corresponding to an agent
importance evaluation. Three observations can be drawn from the results: (i) Occluding agents presumed
to be relevant increases their attribution scores. (ii) Narrative task context efects are only partially
reflected, as the sector-level agent generally receives lower attribution. (iii) In multi-agent configurations,
the ablation study (the additional occlusion of none or all subgraphs) demonstrates that agent importance
scores are not strictly additive.</p>
      <p>Within a narrow experimental scope, the research questions can be answered as follows: (R1) Agent
importance eligibly quantifies the attribution of agent contexts retrieved from knowledge graphs.
Context efects undermine desirable mathematical properties such as additivity. (R2) Agent importance
tends to account for biases in the task specification. The experiments reveal narrative specificities of
the domain such as an elevated importance of the term company compared to the term sector.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The integration of structured knowledge, XAI, and LLMs enables the automation of redundant tasks even
in domains requiring substantial domain expertise, such as financial auditing. The use of multi-agent
systems in combination with GraphRAG ofers a promising architecture – provided that each agent
discloses its decision making. Determining the extent to which a LLM relies on the outputs of individual
agents remains intractable. This challenge is encapsulated in what we term the explanation paradox:
Even if every system component is individually explainable, the attribution of each component remains
opaque. The agent importance method addresses this gap. It estimates the relative contribution of each
agent by computing the inverse of the semantic similarity between the LLM’s baseline verbalization
and the verbalization generated after occluding the respective agent.</p>
      <p>
        Main Findings. Three findings emerge from our contribution: (i) Quantifying agent importance
addresses a critical gap on the way towards trustworthy multi-agent systems in high-stake domains; yet
is feasible as evidenced by our preliminary results. (ii) While our method is mathematically grounded, it
remains theoretically incomplete. Assumptions derived from supervised learning models, e.g., the ones
from SHAP [16], are not seamlessly transferable to generative models. (iii) Our findings hint at two key
challenges: (a) Narrative context efects introduced by task prompts may bias the relevance estimates.
And, (b) correlations between agent subgraphs might confound the measured importance values.
Related Results. Our main findings can be situated into an area of methods, which aims to overcome
semantic limitations [21] or hallucinations [22] in LLMs. What has started as a systematic combination
of supplementary information with LLMs [23], has transitioned into a structured representation of
domain knowledge with researchers questioning their attribution [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Attribution methods have a
rich tradition in XAI [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], of which some of them obtain their importance estimate by occlusion [16].
Closely related to our approach are [24], [25], and [26], who study the alignment of multiple information
sources in traditional LLM settings and classic or graph-enhanced RAG architectures, respectively.
Limitations. Four major limitations can be identified in our work: (i) Domain: In general, our
approach is domain-agnostic. However, although it is motivated by the audit domain, we abstract
the case study to the broader finance domain. Due to domain specificities, our method may not fully
generalize to the intended context. (ii) Applicability: The motivating example centers on the aggregation
of agent explanations. While we emphasize that agent importance is applicable to any task prompt in a
multi-agent system employing GraphRAG, our experiments are limited to explanation interpretations
or comparisons. (iii) Experiments: The experimental design is sparse, and the presented results are
preliminary. More comprehensive empirical validation and a baseline comparison are necessary to
draw robust conclusions. (iv) Theoretical contradictions: The findings expose unresolved theoretical
challenges, e.g., the impact of prompt phrasing, contextual influence in natural language, and correlations
among subgraphs. Also, in contrast to many predictive models, LLM outcomes are not deterministic.
Future Work. First, we will enhance agent execution from natural language prompts by enabling the
parallelization of multiple agent calls. Second, we aim to improve and publicly release a user interface
to facilitate more intuitive and accessible interactions with the system. With respect to the proposed
attribution method, we will formally derive and prove desirable attribution properties within the context
of generative AI systems. Lastly, we will extend our experimental evaluation by defining a range of
diverse tasks across various multi-agent data sets and comparing attribution results across LLMs.
Reproducibility Statement. We used open-source data and provided each model, its corresponding instruction, and graph
query. An extended evaluation is in progress, and we plan to publish the code in the future.
      </p>
      <p>Ethical Considerations. There are no specific ethical concerns to declare. Our overarching goal is to develop trustworthy
multi-agent systems. We believe transparency in the decision-making process is essential.</p>
      <p>Declaration on Generative AI. During the preparation of this work, we used gpt-4o and Grammarly for spell-checking
and grammar correction, and gpt-4o-mini for experimental purposes7.</p>
      <p>Acknowledgments. This article is part of the project BayVFP Data Tales (# DIK-2407-00007 // DIK0660/01).
7gpt-4o: https://platform.openai.com/docs/models/gpt-4o, Grammarly: https://www.grammarly.com/, gpt-4o-mini: https:
//platform.openai.com/docs/models/gpt-4o-mini, all 28th May 2025.
2025. arXiv:2506.00783.
[16] S. M. Lundberg, S. Lee, A Unified Approach to Interpreting Model Predictions, in: I. Guyon, U. von
Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, R. Garnett (Eds.), Advances in
Neural Information Processing Systems 30: Annual Conference on Neural Information Processing
Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 4765–4774. URL: https:
//proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.
[17] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, BERTScore: Evaluating Text Generation
with BERT, in: 8th International Conference on Learning Representations, ICLR 2020, Addis
Ababa, Ethiopia, April 26-30, 2020, 2020. URL: https://openreview.net/forum?id=SkeHuCVFDr.
[18] H. P. Young, Monotonic solutions of cooperative games, International Journal of Game Theory 14
(1985) 65–72. doi:10.1007/BF01769885.
[19] L. Heidrich, E. Slany, S. Scheele, U. Schmid, FairCaipi: A Combination of Explanatory Interactive
and Fair Machine Learning for Human and Machine Bias Reduction, Machine Learning and
Knowledge Extraction 5 (2023) 1519–1538. doi:10.3390/make5040076.
[20] M. Arenas, P. Barcelo, L. Bertossi, M. Monet, On the Complexity of SHAP-Score-Based Explanations:
Tractability via Knowledge Compilation and Non-Approximability Results, Journal of Machine
Learning Research 24 (2023) 1–58. URL: http://jmlr.org/papers/v24/21-0389.html.
[21] K. J. Hammond, D. B. Leake, Large Language Models Need Symbolic AI, in: A. S. d’Avila Garcez,
T. R. Besold, M. Gori, E. Jiménez-Ruiz (Eds.), Proceedings of the 17th International Workshop
on Neural-Symbolic Learning and Reasoning, La Certosa di Pontignano, Siena, Italy, July
35, 2023, volume 3432 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 204–209. URL:
https://ceur-ws.org/Vol-3432/paper17.pdf.
[22] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, T. Liu, A
Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open
Questions (2023). doi:10.48550/ARXIV.2311.05232.
[23] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. tau Yih,
T. Rocktäschel, S. Riedel, D. Kiela, Retrieval-Augmented Generation for Knowledge-Intensive NLP
Tasks, 2021. arXiv:2005.11401.
[24] X. Yue, B. Wang, Z. Chen, K. Zhang, Y. Su, H. Sun, Automatic Evaluation of Attribution by
Large Language Models, in: H. Bouamor, J. Pino, K. Bali (Eds.), Findings of the Association for
Computational Linguistics: EMNLP 2023, Association for Computational Linguistics, Singapore,
2023, pp. 4615–4635. doi:10.18653/v1/2023.findings-emnlp.307.
[25] A. Abolghasemi, L. Azzopardi, S. H. Hashemi, M. de Rijke, S. Verberne, Evaluation of Attribution</p>
      <p>Bias in Retrieval-Augmented Large Language Models, 2024. arXiv:2410.12380.
[26] J. Gao, X. Zou, Y. Ai, D. Li, Y. Niu, B. Qi, J. Liu, Graph Counselor: Adaptive Graph Exploration via
Multi-Agent Synergy to Enhance LLM Reasoning, 2025. arXiv:2506.03939.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kokina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Blanchette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Davenport</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pachamanova</surname>
          </string-name>
          ,
          <article-title>Challenges and opportunities for artificial intelligence in auditing: Evidence from the field</article-title>
          ,
          <source>International Journal of Accounting Information Systems</source>
          <volume>56</volume>
          (
          <year>2025</year>
          )
          <article-title>100734</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.accinf.
          <year>2025</year>
          .
          <volume>100734</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Samiolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Spence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Toh</surname>
          </string-name>
          ,
          <article-title>Auditor judgment in the fourth industrial revolution</article-title>
          ,
          <source>Contemporary Accounting Research</source>
          <volume>41</volume>
          (
          <year>2023</year>
          )
          <fpage>498</fpage>
          -
          <lpage>528</lpage>
          . doi:
          <volume>10</volume>
          .1111/1911-
          <fpage>3846</fpage>
          .
          <fpage>12901</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Wiest</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Zhang,</surname>
          </string-name>
          <article-title>Large Language Model Based Multi-agents: A Survey of Progress and Challenges</article-title>
          ,
          <source>in: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI</source>
          <year>2024</year>
          , Jeju, South Korea,
          <source>August 3-9</source>
          ,
          <year>2024</year>
          , ijcai.org,
          <year>2024</year>
          , pp.
          <fpage>8048</fpage>
          -
          <lpage>8057</lpage>
          . URL: https://www.ijcai.org/proceedings/2024/890.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Unifying Large Language Models and Knowledge Graphs: A Roadmap</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>36</volume>
          (
          <year>2024</year>
          )
          <fpage>3580</fpage>
          -
          <lpage>3599</lpage>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2024</year>
          .
          <volume>3352100</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Francis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Guagliardo</surname>
          </string-name>
          , J. Holland, P. Llewellyn,
          <string-name>
            <given-names>P.</given-names>
            <surname>Selmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , P. Wood,
          <string-name>
            <surname>Cypher:</surname>
          </string-name>
          <article-title>An Evolving Query Language for Property Graphs</article-title>
          ,
          <source>in: Proceedings of the 2018 International Conference on Management of Data (SIGMOD)</source>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>1433</fpage>
          -
          <lpage>1445</lpage>
          . doi:
          <volume>10</volume>
          .1145/3183713. 3190657.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Schwalbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Finzel</surname>
          </string-name>
          ,
          <article-title>A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          (
          <year>2023</year>
          ).
          <source>doi:10.1007/s10618-022-00867-8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language Models are Few-Shot Learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems</source>
          <year>2020</year>
          ,
          <article-title>NeurIPS 2020</article-title>
          , December 6-
          <issue>12</issue>
          ,
          <year>2020</year>
          , virtual,
          <year>2020</year>
          . URL: https://proceedings.neurips.cc/paper/ 2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Du,
          <article-title>Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities, SIGKDD Explor</article-title>
          . Newsl.
          <volume>26</volume>
          (
          <year>2025</year>
          )
          <fpage>109</fpage>
          -
          <lpage>118</lpage>
          . doi:
          <volume>10</volume>
          .1145/3715073.3715083.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Slany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Scheele</surname>
          </string-name>
          , U. Schmid,
          <string-name>
            <surname>Explanatory</surname>
          </string-name>
          <article-title>Interactive Machine Learning with Counterexamples from Constrained Large Language Models</article-title>
          , in: A.
          <string-name>
            <surname>Hotho</surname>
          </string-name>
          , S. Rudolph (Eds.),
          <source>KI 2024: Advances in Artificial Intelligence</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>324</fpage>
          -
          <lpage>331</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -70893-0_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Amling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Slany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dormagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kretschmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Scheele</surname>
          </string-name>
          ,
          <article-title>Bridging the Interpretability Gap in Process Mining: A Comprehensive Approach Combining Explainable Clustering and Generative AI</article-title>
          ,
          <source>in: Explainable Artificial Intelligence - 3rd World Conference, xAI 2025</source>
          , Istanbul, Turkey,
          <source>July 09-11</source>
          ,
          <year>2025</year>
          , Springer, to appear.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dormagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Amling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Scheele</surname>
          </string-name>
          , U. Schmid,
          <article-title>Explaining Process Behavior: A Declarative Framework for Interpretable Event Data</article-title>
          ,
          <source>in: 3rd World Conference, xAI 2025</source>
          , Istanbul, Turkey,
          <source>July 09-11</source>
          ,
          <year>2025</year>
          ,
          <article-title>Late-breaking Work, Demos and Doctoral Consortium, CEUR-WS</article-title>
          , to appear.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Seilani</surname>
          </string-name>
          ,
          <article-title>The role of agentic AI in shaping a smart future: A systematic review</article-title>
          ,
          <source>Array</source>
          <volume>26</volume>
          (
          <year>2025</year>
          )
          <article-title>100399</article-title>
          . doi:https://doi.org/10.1016/j.array.
          <year>2025</year>
          .
          <volume>100399</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rajabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Etminani</surname>
          </string-name>
          ,
          <article-title>Knowledge-graph-based explainable AI: A systematic review</article-title>
          ,
          <source>Journal of Information Science</source>
          <volume>50</volume>
          (
          <year>2024</year>
          )
          <fpage>1019</fpage>
          -
          <lpage>1029</lpage>
          . doi:
          <volume>10</volume>
          .1177/01655515221112844.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Ni</surname>
          </string-name>
          , H.-Y. Shum,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <source>Think-onGraph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph</source>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2307</volume>
          .
          <fpage>07697</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          , KG-TRACES:
          <article-title>Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning</article-title>
          and Attribution Supervision,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>