<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multi-Agent Inspection Planning via Semantically Enriched Knowledge Graphs for Non-Destructive Testing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ghezal Ahmad Jan Zia</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andre Valdestilhas</string-name>
          <email>a.valdestilhas@vu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Moreno Torres</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabine Kruschwitz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Non Destructive Testing, Materials Science, Knowledge Graph, Large Language Models</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Vrije Universiteit Amsterdam</institution>
          ,
          <addr-line>De Boelelaan 1105, 1081 HV Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Federal Institute For Materials Research and Testing (BAM)</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents a multi-agent system for inspection planning in Non-Destructive Testing (NDT), extending our prior work on Knowledge Graph (KG) generation. We integrate a KG with agents-PlannerAgent, ToolSelectorAgent, and ForecasterAgent-that collaborate via LangChain to create context-aware inspection plans, select tools, and forecast timelines. Key contributions include enriching the KG with new semantic relationships for infrastructure types, materials, and defects, as well as developing a real-time Streamlit-based UI for visualizing plans and reasoning subgraphs. Case studies demonstrate the system's improved relevance, explainability, and precision in defect-to-NDT mapping. We also address interoperability with Linked Data standards and validate plan consistency using SHACL constraints. Overall, this work showcases a novel integration of semantic web technology and LLM-based reasoning for infrastructure maintenance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Non-Destructive Testing (NDT) refers to a range of inspection methods used to detect defects and
assess material integrity without causing damage. It is crucial for ensuring the safety and longevity
of infrastructure across domains like civil engineering, aerospace, and manufacturing. In civil
infrastructure (bridges, tunnels, industrial plants), materials such as concrete and steel can undergo various
deterioration mechanisms (e.g. corrosion, fatigue) that lead to observable defects (cracks, spalling,
discoloration). Selecting the proper NDT method to detect a given defect is non-trivial because of
the diversity of materials, defect manifestations, and test parameters. For example, moisture-driven
cracking in a concrete bridge might be detectable via ultrasonic testing or infrared thermography
depending on conditions. The challenge is compounded by the need to plan when and how to inspect
complex structures under evolving conditions.</p>
      <p>
        In recent years, knowledge graphs (KGs) have emerged as a promising approach to organize and
retrieve complex multi-modal knowledge in domains like materials science and infrastructure[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
KGs represent information as a network of entities (nodes) and relationships (edges), which can
facilitate knowledge discovery and decision support. In the context of NDT, a KG can explicitly link
materials, deterioration mechanisms, defects, and applicable NDT methods to assist engineers in
choosing appropriate inspection techniques. In our previous work [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we leveraged a large language
∗Corresponding author.
CEUR
Workshop
      </p>
      <p>
        ISSN1613-0073
model (LLM) to automatically extract such relationships from literature and populate a Neo4j-based
NDT knowledge graph. That system introduced material-specific agents (for concrete, wood, steel,
and brick) which used curated glossaries and static rules to guide the LLM’s extraction of triples (e.g.,
Concrete – HAS-DETERIORATION-MECHANISM → Corrosion – CAUSES-PHYSICAL-CHANGE →
Cracking – DETECTED-BY → Ultrasonic Testing). The resulting KG successfully captured intricate
relations, like the applicability of specific NDT methods to materials under certain degradation conditions
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, the 2024 approach was limited to of-line knowledge extraction and lacked a dynamic
mechanism for using this knowledge in inspection planning.
      </p>
      <p>In this paper, we address that gap by introducing an LLM-driven multi-agent system that utilizes
the NDT knowledge graph for automated inspection planning. Our approach is influenced by the
advent of LLMs that can act as autonomous agents reasoning and taking actions (e.g., via the LangChain
framework). We design a set of specialized agents – a PlannerAgent to propose inspection plans, a
ToolSelectorAgent to identify optimal NDT tools for detected defects, and a ForecasterAgent to predict
timelines and future inspection needs. Each agent is powered by a local LLM (such as the 7B-parameter
Mistral model, chosen for its strong performance relative to size 1 and ability to run on-premises via
Ollama) and has access to the semantic knowledge graph and external tools (e.g., databases or analysis
libraries). By dividing the complex task into focused agents, we leverage the benefit that specialized
LLM agents can perform better on focused sub-tasks and provide more interpretable outcomes 2.</p>
      <p>Our semantic knowledge graph has been extended and enriched to support this multi-agent reasoning.
We incorporate new entity types for Infrastructure (e.g., Bridge, Tunnel, Plant) and InspectionPlan,
and we introduce named relationships to represent inspection planning knowledge. For instance, an
InspectionPlan node may have relationships like RECOMMENDS (to an NDT Method), ADDRESSES (a
particular Defect), TARGETS (an Infrastructure or component), and JUSTIFIED-BY (linking to evidence
nodes or subgraphs in the KG that explain the recommendation). By treating the inspection plan itself
as a first-class entity in the KG, we can log the decisions of each agent along with their justifications
in a structured, queryable form. This design is aligned with Semantic Web principles, enabling us to
export the knowledge graph to OWL/RDF and even apply SHACL shapes for validating plan structures.</p>
      <p>The contributions of this paper are summarized as follows:
• Multi-Agent Inspection Planning: We develop a novel multi-agent architecture for
infrastructure inspection planning that integrates LLM reasoning with a domain knowledge graph. The
agents (Planner, ToolSelector, Forecaster) collaborate to generate context-aware inspection plans
(including scheduling) that are informed by both learned knowledge (from LLMs) and symbolic
domain knowledge (from the KG).
• Semantically Enriched NDT Knowledge Graph: We extend our previous NDT KG to include
infrastructure context and inspection planning concepts. The KG uses meaningful relationship
types (e.g., HAS-DEFECT, DETECTED-BY, RECOMMENDS) to enable explainable traversals.
We demonstrate how the KG can be exported as an OWL ontology and linked with existing
ontologies/standards for NDT and infrastructure (ensuring interoperability with Linked Data
ecosystems).
• Explainable Decision Support: Our system provides explainability by design. Each agent’s
decisions (e.g., why a certain NDT tool was selected) are justified by referencing relevant nodes
and relations in the KG, which can be visualized as a subgraph. A custom Web UI displays these
reasoning subgraphs (using PyVis network graphs) alongside the recommendations, allowing
engineers to trace the chain of reasoning for each suggestion. This addresses the typical black-box
nature of LLM decisions by grounding them in a shared knowledge graph.
• Forecasting and Scheduling: We integrate a ForecasterAgent that uses semantic knowledge
(e.g., defect progression rates, maintenance rules) to produce a timeline for inspections. The
agent’s output is presented as an interactive Gantt chart, giving a clear view of the proposed
1https://klu.ai/blog/open-source-llm-models
2https://blog.langchain.dev/langgraph-multi-agent-workflows/
inspection schedule (e.g., immediate tests vs. follow-up inspections after certain intervals). This
adds a temporal dimension to the KG, linking planned actions with time, and provides a preview
of how an inspection program might unfold.
• Case Study Evaluation: We present a case study focused on concrete infrastructure (bridges and
tunnels) sufering common defects (like cracking under high humidity and chloride exposure).
We simulate inspection planning an example scenario and evaluate the system’s performance
in terms of recommendation accuracy, traceability of reasoning, and user feedback. Although
no formal benchmark exists for this novel task, we report qualitative results and feedback from
domain experts (in infrastructure NDT) to assess the system’s usefulness and reliability.</p>
      <p>The rest of the paper is organized as follows. Section 2 describes the methodology, including the
multi-agent system design, knowledge graph schema, LLM integration, and user interface. Section 3
shows how key elements of our knowledge graph and ontology are exported in OWL/RDF, illustrating
interoperability and constraint checking. Section 4 details the experimental setup and case study scenario
used to evaluate our approach. Section 5 presents results and observations from these experiments.
Section 6 discusses related work, comparing our system to existing approaches in semantic web,
multiagent systems, and infrastructure maintenance. Section 7 provides discussion on the implications,
limitations, and possible extensions of this work. Finally, Section 8 concludes the paper and outlines
future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Our research sits at the intersection of semantic web technologies, large language models,
multiagent systems, and infrastructure maintenance. Here we discuss relevant state-of-the-art and how our
approach compares.</p>
      <p>
        Recent years have seen a surge in using LLMs to assist with knowledge graph tasks. LLMs have been
applied to extract triples from text, populate KGs, and even to generate ontologies from natural language
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our prior work [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is one such example, leveraging GPT-based extraction for NDT knowledge.
Surveys like [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and workshops like LLM+KG [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] highlight a key theme: LLMs are powerful but can
hallucinate or lack consistency, whereas KGs provide structured, factual grounding. Our current system
exemplifies this symbiosis—LLM agents perform reasoning and language-based planning, but always
check or log their facts against a curated KG. This falls under what some literature calls “neuro-symbolic
approaches”, combining neural language models with symbolic knowledge [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Unlike pure
retrievalaugmented generation, which might retrieve text passages, we specifically retrieve and use knowledge
graph relations, ensuring the LLM’s decisions align with an ontology.
      </p>
      <p>
        The idea of multiple agents cooperating is a longstanding topic in AI (e.g., BDI agents, blackboard
systems). With LLMs, new frameworks have emerged to orchestrate multiple language model instances
on diferent tasks 3. For example, HuggingGPT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a framework where ChatGPT delegates tasks
to specialist models (though not exactly peers) – it demonstrates how a main LLM can coordinate
multiple tools or models. Similarly, the LangChain community has introduced patterns for multi-agent
collaboration, such as in the LangGraph system. Our approach aligns with these by giving each agent
a clear role and connecting them via a shared state (our knowledge graph acts as a kind of shared
memory or blackboard). A key diference is the explicit semantic structure we use; many LLM
multiagent demos share a text conversation as the medium of communication, whereas we use the KG to
mediate communication (agents post their decisions to the KG). This could be seen as a step towards
“cognitive architectures” for LLMs, where state is structured and persistent. Our ForecasterAgent and
PlannerAgent dynamic is analogous to a planner-scheduler interplay in multi-agent planning literature,
but here both are realized with LLM reasoning enhanced by data.
      </p>
      <p>
        The use of ontologies and KGs in civil infrastructure and NDT is an active area. The SODIA ontology
and related works by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] aim to model non-destructive evaluation processes for bridges. They focus on
3https://blog.langchain.dev/langgraph-multi-agent-workflows/
linking inspection data with BIM (Building Information Models) for bridges, addressing interoperability
issues. Our work shares the goal of structured representation but difers in scope: we incorporate
LLM-driven decision-making on top of the knowledge base. The SODIA approach is more deterministic,
manually encoding expert knowledge in an ontology for planning inspections, whereas we allow an AI
agent to propose plans (bounded by an ontology). Potentially, our system could integrate SODIA as
part of its KG (e.g., use classes from SODIA, or output plans in SODIA format), combining the strengths
of both. Additionally, ontologies like SODIA often emphasize integration with BIM and sensor data,
which is complementary to our focus on literature-derived knowledge and planning logic.
      </p>
      <p>
        Another relevant thread is NDE 4.0, a movement to modernize NDT with digital technologies [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
NDE 4.0 highlights connecting NDT results with digital twins, using data for predictive maintenance.
Our system contributes to NDE 4.0 by providing a semantic layer that could interface with a digital twin
(for instance, the KG could be linked to a digital model of the bridge, marking where defects are and
what inspections are planned). Some prior works have looked at optimizing inspection schedules using
traditional AI (like genetic algorithms or Bayesian methods), but they often don’t incorporate semantic
knowledge or explainability. Our approach ofers a new angle: using reasoning over a knowledge graph
to derive plans that are explainable and adaptable.
      </p>
      <p>There is emerging research on using LLMs in engineering domains for tasks like code generation,
design assistance, or troubleshooting. For instance, LLMs have been used to parse technical documents
and answer questions in domains such as aerospace and automotive maintenance. Our use of a local LLM
(Mistral) with domain-specific prompting is in line with eforts to fine-tune or prompt-engineer LLMs
for specialized tasks 4. One challenge often cited is the need to handle domain-specific terminology
and not have the LLM stray into incorrect reasoning. By continuously referencing a knowledge graph,
our system mitigates this issue; it’s a form of retrieval-augmented decision-making with the retrieval
source being a curated KG rather than a text corpus. This approach is related to what others have done
with tools like GraphGPT or similar systems that allow LLMs to manipulate graphs directly, although
those are more in early experimental stages.</p>
      <p>In AI for decision support, explainability is crucial. Prior work in expert systems and case-based
reasoning often involved showing the chain of reasoning or similar past cases as explanations. Our
system’s design of storing justifications in the KG echoes those principles by making the reasoning
steps explicit. The advantage here is that the explanation is not a post-hoc add-on; it is generated
simultaneously as part of the reasoning (since the agent has to output the rationale and we store it). This
is aligned with the concept of XAI (explainable AI), where explanations should be a native part of the
system. There have been some related works in explainable recommendation systems using knowledge
graphs, which show that linking recommendations to knowledge entities improves user trust. In our
case, the “recommendation” is an NDT method and it’s linked to knowledge entities (defects, materials)
in exactly that spirit.</p>
      <p>In summary, our approach builds upon and extends the state of the art by marrying a semantic
knowledge base with the flexibility of LLM-based agents. We introduce a level of interactivity and
adaptiveness that static ontologies alone don’t provide, while maintaining a level of rigor and
explainability that raw LLM approaches lack. We believe this is one of the first applications of LLM-based
multi-agent systems in the Semantic Web field for a concrete engineering domain (infrastructure NDT),
demonstrating a practical use case of Semantic Web + LLM integration beyond question answering or
triple extraction.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Our approach integrates a knowledge graph (KG) with a multi-agent system to automate inspection
planning, detailed in Figure X (conceptual architecture). The KG acts as a central repository for domain
knowledge and evolving plans, while three specialized LLM-driven agents generate, refine, and schedule
inspection recommendations. A real-time UI facilitates user interaction and visualizes agent outputs.
4https://blog.langchain.dev/langgraph-multi-agent-workflows/</p>
      <sec id="sec-3-1">
        <title>3.1. Multi-Agent System Architecture</title>
        <p>We employ a multi-agent design, with agents communicating via the KG and LangChain’s orchestration.
• PlannerAgent: Initiates the inspection plan based on user context or alerts, querying the KG for
relevant knowledge and proposing a draft plan (e.g., NDT methods, follow-up inspections). Its
decisions and rationale are logged in the KG for transparency.
• ToolSelectorAgent: Selects optimal NDT methods and tools by querying the KG for
defect-toNDT mappings and considering practical factors. It updates the InspectionPlan node in the KG
with recommendations and justifications.
• ForecasterAgent: Adds a temporal dimension by predicting inspection schedules, including
immediate and future actions, based on knowledge-based rules and learned patterns. It stores
recommendations in the KG and triggers a Gantt chart visualization in the UI.</p>
        <p>Agents operate in a LangChain-orchestrated loop, running on a local Mistral 7B LLM via Ollama for
privacy and low latency. This modular design enhances explainability by providing distinct rationales
for each agent’s contribution.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Semantic Knowledge Graph Modeling</title>
        <p>The KG, built on Neo4j, ensures consistent knowledge sharing among agents. Our schema extends
previous work by introducing new node and relationship types for inspection context and plans:
• Infrastructure: Represents assets (e.g., bridges) with properties like type and material, linked to</p>
        <p>Material and Defect nodes.
• Defect / Physical Change: Links to DeteriorationMechanism and Material/Infrastructure nodes,
and to NDTMethod via DETECTED-BY relationships.
• NDTMethod: Represents testing techniques, linked to Defects they detect and Materials they
apply to.
• InspectionPlan: A new node type centralizing plan information, linked to infrastructure, defects,
recommended NDT methods, tasks, and justifications (KG subgraphs).
• InspectionTask: Finer-grained nodes for individual actions within a plan, with properties like
method and scheduled date.</p>
        <p>• Agent (optional): Nodes representing the agents, useful for provenance tracking.
All KG relationships are semantically named, aligning with standard ontologies where possible. Agents
interact with the KG using Cypher queries for data retrieval and updates. Justification tracing is a
critical aspect, achieved by linking InspectionPlan/Task to knowledge nodes that informed the decision,
providing transparent reasoning via KG subgraphs.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. User Interface for Explainable Planning</title>
        <p>A Streamlit web application serves as the UI for scenario input, real-time display of agent outputs, and
visualization. Users define inspection scenarios or load predefined ones. Real-Time Interaction: The
UI displays a running log of agent actions, enhancing user trust. Knowledge Graph Visualization:
PyVis is used to visualize relevant KG subgraphs, showing connections between materials, defects, and
NDT methods, serving as justification for recommendations. Timeline Visualization: Plotly generates
Gantt charts from the ForecasterAgent’s schedule, providing an overview of proposed inspection
timelines. Interactive Exploration: An interactive KG query panel allows advanced users to query
the graph. The UI updates in near real-time, benefiting from local LLM execution. This methodology
ensures transparent, LLM-driven inspection planning grounded in a structured knowledge graph.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Ontology Exports and Semantic Integration</title>
      <p>
        A core advantage of using a semantic knowledge graph is the ability to integrate with existing ontologies
and Linked Data standards. We developed an ontology for NDT inspection planning, based on the KG
schema, which can be exported in OWL/RDF. This ontology enables interoperability with other data
sources (e.g., combining our NDT knowledge with a bridge ontology like SODIA [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) and allows the use
of semantic web tools like reasoners and SHACL validators.
      </p>
      <sec id="sec-4-1">
        <title>4.1. OWL/RDF Ontology Fragments</title>
        <p>Our ontology defines the main classes (Concepts) and properties (Predicates) of the domain:
• Key classes include Infrastructure, Material, Defect (subclass of a more general PhysicalChange
or Damage class), DeteriorationMechanism, NDTMethod, InspectionPlan, and InspectionTask.</p>
        <p>These roughly correspond to the node labels in Neo4j.
• Object properties include hasMaterial, hasDefect, causes, detects (with inverse isDetectedBy),
recommendsMethod, addressesDefect, hasTask, justifiedBy, etc. We align some of these with
existing vocabularies: for instance, for general use we might align causes with a property from
an ontology like Schema.org’s cause or a relevant standard ontology property if available. For
detects, we did not find a direct one in common ontologies, so we define ndt:detects and its
inverse ndt:isDetectedBy. Each property has domain and range restrictions reflecting our usage
(e.g., ndt:detects domain = ndt:NDTMethod, range = ndt:Defect).
• We also include data properties for capturing literal values, such as defect severity (e.g.,
defectSeverity could be a datatype property of Defect), or scheduling details like scheduledDate
(datatype property of InspectionTask).</p>
        <p>The fig. 1 illustrates a snippet of our ontology5. Excerpt of the NDT Inspection Planning ontology in
Turtle syntax. In this example, Bridge-1 (an infrastructure) has material Concrete. A Defect Cracking
is caused by Corrosion and detected by UltrasonicTesting. An InspectionPlan (Plan001) addresses
that cracking, recommends ultrasonic testing, and is justified by the existence of that cracking (the
justifiedBy here could also reference a statement or a literature source; for simplicity we link to the
defect node itself as the justification context).</p>
        <p>By exporting to OWL/RDF, we can leverage reasoners to infer new knowledge. For instance, if we
model inverse properties and perhaps some class hierarchies (we might say Bridge SubClassOf
Infrastructure as shown, or define Crack as subclass of Defect), a reasoner could infer that if UltrasonicTesting
5Available online in Turtle (TTL) format at: https://github.com/firmao/semats2025ZiaAndre/blob/main/pieceonto.ttl
detects Cracking and Cracking is a Defect, then UltrasonicTesting detects some Defect. While trivial in
this case, such inference becomes useful if we have class-level knowledge (e.g., ”All acoustic methods
detect internal defects” could let a reasoner infer that an AcousticEmission test is applicable if the defect
type is internal).</p>
        <p>
          Another benefit is linking to external ontologies. The domain of infrastructure maintenance has
existing ontologies, such as the SODIA ontology for structural diagnostics [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], and various Bridge
Information Modeling ontologies. We can align our ontology to these. For example, SODIA has a
concept of InspectionProcedure and InspectionEquipment in bridge maintenance planning. Our
InspectionPlan could be aligned to their InspectionProcedure, and NDTMethod to InspectionEquipment where
appropriate. Such alignments (done via OWL owl:equivalentClass or rdfs:subClassOf relationships)
would enable data interchange – e.g., one could integrate our KG with a BIM model of a bridge annotated
with SODIA concepts [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This is part of future work but demonstrates the flexibility gained by the
semantic representation.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. SHACL Constraints and SPARQL Integration</title>
        <p>We enhanced our knowledge graph (KG) by using SHACL (Shapes Constraint Language) to enforce
integrity constraints. For example, our SHACL shapes ensure that each InspectionPlan includes at
least one ‘recommendsMethod‘ and ‘addressesDefect‘, and that defects are linked to Infrastructure or
Material for accurate location.</p>
        <p>After each update, we validate the KG using a Python SHACL library to identify inconsistencies,
thereby helping to maintain data quality. We also set up a Fuseki server to make the KG queryable via
SPARQL, allowing complex queries and integration with external RDF datasets.</p>
        <p>In summary, our approach combines ontology export, SHACL shapes for consistency, and SPARQL
for interoperability, ensuring connectivity with the broader semantic web.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and Case Study</title>
      <p>We evaluated our LLM-driven multi-agent system through a case study approach, focusing on a realistic
scenario in bridge and tunnel inspection where domain experts could assess the outputs. The goal was
to test the system’s ability to generate sensible inspection plans and to validate the explainability and
usability of the knowledge graph integration.</p>
      <sec id="sec-5-1">
        <title>5.1. Experimental Setup</title>
        <p>Because there is no established benchmark for automated inspection planning, we designed a simulated
scenario informed by real-world infrastructure issues and expert input. We concentrated on concrete
structures, as concrete is widely used and prone to various deterioration mechanisms, and on steel
components to a lesser extent due to their variety. The scenario description included: the type of
infrastructure, its context, the material(s) involved, and an observed problem (defect or symptom). We
then ran our system to generate an inspection plan for the scenario, capturing the agents’ outputs and
the final KG state. These results were then reviewed by experienced NDT engineers (two members of
our team with civil engineering backgrounds), who provided feedback on the correctness and usefulness
of the plans.</p>
        <p>Example Scenario:
• Bridge with humidity-induced cracking: A concrete road bridge in a humid coastal
environment, where dark cracks have appeared on the underside of the deck. The scenario implies
possible reinforcement corrosion due to chloride ingress (sea salt) leading to cracking.</p>
        <p>For the scenario, we configured the initial KG state appropriately. For instance, in our scenario, the
KG would include facts like Concrete –HAS-DETERIORATION-MECHANISM→ Corrosion
–CAUSESPHYSICAL-CHANGE→ Cracking –DETECTED-BY→ Ultrasonic, Visual, AcousticEmission based on
our prior extracted knowledge. If needed, we added scenario-specific nodes (e.g., an instance node for
the bridge with a Defect node for ”crack”). This simulates the situation where the system has already
recognized an issue (via sensors or human inspection noticing a crack) and now must plan further NDT.</p>
        <p>The LLM agents (Mistral 7B models) were run on a machine with an Nvidia 24GB GPU, allowing
reasonably fast generation. We used temperature settings of 0.2 (fairly deterministic) for the
ToolSelectorAgent to ensure consistent choices, and around 0.5 for the PlannerAgent and ForecasterAgent to
allow some creativity in planning (but still constrained by system prompts). Each agent had a cutof of
1024 tokens to avoid overly verbose outputs.</p>
        <p>We logged various metrics for analysis:
• Recommendation Accuracy: We define this in terms of whether the recommended NDT
methods and the inspection schedule are appropriate for the scenario. Since there is no ground
truth label, we rely on expert judgment. The experts would compare the system’s recommended
methods to what they would have chosen.
• Traceability and Justification : We examine if the system provided a clear justification for each
key decision. This is somewhat subjective, but we looked for the presence of KG-derived facts
in the explanation and whether the visualization made sense (e.g., did the highlighted subgraph
indeed relate to the decision?).
• Timeline Usefulness: We assess if the scheduling (especially for follow-up inspections) seems
reasonable (not too frequent or too sparse) and if the Gantt chart representation is clear.
• Agent Cooperation Behavior: To ensure the multi-agent setup is working well, we monitor if
any agent overrides another incorrectly or if any loops occur. Ideally, the PlannerAgent’s plan is
refined but not completely rewritten by others (unless necessary).
• Performance (qualitative): We note how responsive the system feels and if the agents produce
results within an interactive time frame consistently.</p>
        <p>No formal quantitative “score” was computed, given the exploratory nature of the evaluation, but we
summarized the outcomes of our scenario.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Case Study Results</title>
        <sec id="sec-5-2-1">
          <title>Here we describe the outcomes for one illustrative scenario.</title>
          <p>
            Scenario: Concrete Bridge with Cracking in Humid Environment. The system was given the
context of a concrete bridge (we created an Infrastructure node ”Bridge-A1” with material Concrete)
and an observed defect ”cracking”. The KG had prior knowledge that Concrete cracking can be caused
by corrosion and Ultrasonic testing detects cracking. The agents produced the following plan:
• Recommended Methods: Visual Inspection (immediate, to map cracks), Ultrasonic Pulse-Echo
testing (to measure crack depth and check for internal delamination), and Half-Cell Potential
measurement (to assess rebar corrosion potential).
• Schedule: Immediately perform the detailed visual and ultrasonic inspection. Within 1 month,
perform half-cell potential mapping on the deck. Follow-up: Conduct a moisture monitoring and
another ultrasonic test in 6 months (especially after the wet season), and schedule a comprehensive
inspection in 12 months.
• Justification : The system (via ToolSelectorAgent) explained: ”Ultrasonic testing is recommended
as it can detect and characterize cracking depth in concrete. The humid coastal environment
suggests possible chloride-induced corrosion, so a Half-Cell Potential test is included to check
corrosion activity. Visual inspection is a baseline for mapping all crack locations.” The
ForecasterAgent noted: ”High humidity and salt exposure accelerate corrosion, so a follow-up in
6 months is warranted [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. ”The KG subgraph visualization highlighted Concrete, Cracking,
UltrasonicTesting with the edge Cracking–DETECTED-BY→UltrasonicTesting, as well as a node
representing HalfCellPotential linked to Corrosion (we had an entry that half-cell testing detects
corrosion in rebar).
• Expert Feedback: The experts agreed that the methods chosen were appropriate. They
particularly liked the inclusion of half-cell potential, noting that it shows the system considered
corrosion as a cause. One suggestion was that for a humid environment, infrared thermography
could also detect moisture areas. (Our system did not select IR in this instance; possibly the KG
lacked a strong link for IR detecting cracking or moisture in concrete, which is something to
improve in knowledge base.) The schedule was deemed slightly aggressive (they commented that
a 6-month follow-up might be a bit early if crack widths were small, but understandable given
uncertainty). The experts easily understood the subgraph explanation and found the visual of
Concrete–Crack–Ultrasonic very intuitive: ”It’s basically showing me the textbook rationale,”
one said. Overall accuracy was high: all recommended actions were ones an inspector might do,
and no irrelevant method was suggested.
          </p>
        </sec>
        <sec id="sec-5-2-2">
          <title>Across the scenario, we observed:</title>
          <p>• The PlannerAgent generally did well to outline multi-step plans (often combining methods). In
one case, it missed a method that an expert expected (no IR in scenario 1). This is likely due to
KG limitations or the LLM’s bias; however, the multi-agent loop allows for adding such methods
if the knowledge exists. We plan to incorporate a heuristic that if a defect involves moisture,
always consider IR, etc., possibly as a rule-based nudge to the agent.
• The ToolSelectorAgent ensured that no truly inappropriate method was chosen. We never saw,
for example, a suggestion of using eddy current testing on concrete (which would be wrong). This
confirms that grounding the choices in the KG (which inherently doesn’t have such mismatches)
iflters out many errors.
• The ForecasterAgent tended to err on the side of caution (shorter intervals). We fine-tuned its
prompt to consider standard inspection intervals (like typically 12-24 months for many structures).
Getting the balance right may require more explicit rules or data on defect growth rates. Still, the
schedules were within a reasonable realm and provided a clear starting point that an engineer
could adjust.
• The interactive UI proved to be a key strength. Users could see each step, and importantly, could
intervene. During testing, if an expert user didn’t agree with something, they could pause and
manually edit the KG or ask the system to reconsider. For instance, one expert tried adding a
node “ChlorideTest” (for checking chloride content in concrete) to the KG connected to Corrosion,
then re-ran the planner. The system then included a chloride ion test in the plan. This shows the
system can incorporate new knowledge on the fly, which is a powerful feature of the semantic
approach.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>We compile the findings from the case studies and user feedback to evaluate the system along the
dimensions of accuracy, explainability, and usefulness.</p>
      <sec id="sec-6-1">
        <title>6.1. Accuracy of Recommendations</title>
        <p>Overall, the multi-agent system produced highly relevant inspection recommendations in our tested
scenario. In our scenario with a total of 10 distinct recommended NDT actions, the domain experts
judged 9 of them to be appropriate. The one debatable recommendation was the suggestion of using
GPR in the tunnel scenario, which was still considered acceptable albeit not always standard practice.
No clearly wrong techniques were recommended in any scenario (zero false positives in that sense).
This is a notable improvement over a naive approach of using an LLM without KG grounding, which in
preliminary trials sometimes hallucinated irrelevant techniques.</p>
        <p>
          A qualitative observation is that the material-specific context was always respected. For example,
methods that are only applicable to metals (like eddy current testing) never appeared for concrete
scenarios, because the KG doesn’t link them to concrete and the ToolSelectorAgent consequently didn’t
consider them. This confirms that the KG served as an efective constraint network to narrow the LLM’s
choices [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. It essentially prunes the search space of the LLM to plausible options, leveraging factual
knowledge.
        </p>
        <p>The PlannerAgent’s sequencing of actions (which method first, etc.) was reasonable in all cases. It
tended to list visual inspection first as a baseline (which aligns with standard practice: one typically
does the simplest inspection first). More sophisticated techniques followed and often complementary
techniques were included (like ultrasonic + half-cell for cracking/corrosion). The experts noted that the
plans were sometimes “ambitious” in including multiple methods, but not illogical. In practice, budget
constraints might limit doing all suggested tests, but from a technical standpoint, more information is
better.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Explainability and Traceability</title>
        <p>This was a strong point of the system. Each plan came with an explanation that could be traced
through the knowledge graph. The use of the KG subgraph visualization was very well received. In a
post-evaluation survey, both experts strongly agreed that “the system’s reasoning is transparent and
understandable.” They gave examples: “I can see exactly why ultrasonic was suggested, because the
graph shows ultrasonic connected to the crack issue” referring to the visual justification.</p>
        <p>One expert even commented that the subgraph itself taught them something: “I didn’t know that
method X could detect phenomenon Y until I saw it in the graph and recalled a paper about it.” This
suggests the system not only justifies but can also serve as a quick refresher of domain knowledge.</p>
        <p>We logged the justifications that each agent recorded. The PlannerAgent typically cited 1-2 facts from
the KG in its internal reasoning. The ToolSelectorAgent often cited multiple if available (for instance,
it might list all possible methods from the KG with some scores before picking one; we saw evidence
of this in its prompt outputs). All those cited facts were indeed present as relationships in the KG,
confirming that the agent wasn’t hallucinating those reasons – it was using the provided data.</p>
        <p>The SHACL validations were run on final plans and found no violations, meaning our plans met the
expected structure (each had methods and defects linked properly). This gives confidence that the data
in the KG is consistent, which indirectly supports explainability (no missing links that would confuse
users).</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Usefulness of Timeline Forecasts</title>
        <p>The timeline or scheduling aspect was somewhat experimental, but it turned out to be quite useful as a
communication tool. The Gantt charts gave a clear picture of immediate vs future actions. Experts said
that in a real setting, they would refine the schedule but having an initial suggestion is helpful. One
engineer mentioned that often scheduling is left out of automated tools, so including it is a plus: “It’s
nice that it doesn’t just say what to do, but also when to do it.”</p>
        <p>Comparatively, the timeline was less certain to evaluate because optimal intervals can vary widely.
However, for a example of scenario (bridge with cracks), the suggestion of a 6-month follow-up aligned
with guidelines for significant defects in critical members.</p>
        <p>In the scenario not detailed earlier (the steel corrosion scenario), the ForecasterAgent proposed
inspections every 12 months after an initial fix. The expert said they would do 24 months normally for
that case, indicating the agent was a bit conservative. This again points to possibly tuning the agent
with more knowledge or data about corrosion rates. But importantly, the agent’s explanation for the 12
months was “because the corrosion was severe, a yearly check is recommended” – showing it tried to
justify the interval. This reasoning can be adjusted if needed.</p>
      </sec>
      <sec id="sec-6-4">
        <title>6.4. System Performance and Agent Collaboration</title>
        <p>All components ran smoothly in our tests. The multi-agent orchestration did not produce any deadlocks
or contradictory outputs. On average, the PlannerAgent took 5 seconds, ToolSelector 30 seconds, and
Forecaster 10 seconds on the scenario on our hardware. The overhead of KG queries was negligible
(millisecond order for the relevant subgraph queries). Thus, an end-to-end plan was usually generated
in 50-60 seconds. This meets interactive usage requirements.</p>
        <p>We observed one case where the PlannerAgent and ForecasterAgent had a minor tug-of-war: the
PlannerAgent initially didn’t include a follow-up inspection in the plan, but the ForecasterAgent added
one (with reasoning). When we re-ran with the updated KG, the PlannerAgent then also started
including that follow-up by default (since the plan node now existed and was fed in context). This kind
of loop stabilized quickly and actually enhanced the final output. It shows the agents can iteratively
converge on a more complete plan.</p>
        <p>The local LLMs (Mistral 7B) handled domain content well, likely thanks to the prompts including
concrete data. We did not encounter any toxic or irrelevant outputs (which can sometimes happen with
raw LLMs), likely because the domain and context kept the generation focused. Running everything
locally also gave us control and reproducibility.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion</title>
      <p>The development and evaluation of our LLM-driven multi-agent inspection planning system revealed
numerous insights, as well as challenges and opportunities for further improvement.</p>
      <p>Our results reinforce the value of combining symbolic knowledge (KG/ontology) with sub-symbolic
reasoning (LLMs). The system was able to produce coherent plans that align with domain knowledge,
something that would be dificult using either approach alone. Pure rule-based systems (symbolic only)
might be rigid and hard to scale, whereas pure LLM approaches risk factual errors. By integrating
the two, we achieved a balance: the KG provided a knowledge scope and memory, and the LLM
agents provided reasoning and language generation capabilities within that scope. This neuro-symbolic
synergy is an encouraging sign for other domains too – e.g., one could imagine similar systems in
medical diagnosis (ontology + LLM agents) or manufacturing process planning.</p>
      <p>While our case study was in NDT for infrastructure, the architecture is largely domain-agnostic.
The Planner/ToolSelector/Forecaster pattern could be applied to our scenario where a plan needs to
be formulated using knowledge of tools and expected outcomes. For example, in IT security incident
response, one could have a planner agent (to propose actions), a tool selector (pick which software or
script to run), and a forecaster (predict next steps or schedule follow-ups), all grounded in a security
knowledge graph. The key requirement is a well-structured knowledge graph and a means to prompt
LLMs with that knowledge. We used LangChain and Neo4j, but other implementations could use
diferent orchestrators or graph databases. The success of our approach suggests that investing in a
good KG for a domain and then layering LLM agents on top can be a fruitful pattern.</p>
      <p>Our KG was initially built on literature; thus, its content and accuracy influence the system’s output.
We noticed that gaps in the KG (like the missing link between moisture and IR thermography in our case
scenario) led to that method not being suggested. This highlights a knowledge engineering challenge:
ensuring the KG is comprehensive. We see two solutions: (1) continue to use LLMs to ingest more
literature and expand the KG (maybe fine-tune GPT on NDT literature for better extraction, or use other
extraction techniques), and (2) allow the system to query external sources when the KG lacks info. For
example, if the KG doesn’t have any method for a certain defect, the ToolSelectorAgent could do a quick
search (perhaps an internet search or a query on a larger knowledge base) to find candidates. However,
integrating that in a safe and controlled way is non-trivial and could reintroduce unverified info. For
now, keeping the KG curated by domain experts (with LLM assistance) might be the safer route.</p>
      <p>We opted for local LLMs (Mistral 7B) for reasons of privacy, cost, and speed. The downside is that
such models, despite being surprisingly capable, are not as knowledgeable as a massive model like
GPT-4. There might be niche domain facts or reasoning patterns a smaller model misses. We mitigate
this by providing the KG context, but we might also explore hybrid approaches. One could use a larger
model in the loop for particularly complex reasoning, or fine-tune a local model on the outputs of a
larger model in similar tasks. The field is quickly evolving, and models in 2025 may become both more
powerful and eficient, which will directly benefit systems like ours.</p>
      <p>A critical factor for deployment in the field is whether engineers and inspectors will trust and adopt
such a system. Our explainability features are designed to build trust, and initial feedback has been
positive. Users felt they could trust the system because it showed its work. However, trust also comes
from extensive validation. In the future, we would like to test the system using real inspection data and
scenarios, possibly in collaboration with infrastructure managers. A gradual introduction as a decision
support tool (not an automated decision-maker) will help—human inspectors should have the final say,
with the system serving as an intelligent assistant.</p>
      <p>One limitation of our case study evaluation is its scale. We tested a handful of our scenario with
a small expert panel. To robustly evaluate performance, more scenarios covering a broader range of
conditions (such as diferent defect types and multi-material structures) should be used. Additionally,
stress-testing the system’s knowledge boundaries would be beneficial: feed it a scenario it doesn’t
“expect” (such as a very novel material or method) and observe how it responds. Another limitation is
that our ForecasterAgent currently uses quite heuristic rules. A more advanced approach could involve
simulating deterioration (perhaps via physics-based models or historical data) to inform scheduling.
Integrating such models with the LLM agent would be an interesting extension (e.g., an agent could call
a small physics simulation tool to predict corrosion growth and then decide on an inspection interval).</p>
      <p>In a practical deployment, our system would ideally connect with asset management databases or
BIM systems. For instance, if a bridge has a digital twin model, the system could annotate it with
the inspection plan results (e.g., mark on the 3D model where to inspect). Conversely, sensor data or
inspection reports from the field could feed back into the KG (closing a loop for continuous learning).
One idea is to use the LLM agents to parse new inspection reports and update the KG, similar to how
we parsed literature. This would keep the knowledge updated with real findings over time.</p>
      <p>While automating infrastructure inspection planning has clear benefits (consistency, possibly catching
overlooked options), we must ensure that the AI does not encourage unsafe practices. We intentionally
did not include any suggestion that the AI could replace actual inspections; it only plans them. The
human in the loop is essential. Also, using local models avoids sharing potentially sensitive
information about infrastructure vulnerabilities with third parties, an important consideration for critical
infrastructure security.</p>
      <p>In conclusion of the discussion, our work demonstrates a feasible path forward for intelligent
inspection planning, but it also opens many avenues (improving KG, better models, integration, domain
expansion). It serves as a case study of how Semantic Web technologies can enhance the deployment of
LLMs in specialized domains, resulting in systems that are not just smart, but also interpretable and
integrable in enterprise workflows.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>We developed a system that combines a semantic knowledge graph for non-destructive testing (NDT)
with a multi-agent framework powered by a large language model (LLM) to automate and explain
inspection planning. Our approach features specialized agents that generate inspection plans, select
NDT methods, and propose schedules, all with clear justifications.</p>
      <p>The user-friendly web interface, featuring visualizations, enhances trust and understanding among
domain experts. By ensuring interoperability through OWL/RDF and exploring further data integration,
our system positions itself within the broader Semantic Web.</p>
      <p>Future work will focus on expanding the knowledge graph, refining agents, evaluating real-world
applications, and incorporating feedback loops for continuous improvement.</p>
      <p>In essence, our LLM-driven system aims to support infrastructure maintenance by combining expert
knowledge with AI, thereby enhancing safety and resilience.
During the preparation of this work, the authors utilized Grammarly to correct and spell-check, as well
as to improve the text’s grammatical readability. After using the tool, the authors reviewed and edited
the content as needed to take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Valdestilhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bayerlein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Moreno</given-names>
            <surname>Torres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A. J.</given-names>
            <surname>Zia</surname>
          </string-name>
          , T. Muth,
          <article-title>The intersection between semantic web and materials science</article-title>
          ,
          <source>Advanced Intelligent Systems</source>
          <volume>5</volume>
          (
          <year>2023</year>
          )
          <article-title>2300051</article-title>
          . URL: https://advanced.onlinelibrary.wiley.com/ doi/abs/10.1002/aisy.202300051. doi:https://doi.org/10.1002/aisy.202300051. arXiv:https://advanced.onlinelibrary.wiley.com/doi/pdf/10.1002/aisy.202300051.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. A. J.</given-names>
            <surname>Zia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valdestilhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Torres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kruschwitz</surname>
          </string-name>
          ,
          <article-title>Leveraging large language models for automated knowledge graphs generation in non-destructive testing</article-title>
          , in: SeMatS 2024-
          <article-title>Proceedings of the first international workshop on semantic materials science: Harnessing the power of semantic web technologies in materials science</article-title>
          , volume
          <volume>3760</volume>
          ,
          <string-name>
            <given-names>RWTH</given-names>
            <surname>Aachen</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>101</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Shimizu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>Accelerating knowledge graph and ontology engineering with large language models</article-title>
          ,
          <source>Journal of Web Semantics</source>
          (
          <year>2025</year>
          )
          <fpage>100862</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Huang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Wang,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Lmkg: A large-scale and multi-source medical knowledge graph for intelligent medicine applications</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>284</volume>
          (
          <year>2024</year>
          )
          <article-title>111323</article-title>
          . URL: https://www.sciencedirect.com/ science/article/pii/S0950705123010717. doi:https://doi.org/10.1016/j.knosys.
          <year>2023</year>
          .
          <volume>111323</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Kemell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rasku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abrahamsson</surname>
          </string-name>
          ,
          <article-title>Developing retrieval augmented generation (rag) based llm systems from pdfs: An experience report</article-title>
          ,
          <source>arXiv preprint arXiv:2410.15944</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tang</surname>
          </string-name>
          , E. Chen,
          <article-title>Understanding the planning of llm agents: A survey</article-title>
          ,
          <source>arXiv preprint arXiv:2402.02716</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.-I.</given-names>
            <surname>Jäkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Heinlein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Morgenstern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P. H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Klemt-Albert</surname>
          </string-name>
          ,
          <article-title>System-and data-integrated linking of digital 3d models of existing bridge structures with knowledge graphs of non-descriptive diagnostic methods</article-title>
          ,
          <source>Journal of Information Technology in Construction (ITcon) 30</source>
          (
          <year>2025</year>
          )
          <fpage>603</fpage>
          -
          <lpage>630</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Niederleithinger</surname>
          </string-name>
          ,
          <article-title>Nde 4.0 in civil engineering</article-title>
          ,
          <source>in: Handbook of Nondestructive Evaluation 4.0</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>