<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>LDAC</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Developing a RAG-Based System for Natural Language Access to Linked Building Data on Construction Sites</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lukas Kirner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jyrki Oraskari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sigrid Brell-Cokcan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair for Individualized Production, RWTH Aachen University</institution>
          ,
          <addr-line>52074 Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>13</volume>
      <fpage>09</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>Digitalizing construction site workflows requires handling heterogeneous data from various sources in a userfriendly way. Linked Building Data (LBD) ofers a powerful framework for this but remains largely inaccessible to on-site users unfamiliar with Semantic Web technologies. To address this, we developed a natural language interface integrated into a 3D construction site-focused LBD viewer, enabling intuitive access to structured and unstructured data without requiring SPARQL or RDF expertise. Our system employs a locally hosted large language model (LLM) and a Retrieval-Augmented Generation (RAG) pipeline to retrieve and generate responses based on both static and real-time data. We detail the architecture, implementation, and trade-ofs of three RAG design approaches and present a robust, prebuilt query pipeline tailored for dynamic construction site use cases. This work demonstrates the feasibility, limitations, and potential of LLM-enhanced access to LBD for non-expert users in real-world construction environments.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;RAG</kwd>
        <kwd>LLM</kwd>
        <kwd>Linked Building Data</kwd>
        <kwd>Construction Site Management</kwd>
        <kwd>Usability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The construction industry is characterized by a multitude of heterogeneous data sources, creating a
fragmented digital landscape. Interoperability across these sources is essential for efective
collaboration and addressing environmental, social, and economic sustainability goals—especially as
construction increasingly adopts digital and robotic technologies. Linked Building Data (LBD) ofers a
promising approach to improve interoperability [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, despite its potential and existing
implementations focused on Building Information Modeling (BIM), LBD remains largely inaccessible to
on-site users such as field engineers, equipment operators, and logistics coordinators.
      </p>
      <p>
        This is mainly due to two factors: (1) on-site users predominantly lack familiarity with Semantic Web
technologies like SPARQL and RDF, and (2) most existing tools require technical setups, e.g. bringing
a laptop to the work area, which is deemed unsuitable for field environments. This gap has been noted
across academic and industry settings, and got particularly evident in the EConoM research project,
which investigates AI, edge computing, and 5G for constructions site management. One of its key
use cases, autonomous intralogistics, relies on integration and user interaction of data about logistics
processes, building elements, materials, and real-time telemetry from mobile robots [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>
        To improve accessibility for such use cases, we began developing a web-based software prototype.
As a foundation, we adapted the open-source three.js-based FOG demo application by Mathias Bonduel
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]1. The resulting viewer supports endpoint selection, loads geometry generated via the IFCtoLBD
converter [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and provides a SPARQL interface alongside a result viewer (see Figure 1). While this
prototype demonstrated the benefits of LBD, it was hardly helpful for on-site users as many of their
requirements were not met. These included support for real-time data integration (e.g., via MQTT or
      </p>
      <p>
        ROS), access to non-RDF sources like time-series databases, and reduced reliance on SPARQL.
Motivated by these needs and inspired by the LDAC Summer School 2024 and its ”LLM+RAG+Ontologies”
repository2, we explored a Retrieval-Augmented Generation (RAG) approach powered by a local large
language model (LLM) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. RAG enables LLMs to retrieve relevant external data based on user
input and combine it with their internal knowledge to generate context-aware responses [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], making it
well-suited for accessing both static and dynamic construction data.
      </p>
      <p>This paper presents the design and implementation of an RAG-powered LBD interface tailored for
real-world construction environments. We evaluate diferent RAG strategies and propose a robust
architecture based on prebuilt queries, assessing its feasibility, limitations, and potential for broader
application.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Not only since the wide availability of LLMs and RAG techniques, semantic data processing with the
help of natural language has sparked interest among researchers exploring new ways to retrieve,
analyze, and interact with construction data. In 2016, Lin et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduced a cloud-based natural
language retrieval system for BIM, enabling non-experts to query data intuitively. Their approach used
MongoDB, MapReduce, and IFC-based keyword mapping to enhance query eficiency and semantic
understanding. On the other hand, LD-BIM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is a notable first-generation tool that enables users to
query Linked Building Data (LBD) using natural language. It can understand plain English commands
such as ”Show windows” and focuses on queries that list elements of a specific type. In contrast, our
approach extends beyond BIM models. We ofer a chat interface that allows users to query and create
content through conversation and interact with devices on a construction site.
      </p>
      <p>
        Zheng et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] proposed a Knowledge Graph (KG)-enhanced approach to improve LLM-based
contract risk identification, demonstrating better fact recall and domain-specific reasoning. Automated
Reporting and Review uses large language models (LLMs) for automated reports in construction. Pu et al.
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] introduce AutoRepo, a framework that uses unmanned vehicles and multimodal large language
models to automate inspection report generation. Tested on a real-world site, it speeds up
inspections, reduces resource waste, and meets regulatory standards. On the other hand, Cruz-Castro et al.
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] have examined using an LLM-based system for real-time feedback on technical reports in AEC
education, addressing the challenge of time-consuming traditional evaluations.
      </p>
      <p>
        LLM models face challenges such as inaccurate responses and made-up answers. Uhm et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
evaluated the Retrieval-Augmented Generation-based Generative Pre-Trained Transformers (RAG-GPT)
model to generate detailed construction safety information. The RAG-GPT model outperformed other
models, with experts praising its contextual relevance and accuracy. Yi et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] discuss using a
generative pre-trained transformer (GPT) model to organize and retrieve data using a dataset from over
200 wells, including various reports and data types. The model allows users to access information
and provides answers with references, addressing concerns about accuracy. Further, natural language
processing (NLP) techniques can be used to match data within a semantic model. One approach is
to create a similarity index for triple patterns, such as comments or literals. Information retrieval
leverages the semantic similarity between a given sentence and the sought-after information. For that,
Random Projection is utilized in GraphDB3 as part of the Semantic Similarity Searches plugin within
the program.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Requirements and Design Decisions</title>
      <p>The primary use case for the system was its deployment on the local edge server of the reference
construction site in Aachen, a full-scale construction testbed at RWTH Aachen University. This
computational core, running a large number of dockerized services, is connected to a dedicated 5G campus
network, enabling low-latency, high-throughput communication with all machines and devices on-site.</p>
      <p>
        Due to the network topology and security constraints, the RAG system had to be split into at least two
components: a user interface (UI) running on the user’s device, and backend RAG services deployed
on the edge server, where access to databases and machine APIs is provided. Several existing solutions
could be integrated into the LBD viewer with minimal modification for the UI. Since the edge server
is equipped with four Nvidia A40 GPUs (48 GB VRAM each), deploying the LLM locally via Ollama
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] was a natural fit. Ollama is a software tool designed to simplify hosting open-source LLMs in
containerized environments.
      </p>
      <p>However, multiple approaches to implementing RAG exist, each with distinct trade-ofs. The
following section outlines three options we investigated, along with a comparative summary of their
respective strengths and limitations (see Table 1).</p>
      <sec id="sec-3-1">
        <title>3.1. Tested RAG Approaches</title>
        <p>Vector Database with Data Embeddings This method involves storing data as dense embeddings
in a vector database to capture semantic relationships. Embeddings are fixed-size, high-dimensional
vectors generated by the LLM to represent concepts or text. Similarity search over these vectors enables
retrieving the most relevant content without explicit keyword matching. While powerful for static,
document-like data, this approach performs poorly for dynamic or real-time use cases due to the need
for frequent, costly re-embedding.</p>
        <p>Fully LLM-generated Queries This approach leverages the LLM’s ability to dynamically create
SPARQL or SQL queries based on natural language input and an understanding of the data schema.
For example, the model can convert a user query into a valid SPARQL command by referencing an
embedded ontology. While this enables flexible and expressive querying, the generated queries often
sufer from syntax errors or semantic mismatches, especially in heterogeneous data environments.
Pipeline with Prebuilt Queries and APIs This method employs a modular execution pipeline in
which the LLM selects and parameterizes prebuilt or partially prebuilt scripts. These are connected to
specific RDF stores, time-series databases, and real-time data streams. Although this approach requires
more manual setup, it ensures predictable behavior, simplifies testing and debugging, and allows for
ifne-grained control over sensitive or safety-critical operations.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Rationale for the Pipeline Approach with Prebuilt Queries</title>
        <p>Given the dynamic nature of construction site processes, machine data, and the architectural
constraints of the edge server infrastructure, the pipeline approach with prebuilt queries emerged as the
most suitable option. Unlike other methods, it supports persistent RDF datasets and near-real-time
data streams or machine control via dedicated APIs. This flexibility allows the system to bridge static
knowledge bases and dynamic sources while maintaining architectural modularity. A major strength
of this approach is its predictability: prebuilt queries and scripted logic ensure stable and repeatable
behaviour. This is particularly important when user interactions do not simply retrieve data but can also
create new data entries or trigger machine actions. In such contexts, reliability and system safety take
precedence over flexibility. Although the pipeline approach requires more upfront development efort,
we considered this trade-of essential to ensure robust performance and operational security. In
contrast, while theoretically more flexible, dynamically generated queries based on embedded ontologies
proved error-prone, dificult to debug, and hard to predict in production scenarios. Repetition of
similar queries (e.g., highlighting specific building elements) also introduces unnecessary computational
overhead without added user value.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Implementation</title>
      <sec id="sec-4-1">
        <title>4.1. General Setup</title>
        <p>To implement a functional prototype, we designed a layered system architecture using modular,
opensource tools, specifically from Open WebUI’s pipelines toolchain 4. The architecture follows a clear
separation of concerns: each layer is responsible for a specific function within the data retrieval and
interaction pipeline. This modularity improves maintainability, allows distributed deployment across
constrained edge hardware, and ensures safe execution of potentially sensitive operations such as
machine control or data creation.</p>
        <p>The architecture consists of five layers (see Figure 2):</p>
        <p>• User Interface Layer: Enables user interaction through natural language queries and
visualization of data. Built by embedding Open WebUI’s chat interface into a Three.js-based LBD viewer.
• Instruction Allocation Layer: Matches user intent with available functions and API calls.
Implemented using custom Python code adapted from Open WebUI’s pipelines toolkit.
• Data Retrieval Layer: Fetches information from RDF stores (via SPARQL), time-series
databases, and real-time sources like MQTT or ROS.
• Context Augmentation Layer: Enriches the user query with retrieved data to prepare a
structured prompt for the LLM.
• Response Generation Layer: Uses a locally deployed LLM to generate responses. This layer
can also interact with the viewer via MQTT to update tables or visual highlights.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. The Web Interface</title>
        <p>The existing JavaScript implementation was updated to use a newer version of the Comunica library
for more advanced SPARQL queries. Due to changes in the library’s API, this required modifying
the internal logic used to process and display result sets. We chose React/Node.js as the frontend
framework to improve modularity and maintainability, replacing global JavaScript variables with React
Context, Hooks, and Props. This change also enabled the integration of React Three Fiber for 3D
rendering.</p>
        <p>The conversational chat interface is based on Open WebUI and is embedded into the application via
an HTML iframe. The backend LLM agent communicates with the frontend through MQTT channels,
pushing updates such as SPARQL query results or structured messages. These are used to populate
tables, highlight 3D elements, or provide feedback directly in the chat window (see Figure 3). While
the modular design allows for easy extension and reconfiguration, it also introduces complexity in user
session handling and authentication. For example, ensuring consistent identity across the LBD viewer
and the embedded chat remains a challenge and is the subject of ongoing work.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Linked Data Retrieval</title>
        <p>
          One of the first and simplest functionalities to test the system was retrieving LBD data by querying
for specific building elements, using filters based on Building Element Ontology (BEO) classes or the
Building Topology Ontology (BOT). For the BEO, this was achieved by passing the variable Etype
(element type) into the Python script, which was then used in a simple SPARQL SELECT query. Since
BEO aligns closely with IFC class names, mapping the user’s query to the nearest IFC class and passing
it as Etype worked well for most elements. When using language-agnostic models (we tested Llama
3.1 (8B and 70B) [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] as well as Mistral NeMo 12B [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]), this worked for many languages. To our
surprise, successful queries were created in Rōmaji by a visiting Japanese delegation. For example,
the prompt “ウィンドウタイプのコンポーネントをすべて表示する” (”Display all components of a
window type”) correctly retrieved all elements of class beo:Window.
        </p>
        <p>To ensure functions could be discovered and invoked by the Instruction Allocation layer, they have
to be clearly described. This includes what each script does, what parameters it takes, and what it
returns (see Listing 1).</p>
        <p>Listing 1: Description for the python definition retrieving the OLCC transport details.
1 """
2 Gets the detail information of a OLCC transport for a specific transport ID.
3
4 :param id: The ID of the specific OLCC transport for which we want more information.
5 :return: The detailed information about the specific OLCC process as markdown.
6 """</p>
        <p>This description was used for a function retrieving logistics process data from the Online Logistics
Control Center (OLCC), used by Zeppelin Rental GmbH for tracking and billing of material transports.
In our observed use cases, users typically start by requesting all inbound shipments for a given day.
A follow-up query, such as “Return all data for OLCC ID 187005,” triggers the function to fetch the
shipment details. The SPARQL query used in the backend is shown in Listing 2.</p>
        <p>Listing 2: Query for the logistics data which belong to a certain OLCC ID.
1 PREFIX olcc: &lt;http://w3id.org/olcc#&gt;
2 PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;
3
4 DESCRIBE ?resource
5 WHERE {?resource olcc:buchungsId "{id}"^^xsd:integer.}</p>
        <p>The resulting RDF graph is pre-processed and returned in multiple ways. To keep the chat interface
clean, only a short message is displayed, while full tabular data is transmitted via MQTT to the viewer
and shown in the application table. An interesting feature of the response generation layer is that it
can distinguish between internal “knowledge” (e.g., JSON results not shown to the user) and final
userfacing output. This allows, for example, internal logistics metadata like olcc:ladezone (loading zone) to
remain in context and be retrieved later when the user asks follow-up questions such as “Where is the
last known position of the package?”</p>
        <p>
          More complex queries require more attention on the data retrieval layer, such as prompting for
scheduled construction progress, as shown in Figure 3. For our construction site data model, we can
query planned and accomplished construction progress using the Internet of Construction Process
Ontology (IoC)5, specifically the concepts of ioc:Schedule and ioc:Status. However, the instances we want to
iflter are modelled using xsd:dateTime literals according to ISO 8601 [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Users tend not to use these
descriptions and refer to time using phrases like ”calendar week 42” or relative expressions such as
”end of next week”.
        </p>
        <p>To address this, we implemented a lightweight solution using an instruction-tuned language model
(Mistral-Nemo-Instruct) to convert natural language date expressions into ISO 8601 format. The
function can be accessed directly by the instruction allocation layer or triggered as a fallback when a regex
check on a date parameter fails for ISO compliance. To ensure reliable parsing, we made eforts to
prevent the model from adding annotations or altering the structure of the output. While instruct
models are fine-tuned to follow explicit prompts, they also retain general world knowledge. For
example, prompting the model with “the day Columbus reached the new world” correctly produces the
ISO-compliant literal ”1492-10-12T02:00:00Z” for use in filtering queries.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Generating process information</title>
        <p>
          Data availability can certainly be improved with the presented RAG approach using LBD. More
promising, however, are the opportunities to generate data on demand in the field, e.g., for the planning and
execution of robotic processes, as there are currently no accessible methods for doing so [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Since the
initial efort of the user, before any possible increase in eficiency, is often decisive for the successful
introduction of new technologies, the concept presented here could contribute decisively to making
construction robotics more attractive for construction practice. Figure 4 shows a chat interaction to
instantiate a process chain using the ioc:Process concept.
        </p>
        <p>The prompt contains the parameters of what to transport (”Paket1875-1”), where to transport it to
5https://internet-of-construction.github.io/IoC-Process-Ontology/
(”WorkStorage_5”), and which machine to use (”Innok223”). At the current stage, these parameters
must either be explicitly provided by the user or inferred from the context of previous answers. While
this approach enables flexibility, it also poses challenges to usability, as users may not always know
or remember the exact identifiers needed. We consider this a limitation of the current implementation
and plan to integrate more robust disambiguation and auto-suggestion mechanisms in future iterations.
The chat response shows that one process has been instantiated and also mentions that six child
processes have been created. These processes depend on the machine requested. The Innok Robotics Heros
223 is an autonomous ground vehicle (AGV) equipped with an automatic trailer coupling system. This
means that logistics processes with this AGV follow a reference process scheme that is first queried. It
consists of loading the trailer with the requested payload (1), moving the AGV to the trailer position
(2), coupling the trailer (3), transporting to the target position (4), decoupling (5), and moving the AGV
out of the target zone (6).</p>
        <p>Generating the process sequence involves several validation steps. If any check fails—such as
missing material location or unavailable target zone—the system responds with clarification requests or
error messages. We plan to extend this validation further to include capability matching and
integration with the machine scheduling system. Once the processes are instantiated, the backend services
handle path planning and execution, which can be launched immediately if the schedule permits. As
seen in Figure 4, the resulting path planning and current operation status are both formalized in RDF
and visualized in the 3D viewer. While the system currently operates with predefined process
templates, the underlying architecture could be extended to support standardized workflow definitions,
such as BPMN, and executed through workflow engines (e.g., Camunda, WSO2). However,
introducing such complexity would require additional semantic modeling efort and more granular user control
over process instantiation and lifecycle management.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Digital Twin approach by integrating machines and their data</title>
        <p>As the RAG pipeline has access to a Linked Data model, including machines, their network endpoints,
associated APIs, and the 5G network infrastructure, it can be extended to retrieve machine data or
even trigger control mechanisms on the construction site with relative ease. The web app can be used,
for example, on a tablet and operated via voice using the integrated microphone and speech-to-text
interface. Command success rates depend on the strictness settings of the pipeline, which can be
adjusted by setting the model’s temperature, and on the clarity of the user’s speech. Figure 5 shows
examples of how reality is shadowed in the Digital Twin.</p>
        <p>
          For the autonomous transportation robot, the system can access onboard sensors and retrieve media
streams, such as images or video, from its cameras. Through a time-series database, users can also
query machine telemetry, such as battery status (shown at the bottom of Figure 5). In addition, live
streams of data like rotational speed, location, or the robot’s point cloud, used for obstacle detection,
can be made available. These point cloud updates (visualized as red dots) provide temporal, real-time
information not typically modeled. For example, the red line at the lower center of Figure 5 represents
the open door of a construction container. Based on previous work in project [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], the system also
supports access to network coverage simulations for the construction site. The increased accessibility
of this information presents significant potential for improving decision-making on site (e.g., within a
Last Planner System).
        </p>
        <p>In addition to data retrieval, the system can also be used for actuation—controlling devices through
exposed APIs. A practical example is a construction lift that can move vertically. Here, voice
commands ofer tangible benefits, allowing workers to control the lift remotely without dismounting from
a vehicle. While this control functionality is technically feasible, it introduces critical safety concerns.
Unlike passive data queries (e.g., requesting an onboard image), misinterpreted or imprecise control
commands could lead to unsafe actions such as activating the wrong equipment. For this reason,
safeguards must be put in place to ensure that potentially hazardous control operations are only executed
with clear user intent and proper authentication.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The presented RAG-based system for accessing Linked Building Data (LBD) and other construction
site information demonstrates both significant strengths and critical limitations. To evaluate its overall
impact, we outline the main strengths, challenges, opportunities, and associated risks.</p>
      <sec id="sec-5-1">
        <title>Strengths and Usability Gains</title>
        <p>A key advantage of the system is its ability to lower the entry barrier to LBD and other complex,
structured data environments for non-expert users on construction sites. Users unfamiliar with SPARQL,
RDF, or system-specific APIs can now interact with LBD models and digital twins using natural
language. This enhances usability and accessibility compared to traditional interfaces. Beyond data
retrieval, the system enables advanced functionalities such as initiating autonomous robotic processes or
accessing real-time sensor streams without requiring technical expertise. The conversational interface
contributes to user confidence by guiding interactions, requesting missing parameters, and providing
feedback instead of cryptic errors.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Challenges and Limitations</title>
        <p>While the use of prebuilt queries ensures predictable and safe behavior, it also limits system flexibility.
The current approach restricts interactions to predefined templates and cannot yet accommodate
arbitrary queries or dynamically composed logic. Error handling remains a challenge, particularly when
users input vague or malformed parameters. This is especially noticeable in speech-based interaction,
where identifiers like WorkStorage_5 are prone to misinterpretation. As the system scales to include
more data types and interaction patterns, retrieval accuracy and validation become increasingly
important to ensure reliable operation and user trust.</p>
      </sec>
      <sec id="sec-5-3">
        <title>Opportunities and Future Directions</title>
        <p>Future improvements should address scalability and adaptability. They could integrate specialized
LLMs optimized for structured retrieval, improving accuracy while reducing manual setup.
Agentbased frameworks capable of refining user prompts, selecting appropriate data sources, and validating
retrieved content could further increase robustness. The ability to instantiate and monitor robotic
processes opens pathways toward adaptive scheduling, predictive diagnostics, and seamless
humanmachine collaboration. These extensions would support real-time decision-making and broaden the
accessibility of robotic workflows in construction.</p>
      </sec>
      <sec id="sec-5-4">
        <title>Risks and Safety Considerations</title>
        <p>Integrating AI-based interaction into operational construction systems introduces serious safety and
security concerns. While the current implementation enforces strict limitations on executable actions,
the potential for voice or text-based commands to trigger real-world machine operations necessitates
rigorous safeguards. Misinterpretations, unauthorized access, or malicious prompts could result in
hazardous scenarios. Future iterations must implement multi-layer authentication, role-based
permissions, and validation checkpoints before executing any critical command.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        This work presents an approach for improving access to Linked Building Data through a
RetrievalAugmented Generation (RAG) architecture powered by a locally hosted large language model (LLM).
The system enables natural language interaction with complex structured and real-time data, lowering
the barrier to entry for non-expert users in construction. By embedding a conversational interface into
a 3D viewer and integrating prebuilt queries, the system supports both robust information retrieval and
the generation of autonomous workflows, such as robotic material transport. While the pipeline
approach ensures stability and safety, challenges remain regarding query flexibility, input robustness, and
system scalability. Future work will focus on improving query interpretation, expanding support for
dynamic instruction allocation, and refining safety mechanisms for interacting with physical systems.
Recent developments in agent-based architectures [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] and domain-specific language modeling ofer
promising directions. This study provides a foundation for more intuitive access to LBD, setting the
stage for AI-augmented digital twins, real-time construction data augmentation, and semi-autonomous
construction site management systems.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work is part of the EConoM research project funded by the Federal Ministry for Digital and
Transport of Germany within the initiative InnoNT (funding number 19Ol22009F). It was supported within
the TARGET-X framework, a project funded by the Smart Networks and Services Joint Undertaking
(SNS JU) under Horizon Europe (funding number 101096614). The authors are responsible for the
content.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used a local chatbot based on Mistral Nemo 12b and
Deepl for spelling checks, translations and paraphrasing for shortening the text. After using these tools,
the authors reviewed and edited the content as needed and take full responsibility for the publication’s
content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pauwels</surname>
          </string-name>
          ,
          <article-title>Supporting decision-making in the building life-cycle using linked building data</article-title>
          ,
          <source>Buildings</source>
          <volume>4</volume>
          (
          <year>2014</year>
          )
          <fpage>549</fpage>
          -
          <lpage>579</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fottner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Clauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hormes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Beinke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Overmeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gottwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Elbert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sarnow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-B. Reith</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Zadek</surname>
          </string-name>
          , F. Thomas,
          <article-title>Autonomous systems in intralogistics - state of the art and future research challenges</article-title>
          ,
          <source>Logistics Research</source>
          <volume>14</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .23773/
          <year>2021</year>
          _2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Oraskari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kirner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zöcklein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brell-Cokcan</surname>
          </string-name>
          ,
          <article-title>Towards human-machine collaboration in autonomous material handling on construction sites</article-title>
          ,
          <source>Human-Machine Communication</source>
          <volume>9</volume>
          (
          <year>2024</year>
          )
          <fpage>189</fpage>
          -
          <lpage>213</lpage>
          . doi:
          <volume>10</volume>
          .30658/hmc.9.11.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bonduel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pauwels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vergauwen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <article-title>Including widespread geometry formats in semantic graphs using rdf literals</article-title>
          ,
          <source>in: Proceedings of the 2019 European Conference on Computing in Construction, Chania, Greece</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>341</fpage>
          -
          <lpage>350</lpage>
          . URL: https://ec-3.org/ publications/conference/paper/?id=EC32019_
          <fpage>166</fpage>
          . doi:
          <volume>10</volume>
          .35490/EC3.
          <year>2019</year>
          .
          <volume>166</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Oraskari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bonduel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>McGlinn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pauwels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Priyatna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kukkonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Steyskaland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehtonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lefrançois</surname>
          </string-name>
          ,
          <source>Ifctolbd: Ifctolbd v 2.44.0</source>
          ,
          <year>2024</year>
          . URL: https://github.com/ jyrkioraskari/IFCtoLBD.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          , et al.,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Trimborn</surname>
          </string-name>
          ,
          <article-title>Enhancing Project Knowledge Management in Construction: Integrating Generative AI and Large Language Models, Master's thesis</article-title>
          , TUM School of Engineering and Design, Technical University of Munich,
          <year>2023</year>
          . URL: https://mediatum.ub.tum.de/doc/1755185/document.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.-Q.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>A natural-language-based approach to intelligent data retrieval and representation for cloud bim</article-title>
          ,
          <source>Computer-Aided Civil and Infrastructure Engineering</source>
          <volume>31</volume>
          (
          <year>2016</year>
          )
          <fpage>18</fpage>
          -
          <lpage>33</lpage>
          . doi:
          <volume>10</volume>
          .1111/mice.12151.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>M. MHolten Rasmussen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Schlachter</surname>
          </string-name>
          , Ld-bim, https://ld-bim.
          <source>web.app/</source>
          ,
          <year>2023</year>
          .
          <source>Accessed: 07 April</source>
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>C. M. Zheng</surname>
            ,
            <given-names>Y. Q.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
          </string-name>
          ,
          <article-title>A knowledge graph modeling approach for augmenting language model-based contract risk identification</article-title>
          ,
          <source>in: Proceedings of the 2024 European Conference on Computing in Construction, European Council on Computing in Construction, Chania, Greece</source>
          ,
          <year>2024</year>
          . URL: https://ec-3.org/publications/conference/paper/?id=EC32024_
          <fpage>178</fpage>
          . doi:
          <volume>10</volume>
          . 35490/EC3.
          <year>2024</year>
          .
          <volume>178</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Pu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Autorepo: A general framework for multimodal llm-based automated construction reporting</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>255</volume>
          (
          <year>2024</year>
          )
          <fpage>124601</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cruz-Castro</surname>
          </string-name>
          , G. Castelblanco,
          <string-name>
            <given-names>P.</given-names>
            <surname>Antonenko</surname>
          </string-name>
          ,
          <article-title>Llm-based system for technical writing real-time review in urban construction and technology</article-title>
          ,
          <source>Proceedings of 60th Annual Associated Schools</source>
          <volume>5</volume>
          (
          <year>2024</year>
          )
          <fpage>130</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Uhm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jeong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Eficacy of retrieval augmented generation-based large language models for generating construction safety information</article-title>
          ,
          <source>SSRN Electronic Journal</source>
          (
          <year>2024</year>
          ). URL: https://ssrn.com/abstract=4819837. doi:
          <volume>10</volume>
          .2139/ssrn.4819837.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ceglinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ashok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Behounek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Peroyea</surname>
          </string-name>
          , T. Thetford,
          <article-title>Applications of large language models in well construction planning and real-time operation</article-title>
          , in: SPE/IADC Drilling Conference and Exhibition,
          <string-name>
            <surname>SPE</surname>
          </string-name>
          ,
          <year>2024</year>
          , p.
          <fpage>D021S014R003</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>O.</given-names>
            <surname>Team</surname>
          </string-name>
          , Ollama:
          <article-title>Open-source large language model serving platform</article-title>
          , https://ollama.com,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -02-11.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , E. Hanson, G. Li,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Papakonstantinou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Simhadri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xie</surname>
          </string-name>
          , Vector databases:
          <article-title>What's really new and what's next?(vldb 2024 panel)</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>17</volume>
          (
          <year>2024</year>
          )
          <fpage>4505</fpage>
          -
          <lpage>4506</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Krippner</surname>
          </string-name>
          ,
          <article-title>Rethinking vector embeddings search for analytical database systems</article-title>
          ,
          <year>2024</year>
          . URL: https://homepages.cwi.nl/~boncz/msc/2024-ElenaKrippner.pdf,
          <source>master's thesis</source>
          ,
          <source>Institut für Software &amp; Systems Engineering</source>
          , Augsburg.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovriguina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Teucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Radyush</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mouromtsev</surname>
          </string-name>
          , Sparqlgen:
          <article-title>One-shot prompt-based approach for sparql query generation</article-title>
          .,
          <source>in: SEMANTiCS (Posters &amp; Demos)</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Contributors</surname>
          </string-name>
          , Langchain: Building applications with llms,
          <year>2025</year>
          . URL: https://github.com/ hwchase17/langchain, accessed:
          <fpage>2025</fpage>
          -04-07.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Grattafiori</surname>
          </string-name>
          et al.,
          <source>The llama 3 herd of models, arXiv preprint arXiv:2407.21783</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          . 48550/arXiv.2407.21783.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21] NVIDIA and
          <string-name>
            <surname>Mistral</surname>
            <given-names>AI</given-names>
          </string-name>
          ,
          <article-title>Mistral nemo 12b base model</article-title>
          ,
          <source>Hugging Face</source>
          ,
          <year>2024</year>
          . URL: https:// huggingface.co/nvidia/Mistral-NeMo
          <string-name>
            <surname>-</surname>
          </string-name>
          12B-Base,
          <year>accessed 2025</year>
          -
          <volume>02</volume>
          -11.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <article-title>International Organization for Standardization, Data elements and interchange formats - information interchange - representation of dates and times, 2019</article-title>
          . ISO 8601-1:
          <fpage>2019</fpage>
          , Geneva, Switzerland.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Owen</surname>
          </string-name>
          ,
          <article-title>Ai and robotics: Transforming automation and labor productivity in the construction industry</article-title>
          ,
          <source>EasyChair Preprint 15086, EasyChair</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kirner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Oraskari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zöcklein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brell-Cokcan</surname>
          </string-name>
          ,
          <article-title>An ontology for signal strength estimation of nomadic 5g networks on construction sites</article-title>
          ,
          <source>in: Proceedings of the 2024 European Conference on Computing in Construction, European Council on Computing in Construction, Chania, Greece</source>
          ,
          <year>2024</year>
          . URL: https://ec-3.org/publications/conference/paper/?id=EC32024_
          <fpage>202</fpage>
          . doi:
          <volume>10</volume>
          . 35490/EC3.
          <year>2024</year>
          .
          <volume>202</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ehtesham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , T. T. Khoei,
          <article-title>Agentic retrieval-augmented generation: A survey on agentic rag</article-title>
          ,
          <source>arXiv preprint arXiv:2501.09136</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>