<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>O. Vyshnevskyy);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Combined Large Language Models and Ontology Approach for Energy Consumption Analysis Software⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Vyshnevskyy</string-name>
          <email>oleksandr.k.vyshnevskyi@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liubov Zhuravchak</string-name>
          <email>liubov.m.zhuravchak@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>Bandery street 12, Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>00</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The paper explores the potential of leveraging Large Language Models for human interaction with a digital twin representing a building's energy consumption system. A novel method is proposed for constructing a digital representation of building energy consumption. This approach integrates a previously developed domain ontology, a graph database, and Large Language Models to enable intuitive, natural language access to energy performance data via a knowledge-driven interface. Within the scope of this research, a domain-specific knowledge base was developed using the Neo4j graph database and an ontological framework. This knowledge base encodes relationships between buildings, meters, consumption indicators, weather stations, and climate data. A prototype chatbot was implemented using the LangChain framework, an agent-based architecture, and Retrieval-Augmented Generation. The study evaluates the ability of the Gpt-4o-mini model to interpret energy-related queries, leverage semantic relations within the knowledge graph, generate Cypher queries for Neo4j, execute them, and provide context-aware responses based on custom-designed agent prompts. The proposed ontology-based framework for energy analysis formalizes domain knowledge into a machinereadable structure, enhancing human understanding and interaction through natural language interfaces. This facilitates more informed planning and decision-making. The results confirm that digital twins can support system management by detecting anomalies, forecasting trends, and identifying operational issues.</p>
      </abstract>
      <kwd-group>
        <kwd>Retrieval Augmented Generation</kwd>
        <kwd>Large Language Model</kwd>
        <kwd>ontology</kwd>
        <kwd>knowledge graph</kwd>
        <kwd>energy1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Ontologies have become foundational in
modeling
processes across various industries,
fundamentally transforming the</p>
      <p>
        way physical systems are described. Through structured
representation, ontologies simplify interaction, data integration, and automation of various building
management tasks. Semantic ontologies provide a formalized structure for representing knowledge
in a machine-readable format, enabling the structured description of complex systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the field
of building energy consumption, the implementation of ontologies has changed the way processes
are modeled in building systems, ensuring a standardized interoperable language. The complexity of
learning involved in developing semantic models and queries remains a significant challenge due to
the labor-intensive process and the need for specialized expertise. Semantic models simplify the
development and deployment of applications by providing a standardized structure to understand
spatial and functional relationships between equipment, allowing generalizations to detect faults or
diagnose and optimize control systems. Some researchers have utilized time series measurements for
more accurate classification of specific sensor data. These approaches work well for determining the
class of each sensor and providing information on the relationships between sensors and associated
equipment. However, much less research has been done on data-driven approaches to determining
spatial and functional relationships between equipment to provide contextual information for
control elements and analytics.
      </p>
      <p>Generative Artificial Intelligence refers to an uncontrolled or partially controlled machine
learning structure that creates content using statistical data derived from training on existing digital
content (e.g., text, video, images, and audio). A Large Language Model (LLM) is a statistical
mathematical model of token distribution in a large publicly available volume of human-created text,
which, after training, can generate human-like language. A Generative Pre-trained Transformer
(GPT) is a system based on LLM, designed to generate or statistically predict sequences of words,
code, or other data, starting from input known as a prompt. GPT is based on transformer architecture,
which processes large volumes of publicly available data in parallel. Recent studies have shown that
the potential of LLMs can be utilized to create ontologies from unstructured text, extend existing
ontologies to represent new concepts, and validate ontologies.</p>
      <p>The Retrieval-Augmented Generation (RAG) approach has proven promising for enhancing LLM
performance in domain-specific tasks. RAG combines the broad knowledge of pre-trained language
models with the ability to retrieve and integrate relevant external information. In building semantic
models, RAG offers advantages such as improved accuracy, domain adaptation, inclusion of
up-todate knowledge, and a reduction in "hallucinations" (responses containing factually incorrect
information). However, its success depends on the relevance of the retrieved information,
maintaining coherence, and contextual understanding in ontologies. Intellectual chatbots based on
LLMs have gained tremendous popularity worldwide for a wide range of industrial applications,
especially since the end of 2022.</p>
      <p>
        Before the release of ChatGPT with integrated LLM, conversational chatbots were already widely
used in various sectors such as education, manufacturing, healthcare, and government services.
There are three main types of conversational chatbots: rule-based chatbots, live chatbots, and basic
AI-based intelligent chatbots. The first two types interact by following predefined rules and,
accordingly, involve chatbot software and human conversations to provide customer services. The
third type, basic AI-based chatbots, facilitate communication beyond predefined commands without
human interaction, laying the foundation for advanced intelligent chatbots. LLM-based intelligent
chatbots, including ChatGPT, have an immense number of model parameters [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        LLMs based on GPT architecture have shown considerable promise in interpreting energy sector
tasks using input data in natural language format [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This study explored the capabilities and
limitations of LLMs in applying them to the electricity sector. The effectiveness of LLMs in answering
general energy system queries, generating code, and analyzing data was discussed. Additionally,
through the RAG approach, LLMs can serve as knowledge bases for documentation and assist in tasks
such as training operators. The multimodal capabilities of LLMs can be beneficial for diagnosing
equipment malfunctions and remote monitoring. LLMs have demonstrated strong capabilities in
detecting correlations between objects (text, images, data), but they are still weak in solving problems
related to physics, which typically involve complex mathematical principles.
      </p>
      <p>The aim of the work is to model the building energy consumption system using a digital twin
based on an ontological approach, a graph database, and LLMs. This will increase the energy
efficiency of the building and provide an easily accessible interface for direct communication
between the user and the knowledge base.</p>
      <p>The object of the study is the energy consumption of buildings.</p>
      <p>The subject of the study is the ontological approach and the agent-based method of retrieval
augmented generation using LLMs.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        To achieve the goals of limiting global warming, it is necessary to reduce energy consumption in
buildings, which account for about 30% of global greenhouse gas emissions. Evaluating energy
consumption during building design using performance simulations is crucial. However, manually
enriching missing semantic information is still a very labor-intensive process. In the study [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a new
methodology was proposed for automatically enriching missing information using Semantic Textual
Similarity and improved tuning of LLM. The authors matched room types and structures with
missing thermal properties using the most semantically similar pairs in the BIM model and
corresponding databases. Three practical examples were used for the improved tuning of LLM.
Various enhanced tuning strategies (using different loss functions, adding opposite pairs of words, or
domain-related abbreviations) significantly increased matching accuracy.
      </p>
      <p>
        The work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] explores how LLMs can assist in solving problems related to the development of
semantic models, focusing on their role in query construction for semantic models, specifically using
the Brick Schema ontology. It is noted that although much effort has been made to capture the
nuances of system construction, tools that enable building managers and application developers to
efficiently create these models and make queries have not been properly developed. Therefore, the
lack of necessary, user-friendly tools limits users with advanced knowledge of programming and
information systems, as they also need to have a deep understanding of energy consumption systems,
their components, and modeling options.
      </p>
      <p>
        Query generation allows users to obtain structured information from the semantic model, which
is crucial for fault detection in diagnostics and analytics. LLM studies have shown the potential for
creating SPARQL (Protocol and RDF Query Language) queries with minimal training, offering
greater adaptability across domains. A method called SGPT (SPARQL GPT) was introduced, which
bypasses the need for manual SPARQL creation by using LLMs to learn graph patterns and generate
queries. Based on this, the SPARQLGEN (SPARQL Generation) approach used GPT-3 in a one-shot
SPARQL generation structure, where providing relevant context significantly improved the quality
of generated queries [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Existing research indicates that while the use of LLMs in building energy efficiency remains
relatively unexplored, interest in this domain is growing quickly, with a rising number of studies
emerging. This work reflects the approach developed by the authors, using modern tools, including
ontologies [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and LLM, for creating semantic queries for digital building models.
      </p>
      <p>
        Study [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] explored how LLMs can acquire domain-specific knowledge in Heating, Ventilation, and
Air Conditioning (HVAC). The results showed that LLMs possess sufficient understanding and
expertise to help users successfully complete the ASHRAE Certified HVAC Designer exam. The
models demonstrated human-level performance in tackling complex problems, reasoning, and
learning, similar to skilled HVAC professionals.. Three key knowledge properties were examined:
recall, analysis, and application – on twelve typical models. Additionally, the GPT-3.5 model passed
the exam twice out of five attempts. This demonstrated that some LLMs, such as GPT-4 and GPT-3.5,
have great potential for assisting or replacing humans in the design and operation of HVAC systems.
      </p>
      <p>
        In the work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the potential of generative pre-trained transformers in building energy
management based on data analysis was investigated. The potential of the GPT-4 model was explored
in three energy consumption data analysis scenarios for buildings: energy load forecasting, fault
diagnosis, and anomaly detection. An approach was proposed to assess the performance of GPT-4’s
capabilities in creating software code for energy load forecasting, device fault diagnosis, and
detecting patterns of system anomaly behavior. It was shown that GPT-4 can automatically solve
most data analysis tasks in the HVAC field.
      </p>
      <p>The work [10] presents a study of a prototype virtual university support agent, built on an LLM
model, to resolve queries from students, teachers, and staff. The integration of generative artificial
intelligence and the natural language features inherent in LLMs was explored to overcome customer
service shortcomings. As a result, the university support agent provided a viable
question-andanswer interface for students, teachers, and administrators to learn about the university’s guidelines
and policies.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>The research was conducted on a laptop with the following characteristics: an 8-core AMD Ryzen 7
6800H 3.20 GHz processor and a 12-core AMD Radeon 680M 2200 MHz graphics processor. A Neo4j
graph database, PyCharm development environment, and Python programming language with
LangChain, FastApi, and Streamlit libraries were used.</p>
      <p>
        The ontology, previously developed in our work [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], describes a range of energy and construction
terms and relationships. Ontology is a form of knowledge representation in a domain model that
makes data smarter, transfers part of the program logic, and generates new data.
      </p>
      <p>Let us consider the relationship between ontologies and graph databases. As of 2025, according to
[11], Neo4j holds the top position in popularity among other graph data stores. Neo4j implements
following ontology concepts: interoperability (defining a shared vocabulary) and logical inference
(inferencing), which is the result of knowing fragments of the vocabulary.</p>
      <sec id="sec-3-1">
        <title>3.1. Neo4j graph database and ontology</title>
        <p>Neo4j is a native graph database that implements a true graph model all the way to the storage level.
It uses Cypher query language, designed specifically for working with graphs. Since 2007, it has
grown into a versatile platform with support for multiple programming languages and frameworks.
Its ecosystem includes the Graph Data Science library, which enables advanced analytics and
machine learning on graph data. It has the cloud service Neo4j AuraDB and can be run in a Docker
container or a Kubernetes cluster. In this study, we worked with a local deployment and used Neo4j
Desktop version 1.6.1.</p>
        <p>Neo4j can load and write an ontology in RDF format. Therefore, as shown in Figure 1:
Representation of the OWL ontology in the Neo4j knowledge graph database the ontology we
developed can be loaded into the Neo4j knowledge base using the Neosemantics (n10s) extension.
Until now, reasoning in RDF and OWL (Web Ontology Language) has been applied only to
fullfledged triple stores or special reasoning mechanisms. Neo4j can be enhanced with reasoning
capabilities for RDF data, even though it isn’t a triple store and doesn’t support SPARQL directly.
Instead, it uses the Cypher language, and many SPARQL queries can be converted into Cypher for
semantic querying. From the user's perspective, Neo4j combines two technologies into one platform:
a transactional analytical graph system and an RDF/OWL reasoning mechanism capable of providing
complex semantic Cypher queries through the materialized graph in Neo4j. Studies have shown that
obtaining all derived facts (so-called materialization) using Neo4j demonstrates a much lower
reasoning time growth rate as the amount of data increases compared to any other system. Therefore,
materialization is the key to efficient real-world queries [12]. Without prior materialization, a triple
store with reasoning must temporarily fetch all answers and relevant facts for each individual query
on demand.</p>
        <p>
          According to the ontology we developed earlier [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], the following nodes and relationships were
created in the graph database. The Customer (tenant) is connected to the Building via the HAS
relationship, the Building is connected to the Apartment via the LOCATED_AT relationship, and the
Meter is connected to the Apartment via the MEASURES relationship. A sample of the relationships
between a building, the apartments it contains, and the meters that measure their energy
consumption is shown in Figure 2. Here, a Cypher query is provided to find all meters that take
measurements for all apartments located in a building.
        </p>
        <p>The Meter node is connected to the Consumption node via the HAS_CONSUMPTION
relationship and to the MeterReading node via the HAS_READING relationship. Since the process of
collecting meter readings is continuous, with specific intervals (e.g., new values are read every hour),
it is important to semantically represent the sequential connection of readings: the MeterReading
node at time t is connected to the MeterReading node at time t+1 via the NEXT relationship.
Therefore, the problem of time series becomes a graph problem and can be effectively solved using
existing graph algorithms. Since consumption per unit of time is the difference between the meter
reading at time t and at time t+1, this is reflected in the semantic relationship of the readings. The
Consumption node at time t is connected to the MeterReading node at time t via the
START_READING relationship and to the MeterReading node at time t+1 via the STOP_READING
relationship. Consumption is also a sequential process, so the Consumption node at time t is
connected to the Consumption node at time t+1 via the NEXT relationship. A sample of sequential
relationships between meter readings and consumption over several consecutive hours is shown in
Figure 3.</p>
        <p>Node WeatherStation (weather forecast station) is connected to the node Temperature (ambient
temperature) by the HAS_TEMPERATURE relationship. Weather station readings are a continuous
process with a specific interval (e.g., new values are received every 1 day), so the node Temperature
at time t is connected by the NEXT relationship to the next node Temperature at time t+1. A sample
of sequential connections between ambient temperature readings over several consecutive days is
shown in Figure 4. The nodes Consumption, MeterReading, and Temperature have an index for the
timepoint field to speed up search.</p>
        <p>Storing data in a graph structure has the following advantages:</p>
        <sec id="sec-3-1-1">
          <title>1. Natural and simple modeling of real-world relationships without complex joins.</title>
          <p>2. Efficient processing of connected data via graph traversal.
3. Flexible structure.
4. Faster performance for complex, multi-relationship queries compared to relational databases.
5. Pattern discovery – support for pattern matching queries facilitates the search for specific
structures in data.</p>
          <p>Graph databases simplify design and querying for complex data by avoiding intricate table joins.
Their flexible structure also helps LLMs generate more accurate queries with fewer errors.Compared
to relational databases, LLMs must handle schema knowledge and relationships between keys, which
can increase errors in generating SQL queries.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Retrieval Augmented Generation</title>
        <p>This study utilized an ontological approach to build a knowledge model for the domain of energy
consumption in buildings. An agent-based Retrieval Augmented Generation method was applied,
and a set of tools was developed to interact with the knowledge base and the end-user, using LLMs.
The diagram shown in Figure 6 illustrates the working principle of the developed chatbot application
and demonstrates how a user query in natural language is transformed into a Cypher query
understandable by the Neo4j graph database; the execution result is passed to the agent, which then
returns the final response to the user. The GPT-4o-mini model from OpenAI was used, which
efficiently solves tasks due to its low cost and latency.</p>
        <sec id="sec-3-2-1">
          <title>Here is a brief description of each component: 1. 2. 3.</title>
          <p>LangChain agent –upon receiving a user request, the agent decides which tool to call and
what input data to provide to the tool. The agent then monitors the tool's response and
decides what to return to the user (agent response).</p>
          <p>Neo4j graph database stores structured data about buildings, apartments, residents, meters,
and consumption.</p>
          <p>LangChain Neo4j Cypher Chain tool translates user queries into Cypher, executes them
on the Neo4j database, and returns the results through a question-answering chain using the
GraphCypherQAChain class..</p>
          <p>Tool function with building name parameter is used when the LangChain agent
successfully extracts the building name from the user query. The building name is passed as
an input parameter for a Python function containing logic specific to the building. For
example, it could make a request to an external resource providing metadata about the
building, and this result is returned to the agent.</p>
          <p>Tool function with meter name parameter is used when the LangChain agent
successfully extracts the meter name from the user query. The meter name is passed as an
input parameter for a Python function that provides a consumption forecast for that specific
meter for the next 24 hours, and the result is returned to the agent.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>The open source code for developed building energy agent is provided in Figure 6.</title>
          <p>For communication with the agent, an API endpoint with the POST method is added. It accepts a
single parameter with the user’s question, runs the agent, and returns the result from the LLM using
one of the defined tools.FastAPI, a Python web framework, was used to build the API. Figure 7 shows
the documentation page for the developed API methods.</p>
          <p>To create a graphical web interface for interaction with the end user, the Streamlit library was
used. The deployment of the developed application was implemented using the Docker Compose
container manager with two services: one for the API and one for the graphical interface. The user
interface accepts the user’s queries and sends a POST request to the agent’s API endpoint. The user is
also provided with an explanation of how the agent generated the response. This can be used during
testing to check if the agent invoked the correct tool and provided the correct response. The result of
searching for data about buildings and meters in the developed application is shown in Figure 8,
where the user asked several questions: “Which meters were recently connected and when exactly?”,
“Which meters are connected to the building {Building Name}?”, “Which building is the oldest and
how many years old is it?”.</p>
          <p>Details of the query 'Which building is the oldest and how old is it?' can be seen in the API service
console. It is noted that the agent selected the correct tool, GraphCypherQAChain, and generated the
Cypher query shown in Figure 9, which, after execution, returned the value '34'. This value was then
passed to the agent, which formulated the response: 'The oldest building is {Building Name}, and it is
34 years old,' with the agent knowing that the unit of measurement is indeed a year, not a day or an
hour, due to the generated prompts.</p>
          <p>Let's consider the query 'Are there anomalies for the meter {MeterName}?'. In the API service
console, we can see that the agent, using the GraphCypherQAChain tool, generated the Cypher
query shown in Figure 10. This query checks if there is any consumption where the ratio of the
difference between consecutive meter readings to the time difference is greater than 10. This query to
the Neo4j knowledge base returned an empty array '[]', which was passed to the agent, which then
formulated the final response: 'No anomalies found for the meter {MeterName}'. It should be noted
that there were no predefined prompts for anomaly detection; the GraphCypherQAChain chain
automatically generated this Cypher query based on the structure of the built knowledge base.
Thanks to modifications in the prompts, an instruction can be added to define an anomalous value
differently, considering the characteristics of the meter type. This way, different parameters will be
applied for meters of different utilities.</p>
          <p>Consider the query “What is the consumption profile of the meter {MeterName} during the
summer period?”. In the API service console, we can see that the agent, using the
GraphCypherQAChain tool, generated the Cypher query shown in Figure 11. This query searches for
the meter consumption where the date falls within the range from June 1st to September 1st. This
query to the Neo4j knowledge base returned an array of values, which was passed to the agent, which
then generated the final response in the form of a list of consumption values for specific dates. It
should be noted that no prior prompts were provided to generate the consumption profile, and the
GraphCypherQAChain chain automatically generated this Cypher query based on the structure of
the constructed knowledge base, automatically filtering the data and establishing the condition for
the beginning and end of the summer period. Thanks to prompt modifications, it is possible to add
instructions to display the date in the format "2023-08-01 04:00:00" (instead of the epoch format) and
to ensure that electricity consumption is displayed in kilowatts.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>Small and medium enterprises currently lack an integrated system that combines data from various
media sources into a centralized information system, limiting their ability to effectively navigate
energy sustainability initiatives according to [13]. To address this gap, they introduced Energy
Chatbot, a system using LLM integrated with a multi-source search add-on. This system covers
various media sources, including news, government reports, industry publications, scientific
research, and social networks. The chatbot delivers real-time data, assisting in identifying long-term
sustainability plans and preserving a competitive edge in the evolving energy industry. This
approach reduces costs through the use of open-source models. The Energy Chatbot provides access
to up-to-date information, allowing the identification of long-term sustainability strategies and
maintaining a competitive advantage in the evolving energy sector. The authors selected Llama
models (Llama2, Llama3, and Llama3.1) developed by Meta, which are updated versions trained on
large datasets and known for their high performance compared to other open-source models.</p>
      <p>The work [14] investigates the improvement of chatbot resources for medical services using LLM,
effective boundary-efficient fine-tuning, and effective information processing strategies. For the
chatbot structure development, the Lang Chain library and a two-dimensional data search
methodology were used to engage the chatbot with a large medical service information repository,
including record stacking and piecing close using vector stores. BLEU and ROUGE metrics, based on
accuracy evaluations, customer satisfaction research, and clinical expert assessments, were used. The
study showed promising results regarding the chatbot’s potential as an important tool for training
patients and internal organization resources. It was noted that unlocking LLM potential for chatbot
development requires careful consideration of resource constraints, particularly in environments
such as Google Colab. The TinyLlama project focuses on creating compact and efficient LLMs by
pretraining them with 1.1 billion parameters. Attention mechanisms allow LLM to selectively focus
on relevant parts of the input sequence, capturing long-term dependencies and improving context
understanding. The model stack and memory were optimized by applying quantization, deactivation
of reserve storage, and thoughtful tokeniser configuration. These methods enable the development of
effective chatbots even with limited computational resources.</p>
      <p>The agent helps users understand what energy optimization procedures can be created and
applied to make their household appliances more eco-friendly, reduce overall energy consumption,
and simplify routine tasks through smart technologies. In the work [15], the development and
implementation of the GreenIFTTT (Green If This Then That) application, based on the GPT4 model,
for creating and controlling home automation procedures was presented. The system focuses on
creating a sequence of automation procedures in the home environment based on the sequential
execution of certain actions triggered by various conditions. The main interaction paradigm is
conversation, using LLM capabilities to help users find and control their smart devices. The system
also integrates data from connected sensors, providing users with real-time information about their
daily activities.</p>
      <p>The work [16] proposes an approach that utilizes the unique capabilities of LLM for generating
SQL code for data transformation based on prompts with domain knowledge and historical data
templates. The SQLMorpher library was developed, demonstrating the effectiveness of LLM in
complex tasks related to specific domains, emphasizing their potential to drive sustainable solutions
in energy efficiency.</p>
      <p>The potential use of augmented search generation technology to answer questions regarding
electricity consumption measurement in buildings, considering aspects of energy digital twin based
on domain knowledge was investigated in [17][18]. The authors used ChatGPT, Gemini, and Llama
models to answer questions based on a knowledge graph concerning the building's electricity
consumption. Their knowledge graph was created in RDF (Resource Description Framework) format,
stored in a Blazegraph database, and can accept queries via the SPARQL language. The authors
compared answers generated by LLM and RAG methods using the existing digital twin based on
electricity knowledge. Their conclusions showed that the RAG approach not only reduces the
amount of incorrect information generated by LLM but also significantly improves the quality of the
result, justifying answers with verifiable data.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The research conducted in this paper highlights the significant potential of using large language
models for human interaction with building energy consumption systems by building a digital twin
of the system based on an ontological approach.</p>
      <p>Through experimental investigation, it was explored how LLMs can help transform a human
language query about energy consumption into an appropriate Cypher query to a knowledge base.
By learning from a massive dataset, LLMs have the potential to generalize across various domains
without requiring extensive domain-specific training. The application of a hybrid approach with
retrieval augmented generation and ontology allowed for improved query generation accuracy by
performing a prior search for relevant information using semantic relationships in the constructed
Neo4j knowledge base schema.</p>
      <p>The approach using the proposed ontological model for building a knowledge base about
buildings has shown that it can be an effective tool for interacting with LLMs due to its formalized
structure, suitable for machine reading, and system description for representing knowledge about
domain-specific features. The interactive interface makes these approaches more adaptive and
accessible for people since they are not limited by predefined rules or models, which allows for
broader application in the field of buildings and energy. By utilizing LLMs, dependence on specialized
knowledge can be reduced, and the creation of models based on ontology promotes their widespread
use by end users.</p>
      <p>Future research may focus on applying fine-tuning approaches for compact and efficient LLM
models such as TinyLlama, which will save costs and provide the possibility of using these models for
embedded devices in offline mode.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used X-GPT-4 in order to: Grammar and spelling
check. After using this tool, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.
[10] J. B. Ilagan, J. R. Ilagan, A prototype of a conversational virtual university support agent
powered by a large language model that addresses inquiries about policies in the student
handbook, Procedia Computer Science 239 (2024) 1124–1131. doi:10.1016/j.procs.2024.06.278.
[11] Historical trend of graph DBMS popularity, DB-Engines – Knowledge Base of Relational and
NoSQL Database Management Systems. URL:
https://db-engines.com/en/ranking_trend/graph+dbms.
[12] T. Liebig, GraphScale: Adding expressive reasoning to semantic data stores, in: T. Liebig, V.</p>
      <p>Vialard, M. Opitz, S. Metzl, Proceedings of the 14th International Semantic Web Conference
(ISWC 2015), vol. 1486, 2015. URL: https://ceur-ws.org/Vol-1486/paper_117.pdf
[13] M. Arslan, L. Mahdjoubi, S. Munawar, Driving sustainable energy transitions with a
multisource RAG-LLM system, Energy and Buildings 324 (2024) 114827.
doi:10.1016/j.enbuild.2024.114827.
[14] S. Vidivelli, M. Ramachandran, A. Dharunbalaji, Efficiency-driven custom chatbot development:
Unleashing LangChain, RAG, and performance-optimized LLM fusion, Computers, Materials &amp;
Continua (2024) 1–10. doi:10.32604/cmc.2024.054360.
[15] M. Giudici et al., Designing home automation routines using an LLM-based chatbot, Designs 8
(2024) 43. doi:10.3390/designs8030043.
[16] A. Sharma et al., Automatic data transformation using large language model – An experimental
study on building energy data, in: Proceedings of the 2023 IEEE International Conference on Big
Data (BigData), Sorrento, Italy, 15–18 December 2023, IEEE, 2023.
doi:10.1109/bigdata59044.2023.10386931.
[17] C. Fortuna, V. Hanžel, B. Bertalanič, Natural language interaction with a household electricity
knowledge-based digital twin, in: Proceedings of the 2024 IEEE International Conference on
Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm),
Oslo, Norway, 17–20 September 2024, IEEE, 2024, pp. 8–14.
doi:10.1109/smartgridcomm60555.2024.10738062.
[18] C. Fortuna, V. Hanžel, B. Bertalanič, Towards data-driven electricity management: Multi-region
harmonized data and knowledge graph, 2024. URL: http://arxiv.org/abs/2405.18869.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Vyshnevskyy</surname>
          </string-name>
          , L. Zhuravchak,
          <article-title>Semantic models for buildings energy management</article-title>
          ,
          <source>in: Proceedings of the 2023 IEEE 18th International Conference on Computer Science and Information Technologies (CSIT)</source>
          , Lviv, Ukraine,
          <fpage>19</fpage>
          -21
          <source>October</source>
          <year>2023</year>
          , IEEE,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1109/csit61576.
          <year>2023</year>
          .
          <volume>10324108</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jiang</surname>
          </string-name>
          et al.,
          <article-title>Preventing the immense increase in the life-cycle energy and carbon footprints of LLM-powered intelligent chatbots</article-title>
          ,
          <source>Engineering</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1016/j.eng.
          <year>2024</year>
          .
          <volume>04</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumder</surname>
          </string-name>
          et al.,
          <article-title>Exploring the capabilities and limitations of large language models in the electric energy sector</article-title>
          ,
          <source>Joule</source>
          <volume>8</volume>
          (
          <year>2024</year>
          )
          <fpage>1544</fpage>
          -
          <lpage>1549</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.joule.
          <year>2024</year>
          .
          <volume>05</volume>
          .009.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Forth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Borrmann</surname>
          </string-name>
          ,
          <article-title>Semantic enrichment for BIM-based building energy performance simulations using semantic textual similarity and fine-tuning multilingual LLM, Journal of Building Engineering (</article-title>
          <year>2024</year>
          )
          <article-title>110312</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.jobe.
          <year>2024</year>
          .
          <volume>110312</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O. B.</given-names>
            <surname>Mulayim</surname>
          </string-name>
          et al.,
          <article-title>Large language models for the creation and use of semantic ontologies in buildings: Requirements and challenges</article-title>
          ,
          <source>in: Proceedings of the 11th ACM International Conference on Systems for Energy-Efficient Buildings</source>
          , Cities, and
          <string-name>
            <surname>Transportation</surname>
          </string-name>
          (BuildSys '24), Hangzhou, China, ACM Press, New York, NY,
          <year>2024</year>
          , pp.
          <fpage>312</fpage>
          -
          <lpage>317</lpage>
          . doi:
          <volume>10</volume>
          .1145/3671127.3698792.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C. V. S.</given-names>
            <surname>Avila</surname>
          </string-name>
          et al.,
          <article-title>Experiments with text-to-SPARQL based on ChatGPT</article-title>
          ,
          <source>in: Proceedings of the 2024 IEEE 18th International Conference on Semantic Computing (ICSC)</source>
          , Laguna Hills, CA, USA,
          <fpage>5</fpage>
          -7
          <source>February</source>
          <year>2024</year>
          , IEEE,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1109/icsc59802.
          <year>2024</year>
          .
          <volume>00050</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Vyshnevskyy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhuravchak</surname>
          </string-name>
          ,
          <article-title>Machine learning methods to increase the energy efficiency of buildings, Visnik Nacionalnogo universitetu "Lvivska politehnika"</article-title>
          .
          <source>Ser. Informacijni sistemi ta merezi 14</source>
          (
          <year>2023</year>
          )
          <fpage>189</fpage>
          -
          <lpage>209</lpage>
          . doi:
          <volume>10</volume>
          .23939/sisn2023.
          <fpage>14</fpage>
          .189.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          et al.,
          <article-title>Evaluation of large language models (LLMs) on the mastery of knowledge and skills in the heating, ventilation and air conditioning (HVAC) industry</article-title>
          , Energy and
          <string-name>
            <given-names>Built</given-names>
            <surname>Environment</surname>
          </string-name>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1016/j.enbenv.
          <year>2024</year>
          .
          <volume>03</volume>
          .010.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Lu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Generative pre-trained transformers (GPT)-based automated data mining for building energy management: Advantages, limitations and the future, Energy and</article-title>
          Built
          <string-name>
            <surname>Environment</surname>
          </string-name>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1016/j.enbenv.
          <year>2023</year>
          .
          <volume>06</volume>
          .005.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>