<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maria Claudia Reis Cavalcanti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samir de Oliveira Ramos</string-name>
          <email>samir.ramos@ime.eb.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ronaldo Ribeiro Goldschmidt</string-name>
          <email>ronaldo.rgold@ime.eb.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wallace Anacleto Pinheiro</string-name>
          <email>wallaceapinheiro@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandra Miguel Raibolt da Silva</string-name>
          <email>raibolt@ime.eb.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alex Garcia</string-name>
          <email>garcia@ime.eb.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Alkmim</string-name>
          <email>balkmim@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robinson Callou</string-name>
          <email>robinson.rcmbf@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edward Hermann Haeusler</string-name>
          <email>hermann@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cecília de Azevedo Castro César</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ferrucio de Franco Rosa</string-name>
          <email>ferrucio@ita.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Maria Parente de Oliveira</string-name>
          <email>parente@ita.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aeronautics Institute of Technology (ITA), Praça Marechal Eduardo Gomes</institution>
          ,
          <addr-line>50, São José dos Campos - SP, 12228-900</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Military Institute of Engineering (IME), Praça Gen. Tiburcio</institution>
          ,
          <addr-line>80, Rio de Janeiro - RJ, 22290-270</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rua Marquês de São Vicente</institution>
          ,
          <addr-line>225, Rio de Janeiro -RJ, 22451-900</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <fpage>23</fpage>
      <lpage>37</lpage>
      <abstract>
        <p>One of the early challenges in ontology creation is developing a standardized vocabulary with clear definitions, especially when integrating concepts from various and often conflicting sources. We propose the LLM-Assisted Vocabulary Harmonization (LAVOHA) method, which leverages large language models (LLMs) to systematically analyze and reconcile concept definitions in natural language. Our approach is demonstrated in the cybersecurity domain, where we harmonized definitions from multiple established cybersecurity vocabularies. In a case study, the LAVOHA definitions were evaluated against a human consensus using criteria such as clarity, completeness, and alignment with expert understanding. The results indicate that LAVOHA produces definitions that are more consistent and comprehensive than those generated by an LLM without harmonization guidance. These findings suggest that LAVOHA can significantly enhance the quality and interoperability of ontology vocabularies in complex domains.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLM</kwd>
        <kwd>Ontology</kwd>
        <kwd>Vocabulary Harmonization</kwd>
        <kwd>BM25</kwd>
        <kwd>RAG</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        According to the literature on ontology engineering [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ], when formalizing an ontology, building or
reusing a glossary of terms is demanded. However, existing glossaries may have problematic definitions
due to ambiguous and conflicting terminology. In the cybersecurity incident response domain, many
oficial standardization documents point to conflicting definitions of terms. Identifying the best definition
of terms in this context is hard, time consuming, and requires great human efort.
      </p>
      <p>Upon identifying the definitions for the same concept, merging them is a complex task fraught with
several key problems, such as inconsistencies, name conflicts, and redundant hierarchies, to name
a few. In summary, merging diferent descriptions of the same concept risks either loss of detail or
overgeneralization. The process may fail to distinguish between closely related but distinct concepts,
collapsing them into a single broad concept and losing important nuances. In contrast, merging truly
identical concepts could be neglected, leading to unnecessary duplication.</p>
      <p>
        In our literature mapping, we identified the need for agile approaches aimed at helping humans
agree on consensual definitions. Natural language processing techniques have been used in ontology
engineering and, when efectively fine-tuned, LLMs might work as suitable assistants for ontology
construction [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>We propose the LLM Assisted Vocabulary Harmonization (LAVOHA) method to harmonize definitions
from various conflicting sources (e.g., vocabularies and glossaries). Documents are segmented into
smaller chunks and transformed into vector embeddings for eficient storage. When a query is received,
the system retrieves the most relevant chunks using a similarity function and provides them as a context.</p>
      <p>When building an ontology from natural language definitions, engaging in full alignment with
existing ontologies is typically premature. At this stage, the focus should be on clarifying concepts,
normalizing terminology, and establishing initial structures, tasks that require semantic flexibility
and lack the grounding needed for reliable correspondence with formal models. Premature ontology
alignment risks distorting the intended meaning or introducing misinterpretations.</p>
      <p>To mitigate these risks, the early phase must prioritize vocabulary construction: identifying and
standardizing domain-relevant terms to ensure internal coherence. Although this process has not yet
involved mapping to external ontologies, it lays the foundation for future alignment by creating a
stable semantic base. In this sense, vocabulary construction acts as a form of pre-alignment, shaping
definitions and relationships that will later facilitate integration.</p>
      <p>For these reasons, this work does not address the ontology alignment itself but the prior step of
harmonizing vocabulary definitions. Alignment becomes feasible only once the core vocabulary and
conceptual structures have stabilized.</p>
      <p>The remainder of this paper is organized as follows. Section 2 provides theoretical foundations;
Section 3 presents a synthesis of the literature review, focusing on related work; Section 4 introduces
the LLM-Assisted Vocabulary Harmonization method; Section 5 describes a case study on applying the
proposed method and discusses the results; and Section 6 presents the conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Foundations</title>
      <sec id="sec-2-1">
        <title>2.1. Ontology Building Methodologies</title>
        <p>
          Most of the methodologies for ontology engineering found in the literature [
          <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
          ] include sub-processes,
ranging from requirements elicitation to testing. Usually, in the requirements elicitation sub-process,
they gather existing vocabularies and other related standard and reference documents. Figure 1 shows
a variation of the main sub-processes (white boxes) of the SABIO methodology [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in BPMN notation.
The conceptualization/formalization sub-process embeds a vocabulary construction task, in which the
ontologist must define the terms that can be covered by the ontology. Note that in Figure 1 this task is
reified (gray box) and represented as a sub-process. We treated this task as a sub-process because its
complexity is greater than it first seemed
        </p>
        <p>
          According to some authors [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], building a vocabulary, i.e. a list of terms and their corresponding
definitions, is not a trivial task and may afect the quality and consistency of the ontology under
construction. Many challenges emerge and must be solved with the support of domain experts. Some of
the main issues that should be addressed are [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]: (i) Inconsistencies – there are plenty of terminological
resources, and diferent definitions coming from these resources can be conflicting and lead to logical
contradictions; (ii) Homonyms – two diferent concepts might share the same name but refer to diferent
things, resulting in either unintended duplication and ambiguity, and should be distinguished by using
diferent terms; (iii) Synonyms - multiple terms that refer to the same concept may create confusion,
thus a preferred term must be chosen; (iv) Recursive or Circular Definitions - definitions that define a
term in terms of itself, or in terms of another that references it, can create logical inconsistencies, and
should be avoided; (v) Redundant Hierarchies – merging can introduce multiple paths between concepts
or duplicate subclass relationships, cluttering the structure and making maintenance dificult.
        </p>
        <p>Based on these issues, a set of tasks should be planned, such as reconciling definitions, checking for
cycles and inconsistencies, choosing preferred terms, disambiguating homonyms, among others.</p>
        <p>
          Moreover, especially in the case of cybersecurity domain, there are many reference and terminological
documents to take into account, which turn the Vocabulary construction into an even more expensive
and time-consuming task. In the present work, we intend to address some of these problems in an agile
way, by proposing a Vocabulary preparation and enrichment sub-process using a RAG/LLM approach.
2.2. RAG
Based on insights derived from references [
          <xref ref-type="bibr" rid="ref10 ref11 ref7 ref8 ref9">7, 8, 9, 10, 11</xref>
          ], Retrieval-Augmented Generation (RAG) is a
technique that enhances text generation in large language models (LLMs) by integrating information
from external, private, or proprietary data sources that are reliable, up-to-date, and provide additional
context. This approach improves the accuracy and reliability of the model’s output by referencing an
external knowledge base before generating a response. Additionally, RAG promotes transparency by
linking the generated text to specific, relevant sources, ofering users insight into the model’s generative
process.
        </p>
        <p>
          The RAG technology has been developing rapidly. RAG originated alongside the Transformers
framework, designed to enhance text generation by incorporating external context [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Subsequently,
the technology focused on prioritizing the most relevant information to enhance LLM responses.
RAG was later utilized to assist in fine-tuning LLMs, further improving their contextual awareness
and accuracy [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Thus, there are diferent types of RAG, which can be classified based on their
implementation, architecture, or approach to integrating external data. Among the various types of
approaches, the following stand out [
          <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 15, 16</xref>
          ]:
• Naïve RAG: Documents are segmented into smaller chunks and transformed into vector
embeddings for eficient storage. When a query is received, the system retrieves the most relevant
chunks using a similarity function and provides them as context for a language model.
• Advanced RAG: enhances the base model with sophisticated preprocessing (e.g., query
reformulation) and post-processing (e.g., document re-ranking) to optimize retrieval accuracy. It can
integrate LLMs, Large Retrieval Models (LRMs), and Small Language Models (SLMs) to generate
precise, coherent, and contextually enriched responses.
• Modular RAG: Allows individual components to be replaced or fine-tuned independently,
including the retriever (which fetches relevant data), the processor (which pre-processes information),
and the generator (which leverages an LLM or LRM to produce coherent and contextually accurate
text).
• Corrective RAG: After generating a response, the system cross-references it with trusted data
sources to identify and correct inaccuracies, ensuring greater factual accuracy and reliability.
• Speculative RAG: Typically utilizes an LLM or LRM to generate plausible responses by combining
retrieved information with pattern-based reasoning and informed assumptions.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Information Retrieval and Ranking Functions</title>
        <p>
          Information Retrieval (IR) [
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ] is an area concerned with the extraction of desired information from
various sources, but mainly textual content. The most used models in IR are vector space models and
probabilistic models, although there are techniques that do not involve embedding of text, such as
ranking functions.
        </p>
        <p>
          Vector space models[
          <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
          ] aim to embed words (or even entire fragments of text) as vectors in a
high-dimensional space. Relevance or similarity are then translated to distance between such vectors,
which can be represented in many ways, but mostly via the dot-product between two vectors.
        </p>
        <p>
          Probabilistic models are based on the principle that documents in a corpus should be ranked by
decreasing probability of their relevance to a queried term - the Probabilistic Ranking Principle [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
Diferent implementations provide their own probability estimation techniques, and each domain
requires adequate crafting of them.
        </p>
        <p>Ranking Functions are another way to retrieve information from texts - usually paired together with
word embeddings. Given a set (usually called a corpus) of fragments of text (documents)1 and a certain
term to be queried in these documents, this function creates a score for each - indicating the relevance
of each to the queried term.</p>
        <p>
          Tf-idf[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] is a ranking function that stands for a mix of term frequency and inverse document frequency.
It is, in fact, a way to take both these concepts into account when scoring documents regarding certain
queried terms. Term frequency is the amount of times the queried term appears in each document.
Inverse document frequency is the logarithm of the inverse of the frequency with which the term
appears in all of the documents - indicating how relevant the document in question is compared to the
others. These scores are, then, multiplied (let  be the number of documents,  a document and  a
queried term):
        </p>
        <p>TF-IDF(, ) = tf(, ) · 
︂(</p>
        <p>)︂
df()</p>
        <p>
          Over decades, Tf-idf has been used in several applications (e.g.[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]), but it does have certain limitations.
It does not consider concepts such as normalization and saturation. Normalization is related to the size
of each document, i.e. if a term appears 10 times in a document that has 100 words, it should be more
relevant than appearing 11 times in 1000 words. Saturation indicates that there must be a threshold
up to which the term is still relevant in the document - the more it appears above this point, the less
important the syntactical presence of the term in the document is to its semantics.
        </p>
        <p>
          BM25[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] stands for Best Match 25 (25 indicates that it was the 25th iteration of the refinement of this
algorithm). It is a series of improvements over Tf-idf considering concepts such as the ones described
above:
        </p>
        <p>BM25(, ) = ∑︁ 
∈
︂(</p>
        <p>)︂
df() ·
1 · (1 − ) +  · () )︁
︁(
(1 + 1) · tf(, )</p>
        <p>+ tf(, )
1,  - parameters
() - size of document 
 - average document size</p>
        <p>
          There have been many implementations of BM25 over the years [
          <xref ref-type="bibr" rid="ref25 ref26 ref27">25, 26, 27</xref>
          ], each adjusting parameters,
considering diferent domains, and considering diferent linguistic concepts. However, most of them
have better performance than plain Tf-idf in most scenarios.
1In this section we will utilize these terms to define ranking functions, but they are confusing when taking our implementation
into account. In the following sections, we will refer to a corpus as a document, and the documents in it exclusively as
fragments of text or sentences.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.4. Cybersecurity</title>
        <p>
          Cybersecurity, the domain addressed in this article, is the prevention of damage to, protection of, and
restoration of computers, electronic communications systems, electronic communications services, wire
communication and electronic communication, including information contained therein, to ensure its
availability, integrity, authentication, confidentiality, and nonrepudiation [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. The ontological analysis
of the cybersecurity domain is highly valued, as it reveals how diferent initiatives model reality in
distinct ways, which can directly impact the understanding of the domain, the interoperability of security
systems, and the subsequent implementation of policies and actions. We will extract fundamental
concepts from these initiatives; therefore, we provide a brief description of each.
        </p>
        <p>Several initiatives have been proposed to guide best practices in cybersecurity, providing guidelines
for identifying, preventing, and responding to threats. Some of the most prominent cybersecurity
frameworks and standards include the MITRE strategies, NIST approaches, ISO/IEC 27001, CIS Controls,
COBIT, and OWASP.</p>
        <p>
          MITRE is an American corporation that has developed two complementary frameworks: MITRE
ATT&amp;CK and MITRE D3FEND. The philosophy of MITRE ATT&amp;CK is to move from a reactive defense
to a proactive defense based on adversary behavior [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. Therefore, they focus on how adversaries
achieve their objectives, i.e., the tactics, techniques, and procedures (TTPs) employed in real-world
attacks. MITRE D3FEND, on the other hand, focuses on defensive countermeasures mapped onto
ATT&amp;CK ofensive tactics and techniques [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Once one knows how adversaries attack, one can detail
how to defend against them.
        </p>
        <p>
          NIST (National Institute of Standards and Technology) produces a wide range of publications,
standards, and frameworks related to cybersecurity. The NIST Cybersecurity Framework (CSF) is a high-level
strategic guide for organizations to manage and mitigate cyber risk [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. It is organized into six main
functions: Govern (establishing strategy, expectations, and policies), Identify (mapping assets, risks, and
vulnerabilities), Protect (implementing controls such as encryption and access management), Detect
(continuous monitoring to identify incidents), Respond (executing mitigation and communication plans),
and Recover (restoring services and improving resilience). The CSF acts as an aggregator, pointing
to other standards and norms for the implementation details. The NIST Special Publication (SP) 800
Collection is the most comprehensive collection of NIST cybersecurity guidelines and recommendations,
such as NIST 800-53, a catalog of security controls that organizations should implement [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. This
extensive catalog provides technology-agnostic controls used by risk teams, while security operations
teams more commonly use MITRE mitigations. It is possible to map between the two to find a common
language. The glossary of terms [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] used in all technical publications is useful for ontology design.
        </p>
        <p>
          ISO/IEC 27001 [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ], developed by the International Organization for Standardization (ISO) and the
International Electrotechnical Commission (IEC), is an international standard that defines requirements
for an Information Security Management System (ISMS), focusing on the confidentiality, integrity, and
availability of data. It covers risk management and security controls organized into domains (such as
policies, asset management, incident response, and certification). An organization may seek an ISO
27001 certification to demonstrate compliance with an international ISMS standard, which often covers
a large portion of the NIST CSF requirements and implements many of the controls in NIST SP 800-53.
        </p>
        <p>
          CIS Controls [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], developed by the Center for Internet Security (CIS), provide a set of practical and
prioritized controls to protect systems and data. It is focused on immediate implementation, making it
accessible for organizations seeking quick and efective solutions. The CIS Controls are closely aligned
with the NIST approaches, sharing common goals and complementary structures.
        </p>
        <p>
          Control Objectives for Information and Related Technology (COBIT), developed by the Information
Systems Audit and Control Association [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ], is an IT governance framework that combines cybersecurity
with organizational strategic objectives. It promotes efective technology and information management,
aligning security with business processes.
        </p>
        <p>
          OWASP (Open Web Application Security Project) [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] is an organization focused on software security,
particularly in web applications and APIs, providing guidelines and tools for developers and businesses.
Its most well-known project, the OWASP Top 10, lists the primary vulnerabilities in web applications,
such as SQL injection and authentication failures, helping prioritize mitigation actions.
        </p>
        <p>These initiatives, each with its specificities, are complementary and can be combined to create robust
cybersecurity strategies tailored to the needs and objectives of each organization. In the context of
ontologies, they provide systematic frameworks that organize, standardize, and implement security
concepts consistently and interoperable, facilitating the development and maintenance of ontologies.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>
        A literature mapping was conducted by analyzing 19 selected articles, following the principles and
phases proposed by Kitchenham (2004) [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]. We explored well-known scientific databases, including
Scopus, Web of Science, IEEE Xplore, SpringerLink, and Google Scholar. Our review identified key
challenges and opportunities for using AI approaches, such as large language models (LLM) and
retrievalaugmented generation (RAG) to support ontology development. We analyze works addressing various
aspects of the ontology development process, including studies focused specifically on vocabulary
concept harmonization. Of the related works reviewed, 12 studies focused on building or improving
ontologies using LLM during the initial phases of ontology engineering, such as specification and
conceptualization.
      </p>
      <p>
        LLMs are proven to be a promising approach for ontology learning and engineering, as they combine
eficient extraction of structured knowledge from natural text with human collaboration for refinement
and validation. Techniques for automatic discovery of taxonomic relationships and dynamic generation
of ontological components using RAG have been proposed in recent studies. Giglou et al. (2023) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
propose an approach that uses LLMs for Ontology Learning (OL). The authors investigate whether LLMs
can automatically extract and structure knowledge from natural language text. Nine LLM families were
evaluated for three main OL tasks: term typing, taxonomy discovery, and extraction of non-taxonomic
relations. Doumanas et al. (2024) [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] present an LLM-enhanced ontology engineering (OE) approach,
aiming to identify how OE tasks can be completed with LLM and human collaboration. LLMs are
employed to generate domain ontologies for modeling Search and Rescue (SAR) missions in wildfire
incidents. The authors analyze LLM capabilities to OE and evaluate the human-machine synergy to
represent knowledge, focusing on the SAR domain. Toro et al. (2024) [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ] present an ontology generation
method employing LLM and RAG (DRAGON-AI) aiming at generating textual and logical ontology
components. According to the authors, DRAGON-AI has high precision for relationship generation,
but has slightly lower precision than from logic-based reasoning; evaluators with the highest level
of confidence in a domain were better able to discern flaws in AI-generated definitions. Mateiu and
Groza (2023) [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ] enrich ontologies by translating Natural Language (NL) into Description Logic. A
GPT model is fine-tuned to convert NL into OWL. Pairs of sentences in NL and the corresponding
translations for fine-tuning are designed. The training pairs cover aspects of ontology engineering, such
as instances, domain and range of relations, and object property relationships. The resulting axioms
were used to enrich an ontology, supervised by human experts.
      </p>
      <p>
        Abolhasani and Bran (2025) [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ] propose the OntoKGen platform, which uses LLMs to extract
ontologies from technical texts and create new branches of these ontologies through user interaction
and validation to define concepts, relationships, and properties. Based on the confirmed ontology, the
platform generates KG in an automated, interactive, and adaptive manner, reducing user intervention
while allowing necessary adjustments. In future work, the authors propose to integrate the
OntoKGengenerated KGs into RAG systems, enabling dynamic data manipulation via an interface. Although
OntoKGen does not directly work with RAG systems, it represents an advance in the extraction and
creation of ontologies using LLMs.
      </p>
      <p>
        Vrolijk et al. (2023) [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ] present an ontology learning system for the job market based on the ESCO
ontology, which uses LLM combined with RAG techniques to extract, classify and relate mentions of
skills and occupations from online job advertisements. The system proposes a three-layer architecture
that integrates automatic processing and human interaction to keep the ontology updated, identifying
new entities and relationships. The experiments indicate that the method improves performance in
extracting mentions, classifying relationships, discovering knowledge, and suggesting new entities for
ontology extension.
      </p>
      <p>
        Bran et al. (2025) [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ] introduce the OntoRAG methodology, which combines LLM with ontologies
to enhance knowledge generation in scientific domains, where the goal is to mitigate "hallucination"
problems. The approach was tested on a benchmark in the Single Atom Catalysis (SAC) domain,
showing its efectiveness in predicting synthesis procedures. The results indicate that OntoRAG
outperforms traditional RAG methods, highlighting the potential of integrating ontologies as knowledge
representation alongside LLM models. In addition, the authors present the OntoGen tool, which allows
the automatic generation of ontologies from documents. The OntoGen process can be divided into
three stages, namely: (a) vocabulary extraction; (b) category generation; and (c) taxonomy extraction.
These stages facilitate the adaptation of the OntoRAG method when applied to new domains. Despite
the results presented, the authors note that user supervision is still needed when creating ontologies.
      </p>
      <p>
        LLMs have supported the expansion and enrichment of ontologies, demonstrating efectiveness in the
automated generation of ontological components (e.g., competency questions and RDF mappings) and
the structured extraction of knowledge from unstructured texts, with applications in diverse domains.
Yang et al. (2024) [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ] propose an LLM-based ontology expansion method. LLMs are used to formulate
competency questions (CQs) and to extend the initial ontology. The authors created a knowledge graph
for breast cancer treatment. Mukanova et al. (2024) [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ] propose an LLM-powered NLP method for
ontology enrichment. The authors aim to process natural language texts and extract data from the text
that matches the semantics of an ontological model. LLM extracts data from a Web page and converts it
into lists with information relevant to an ontology. The proposed method is implemented using the
example of an ontological model that describes a geographical configuration. Val-Calvo et al. (2025) [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ]
use LLM to aid in the development of ontology from data sets, increasing automation of ontology-based
KG generation. The authors developed an LLM method to enhance ontology engineering through
data pre-processing, ontology planning, building, and entity improvement. The proposed method can
generate mappings and RDF data, but the authors focus on ontologies.
      </p>
      <p>
        LLMs in conjunction with semi-automated approaches can support KG and ontology engineering,
e.g., formulating CQs, developing or evaluating KGs with lower human intervention, and enabling
conversational frameworks for eliciting requirements in ontologies. Kommineni et al. (2024) [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ]
present an LLM-supported approach for semi-automatically building an ontology and KG. The proposed
approach involves: i) formulating competency questions (CQs); ii) developing an ontology (TBox) based
on these CQs; iii) constructing KGs using the developed ontology; and iv) evaluating the resultant
KG with minimal to no involvement of human experts. To evaluate the answers generated via RAG
and the KG concepts automatically extracted using LLMs, the authors designed a judge LLM that
rates the generated content. Zhang et al. (2024) [
        <xref ref-type="bibr" rid="ref48">48</xref>
        ] present a framework for conversational ontology
engineering (OntoChat), aimed at supporting requirement elicitation, analysis, and testing. OntoChat
aids users in creating user stories and extracting competency questions. The authors replicated the
engineering of the Music Meta Ontology and collected preliminary metrics on the efectiveness of each
component from users.
      </p>
      <p>
        Our approach (LAVOHA) is focused on defining concepts from cybersecurity vocabularies and
ontologies. We harness LLM to analyze and harmonize concept definitions in natural language and
propose term relationships of a security incident response glossary. LAVOHA supports ontologists in
the specification and conceptualization phases of the ontology engineering process. LAVOHA shares
similarities with other works, particularly in leveraging LLMs for ontology-related tasks. Unlike related
works, our method employs LLMs to process natural language and extract structured knowledge,
supporting ontology engineering. The focus on automating parts of the ontology development process
aligns with [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ], [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ], and [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ], which also aim to reduce human efort through LLM assistance. Similarly
to [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ] and [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ], LAVOHA involves defining and refining concepts (in our case, cybersecurity terms)
using LLM-generated insights. The emphasis on human-AI collaboration is another shared aspect, as
seen in [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] and [
        <xref ref-type="bibr" rid="ref48">48</xref>
        ], where human expertise guides and validates the LLM outputs.
      </p>
      <p>
        Although related work (e.g. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ], [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ], [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ]) focuses on general ontology learning or enrichment,
our approach is domain-specific, targeting cybersecurity vocabularies and incident response terminology.
Unlike [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ] and [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ], which use LLMs for the generation or translation of logical axioms, our method
focuses on conceptual harmonization and proposal of relationships, supporting the early stages of the
ontology engineering. [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ] and [
        <xref ref-type="bibr" rid="ref48">48</xref>
        ] present automated evaluation mechanisms (e.g., judge LLM or
conversational interfaces), whereas our approach prioritizes ontologist-guided refinement rather than
full automation. This distinguishes our work from [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ], which minimizes human involvement, and aligns
more closely with the emphasis on human-machine synergy from [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]. Finally, [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ] and [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ] integrate
ontologies with KGs, and our current scope is limited to glossary and ontology conceptualization,
although future extensions could explore KG integration. Our approach shares foundational
LLMbased strategies with other works, but distinguishes itself through its cybersecurity focus, conceptual
harmonization goals, and balanced human-AI collaboration. Despite the large number of studies on the
use of LLMs in supporting ontology development, the specific issue of concept harmonization has been
little explored in the literature. Table 1 presents a comparative analysis of related work.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. The LAVOHA Method</title>
      <p>This section introduces LAVOHA, a simplified version of an Advanced RAG-inspired method (see
subsection 2.2) designed to support vocabulary harmonization in the ontology creation process.</p>
      <sec id="sec-4-1">
        <title>4.1. Conceptual Description</title>
        <p>Given a term  to be incorporated into a vocabulary, together with  = {1, 2, . . . , ||}2, a set
of queried terms (i.e. words related to the definition of term ), and a corpus  = {1, 2, . . . , ||}
consisting of relevant documents within the target domain, LAVOHA extracts relevant sentences from
2In this article, || denotes the cardinality of any arbitrary set 
documents in  and uses them to query an LLM for suggested definitions for . It is important to
highlight that if  is a single word term then  ∈ , so that the algorithm will return the sentences
related to  itself. In case  is a compound name, then each word in  must be in . Figure 2 presents a
modular overview of the designed method.</p>
        <p>Step 1 (Sentence Split) splits each document  ∈  into sentences, yielding a Bag of Sentences , the set
of all sentences in . In Step 2 (Top Sentences Selection), given ,  and , this step evaluates a relevance
score for each pair (, ) ∈  × . For each  ∈ , the top  sentences are returned, resulting in
a set  of up to || ×  distinct sentences. Then, for each sentence  ∈ , Step 3 (Select Neighbor
Sentences) retrieves the  sentences that precede  and the  sentences that succeed  in  . The
output of this step is, therefore, a set  of up to || × (2 + 1) distinct sentences, since the original
sentence in  is also included. The next step (Step 4 - LLM Request Formulation Task) constructs , the
prompt expression for an LLM  chosen by the user. To this end, the following strings are concatenated:
“You are a specialist in ”; name-of-the-domain-of-the-application; “. "; “Define the term "; ; “, based on the
definitions stated in the following set: "; . Note that name-of-the-domain-of-the-application, , and 
are variables and are not strings themselves. Indeed, they contain the strings to be concatenated to
form the prompt. Finally, LAVOHA’s last step (Step 5 - LLM Engine Execution) consists of the execution
of ′ engine, given the prompt . Its output is , the definition for  suggested by .</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Implementation Details</title>
        <p>We provide further details on the implementation of the most challenging steps of the LAVOHA method.
The LAVOHA implementation artifacts are available in the GitHub repository 3. The system was
developed in Python (version 3.12), due to its vast availability of useful packages and documentation
online.
Before Sentence Split, the pdf documents are converted to text files. This process is done only once
per document, as a pre-processing step, since it is highly time-costly. Whenever a new document is
added to the pool, it is quickly converted to a text file. To convert pdf files to txt, we make use of the
PyMuPdf4 in version 1.25.5.</p>
        <sec id="sec-4-2-1">
          <title>Step 1 (Sentence Split)</title>
          <p>In order to properly break the textual content of each file, we make use of the nltk5 NLP package
version 3.9.1, providing the appropriate language identifier depending on the document ( "english"
for documents in English, and "portuguese", for Portuguese) in order to find and properly ignore the
correct stopwords. The results presented in section 5 concern only the english version of the documents.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Step 2 (Top Sentences Selection)</title>
          <p>The selection of the best sentences was made using the BM25 algorithm. Our choice of BM25
implementation is the package rank-bm256, in version 0.2.2.</p>
          <p>The definitions of the BM25 parameters for each term are in a module ( queries.py), containing
a dictionary with information pertaining to the BM25 parameters for the 37 terms initially assigned
to the glossary. This module centralizes and makes it easier to change and adapt BM25 parameters
should the result of a BM25 call be unsatisfactory - this happens mostly in situations where not enough
relevant sentences are selected, or when many irrelevant sentences are selected via the algorithm. The
number of sentences retrieved in this module can be set in a property file.</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Step 3 (Neighbor Sentences Selection)</title>
          <p>The number of neighbors recovered in this step can be adjusted via a property file. In the experiment
reported in the present paper, the number of neighbors was set to 0 (no neighbors), since only in special
cases does the context surrounding the selected sentences provide useful information. This evaluation
should be conducted individually for each case.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Case Study</title>
      <p>We are developing an ontology for the cybersecurity domain, following a 4-phase methodology:
specification, conceptualization, formalization, and implementation. In this methodology, the glossary
of terms should be drafted preliminarily during specification and finalized during conceptualization.
During the conceptualization phase, we faced the dificulty of reconciling diferent definitions of terms
originating from documents assigned as knowledge sources. At this point, we created the LAVOHA
method to support this task. This section presents the results of the LAVOHA method applied to this
dificulty.</p>
      <sec id="sec-5-1">
        <title>5.1. Experiment Configuration</title>
        <p>As the document corpus  for the cybersecurity domain, we used the following set of documents:
• ATTACK_Design_and_Philosophy_March_2020.pdf
• getting-started-with-attack-october-2019.pdf
• mitre_TTPs.pdf
• NBRISO-IEC 27035.pdf
4https://pypi.org/project/PyMuPDF/
5https://www.nltk.org/
6https://pypi.org/project/rank-bm25/
The following list shows each term  used in the experiment, together with its set of related words
(queries) . It is worth recalling that the terms and their respective related words, defined by the user,
are precisely the concepts that require harmonization, since the meanings may vary in each document.
• Attack: "attack", "attempt", "access", "damage", "interrupt", "malicious", "degrade", "destroy",
• Attack vector: "vector", "attack", "method", "methods", "technique", "techniques", "access"
• Campaign: "campaign", "grouping", "activities", "intrusion", "period", "targets", "objectives"
• Damage: "damage", "efect", "event", "incident", "occurrence", "loss"
• Event: : "event", "occurrence", "observable", "indication", "incident", "suspicion", "adverse"
• Incident: "incident", "occurrence", "risk", "confidentiality", "integrity", "availability", "information",
"violation", "threat", "policies"
• Information Asset: "asset", "information", "value", "person", "organization", "organisation",
"medium", "resource", "critical"
The configuration properties used in the experiment were the following:
• N = 6. Step 2 selects the top six sentences.
• M = 0. Step 3 adds no neighbors.</p>
        <p>•  ∈ {GPT-4o, DeepSeek-V3}. In Step 6 we used both GPT-4o and DeepSeek-V3.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Results</title>
        <p>To assess whether LAVOHA indeed improved the LLM output, we generated the output (reconciled
definitions) in three diferent ways:
• Using only the LLM, without LAVOHA;
• Using the LLM with LAVOHA;
• Through human discussions until reaching a consensual definition.</p>
        <p>
          The human consensual definition was considered the appropriate response, and it was later compared
to the first two definitions to evaluate whether the use of LAVOHA improved the LLM response for
this task. To achieve this, we calculated the sentence embedding as the average of its word vectors,
the cosine similarity between the responses generated by LLMs, and the consensus definition. The
definition embeddings were computed with the pre-trained FastText library [
          <xref ref-type="bibr" rid="ref49 ref50">49, 50</xref>
          ].
        </p>
        <p>Table 2 shows the mean value and standard deviation of the cosine similarity over the seven terms’
definition. We note that with the use of LAVOHA, the quality of GPT4-o output improved from a 0.9261
mean similarity with the consensus definition to 0.9317, whereas DeepSeek improved from 0.9279 to
0.9443. Therefore, the DeepSeek model responded better with the help of LAVOHA. This result may be
due to DeepSeek having a smaller training set than GPT4-o.</p>
        <p>We performed the automated evaluation because it is dificult to analyze manually the nuances that
diferentiate the two definitions. Still, it may be useful to present definitions produced with and without
LAVOHA to illustrate the quality improvement achieved by the method. As an example, we list two
definitions of attack: the first was generated by DeepSeek-V3 alone, and the second by DeepSeek-V3
assisted by LAVOHA:
1. A deliberate, malicious attempt by an individual, group, or system to exploit vulnerabilities
in a computer system, network, or digital infrastructure with the intent to: - Compromise
confidentiality, integrity, or availability (CIA triad) of data or services. - Gain unauthorized access,
disrupt operations, steal information, or cause harm. - Deploy malware, execute code, manipulate
systems, or escalate privileges. Attacks can be active (directly altering or damaging systems) or
passive (eavesdropping without modification). Common types include phishing, ransomware,
DDoS, SQL injection, and zero-day exploits.
2. A deliberate, malicious attempt by an adversary to compromise, disrupt, or destroy the
confidentiality, integrity, or availability of systems, networks, or data. Attacks may employ various tactics,
techniques, and procedures (TTPs), such as exploiting vulnerabilities, deploying malware, or
leveraging social engineering, to achieve objectives like unauthorized access, data theft, service
disruption (e.g., Denial of Service), or system destruction. These actions can target technical
infrastructure (e.g., endpoints, cloud resources) or human elements (e.g., phishing), and often
mimic normal activity to evade detection.</p>
        <p>The second definition feels cleaner and more precise. In the first sentence, it defines attack, in the
second, it demonstrates how attacks may be performed, and in the third sentence, it enumerates the
possible targets of an attack. Furthermore, it uses the verb "target", an important predicate, because it
characterizes the relation between a typical attack and the attacked assets, suggesting a possible triple
modeling (Attack, targets, Asset). Then it identifies "technical infrastructure" as a target, a general term
that encompasses the vulnerable entities listed in the first definition. It also includes "human elements",
which is a conceptual gap in the first definition.</p>
        <p>The other six term definitions generated by DeepSeek and the seven term definitions generated by
GPT presented similar improvement with the aid of LAVOHA. Furthermore, DeepSeek answers for the
seven terms without LAVOHA seem to vary little, as if reading from the same source. For instance,
DeepSeek mentions "confidentiality, integrity, or availability (CIA triad)" in 4 of the 7 definitions without
LAVOHA. However, when assisted by LAVOHA, it mentions the CIA triad in only one of the definitions,
which shows a better separation of concepts by the diferent definitions.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We introduced LAVOHA, a method designed to harmonize conflicting concept definitions, specifically
applied within cybersecurity vocabularies and ontologies. The approach leverages natural language
analysis to produce unified consensus-based definitions of security concepts, as shown through a case
study that compares LAVOHA-generated definitions with human consensus. The results favor LAVOHA
over relying solely on large language models (LLMs), showing improved suitability.</p>
      <p>As future work, we intend to extend our approach to other phases of Ontology Engineering,
investigating how LLMs can assist in writing an ontology and whether LAVOHA-like methods can enhance
their performance. Regarding the focus of the present work, namely vocabulary harmonization, future
research could explore additional quantitative evaluation methods, such as measuring the
perplexity of the consensus definition when submitted to the LLM in diferent scenarios. We expect that
better-equipped LLMs (possibly enhanced by LAVOHA) will exhibit lower perplexity scores for the
consensual definition. It would also be worthwhile to test alternatives to the BM25 algorithm for
sentence selection, such as using LLM-based embeddings to represent queries and candidate sentences
in Step 2 of LAVOHA.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by the Brazilian Funding Authority for Studies and Projects (FINEP) trough the
projects [CyberSemantics - Contract No. 0.1.22.0335.00/Ref. FINEP No. 0172/22] and [S2C2 - Contract
No. 0.1.20.0272.00/Ref. FINEP No. 2904/20] and the Systems Development Center of the Brazilian Army.
E.H.Haeusler was partially suported by CNPq grant 309287/2023-5.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4o for grammar and spelling checks and text
translations. After using this tool, the authors reviewed and edited the content as needed and assume
full responsibility for the content of the publication.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>R. de Almeida Falbo</surname>
          </string-name>
          ,
          <article-title>Sabio: Systematic approach for building ontologies</article-title>
          , in: G. Guizzardi,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pastor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wand</surname>
          </string-name>
          , S. de Cesare,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gailly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lycett</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          Partridge (Eds.),
          <source>Proceedings of the 1st Joint Workshop ONTO.COM / ODISE on Ontologies in Conceptual Modeling and Information Systems Engineering co-located with 8th International Conference on Formal Ontology in Information Systems, ONTO.COM/ODISE@FOIS</source>
          <year>2014</year>
          , Rio de Janeiro, Brazil,
          <year>September 21</year>
          ,
          <year>2014</year>
          , volume
          <volume>1301</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2014</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1301</volume>
          / ontocomodise2014_2.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Suárez-Figueroa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernández-López</surname>
          </string-name>
          ,
          <article-title>The neon methodology framework: A scenario-based methodology for ontology development</article-title>
          ,
          <source>Applied Ontology</source>
          <volume>10</volume>
          (
          <year>2015</year>
          )
          <fpage>107</fpage>
          -
          <lpage>145</lpage>
          . URL: https://journals.sagepub.com/doi/abs/10.3233/AO-150145. doi:
          <volume>10</volume>
          .3233/AO-150145.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Mariano</surname>
          </string-name>
          Fernández-López,
          <article-title>Asunción Gómez-Pérez, Methontology: From ontological art towards ontological engineering</article-title>
          ,
          <source>Miscellaneous</source>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Babaei Giglou</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>S. Auer,</given-names>
          </string-name>
          <article-title>Llms4ol: Large language models for ontology learning</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2023</year>
          , pp.
          <fpage>408</fpage>
          -
          <lpage>427</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>A. De Nicola</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Missikof</surname>
          </string-name>
          ,
          <article-title>A lightweight methodology for rapid ontology engineering</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>59</volume>
          (
          <year>2016</year>
          )
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . URL: https://doi.org/10.1145/2818359. doi:
          <volume>10</volume>
          .1145/2818359.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>P. M. L. Scheidegger</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. L. M. Campos</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          <string-name>
            <surname>Cavalcanti</surname>
          </string-name>
          ,
          <article-title>An approach for systematic definitions construction based on ontological analysis</article-title>
          , in: E. Garoufallou,
          <string-name>
            <given-names>S.</given-names>
            <surname>Virkus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Siatri</surname>
          </string-name>
          , D. Koutsomiha (Eds.), Metadata and Semantic Research - 11th International Conference, MTSR 2017 Tallinn, Estonia,
          <source>November 28 - December 1</source>
          ,
          <year>2017</year>
          , Proceedings, volume
          <volume>755</volume>
          of Communications in Computer and Information Science, Springer,
          <year>2017</year>
          , pp.
          <fpage>87</fpage>
          -
          <lpage>99</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -70863-
          <issue>8</issue>
          _ 9. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -70863-8\_9.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Amazon</given-names>
            <surname>Web</surname>
          </string-name>
          <string-name>
            <surname>Services</surname>
          </string-name>
          , What is RAG?
          <article-title>- Retrieval-Augmented Generation explained</article-title>
          , https://aws. amazon.com/what-is/retrieval-augmented-generation/,
          <year>2023</year>
          . Acessado em 10 de outubro de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>IBM</given-names>
            <surname>Research</surname>
          </string-name>
          ,
          <article-title>What is retrieval-augmented generation (RAG)?</article-title>
          , https://research.ibm.com/blog/ retrieval
          <article-title>-augmented-generation-</article-title>
          <string-name>
            <surname>RAG</surname>
          </string-name>
          ,
          <year>2023</year>
          . Acessado em 10 de outubro de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Intel</surname>
          </string-name>
          , What is RAG?
          <article-title>Retrieval-Augmented Generation explained</article-title>
          , https://www.intel.com/content/ www/us/en/learn/what-is-rag.html,
          <year>2025</year>
          . Acessado em 10 de outubro de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Elastic</surname>
          </string-name>
          ,
          <article-title>What is retrieval-augmented generation?</article-title>
          , https://www.elastic.co/what-is/ retrieval-augmented-generation,
          <year>2025</year>
          . Acessado em 10 de outubro de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Oracle</surname>
          </string-name>
          ,
          <article-title>What is Retrieval-Augmented Generation (RAG)?</article-title>
          , https://www.oracle.com/ artificial-intelligence/generative-ai/retrieval-augmented
          <string-name>
            <surname>-</surname>
          </string-name>
          generation-rag/,
          <year>2023</year>
          . Acessado em 10 de outubro de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.- t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledgeintensive nlp tasks</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/ 6b493230205f780e1bc26945df7481e5-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          , et al.,
          <article-title>Retrieval-augmented generation for large language models: A survey</article-title>
          , https://arxiv. org/abs/2312.10997,
          <year>2024</year>
          . Acessado em 10 de outubro de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>IBM</surname>
          </string-name>
          ,
          <article-title>What are RAG techniques?</article-title>
          , https://www.ibm.com/think/topics/rag-techniques,
          <year>2025</year>
          . Acessado em 10 de outubro de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Weka</surname>
          </string-name>
          ,
          <article-title>What is Retrieval-Augmented Generation (RAG)?</article-title>
          , https://www.weka.io/learn/guide/ai-ml/ retrieval-augmented-generation/,
          <year>2024</year>
          . Acessado em 10 de outubro de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Homayoun</surname>
          </string-name>
          ,
          <article-title>6 types of Retrieval-Augmented Generation (RAG) techniques you should know</article-title>
          , https://homayounsrp.medium.com/ 6
          <article-title>-types-of-retrieval-augmented-generation-rag-techniques-you-should-know-</article-title>
          <string-name>
            <surname>b45de9071c79</surname>
          </string-name>
          ,
          <year>2024</year>
          . Acessado em 10 de outubro de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singhal</surname>
          </string-name>
          ,
          <article-title>Modern information retrieval: A brief overview</article-title>
          ,
          <source>IEEE Data Eng. Bull</source>
          .
          <volume>24</volume>
          (
          <year>2001</year>
          )
          <fpage>35</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hambarde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Proença</surname>
          </string-name>
          , Information retrieval:
          <article-title>Recent advances and beyond</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>76581</fpage>
          -
          <lpage>76604</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2023</year>
          .
          <volume>3295776</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          ,
          <article-title>A comprehensive review on word embedding techniques</article-title>
          ,
          <source>in: 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>538</fpage>
          -
          <lpage>543</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICISCoIS56541.
          <year>2023</year>
          .
          <volume>10100347</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Keith</given-names>
            <surname>Norambuena</surname>
          </string-name>
          ,
          <source>A Systematic Literature Review on Word Embeddings: Proceedings of the 7th International Conference on Software Process Improvement (CIMPS</source>
          <year>2018</year>
          ),
          <year>2019</year>
          , pp.
          <fpage>132</fpage>
          -
          <lpage>141</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -01171-0_
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>A. de Vries</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Wilschut</surname>
          </string-name>
          ,
          <article-title>On the integration of ir and databases</article-title>
          , in: Database issues in multimedia,
          <year>1999</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Buckley, Term Weighting Approaches in Automatic Text Retrieval</article-title>
          ,
          <source>Technical Report, USA</source>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Shahzad</surname>
          </string-name>
          <string-name>
            <surname>Qaiser</surname>
          </string-name>
          , Text mining:
          <article-title>Use of tf-idf to examine the relevance of words to documents</article-title>
          ,
          <source>International Journal of Computer Applications</source>
          <volume>181</volume>
          (
          <year>2018</year>
          )
          <fpage>25</fpage>
          -
          <lpage>29</lpage>
          . URL: https://ijcaonline.org/ archives/volume181/number1/
          <fpage>29681</fpage>
          -
          <lpage>2018917395</lpage>
          /. doi:
          <volume>10</volume>
          .5120/ijca2018917395.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hancock-Beaulieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gatford</surname>
          </string-name>
          , Okapi at TREC-3, in: D. K. Harman (Ed.),
          <source>Proceedings of The Third Text REtrieval Conference</source>
          , TREC 1994, Gaithersburg, Maryland, USA, November 2-
          <issue>4</issue>
          ,
          <year>1994</year>
          , volume
          <volume>500</volume>
          -225 of NIST Special Publication,
          <source>National Institute of Standards and Technology (NIST)</source>
          ,
          <year>1994</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>126</lpage>
          . URL: http://trec.nist.gov/pubs/trec3/ papers/city.ps.gz.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>X. H.</given-names>
            <surname>Lù</surname>
          </string-name>
          ,
          <article-title>Bm25s: Orders of magnitude faster lexical search via eager sparse scoring</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2407.03618. arXiv:
          <volume>2407</volume>
          .
          <fpage>03618</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Trotman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Puurula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Burgess</surname>
          </string-name>
          ,
          <article-title>Improvements to bm25 and language models examined</article-title>
          ,
          <source>in: Proceedings of the 19th Australasian Document Computing Symposium</source>
          , ADCS '14,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2014</year>
          , p.
          <fpage>58</fpage>
          -
          <lpage>65</lpage>
          . URL: https://doi.org/10.1145/2682862. 2682863. doi:
          <volume>10</volume>
          .1145/2682862.2682863.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>M.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , H. Zaragoza,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Burges</surname>
          </string-name>
          ,
          <article-title>Optimisation methods for ranking functions with multiple parameters</article-title>
          ,
          <source>in: Proceedings of the 15th ACM International Conference on Information and Knowledge Management</source>
          , CIKM '06,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2006</year>
          , p.
          <fpage>585</fpage>
          -
          <lpage>593</lpage>
          . URL: https://doi.org/10.1145/1183614.1183698. doi:
          <volume>10</volume>
          .1145/ 1183614.1183698.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <article-title>National Institute of Standards and Technology, Security and Privacy Controls for Information Systems</article-title>
          and Organizations: NIST Special Publication 800-53, Revision 5,
          <string-name>
            <surname>Technical</surname>
            <given-names>Report</given-names>
          </string-name>
          , National Institute of Standards and Technology, Gaithersburg,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>B. E.</given-names>
            <surname>Strom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Applebaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Nickels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Pennington</surname>
          </string-name>
          , C. B. Thomas,
          <string-name>
            <surname>MITRE ATT</surname>
          </string-name>
          &amp;
          <article-title>CK: Design and Philosophy</article-title>
          ,
          <source>Technical Report, MITRE Corporation</source>
          ,
          <year>2020</year>
          . URL: https://attack. mitre.org/docs/ATTACK_Design_and_Philosophy_March_
          <year>2020</year>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Kaloroumakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Toward a Knowledge Graph of Cybersecurity Countermeasures</article-title>
          ,
          <source>Technical Report, MITRE Corporation</source>
          ,
          <year>2023</year>
          . URL: https://d3fend.mitre.org/resources/D3FEND.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <article-title>National Institute of Standards and Technology</article-title>
          , Cybersecurity framework,
          <year>2025</year>
          . Available at: https://www.nist.gov/cyberframework, accessed on May 28,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32] Computer Security Resource Center, Glossary,
          <year>2025</year>
          . URL: https://csrc.nist.gov/glossary.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33] International Organization for Standardization, Iso/iec 27001:
          <fpage>2022</fpage>
          - information security,
          <source>cybersecurity and privacy protection</source>
          ,
          <year>2022</year>
          . Available at: https://www.iso.org/standard/27001, accessed on May 28,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <article-title>Center for Internet Security, Cis critical security controls</article-title>
          ,
          <year>2025</year>
          . Available at: https://www.cisecurity. org/controls, accessed on May 29,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>ISACA</surname>
          </string-name>
          ,
          <article-title>Cobit 2019 framework: Introduction and</article-title>
          methodology,
          <year>2019</year>
          . Available at: https://www. isaca.org/resources/cobit, accessed on May 29,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>OWASP</surname>
          </string-name>
          , Owasp top ten,
          <year>2025</year>
          . Available at: https://owasp.org/www-project
          <string-name>
            <surname>-</surname>
          </string-name>
          top-ten/,
          <source>accessed on May 29</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kitchenham</surname>
          </string-name>
          ,
          <article-title>Procedures for performing systematic reviews</article-title>
          , Keele,
          <string-name>
            <surname>UK</surname>
          </string-name>
          , Keele University 33 (
          <year>2004</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>D.</given-names>
            <surname>Doumanas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Soularidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kotis</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Vouros, Integrating llms in the engineering of a sar ontology</article-title>
          ,
          <source>in: IFIP International Conference on Artificial Intelligence Applications and Innovations</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>360</fpage>
          -
          <lpage>374</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>S.</given-names>
            <surname>Toro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Anagnostopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Bello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Blumberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cameron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Carmody</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Diehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Dooley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. D.</given-names>
            <surname>Duncan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fey</surname>
          </string-name>
          , et al.,
          <article-title>Dynamic retrieval augmented generation of ontologies using artificial intelligence (dragon-ai)</article-title>
          ,
          <source>Journal of Biomedical Semantics</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mateiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Groza</surname>
          </string-name>
          ,
          <article-title>Ontology engineering with large language models</article-title>
          ,
          <source>in: 2023 25th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>226</fpage>
          -
          <lpage>229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Abolhasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>Ontokgen: A genuine ontology and knowledge graph generator using large language model</article-title>
          ,
          <source>in: 2025 Annual Reliability and Maintainability Symposium (RAMS)</source>
          , IEEE,
          <year>2025</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vrolijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Poslavsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bijl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mahdavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shokri</surname>
          </string-name>
          ,
          <article-title>Ontology learning for esco: Leveraging llms to navigate labor dynamics</article-title>
          ,
          <source>Proceedings of the 2nd workshop on Knowledge Base Construction from Pre-Trained Language Models (KBC-LM</source>
          <year>2024</year>
          )
          <article-title>(</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>A. M. Bran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Oarga</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hart</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lederbauer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Schwaller</surname>
          </string-name>
          ,
          <article-title>Ontology-retrieval augmented generation for scientific discovery</article-title>
          ,
          <source>Under review as a conference paper at ICLR</source>
          <year>2025</year>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>An llm supported approach to ontology and knowledge graph construction</article-title>
          ,
          <source>in: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>5240</fpage>
          -
          <lpage>5246</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Milosz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dauletkaliyeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nazyrova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Yelibayeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kuzin</surname>
          </string-name>
          , L. Kussepova,
          <article-title>Llm-powered natural language text processing for ontology enrichment (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>M.</given-names>
            <surname>Val-Calvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Aranguren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mulero-Hernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Almagro-Hernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Deshmukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Bernabé-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Espinoza-Arias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Sánchez-Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Fernández-Breis</surname>
          </string-name>
          ,
          <article-title>Ontogenix: Leveraging large language models for enhanced ontology engineering from datasets</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>62</volume>
          (
          <year>2025</year>
          )
          <fpage>104042</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Kommineni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>König-Ries</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samuel</surname>
          </string-name>
          ,
          <article-title>From human experts to machines: An llm supported approach to ontology and knowledge graph construction</article-title>
          ,
          <source>arXiv preprint arXiv:2403.08345</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Carriero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schreiberhuber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tsaneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. S.</given-names>
            <surname>González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          , J. de Berardinis,
          <article-title>Ontochat: a framework for conversational ontology engineering using language models</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>102</fpage>
          -
          <lpage>121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics 5 (</article-title>
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          . URL: https: //aclanthology.org/Q17-1010/. doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00051</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <surname>C. De Boom</surname>
            ,
            <given-names>S. Van</given-names>
          </string-name>
          <string-name>
            <surname>Canneyt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bohez</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Demeester</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Dhoedt</surname>
          </string-name>
          ,
          <article-title>Learning semantic similarity for very short texts</article-title>
          ,
          <source>in: 2015 IEEE International Conference on Data Mining Workshop (ICDMW)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1229</fpage>
          -
          <lpage>1234</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICDMW.
          <year>2015</year>
          .
          <volume>86</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>