<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Large Language Model for Ontology Learning in Drinking Water Distribution Network Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yiwen Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erkan Karabulut</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victoria Degeler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>6</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>Currently, most ontologies are created manually, which is time-consuming and labour-intensive. Meanwhile, the advanced capabilities of Large Language Models (LLMs) have proven beneficial in various domains, significantly improving the eficiency of text processing and text generation. Therefore, this paper focuses on the use of LLMs for ontology learning. It uses a manual ontology construction method as a basis to facilitate the LLMs for ontology learning. The proposed approach is based on Retrieval Augmented Generation (RAG), and passed queries to LLMs are based upon the manual ontology method - UPON Lite ontology. Two diferent variants of LLMs have been experimented with, and they all demonstrate the capability of ontology learning to varying degrees. This approach shows promising initial results in the direction of (semi-) automated ontology learning using LLMs and makes the ontology construction process easier for people without prior domain expertise.The final ontology was evaluated by the domain expert and ranked according to the defined criteria. Based on the evaluation results, the ifnal ontology could be used as a base version, but it requires further fine-tuning by domain experts to ensure its accuracy and completeness.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLM</kwd>
        <kwd>Ontology Learning</kwd>
        <kwd>Drinking water distribution network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The widespread success of Large Language Models (LLMs), such as ChatGPT, has made them one of the
most popular tools in the AI field [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A LLM is trained on massive amounts of data, and based on the
observed patterns in the training data. It is capable of performing Natural Language Processing (NLP)
tasks in real-time, such as generating answers to queries and summarizing texts.
      </p>
      <p>
        An ontology, defined as "a formal, explicit specification of a shared conceptualization" [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], usually
serves as a formal model that defines the common vocabulary and relationships in a given domain. The
construction of an ontology can be categorized into two approaches: the manual approach or the (semi)
automatic approach. The manual construction of the ontology is usually done by domain experts, and
this process involves considerable efort [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Manual construction of ontology ofers the advantage of
precision and structure, but takes a lot of time. In recent years, LLM-based (semi) automatic ontology
construction has gained popularity [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It refers to a (part of) or the complete ontology construction
process being handled by automated tools or algorithms. The (semi) automatic methods are eficient in
terms of production time, but they lack accuracy as they lack domain expertise in the creation phase
and are less structured.
      </p>
      <p>Therefore, this paper explores a new way of creating an ontology - combining manual and (semi)
automatic methods. It uses LLMs to facilitate the process of ontology construction and the framework
is derived from manual methods. The aim is to automate the process of generating an ontology while
following the established steps of manual ontology construction. This hybrid method could increase the
eficiency of ontology construction while maintaining the structure during the ontology construction
process. The drinking water distribution network (DWDN) was chosen as a case study to test the
feasibility of the approach. The research question is proposed as follows : How can LLMs be used to
facilitate the construction of an ontology in the drinking water distribution network domain
based on a manual ontology construction method?</p>
      <p>The remainder of this paper is organized as follows: First, a literature review on ontology construction
and existing ontology construction methods is conducted in Section 2. Section 3 describes the DWDN
use case. Next, Section 4 explains the approach of this research to create the ontology. This is followed
by the results in Section 5. Finally, the paper discusses the limitations and future work in Section 6, and
provides conclusions in Section 7.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>This section analyzes existing ontology construction methods to find the best one among manual and
(semi) automatic construction methods; and looks at the usage of LLMs for ontology construction and
at the retrieval-augmented generation.</p>
      <sec id="sec-2-1">
        <title>2.1. Manual ontology construction</title>
        <p>
          Two systematic literature reviews (SLR) have investigated the ontology construction, listing in total
21 manual ontology construction methods [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ]. In order to find the best method that could be used
as a framework for (semi) automatic ontology construction, eight evaluation criteria, either derived
from the original paper [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] (domain analysis, level of detail, evaluation, documentation, maintenance,
reusability, sample application) or tailored to this paper (domain specificity), were selected to assess the
quality of the manual methods.
        </p>
        <p>
          Two methodologies stand out after the evaluation: the UPON Lite Ontology [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and the NeOn
Methodology [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The NeOn Methodology ofers high flexibility with a bit more detailed results,
allowing users to select proposed scenarios based on the specific situation, whereas the UPON Lite
Ontology provides direct and concise instructions for ontology construction. Given the context of this
study, the UPON Lite Ontology is a preferred approach, as it provides comprehensive instructions and
the entire construction cycle by default. A few relevant papers from the recent years were
domainspecific and could not be generalized to other domains, such as the construction industry [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and the
product development [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. (Semi) Automatic ontology construction</title>
        <p>
          Zulkipli et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] discuss 19 (semi-)automatic methods for ontology construction. 8 of these methods
are semi-automatic and 11 are fully automatic based on the categorisation of the paper. They did
not define any criteria for evaluating these methodologies, so, in this paper, we have devised several
new criteria (Description of automation (C3), Main tools or algorithm (C4), Availability of tools(C5),
Input data for automation(C7), Accessibility(C8)) and augmented them with some criteria for manual
methods (Evaluation(C2), Domain specificity(C6), Level of detail(C1)) to find the best (semi-)automatic
methodologies. The results of the evaluation can be found in the Github supplementary material1.
        </p>
        <p>
          Most methodologies contain detailed descriptions and clearly outline the automated ontology
construction process. They all used various tools to facilitate automation, such as NLP [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ] and the
RelExOnt algorithm [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. There are also a variety of input data formats, but most are plain text, such as
databases and websites, and several methods use NLP for automation.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Ontology learning</title>
        <p>
          Ontology learning refers to the "integration of knowledge from diverse fields, utilizing technology to
automatically or semi-automatically construct ontology based on various input data" [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. It is often
associated with ontology engineering and machine learning.
1https://github.com/DiTEC-project/Large-Language-Model-for-Ontology-Learning-In-Drinking-Water-Distribution-Network-Domain
        </p>
        <p>
          The LLMs are already used in various tasks that are relevant to diferent aspects of ontology
construction such as ontology matching, and competency question generation [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ], they could work
as an assistant in this process [
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ]. However, none of them addresses the problem of ontology
construction directly with LLMs.
        </p>
        <p>
          ChatGPT performs well in the task of term typing, recognizing a type taxonomy, and discovering
non-taxonomic relations between types [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Besides ChatGPT, the open source models such as ’Flan-T5’
also demonstrates ability to perform these tasks [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Furthermore, LLM demonstrates exceptional
capability in translating natural language sentences into description logic [19].
        </p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Retrieval-augmented Generation</title>
          <p>
            Although LLMs have demonstrated remarkable generation capabilities, they have a significant limitation
– hallucination, they may produce incorrect or missing information when queried beyond the scope of
their training data [20]. This limitation afects the reliability and validity of the responses provided by
LLMs. Nevertheless, this limitation is not insurmountable. In 2020, a technique known as
RetrievalAugmented Generation (RAG) was introduced [21], which has the potential to mitigate this risk by
augmenting the generative AI with facts from external documents. RAG combines pre-trained parametric
and non-parametric memory for language generation. It not only exploits the strong generation abilities
of LLMs but also incorporates external data as relevant information input. Therefore, RAG serves as a
powerful tool for customizing factual answers within specific domains. The study has been carried out
to demonstrate the feasibility of developing competency questions and then used LLMs to support the
automation process of ontology and knowledge graph construction via the RAG technique [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Use case: Drinking water distribution network</title>
      <p>A drinking water distribution network (DWDN) [22] is a large and complex network, it connects water
treatment plants or water sources (in the absence of treatment) to customers via a network of pipes,
storage facilities, valves, and pumps2. The DWDN serves several purposes such as providing water for
households, supplying water for firefighting and supporting industrial processes [22].</p>
      <p>There is currently no existing DWDN ontology, as the currently published ontologies in the water
networks domain focus on the drinking water quality [23, 24] or water resources management [25]
rather than the DWDN itself. EPANET is a widely used software for modeling and analysis of DWDNs3.
It includes various data that could be used for modeling such as colour-coded network maps, data
tables, energy consumption, response, calibration, and time series graphs. EPANET documentation
contains extensive information about the DWDN domain and this information is utilized as part of our
methodology as an external source for performing the RAG method.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>This section outlines the complete ontology construction process, including the RAG construction, the
strategy of query engineering and the pipeline. The final goal is to construct an ontology and represent
it in OWL. The flowchart is available in Table 1.</p>
      <sec id="sec-4-1">
        <title>4.1. Framework of the RAG</title>
        <p>The RAG enables the LLMs to generate contextual responses, and it is the core framework of ontology
construction in this paper. This paper implements the Basic RAG, which could be segmented into three
parts. The LangChain library from Hugging Face is employed4.
2https://www.epa.gov/dwreginfo/drinking-water-distribution-system-tools-and-resources
3https://www.epa.gov/water-research/epanet
4https://python.langchain.com/v0.1/docs/integrations/platforms/huggingface/
Domain terminology</p>
        <p>Dictionary {Entity : Definition}</p>
        <p>Domain glossary,
Taxonomy, and Predication</p>
        <p>Dictionary {Entity: {Synonyms:[ ]; Taxonomy:[ ]; Predication:[ ]}
Parthood</p>
        <p>Relationships between entities in the string format
Initial Ontology</p>
        <p>Turtle syntax of the initial ontology
Final Ontology</p>
        <p>Turtle syntax of the final ontology</p>
        <p>Expert fine-tuning</p>
        <p>Input In this step, users provide various types of data to serve as external information sources.
These documents are segmented into smaller units and transformed into vector scores using OpenAI
embedding model – "OpenAIEmbeddings". In our use case, the resource is coming from EPANET. This
resource comprehensively covers both physical and non-physical components in the water networks,
as well as simulation models for water flow.</p>
        <p>Retrieval Information retrieval is the process of extracting relevant information from a given input
document based on a user query [26]. The information retriever identifies the chunks that best match
the query based on their vector similarity. The retriever requires two parameters as input - search type
and search arguments. The search type parameter was set to Maximum Marginal Relevance (MMR).
MMR ofers the advantage of diversified search results. Its scoring mechanism combines the relevance
of the document chunks to the user query and the novelty of the chunk compared to others [27]. The
search arguments define K to set the retriever to return the top K chunks based on the MMR score. It’s
a trade-of between the diversity and relevance of the result. Typically, the value of K falls within the
range of 5 to 10, depending on the size of the divided chunks. Based on some small experiments, the
result is better when k is 10, therefore, k was set to 10.</p>
        <p>The LLMs sometimes produce results with errors. The algorithm is configured to attempt retrieval
up to three times per query. If the first attempt fails, the algorithm will retry the same query in the
same block. If the second attempt also fails, it will make a third and final attempt.</p>
        <p>Table 1 summarizes for all the mentioned hyperparameters to ensure reproducibility.</p>
        <p>Generation model In the generation phase, the LLM produces a contextually relevant response that
is coherent with the query and the retrieved units. Four variants of LLMs were used in this paper:
gpt-4-0125-preview (gpt-4), gpt-3.5-turbo-0125 (gpt-3.5-turbo), gpt-4-turbo-2024-04-09 (gpt-4-turbo) and
huggingfaceh4/zephyr-7b-beta (7b-beta).</p>
        <p>The construction of the RAG also requires several parameters. The template, which provides a
structured format for the RAG. It consists of two parts: a system template that applies to all the queries,
and a user query specified for each step. The query, which is defined by the user and passed to LLM via a
predefined system template. The Parser, which converts the generated response into a human-readable
string to make it more suitable for display and interpretation.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Query generation</title>
        <p>Once the RAG environment has been configured, the next step is to create queries. The user query is
used to instruct the LLM to generate the answer in the desired format and direction. The queries of
this paper are based on the UPON Lite Ontology, which includes the creation of a domain terminology;
domain glossary; taxonomy; predication; parthood, and ontology in the Turtle syntax format (a way of
expressing data in the RDF data model5.</p>
        <p>The creation of queries can be divided into three phases. The first phase involves experimentation,
focusing on the query instructions. The primary aim is to gain a basic understanding of the answers
generated by the LLMs. During this stage, the step descriptions are copied directly from the UPON
Lite Ontology and then tested multiple times with all LLMs. Based on the outputs, the query is slightly
modified, such as adding more descriptions or removing redundant content. For example, if the output
indicates a lack of understanding of a certain concept in the query, an explanation of that concept is
added. At the end of this step, the core set of queries is defined, which is with step descriptions.</p>
        <p>Having gained a basic understanding of the responses provided by the LLMs, the second phase shifts
the focus to data structure. The primary objective of this step is to create the query in a specific way so
that the LLM returns an answer in the desired format.</p>
        <p>Prompt engineering, or querying is about creating the "recipe" to guide the LLMs to perform the
desired task [28]. There are several papers that have introduced query patterns to enhance prompt
engineering [29, 30], but these identified patterns are all tailored for OpenAI models 6. After
experimenting with a promising template pattern [29]: "I am going to provide a template for your output. X is my
placeholder for content. Try to fit the output into one or more of the placeholders that I list. Please preserve
the formatting and overall template that I provide. This is the template: PATTERN with PLACEHOLDERS",
only the gpt-4 model demonstrates its capability to process this template pattern efectively.</p>
        <p>The original query templates were created, tested, and adjusted based on the outputs. Placeholders
are usually used in the query to structure the response format. For example: "Please provide the
complete answer formatted as a Python Dictionary {Entity: Definition}". By incorporating customized
format templates into the query, the queries became longer and the answers became more structured.
As a result, by the end of this phase, the queries consisted of two parts: format instructions and step
descriptions.</p>
        <p>The third stage, the fine-tuning of queries, focuses on improving both the quality and the structure of
the results. During this stage, the existing queries are fine-tuned based on the output from the diferent
5https://en.wikipedia.org/wiki/Turtle_(syntax)
6https://platform.openai.com/docs/models/overview
models, by adjusting the content details and changing the position of the content within the query. At
the end of this stage, all the queries can result in an overall satisfactory quality, comprising coherent
answers structured in accordance with the specified query format. Although the answers in the last
phase already possess some initial structure and quality, it’s crucial to refine them to further optimize
the results.</p>
        <p>For example, whether the format template or the step descriptions should come first. In the step of
the ontology generation, it was observed that LLMs struggled to process large amounts of information
to create the ontology Turtle syntax all at once, resulting in numerous mistakes and loss of information.
As a solution, the final ontology step was introduced to give LLMs more room and time to generate
the final Turtle syntax, thereby enriching the initial ontology that only contains entities and their
relationships.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Query pipeline</title>
        <p>This subsection provides an overview of the query, including its structure, purpose, and components.
The queries are derived from the UPON Lite ontology.</p>
        <p>The system template is created to provide a general structure for the input query and output. Within
the system template, the "query" and "Assistant" are the indicators for the starting position of the query
and answer respectively. Within the "query," its components are introduced.</p>
        <p>1. System template:
2. Query
• This part provides a general template for generating answers and handling input.
"template: | You are an AI assistant that follows instruction extremely well. Please give
direct and complete answers. Please be truthful, if you don’t know the answer, just say that
you don’t know, don’t try to make up an answer or return the wrong answer. {query}
Assistant"
• Introduction: This part introduces the query and explains its context.
• Input: Specifies the input for the query, such as the result from previous steps.
• Step Descriptions: Describes the instructions for each step.
• Answer Format: Outlines the format for presenting the query results.</p>
        <p>• Example: Provides an illustrative example of the query in action.</p>
        <p>Domain terminology The first step creates the domain terminology by identifying the entities or
concepts within the domain e.g. in the DWDN domain, the entities are "junction," "pipe," etc. The
structure of the first step is important because it forms the basis of the ontology. The response should
adopt a dictionary format. This format allows for easy looping of the entity within the dictionary in
subsequent steps if needed.</p>
        <p>Domain glossary, taxonomy, and predication The second step is to determine the domain glossary
(synonyms), taxonomy, and predication (property) for each entity that was generated in the previous
step. For example, in the DWDN domain, we may establish that ’control device’ is a synonym for ’valve’;
create a hierarchical taxonomy where ’butterfly valve’ is a category of ’valve’; and specify that ’valve’
has the property of ’material’. This step is designed to take the entire dictionary result of the domain
terminology as input and return the domain glossary, taxonomy, and predication in a dictionary format.</p>
        <p>Parthood This step involves establishing the relationships between entities that are generated in
the first step. Parthood defines part-whole relationships within the ontology, such as the relationship
between diferent entities. This step uses the entire domain terminology dictionary to replace the input
placeholder in the query, provide step descriptions, and specify the requirements such as avoiding the
generation of conflicting relationships.</p>
        <p>Initial ontology Until the last step, all stages of information gathering have been completed. This
step involves encoding the parthood into Turtle syntax that is machine-readable. It creates the initial
ontology based on the output from the last step. Through experimentation, it was observed that the
LLMs have dificulties to generate correct ontology Turtle syntax without explicit instruction, which
is not provided by the UPON Lite Ontology. Therefore, this query is tailored by us without reference
to the UPON Lite Ontology. The expected output of this step is a correct and complete initial Turtle
syntax.</p>
        <p>Final ontology This step also involves encoding information into the Turtle syntax. Ideally, the input
to this step includes the initial ontology from the previous step along with domain glossary, taxonomy,
and predication for all the entities. However, due to the limited processing capacity of the free model
such as 7b-beta, it cannot handle very large texts at once. Therefore, the alternative approach is to use
another predefined query that is slightly diferent in the input part which feed each entity from the
dictionary individually, to gradually generate the final ontology Turtle syntax. As in the previous step,
specific step descriptions and examples are developed. The expected outcome of this step is a correct
and complete Turtle syntax containing all the information generated before.</p>
        <p>Expert fine-tuning After the previous step, the LLM has generated the final ontology, and the
quality of the ontology should be checked to ensure the accuracy and completeness. Therefore, domain
experts evaluate it and provide suggestions for improvement for each of the six steps. It is a generic
ifne-tuning step and is not specific to, e.g., error correction.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Query Code</title>
        <p>This subsection details the queries sent to the LLMs at each step.</p>
        <sec id="sec-4-4-1">
          <title>4.4.1. Domain terminology</title>
          <p>We are creating an ontology in the water distribution network domain. The first step involves creating
a domain-specific terminology, or list of terms characterizing the domain at hand. This is a preliminary
step to start identifying domain knowledge and drawing the boundaries of the observed domain. The
drinking water distribution network could be modeled as a collection of links connected to nodes.
The outcome of this step should be a domain lexicon, or information structure used to answer the
question "What are the physical components typically used while building drinking water distribution
networks?"
Provide the output in a Python dictionary format suitable for direct iteration within a Python for loop,
structured as {"Entity": "Short definition of the entity"}, with no additional explanations outside of the
dictionary.</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>4.4.2. Domain glossary, taxonomy, and predication</title>
          <p>We are creating the ontology in the drinking water distribution network domain. Steps 2, 3, and 4
involve creating Synonyms, Taxonomy, and Predication for the input entity: {ResultQ1} based on the
step descriptions. There should not be any repeated answers between steps 2, 3, and 4. A term can be
either a synonym, a taxonomical term, or a predicate.</p>
          <p>Step 2: Provide 0-2 domain glossary (synonyms) for the input entity. The terms of the lexicon are
associated with a textual description, indicating also possible synonyms; Having produced a first lexicon,
you could, in this step, enrich it by associating a textual description with each entry. You can enrich the
lexicon by associating a textual description with each entry.</p>
          <p>Step 3: Provide taxonomy for the input entity. Domain terms are organized in a
generalization/specialization (ISA) hierarchy; The first is a taxonomy based on the specialization relation, or the ISA
relationship connecting a more specific concept to a more general one (such as invoice ISA business
document). You must not only identify ISA relations between existing terms but also introduce more
abstract terms or generic concepts seldom used in everyday life that are extremely useful in organizing
knowledge. During this step, you thus provide feedback to the two previous knowledge levels—lexicon
and glossary—since taxonomy building is also an opportunity to validate the two previous levels and
extend them with new terms. You must find a good balance between the breadth of the taxonomy, or
average number of children of intermediate nodes, and its depth, or levels of specialization and the
granularity of taxonomy leaves.</p>
          <p>Step 4: Provide predication (CP, AP, RP) for the input entity. Terms representing properties from the
glossary are identified and connected to the entities they characterize; This step is similar to a database
design activity, as it concentrates on the properties that, in the domain at hand, characterize the relevant
entities. You generally identify atomic properties (AP) and complex properties (CP). The former can be
seen as printable data fields (such as unit price), and the latter exhibit an internal structure and have
components (such as address composed of, say, street, city, postal code, and state). Finally, if a property
refers to other entities (such as a customer referred to in an invoice) it is called a reference property
(RP). In a relational database, an RP is represented by a foreign key. The resulting predicate hierarchy is
organized with the entity at the top, and then a property hierarchy below it, where nodes are tagged
with CP, AP, and RP.</p>
          <p>Here’s how you can structure it:
{"Entity1":{ "Synonyms": ["synonym1"],
"Taxonomy":
"term1": ["subterm1"],
"Predication": ["property1"]
# Add more or delete properties as needed }}</p>
        </sec>
        <sec id="sec-4-4-3">
          <title>4.4.3. Parthood</title>
          <p>We are creating the ontology in the drinking water distribution network domain. Step 5 involves
relationship mapping for the input entities: ResultQ1. Step 5: Parthood (meronymy). Complex entity
names connected to their components, with all names needing to be present in the glossary; This step
concentrates on the ’architectural’ structure of business entities, or parts of composite entities, whether
objects, processes, or actors, by eliciting their decomposition hierarchy (or part-whole hierarchy). To
this end, you would analyze the structure and components an entity exhibits, creating the hierarchy
based on the partOf (inverse hasPart) relationship. Parthood can also be applied to immaterial entities
(such as a regulation subdivided into sections and articles or a process subdivided into sub-processes
and activities). Please identify and map the clear relationships between entities and ensuring that no
conflicting relationships exist. For example, avoid situations where entity A is considered a part of
entity B while simultaneously entity B is also considered a part of entity A. You don’t need to provide
the explanation. You can structure it like this: Entity: Relationship: Entity.</p>
        </sec>
        <sec id="sec-4-4-4">
          <title>4.4.4. Initial ontology</title>
          <p>We are creating the ontology in the drinking water distribution network domain. Step 6 involves
creating the ontology schema based on the input: {ResultQ5}.</p>
          <p>Step 6: Please produce the formally encoded ontology by using the Web Ontology Language, or OWL,
based on this input.</p>
          <p>When constructing an ontology schema, follow these steps:
1) Define prefixes for readability.
2) Create classes to represent entities.
3) Organize classes hierarchically using subclass relationships.</p>
          <p>Please return the turtle syntax encompassing all classes and their relationships, excluding any
explanatory text. Here is an example of ontology schema in another domain:
# Define prefixes
@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns&gt; .
@prefix rdfs: &lt;http://www.w3.org/2000/01/rdf-schema&gt; .
@prefix owl: &lt;http://www.w3.org/2002/07/owl&gt; .
@prefix ex: &lt;http://example.org/ontology&gt; .
# Create classes and relationships
ex:Animal rdf:type owl:Class .
ex:Mammal rdf:type owl:Class;
rdfs:subClassOf ex:Animal.</p>
        </sec>
        <sec id="sec-4-4-5">
          <title>4.4.5. Final ontology</title>
          <p>We are creating the ontology in the drinking water distribution network domain. Step 7 is ontology
ifnalization, which integrates the knowledge gathered in previous steps: step 2 (Synonyms), step
3 (Taxonomy), and step 4 (Predication). The results from these steps are stored in the dictionary
{ResultQ234}. This is in the ontology schema: {ResultQ6}.</p>
          <p>Your task is formally encoding the previous result and combining it with the provided ontology schema.
When generating the answers, you need to keep everything from the ontology schema, but you don’t
need to provide any explanation. You should provide a complete ontology by repeating these steps:
1. Identify the key of the input, which represents the entity in the ontology, and use it as the class name
in the turtle syntax.
2. Define equivalent classes (e.g. equivalentClass) for each entity based on synonyms. Two classes may
be stated to be equivalent.
3. Incorporate the taxonomy of each entity as relationships (e.g. rdfs: subClassOf). Class hierarchies
may be created by making one or more statements that a class is a subclass of another class.
4. Define properties for each entity based on predication. Properties can be used to state relationships
between individuals or from individuals to data values. If there are repetitive properties between entities,
you can simply add information on top of existing properties rather than creating duplicates. Here is
an example of encoded information in turtle syntax:
ex:Person rdf:type owl:Class.
# Equivalent Classes
ex:Individual rdf:type owl:Class ;
owl:equivalentClass ex:Person.
# Taxonomy Relationships
ex:Employee rdf:type owl:Class ;
rdfs:subClassOf ex:Person.
# Properties
ex:hasChild rdf:type owl:ObjectProperty ;
rdfs:domain ex:Person.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Result</title>
      <p>This result section first describes the answers generated at each step and then evaluates the performance
of LLMs: gpt-4-0125-preview (gpt-4), gpt-3.5-turbo-0125 (gpt-3.5-turbo), gpt-4-turbo-2024-04-09
(gpt-4turbo), huggingfaceh4/zephyr-7b-beta (7b-beta).</p>
      <p>Domain terminology. The DWDN is modeled as nodes and links in the input document, the main
physical components consist of pipes, pumps, valves, junctions, tanks, and reservoirs. Across all the
responses generated by diferent LLMs, terms like pipe(s), valve(s), pump(s), reservoir(s), and hydrant(s)
are the most common. Surprisingly, the term "hydrant" does not appear in the input document. However,
it is a part of the drinking water distribution network, it refers to "a discharge pipe with a valve and
spout at which water may be drawn from a water main"7. The term "junction" is not consistently
7https://www.merriam-webster.com/dictionary/hydrant
included among these frequent terms, except in the response from 7b-beta. Below is the example answer
returned by 7b-beta of this step:</p>
      <p>{’tank’: ’A container used to store water in a drinking water distribution network.’, ’junction’: ’A point
where multiple pipes join in a drinking water distribution network.’, ’manhole’: ’A structure used to provide
access to the drinking water distribution network for maintenance and repair.’}</p>
      <p>
        Domain glossary, taxonomy, and predication. The expected result of this step is a structured
dictionary with the content of domain glossary, taxonomy, and predication. All the LLMs return a
structured dictionary, however, not all models are able to create the structured answer within the
dictionary, as illustrated in the query. LLM gpt-3.5-turbo can only return the predication without further
specification such as printable data fields - atomic properties (AP) , internal structure and components
- complex properties (CP) and the property refer to other attributes - reference properties (RP) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Moreover, both gpt-4 and gpt-4-turbo models generate very similar answers in this step. The example
answer of all models is shown in Table 2.
      </p>
      <p>Parthood. This step generates the relationships between entities. In general, gpt-4 and gpt-4-turbo
return similar results. Mostly the entities like "pump" are connected by the big concept "drinking water
distribution network". Gpt-3.5-turbo and 7b-beta create relationships between the entities but put less
emphasis on creating relationships under the one big system "drinking water distribution network".
Also, 7b-beta sometimes returns conflicting answers, even when the query explicitly instructs not to do
so. An example answer of this step is shown in Table 2.</p>
      <p>Initial ontology. The expected result of this step is a formal initial ontology Turtle syntax that
encodes the answer from parthood. However, not all models were able to produce the correct Turtle
syntax, with 7b-beta performing particularly poorly and gpt-3.5-turbo gives the most straightforward
answer. The 7b-beta model either failed to generate the answer due to server errors or produced Turtle
syntax with incorrect content. While the gpt models were also successful in generating Turtle syntax,
they showed inconsistent stability and errors such as repetitive class creation or incomplete input
information. The example result is provided in Table 2.</p>
      <p>Final ontology. Ideally, the final ontology should contain all the information generated in the initial
ontology, domain glossary, taxonomy and predication. However, all the models struggle somewhat
in this aspect. Although they all follow the query instructions’ structure, the result is disorganized,
containing errors or miss significant amounts of information. Nevertheless, gpt-4-turbo outperforms
the other models to some extent, as it can produce the Turtle file with fewer missing details and syntax
errors. Examples from this model are illustrated in Figure 2 which presents the relationship between
the main line (Large-diameter pipe) and water distribution network. Figure 3 shows the properties of
the entity fire hydrant.</p>
      <p>Expert fine-tuning. Two domain experts acknowledged the quality of the ontology produced by
gpt-4-turbo and suggested that it could serve as a valuable starting point. They also highlighted its
significant utility for individuals lacking domain knowledge but interested in ontology construction.
The main criticism centered around the overly complex responses, which included many less important
terms such as color as a property and hydrant as an entity, while essential entities such as water demand
were missing. Meanwhile, not all entities have a definition and one domain expert mentioned that this
is a challenge for them to understand them. All suggestions have been integrated into the final ontology.
The visualized full fine-tuned ontology is available in the Github supplementary material.</p>
      <sec id="sec-5-1">
        <title>5.1. Model evaluation</title>
        <p>The evaluation consists of two parts, the results evaluated by the domain expert and the ranking of the
models based on the experience gained during the project. The full evaluation results is available in
Table 3.</p>
        <p>In this paper, the overall performance of the models is qualitatively ranked along three dimensions:
time (the duration required to generate the answer), scalability (the length of returned final ontology),
cost (whether the use of the LLM is free or incurs a cost). The gpt-4-turbo has best performance in terms
of cost-efectiveness and scalability. gpt-4-turbo is followed by gpt-4. The 7b-beta model is characterized
by being free of cost but it has a longer response time.</p>
        <p>In addition, to better understand the stability of each LLM, the domain terminology query was run
ten times and the frequency of terms graph is plotted in Figure 4. The 7b-beta returns the same answer
even on diferent runs, and the gpt LLMs are able to return answers that are very similar but difer in
details such as the order of terms and the definitions of terms.</p>
        <p>Meanwhile, a similarity check is performed to compare the answers generated by the same model
and all queries over three runs. This check involves vectorizing the responses and then using cosine
similarity to measure the diference. The similarity check score is calculated for each model with all
queries, each run three diferent times, resulting in three answers. When compared, it is clear that there
is not a high degree of similarity between the answers generated by the models, indicating the answers
might be unstable. Although gpt-4 achieved the highest similarity score, suggesting that its outputs are
somewhat closer to its previous output compared to the other models. The overall similarity between
the answers remains not very high for all models.</p>
        <p>When analyzing the results of each step, the models show similar performance in the intermediate
steps. The main diference between the models lies in the initial and final phases. While 7b-beta excels
in the first step of domain terminology, its performance drops in the last two steps of initial ontology
generation and final ontology generation. Conversely, gpt-4-turbo performs well in the final step and
gpt-3.5-turbo performs well in the initial ontology construction.</p>
        <p>The domain expert from the water network company Vitens, as a senior data scientist evaluated
the models anonymously without prior knowledge of the model identity. The expert evaluated the
result from three perspectives on the scale of 1 to 10: accuracy, ensuring factual correctness; relevance,
alignment with the DWDN domain; and completeness, addressing the query comprehensively. The
scores of each model are determined based on both the initial and the final ontologies. There is no
reference ontology used for scoring; instead, the models are scored at once after a thorough review of
all the models. All models achieve a score of 7 or above, meaning that the results are good for the first
version, but not perfect. Of all the models tested, gpt-4-turbo achieved the highest score.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>This section discusses the limitations of this approach.</p>
      <p>Instability of LLMs. With each run, even when presented with the same query and model, the
similarity between the answers is not very high. Despite setting the temperature (randomness) of the
LLMs to a low value of 0.5, there is still significant instability in the responses. Multiple query runs were
conducted and the best answer by comparison was selected to reduce the randomness of the outputs.
However, there remains a possibility that the model could produce better or worse results with the
same query.</p>
      <p>Parameter settings. In this research, parameter values were chosen based on commonly used values.
For example, at the document splitting stage, the chunk size was set to 256 and the chunk overlap was
set to 20 but they could be any value; the maximum length and the maximum new tokens were set to
16384 and 8096 respectively. These parameter configurations have the potential to afect the retriever’s
quality or LLM’s ability to generate the answers, which in turn afects the final result. Therefore, it is
possible that the project did not achieve optimal results due to these settings.</p>
      <p>Document input. One noteworthy feedback point from the domain expert is that the final ontology
lacks certain concepts such as water demand. However, we subsequently found that the original input
document doesn’t extensively cover the topic of water demand. This suggests that the input document
used in this paper may lack suficient information. The chosen documents must fully cover the topics
that are expected to be included in the ontology.</p>
      <p>Full automation of the process. While the methodology used in this paper facilitates the generation
of a basic version of the ontology within minutes, the final ontology still needs to undergo fine-tuning
by the domain expert to ensure accuracy and relevance before publication and oficial use. The first
version of the ontology generation process is fully automated, but human intervention is still required
afterward.</p>
      <p>Utilizing multiple LLM architectures. Based on the results, the model’s performance may vary at
diferent steps of the process. No single model consistently excels or fails in all steps. Therefore, using
models that perform better at certain steps and combining them based on their respective strengths at
each step could improve the overall accuracy of the final result.</p>
      <p>Despite these constraints, future work could focus on evaluations. In selecting methods, multiple
criteria are being developed to choose the best ontology construction methods. These criteria are also
work in progress, standard rules about criteria selection could be introduced in the future. Additionally,
the current result evaluation process lacks objectivity. For example, the time is not precisely measured
by the timer. In the future, multiple domain experts should be incorporated in the evaluation process,
and evaluation criteria should be quantified.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In conclusion, this paper focuses on the use of LLMs for ontology learning, combining manual and
automated methods into a new approach to ontology construction. The UPON Lite Ontology was
chosen as the foundation for the query input to the LLMs. This hybrid ontology construction approach
increases the generation speed, which significantly reduces the production time, and the accessibility of
ontology construction to non-domain experts as it only requires the involvement of domain experts in
the fine-tuning step.</p>
      <p>This paper tested multiple LLMs to provide insights into the capabilities of diferent LLM variants in
ontology learning within the drinking water distribution network domain. Results from four LLMs are
evaluated both qualitatively and quantitatively to provide a more comprehensive understanding of the
overall quality of answers.</p>
      <p>Currently, the final ontology requires refinement by domain experts before it can be used. One
suggestion for the domain expert is that the refinement could possibly start with adapting the query so
that the LLMs only return the most important concepts with detailed explanations.</p>
      <p>The choice of the LLM is crucial. While no LLM can produce answers identical to those of a domain
expert, using a more advanced LLM that trained on superior data increases the likelihood of achieving
a better result. In this paper, the latest OPENAI model, gpt-4-turbo, has superior overall performance
compared to other models. In the case of budget constraints, 7b-beta remains a viable option.</p>
      <p>Last but not least, the user query plays an important role in the answer generation, as it directly
influences the quality and format of the answer. During the query generation process, it’s important to
continually experiment to find the right balance of the query complexity. If the query is too complex,
LLMs may struggle to process it correctly. Conversely, if the query is too simple, LLMs may miss the
underlying concept. The development of the query template and a query generation strategy can greatly
improve this process, facilitating the creation of queries that are both informative and comprehensible
for LLMs to interpret accurately.</p>
      <p>Acknowledgements. This work was supported by The Dutch Research Council (NWO), in the
scope of the Digital Twin for Evolutionary Changes in water networks (DiTEC) project, number 19454.
[19] P. Mateiu, A. Groza, Ontology engineering with large language models, arXiv preprint
arXiv:2307.16699 (2023).
[20] J. Zhao, G. Hafar, E. Shareghi, Generating synthetic speech from spokenvocab for speech
translation, arXiv preprint arXiv:2210.08174 (2022).
[21] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih,
T. Rocktäschel, et al., Retrieval-augmented generation for knowledge-intensive nlp tasks, Advances
in Neural Information Processing Systems 33 (2020) 9459–9474.
[22] N. R. Council, D. on Earth, L. Studies, W. Science, T. Board, C. on Public Water Supply
Distribution Systems, Assessing, R. Risks, Drinking water distribution systems: Assessing and reducing
risks, National Academies Press, 2007.
[23] S. Shahsavani, A. Mohammadpour, M. R. Shooshtarian, H. Soleimani, M. R. Ghalhari,
A. Badeenezhad, Z. Baboli, R. Morovati, P. Javanmardi, An ontology-based study on water
quality: probabilistic risk assessment of exposure to fluoride and nitrate in shiraz drinking water,
iran using fuzzy multi-criteria group decision-making models, Environmental Monitoring and
Assessment 195 (2023) 35.
[24] L. Ahmedi, E. Jajaga, F. Ahmedi, An ontology framework for water quality management., SSN@</p>
      <p>ISWC 1063 (2013) 35–50.
[25] P. Escobar, M. d. M. Roldán-García, J. Peral, G. Candela, J. Garcia-Nieto, An ontology-based
framework for publishing and exploiting linked open data: A use case on water resources management,
Applied Sciences 10 (2020) 779.
[26] N. T. W. Khin, N. N. Yee, Query classification based information retrieval system, in: 2018
International conference on intelligent informatics and biomedical sciences (ICIIBMS), volume 3,
IEEE, 2018, pp. 151–156.
[27] J. Carbonell, J. Goldstein, The use of mmr, diversity-based reranking for reordering documents
and producing summaries, in: Proceedings of the 21st annual international ACM SIGIR conference
on Research and development in information retrieval, 1998, pp. 335–336.
[28] V. Liu, L. B. Chilton, Design guidelines for prompt engineering text-to-image generative models,
in: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022, pp.
1–23.
[29] J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer-Smith, D. C.</p>
      <p>Schmidt, A prompt pattern catalog to enhance prompt engineering with chatgpt, arXiv preprint
arXiv:2302.11382 (2023).
[30] L. Giray, Prompt engineering with chatgpt: a guide for academic writers, Annals of biomedical
engineering 51 (2023) 2629–2633.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Filippo</surname>
          </string-name>
          , G. Vito,
          <string-name>
            <given-names>S.</given-names>
            <surname>Irene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Simone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gualtiero</surname>
          </string-name>
          ,
          <article-title>Future applications of generative large language models: A data-driven case study on chatgpt</article-title>
          ,
          <source>Technovation</source>
          <volume>133</volume>
          (
          <year>2024</year>
          )
          <fpage>103002</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gruber</surname>
          </string-name>
          ,
          <source>What is an ontology</source>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F. N.</given-names>
            <surname>Al-Aswadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. Y.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <article-title>Automatic ontology construction from text: a review from shallow to deep learning trend</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>53</volume>
          (
          <year>2020</year>
          )
          <fpage>3901</fpage>
          -
          <lpage>3928</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sattar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S. M.</given-names>
            <surname>Surin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Mahmood</surname>
          </string-name>
          ,
          <article-title>Comparative analysis of methodologies for domain ontology development: A systematic review</article-title>
          ,
          <source>International Journal of Advanced Computer Science and Applications</source>
          <volume>11</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Alfaifi</surname>
          </string-name>
          ,
          <article-title>Ontology development methodology: a systematic review and case study</article-title>
          ,
          <source>in: 2022 2nd International Conference on Computing and Information Technology (ICCIT)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>446</fpage>
          -
          <lpage>450</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>A. De Nicola</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Missikof</surname>
          </string-name>
          ,
          <article-title>A lightweight methodology for rapid ontology engineering</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>59</volume>
          (
          <year>2016</year>
          )
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Suárez-Figueroa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernandez-Lopez</surname>
          </string-name>
          ,
          <article-title>The neon methodology framework: A scenario-based methodology for ontology development</article-title>
          ,
          <source>Applied ontology 10</source>
          (
          <year>2015</year>
          )
          <fpage>107</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Farghaly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Soman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Collinge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Manu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Mosleh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cheung</surname>
          </string-name>
          ,
          <article-title>Construction safety ontology development and alignment with industry foundation classes (ifc</article-title>
          ),
          <source>Electronic Journal of Information Technology in Construction 27</source>
          (
          <year>2022</year>
          )
          <fpage>94</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Sprenger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Maurer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          , U. Rüppel,
          <article-title>Building product ontology: core ontology for linked building product data</article-title>
          ,
          <source>Automation in Construction</source>
          <volume>133</volume>
          (
          <year>2022</year>
          )
          <fpage>103927</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z. Z.</given-names>
            <surname>Zulkipli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Maskat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. H. I.</given-names>
            <surname>Teo</surname>
          </string-name>
          ,
          <article-title>A systematic literature review of automatic ontology construction</article-title>
          ,
          <source>Indones. J. Electr. Eng. Comput. Sci</source>
          <volume>28</volume>
          (
          <year>2022</year>
          )
          <fpage>878</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alobaidi</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Malik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sabra</surname>
          </string-name>
          ,
          <article-title>Linked open data-based framework for automatic biomedical ontology generation</article-title>
          ,
          <source>BMC bioinformatics 19</source>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saberi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Semantic-based lightweight ontology learning framework: a case study of intrusion detection ontology</article-title>
          ,
          <source>in: Proceedings of the international conference on web intelligence</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1171</fpage>
          -
          <lpage>1177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kaushik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <article-title>Automatic relationship extraction from agricultural text for ontology construction</article-title>
          ,
          <source>Information processing in agriculture 5</source>
          (
          <year>2018</year>
          )
          <fpage>60</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Maedche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <article-title>Ontology learning</article-title>
          , in: Handbook on ontologies, Springer,
          <year>2004</year>
          , pp.
          <fpage>173</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <article-title>A language model based framework for new concept placement in ontologies (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rebboud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tailhardat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lisena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <article-title>Can llms generate competency questions?</article-title>
          ,
          <source>in: ESWC</source>
          <year>2024</year>
          , Extended Semantic Web Conference,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Babaei Giglou</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>S. Auer,</given-names>
          </string-name>
          <article-title>Llms4ol: Large language models for ontology learning</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2023</year>
          , pp.
          <fpage>408</fpage>
          -
          <lpage>427</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Kommineni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>König-Ries</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samuel</surname>
          </string-name>
          ,
          <article-title>From human experts to machines: An llm supported approach to ontology and knowledge graph construction</article-title>
          ,
          <source>arXiv preprint arXiv:2403.08345</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>