<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Assessing the Capability of Large Language Models for Domain-Specific Ontology Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Sofia</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lippolis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Javad Saeedizade</string-name>
          <email>javad.saeedizade@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robin Keskisärkkä</string-name>
          <email>robin.keskisarkka@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aldo Gangemi</string-name>
          <email>aldo.gangemi@unibo.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eva Blomqvist</string-name>
          <email>eva.blomqvist@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Giovanni Nuzzolese</string-name>
          <email>andrea.nuzzolese@cnr.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ISTC-CNR</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Linköping University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Large Language Models (LLMs) have shown significant potential for ontology engineering. However, it is still unclear to what extent they are applicable to the task of domain-specific ontology generation. In this study, we explore the application of LLMs for automated ontology generation and evaluate their performance across diferent domains. Specifically, we investigate the generalizability of two state-of-the-art LLMsDeepSeek and o1-preview, both equipped with reasoning capabilitiesby generating ontologies from a set of competency questions (CQs) and related user stories. Our experimental setup comprises six distinct domains carried out in existing ontology engineering projects and a total of 95 curated CQs designed to test the models reasoning for ontology engineering. Our findings show that with both LLMs, the performance of the experiments is remarkably consistent across all domains, indicating that these methods are capable of generalizing ontology generation tasks irrespective of the domain. These results highlight the potential of LLM-based approaches in achieving scalable and domain-agnostic ontology construction and lay the groundwork for further research into enhancing automated reasoning and knowledge representation techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>ontology engineering</kwd>
        <kwd>large language models</kwd>
        <kwd>ontology generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid evolution of large language models (LLMs) has transformed numerous areas of natural
language processing as well as knowledge engineering, and ontology engineering is no exception.
Ontologiesformal representations of knowledge that enable semantic interoperabilityhave long
been essential in domains ranging from healthcare to environmental science. However, traditional
ontology engineering, as shown by requirement-based methods such as eXtreme Design [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
remains a labour-intensive, expertise-driven task, often hindering timely and scalable knowledge
modelling. Recent studies, such as the ones by Saeedizade and Blomqvist [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Lippolis
et al. [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] suggest a potential for using LLMs in ontology engineering, where LLMs ofer a
promising addition by automating key aspects of ontology construction through their language
understanding and reasoning capabilities.
      </p>
      <p>Building on these earlier studies that showed the feasibility of LLM-supported ontology
generation, this paper extends the investigation by suggesting an ontology generation prompt for
reasoning models and examining the generalizability of automated ontology generation across
diverse and more specific domains. Neither of these aspects were so far addressed in previous
work. In particular, we explore the performance of two state-of-the-art LLMsDeepSeek and
OpenAI o1-previewwhich are equipped with reasoning capabilities. By leveraging a dataset</p>
      <p>(CC BY 4.0).
CEUR</p>
      <p>ceur-ws.org
comprising 95 carefully curated competency questions paired with corresponding user stories,
from specific knowledge domains, our approach systematically tests whether these models can
generate good ontology drafts based on requirements that capture domain-specific needs.</p>
      <p>The contributions of our study are the following: First, we present an automated pipeline
for ontology generation, including a prompt strategy, that harnesses the advanced reasoning
capabilities of the most recent commercial and open-source LLMs to interpret natural language
requirements and suggest corresponding ontological modules. Second, we evaluate this approach
across six distinct knowledge domains, providing empirical evidence of its consistent performance
and domain-agnostic potential1.</p>
      <p>The remainder of this paper is organized as follows. Section 2 surveys the relevant literature in
LLM-based knowledge engineering. Section 3 establishes the preliminary definitions and concepts
used in the remainder of the paper. In Section 4, we describe our methodology, including dataset
creation and details about the prompt. Section 5 shows our experimental setup and evaluation
framework. Section 6 presents and discusses our results, Section 7 addresses the discussion, and
Section 8 focuses on limitations and potential risks. Finally, Section 9 concludes the paper and
outlines directions for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Ontology generation based on requirements using large language models has been recently
studied. In Benson et al.[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the authors present a study on producing outputs that are consistent
with BFO through the chat interface of ChatGPT (with GPT-4). However, this work was
based on a few examples and was solely focused on LLMs with non-reasoning capabilities,
noting reasoning models needed a new, ad-hoc analysis. In Saeedizade and Blomqvist [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
LLM-generated ontologies from a set of competency questions from a semantic web course were
compared to student performance in that course. The domain of those ontology stories was
limited to ontology stories designed for that specific course. The study by Lippolis et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] on
ontology generation was based on the African Wildlife Ontology, a gold standard outlined in
Potoniec et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Additionally, Doumanas et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] fine-tuned LLMs to generate ontologies by
considering the domain of the ontology. However, the comparison of its performance between
diferent domains is not mentioned in the study. In the same way, the work by Lippolis et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
addresses LLM-based ontology generation on a dataset with 100 CQs of diferent domains, but
doesn’t address the LLMs’ performance on the involved domains and focus on three ontology
homework from a semantic web course, the one presented in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this study, it is concluded
that OpenAI’s o1-preview model is the best performing one with respect to non-reasoning models.
For what concerns studies about domain-specific applications with LLMs, Fathallah [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] focuses
on ontology generation in life sciences, Huang et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] in drinking water distribution network
and Xu et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] evaluate retrieval augmented generation for E-commerce and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] on drug
indication, assessing the capability of LLMs in generating knowledge graphs from text using
predefined ontologies. Val-Calvo et al. propose a pipeline to automate ontology development
and knowledge graph generation with GPT-4 from CSV files from the e-commerce domain to
streamline the engineering process in enterprise settings [12]. From these studies, it is possible
to conclude there is no systematic evaluation of one method across diferent ontology domains of
varying complexity, especially with diferent LLMs with reasoning capabilities.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminaries</title>
      <p>
        In this section, we present and define terminology that will be used throughout this paper,
building on the one already outlined in Lippolis et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
1Data, code and other material for this study is available at this link: https://github.com/dersuchendee/
Domain-OntoGen
      </p>
      <p>Ontology Generation: We define ontology generation as the process of creating formal
representations of knowledge by identifying the necessary classes, properties, and axioms to capture
domain-specific concepts and their relationships. This process can be performed manually or
supported by automated techniques, such as LLMs, to draft ontologies from natural language
inputs in the form of requirements.</p>
      <p>
        Modelling Competency Question: Following the definitions outlined in previous work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a
CQ is considered modelled in an ontology if and only if a SPARQL query exists to extract the
answer of the CQ from the ontology irrespective of the quality of the ontology modelling or
adherence to optimal modelling practices.
      </p>
      <p>
        Minor issue: Similar to [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], for a pair ontology and a competency question, if the ontology
includes all necessary elements (named classes or properties) except for only one object property
or only one data property, and adding this single element would make the CQ modelled, this is
considered a minor issue in modelling the CQ.
      </p>
      <p>Reasoning Capability of LLMs: The reasoning capability of LLMs refers to their ability to
generate logically structured and coherent responses by recognizing and using patterns learned
from large amounts of text. This definition includes simulating various forms of reasoningsuch
as deducing, generalizing from examples, and drawing analogiesoften by processing information
step-by-step (a process sometimes called chain-of-thought, which is natively employed in models
like o1-preview). Importantly, LLM reasoning is based on statistical patterns rather than true
human understanding or deliberate logic.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>In this section, we outline the methodology employed in this study. We begin by describing
the dataset creation process and its statistical properties, followed by the explanation of the
prompting technique used in this work. Finally, we present the pipeline for ontology generation.</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset Creation</title>
        <p>To evaluate ontologies generated by LLMs with reasoning capabilities, we constructed a dataset
comprising 95 competency questions (CQs) with 12 ontology stories across 6 distinct domains.
Each dataset entry includes: (i) a CQ, (ii) a user story (providing the contextual background),
(iii) the binary label of the dificulty level of the CQ (easy or hard), and (iv) the domain of the
ontology. Table 1 shows statistics for the dataset.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Dataset composition</title>
          <p>The CQs were extracted from four sources: Onto-DESIDE 2 (domain: Circular Economy), the
Polifonia Project (domains: Music and Events)[13], the WHOW Project (domain: Water and
Health) [14] and AquaDiva (domains: Microbe Habitat and Carbon and Nitrogen Cycling)[15].
In particular, these derive from carefully curated large-scale projects, with the fourth one being
a semi-automatically annotated dataset.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. CQ classification</title>
          <p>To ensure balanced and comparable evaluations, each CQ was classified into one of two categories,
“Easy” and “Hard”, based on the complexity required for its formal representation. An LLM,
o1-preview, was used to assign these labels by estimating the dificulty level of classes and
properties needed to model the CQ. Specifically, if a CQ required at most 2 classes and 1
property (either a data or an object property), it was classified as “Easy”; otherwise, it was
classified as “Hard”. The labels were then checked manually by an ontology engineer. This
labelling strategy was implemented to maintain a consistent distribution of CQ dificulties across
all domains.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.1.3. Manual annotations</title>
          <p>
            The labelling of the dataset for the verification of CQs has been carried out manually by two
ontology engineers. Following the error classification outlined in previous work [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ], the output
ontologies were considered to be modelled, not modelled, or not modelled due to a minor error.
          </p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Prompting Techniques</title>
        <p>
          A prompt serves as an instruction for LLMs to generate responses to a given input. The prompt
employed in this study follows a few-shot prompting technique, with some examples, building on
and distinguishing it from previous approaches which mainly employed a decomposed prompting
technique [
          <xref ref-type="bibr" rid="ref2 ref3 ref4">3, 4, 2</xref>
          ]. This choice is motivated by the observation that reasoning models have
been shown in Lippolis et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] to perform more efectively with minimal instructions rather
than being explicitly guided to think step by step. After trials with and without a few-shot
example, the performance of the few-shot has proven more accurate than the other. The prompt
is structured into three key components:
• Instruction: A description of the task (ontology generation) and its objective (ontology
must model the input CQ).
• Example: A Turtle syntax sample ontology model corresponding to a CQ and its associated
ontology story, illustrating an ideal response.
• Input: A designated section where a new CQ and ontology story are inserted, prompting
the LLM to generate an ontology model accordingly.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Ontology Generation</title>
        <p>
          We used the Independent Ontology Generation method described in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In this approach, each
CQ and its associated ontology story are provided to an LLM through a prompt to generate
the corresponding ontology. This method treats each CQ as an independent unit, enabling a
precise evaluation of the ontology generation process on a per-question level. By handling CQs
in isolation, the impact of the settings can be assessed.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiment Setup</title>
      <p>In this section, we describe the experimental setup. To run the experiments, we used Azure API
to call o1-preview and the original DeepSeek API with default hyperparameters. The prompt
with a fixed example (few-shot) is appended with ontology story and CQ and sent to the API to
get the resulting output file. The results are saved in a Turtle file.</p>
      <p>LLMs: In the present study, we have selected two large language models with reasoning
capabilities: DeepSeek and OpenAI o1-preview. This selection was motivated by the desire to
perform a comparative analysis between the current state-of-the-art closed-source and
opensource LLMs. The OpenAI o1-preview model was accessed through the Microsoft Azure API,
whereas DeepSeek was used according to the specifications detailed in its oficial documentation 3.</p>
      <p>Prompt: The prompt used in this work is available in the supplementary materials on
GitHub4. Initially, several prompt configurations were evaluated, including versions with
and without examples (i.e., few-shot and zero-shot settings). The results demonstrated that
prompts incorporating examples consistently outperformed those without, justifying their use.
Furthermore, the impact of varying the complexity of the examples was examined in a separate
experiment involving multiple ontology generation tasks. In the interest of conciseness, the
detailed results of these additional experiments are not reported here.</p>
      <p>Hyperparameters: The default hyperparameters for the LLMs, including temperature and
penalty, were employed throughout this study.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>Overall, both the o1-preview and DeepSeek models exhibited high and comparable accuracy.
Initially, nine and ten CQs remained unmodeled for the o1-preview and DeepSeek models,
respectively, out of a total of 95 CQs. However, when minor modelling issues are excluded, the
number of unmodeled CQs decreases to eight for the o1-preview model and five for the DeepSeek
model.</p>
      <p>Table 2 provides a breakdown of mistakes on both easy and hard CQs by domain. Regarding
the o1-preview scores, performance was consistent across all domains, suggesting that the
ontology generation method is broadly applicable rather than being limited to a specific domain.
With slight diferences, although DeepSeek generally demonstrated similar consistency, the scores
for the Events domain were noticeably lower, indicating a need for further investigation into the
model’s performance in that particular domain.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion</title>
      <p>
        Our results reveal a performance trade-of between the two models. Specifically, while o1-preview
tends to commit more “minor” errors, DeepSeek is prone to a higher incidence of incomplete
modellings. When we consider cases with “minor” mistakes equivalent to complete modellings,
both models demonstrate similar performance across the domains. This finding suggests that
diferences in error types may not translate into significant disparities in overall outcomes.
3https://api-docs.deepseek.com/
4https://github.com/dersuchendee/Domain-OntoGen
Domain generalizability. Our work extends previous findings by Lippolis et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] by showing
that reasoning models working for a specific domain can generalize efectively across other
datasets. Therefore, the improvements achieved in earlier studies are not limited to a single
domain, afirming the broader applicability of the developed techniques.
      </p>
      <p>Reasoning models architecture. One possible explanation for the similar outputs observed
from both DeepSeek and o1-preview is their reasoning capabilities. It is plausible that these
models benefit from mutual fine-tuning or shared architectural strengths. The collective evidence
points to a significant enhancement in reasoning with respect to models with non-reasoning
capabilities, which appears to underlie their comparable performance.</p>
      <p>Dificulty levels of the requirements. Furthermore, the analysis indicates that both models
handle hard and easy CQs with similar performance. This observation challenges the common
assumption that increased question complexity necessarily results in lower scores. Instead, our
ifndings suggest that the models are robust enough to process questions of varying complexity
without a marked performance penalty. Nonetheless, this is limited to only one evaluation
criterion.</p>
      <p>Error analysis. Finally, a closer look at the errors reveals that the majority occur solely within
the Events domain. The corresponding CQs in this category are much shorter than those in
other domains, but there are multiple user stories that serve as requirements. This pattern could
suggest that longer, more descriptive CQs could provide additional context that aids the models
in processing and reasoning, leading to improved performance, especially if multiple scenarios
are provided. Therefore, the lack of standardization in requirements is also fundamental when
dealing with automated techniques.</p>
      <p>Overall, these insights highlight the importance of both model architecture and the design
and selection of requirements in achieving balanced and generalizable performance.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Limitations and Risks</title>
      <p>One potential risk associated with this study is dataset leakage, which may impact the quality
and reliability of the LLM-generated ontology. Specifically, in certain domains, an LLM may
have encountered parts of the dataset during pretraining, while in others, it relies solely on
its reasoning capabilities. To the best of our knowledge, the CQs and ontology modules were
not publicly available in a single dataset, reducing the likelihood that LLMs were trained on
these elements in direct succession as consecutive tokens. Another limitation of this work is the
exclusive use of reasoning-based LLMs. These models, while beneficial for ontology generation
tasks, are generally resource-intensive and less widely available compared to standard LLMs.
Additionally, the scope of this study is restricted to only two LLMs, which may limit the
generalizability of the findings.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Conclusion</title>
      <p>In this study, we investigated the potential of two LLMs, DeepSeek and o1-preview, with reasoning
capabilities for automated ontology generation across six diverse domains. Our experiments,
conducted on a dataset of 95 competency questions paired with user stories, reveal that both
LLMs similarly exhibit robust and consistent performance across all domains and dificulty levels
of the CQs. Overall, this work lays a foundation for further advancements in automated ontology
engineering, highlighting the potential of LLMs to deliver scalable, domain-agnostic knowledge
representation. Future research will build on these results to explore additional LLMs.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgments</title>
      <p>This project has received funding from the European Unions Horizon Europe research and
innovation programme under grant agreements no. 10105- 8682 (Onto-DESIDE) and 101070588
(HACID), and is supported by the strategic research area Security Link. Additional financial
support to this project was provided by NextGenerationEU under NRRP Grant agreement n.
MUR IR0000008 - FOSSR ( CUP B83C220- 03950001 ). This work was also supported by the
PhD scholarship “Discovery, Formalisation and Re-use of Knowledge Patterns and Graphs for the
Science of Science”, funded by CNR-ISTC through the WHOW project (EU CEF programme
grant agreement no. INEA/CEF/ICT/ A2019/2063229). Finally we thank OpenAI’s Researcher
Access Program Grant for the API credits.</p>
    </sec>
    <sec id="sec-11">
      <title>Disclosure of Interests</title>
      <p>The authors have no competing interests to declare that are relevant to the content of this
article.</p>
    </sec>
    <sec id="sec-12">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4 and Grammarly in order to:
Grammar and spelling check. After using these tool(s)/service(s), the author(s) reviewed and
edited the content as needed and take(s) full responsibility for the publications content.
with medical context, in: 15th International Semantic Web Applications and Tools for
Healthcare and Life Sciences (SWAT4HCLS 2024) (to appear), 2024.
[12] M. Val-Calvo, M. Egaña Aranguren, J. Mulero-Hernández, G. Almagro-Hernández, P.
Deshmukh, J. A. Bernabé-Díaz, P. Espinoza-Arias, J. L. Sánchez-Fernández, J. Mueller,
J. T. Fernández-Breis, Ontogenix: Leveraging large language models for enhanced
ontology engineering from datasets, Information Processing &amp; Management 62 (2025)
104042. URL: https://www.sciencedirect.com/science/article/pii/S0306457324004011. doi:https:
//doi.org/10.1016/j.ipm.2024.104042.
[13] J. de Berardinis, V. A. Carriero, N. Jain, N. Lazzari, A. Meroño-Peñuela, A. Poltronieri,
V. Presutti, The polifonia ontology network: Building a semantic backbone for musical
heritage, in: International Semantic Web Conference, Springer, 2023, pp. 302–322.
[14] A. S. Lippolis, G. Lodi, A. G. Nuzzolese, The water health open knowledge graph, Scientific</p>
      <p>Data 12 (2025) 274.
[15] A. Algergawy, H. Hamed, S. Thiel, B. König-Ries, Towards semantic annotation for scientific
datasets, in: European Semantic Web Conference, Springer, 2024, pp. 164–167.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Daga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          , E. Blomqvist,
          <article-title>extreme design with content ontology design patterns</article-title>
          ,
          <source>in: Proc. Workshop on Ontology Patterns, CEUR-WS</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Saeedizade</surname>
          </string-name>
          , E. Blomqvist,
          <article-title>Navigating ontology development with large language models</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Lippolis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ceriani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zuppiroli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          , Ontogenia:
          <article-title>Ontology generation with metacognitive prompting in large language models</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>265</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Lippolis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Saeedizade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Keskisärkkä</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zuppiroli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ceriani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Blomqvist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <article-title>Ontology generation using large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2503.05388</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Benson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sculley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liebers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Beverley</surname>
          </string-name>
          ,
          <article-title>My ontologist: Evaluating bfo-based ai for definition support</article-title>
          ,
          <source>arXiv preprint arXiv:2407.17657</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Potoniec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wiśniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ławrynowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Keet</surname>
          </string-name>
          ,
          <article-title>Dataset of ontology competency questions to sparql-owl queries translations</article-title>
          ,
          <source>Data in brief 29</source>
          (
          <year>2020</year>
          )
          <fpage>105098</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Doumanas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Soularidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spiliotopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vassilakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kotis</surname>
          </string-name>
          ,
          <article-title>Fine-tuning large language models for ontology engineering: A comparative analysis of gpt-4 and mistral</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>15</volume>
          (
          <year>2025</year>
          )
          <fpage>2146</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Fathallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Algergawy,</surname>
          </string-name>
          <article-title>Llms4life: Large language models for ontology learning in life sciences</article-title>
          ,
          <source>arXiv preprint arXiv:2412</source>
          .
          <year>02035</year>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Karabulut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Degeler</surname>
          </string-name>
          ,
          <article-title>Large language model for ontology learning in drinking water distribution network domain (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          ,
          <article-title>Knowledge graph-enhanced retrieval augmented generation for e-commerce (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alharbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dobriy</surname>
          </string-name>
          , W. ajewska, L. Menotti,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Saeedizade</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Dumontier, Exploring the role of generative ai in constructing knowledge graphs for drug indications</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>