<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Special Session on Harmonising Generative AI and Semantic Web Technologies, November</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>LLMs for Ontology Engineering: A landscape of Tasks and Benchmarking challenges</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Garijo</string-name>
          <email>daniel.garijo@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>María Poveda-Villalón</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elvira Amador-Domínguez</string-name>
          <email>elvira.amador@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ZiYuan Wang</string-name>
          <email>ziyuan.wang@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raúl García-Castro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Corcho</string-name>
          <email>oscar.corcho@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Large Language Models, Ontology Engineering</institution>
          ,
          <addr-line>Benchmark, Challenges</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Politécnica de Madrid</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>13</volume>
      <issue>2024</issue>
      <abstract>
        <p>tasks. Large Language Models (LLMs) have emerged as a powerful technology for text generation tasks, showing promise in supporting the Ontology Engineering (OE) process. In this paper, we review current research on applying LLMs to OE tasks, aiming to identify commonalities and gaps in the state of the art. We categorize these eforts using the Linked Open Terms (LOT) methodology, characterizing them by their input and expected output. From this analysis, we highlight key challenges when creating benchmarks to evaluate LLM performance in OE 1https://chat.openai.com/ 0000-0003-0454-7145 (D. Garijo); 0000-0003-3587-0367 (M. Poveda-Villalón); 0000-0001-6838-1266 (E. Amador-Domínguez); 0009-0000-6228-4713 (Z. Wang); 0000-0002-0421-452X (R. García-Castro); 0000-0002-9260-0753 (O. Corcho) Proceedings</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Ontologies are a key component of Knowledge Engineering for integrating, validating and reasoning
with data in Knowledge Graphs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, developing ontologies is a challenging and time consuming
task. According to existing methodologies for ontology development [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Knowledge Engineers should
follow an iterative process to 1) distill the knowledge of the target domain by interviewing experts
and understand their data-driven requirements, 2) implement a shared conceptualization by assessing
existing standard ontologies described in the domain and validating it against the requirements, 3) make
the ontology available on the web in both human and machine-readable manner, and 4) assess and
maintain the ontology by addressing any new requirements that may arise from its use. While diferent
tools have been developed by the scientific community to assist in the Ontology Engineerning process
(e.g., for formalizing tests to assess requirements [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], creating human-readable documentation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
ontology assessment [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], etc.) a significant manual efort is still required from knowledge engineers
for conceptualizing, reusing and validating existing ontologies.
      </p>
      <p>
        In recent years, Large Language Models (LLMs) [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ] have emerged as a disruptive AI technology
for text generation tasks. On the one hand, LLMs have revolutionized the state of the art by providing
impressive results in challenging AI tasks such as code generation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], question answering [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] or
text summarization [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and becoming easy to adapt as chat bots such as ChatGPT.1 On the other hand,
LLMs have limited reasoning skills [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], hallucination problems (i.e., producing inaccurate answers and
information) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], lack transparency when providing results [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and present bias problems [15].
      </p>
      <p>A number of works have started using LLMs for aiding developers in ontology engineering tasks
(e.g., proposing competency questions [16], learning ontologies from text [17], aligning concepts to
existing taxonomies [18], etc.). However, the tasks addressed in existing works are usually defined in
an heterogeneous manner, with diferent scope, inputs and expected outputs. In this paper we provide
an overview of existing Ontology Engineering tasks addressed in the state of the art and map them
(O. Corcho)</p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>Ontology requirements specification</p>
      <p>EUOxsnpet.resrDtsevel. Ont. Devel.</p>
      <p>Ont. Devel.</p>
      <p>Users
Experts</p>
      <p>Ontology implementation</p>
      <p>Ont. Devel.</p>
      <p>Ontology publication
Ont. Devel.</p>
      <p>Users
Experts</p>
      <p>Ont. Devel.</p>
      <p>Ont. Devel.</p>
      <p>Users</p>
      <p>Experts
Ont. Devel.</p>
      <p>
        Users
Experts
Ontology
evaluation
Evaluated
ontology
...
against the diferent phases described in the Linked Open Terms (LOT) methodology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In addition,
we characterize each diferent task by their expected input and output. Our work helps characterizing
existing tasks, describes existing gaps and discusses the main challenges when creating reference
benchmarks for evaluation.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Mapping LLM for OE Tasks to the LOT methodology</title>
      <p>Ontology development projects may involve a substantial number of activities. For example, the NeOn
methodology identifies 10 processes and 49 activities involved in ontology engineering [ 19]. Some
activities are carried out almost in every ontology development, for example ontology evaluation,
while others are needed only in some specific cases, for example ontology customization. Ontology
development methodologies, like Linked Open Terms (LOT), NeOn , METHONTOLOGY [20], DILIGENT
[21] and SAMOD [22] among others, orchestrate the execution of ontology engineering activities in
a guided way that usually involves requirements specification (written as Competency Question or
afirmative statements), implementation and evaluation as core steps. We take the LOT workflow
as basis for our analysis as it considers the core steps from traditional methodologies and extends
them with additional steps for ontology publication and maintenance. Figure 1 depicts the core phases
included in LOT (requirements specification, implementation, publication and maintenance) and main
activities included in each phase. Boxes entitled “...” in Figure 1 group activities from the methodology
that are not considered for this analysis due to the nature of the activity, e.g., proposing a candidate
release to be published, or describing the purpose of the ontology.</p>
      <p>In order to categorize existing works using LLMs for OE tasks, we reviewed and mapped each
approach to the activities from the LOT methodology. During this process, we considered which input
is given to the LLM (or system described in each work) and which output is expected by each approach,
so as to select the corresponding LOT activity and characterize each task. For example, if the input is a
set of competency questions and the output is OWL code, the task would be mapped under “Ontology
encoding”, while if the output is a diagram or other conceptualization formalism, it would be mapped
to “Ontology conceptualization”. Table 1 shows the results of our mapping, outlining the input and
output of each approach.</p>
      <p>As described in Table 1, most of the existing eforts on applying LLMs to OE are focused on the initial
stages: requirement specification and implementation. Regarding requirements specification, existing
approaches focus mostly on generating Competency Questions from text, like Kommineni et al. [26]
and Antia and Keet [28]. Other works, e.g., Alharbi et al. [23], Rebboud et al. [24] and Ciroku et al. [25],</p>
      <sec id="sec-2-1">
        <title>Requirements</title>
      </sec>
      <sec id="sec-2-2">
        <title>Specification</title>
      </sec>
      <sec id="sec-2-3">
        <title>Implementation</title>
      </sec>
      <sec id="sec-2-4">
        <title>Publication</title>
      </sec>
      <sec id="sec-2-5">
        <title>Maintenance</title>
        <sec id="sec-2-5-1">
          <title>LOT Task</title>
          <p>Purpose, scope,
and non-functional
requirement writing
Functional requirements
writing</p>
          <p>CQ writing
Requirement improvement
Requirement formalization</p>
          <p>Reuse
Conceptualization</p>
          <p>Encoding</p>
          <p>Evaluation
Documentation
Online publication</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>Resource</title>
          <p>Alharbi et al. [23]
Rebboud et al. [24]</p>
          <p>Ciroku et al. [25]
Kommineni et al. [26]</p>
          <p>Zhang et al. [27]
Antia and Keet [28]</p>
          <p>Tufek et al. [29]</p>
          <p>Lopes et al. [18]
Hertling and Paulheim [30]
Babaei Giglou et al. [31]</p>
          <p>Toro et al. [32]
Mateiu and Groza [33]
Köhler and Neuhaus [34]</p>
          <p>Doumanas et al. [35]
Saeedizade and Blomqvist [36]
Kommineni et al. [26]</p>
          <p>Caufield et al. [37]
Amini et al. [38]
Toro et al. [32]</p>
        </sec>
        <sec id="sec-2-5-3">
          <title>Input</title>
          <p>Ontology file
Ontology file</p>
          <p>Text
User story</p>
          <p>Text and
CLaRO templates</p>
          <p>CQs</p>
          <p>Term,
informal definition
and domain entity
Ontologies and Text</p>
          <p>Source text
and terminologies
Partially completed
ontology term</p>
          <p>Text</p>
          <p>Text
Text (ORSD)</p>
          <p>CQs
Schema and Text</p>
          <p>Ontology file and
additional information
Ontology term</p>
        </sec>
        <sec id="sec-2-5-4">
          <title>Output</title>
          <p>CQs</p>
          <p>SPARQL Queries
Corresponding class
in top-level
ontology
Ontology Mappings</p>
          <p>Taxonomy,
relationships,
and axioms</p>
          <p>JSON/YAML object
with logical definitions
and relationships</p>
          <p>Ontology file
Term definition
analyzed the opposite activity, that is, extracting CQs from existing ontologies, which may be applied
during a reverse ontological engineering process. It can also be observed that Saeedizade and Blomqvist
[36] use LLMs to formalize ontological requirements as SPARQL queries.</p>
          <p>Regarding the activities involved in the ontology implementation phase, we observe that most works
aim to facilitate the ontology encoding activity using LLMs to generate ontology OWL files from text
or CQs (e.g., Mateiu and Groza [33], Doumanas et al. [35], Saeedizade and Blomqvist [36], Kommineni
et al. [26], Caufield et al. [37], Amini et al. [38] and Köhler and Neuhaus [34]). Other works, such as
Babaei Giglou et al. [31], focus on a previous step by generating conceptualizations from texts and
terminologies. In addition, some approaches, like Lopes et al. [18] and Hertling and Paulheim [30], have
explored the use of LLMs for assisting the ontology matching activity, which may also be useful for
ontology reuse (i.e., helping identify candidate terms in existing ontologies). Finally, one work was
identified to aid ontology documentation: Toro et al. [32] present an LLM-powered ontology completion
approach that contributes to the ontology conceptualization task, since it extracts relations and logical
definitions for a given term, but it can also be categorized within ontology documentation, since it
provides a definition for the target input term.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. LLMs for Ontology Engineering tasks: Gaps and challenges</title>
      <p>Table 1 illustrates some of the main gaps in the state of the art. Within the requirement specification
phase, no approaches deal with non-functional requirement specification writing, an often neglected
task when building ontologies. Another important gap is on requirement improvement, where a new
task may be derived from CQ writing in order to use LLMs to enhance a current set of requirements,
given an initial set of CQs.</p>
      <p>
        Next, the implementation phase has received most of the attention so far, especially on
conceptualization and encoding of ontologies. However, there are two notable gaps. First, in the reuse phase no
approaches have been presented to support ontology search, selection or adaptation by using LLMs.
Second, no approaches cover ontology evaluation so far. While [29] generates SPARQL queries from
CQs in order to validate an ontology against its instances, no approaches address the assessment of
the ontology itself. New tasks may be proposed in order to feed the reports from ontology evaluation
tools like OOPS! [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or FOOPS! [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to an LLM in order to translate their suggestions into changes in the
ontology.
      </p>
      <p>Finally, the publication and maintenance phases are barely addressed by the reviewed works, despite
the potential of LLMs to contribute to tasks like generating ontology examples, proposing definitions and
labels in multiple languages, or suggesting changes in an existing OWL file based on a new requirement.</p>
      <p>Given the interest in the community to automatically assist OE activities with LLMs, it is becoming
increasingly important to promote a common evaluation framework and benchmarks for OE in diferent
OE tasks. Works such as Hertling and Paulheim [30] and Alharbi et al. [23] have started moving in this
direction. We have identified three main challenges from the conducted review:</p>
      <p>Challenge 1: Homogenizing OE Task Definitions. As shown in Table 1, diferent approaches
tackle the same LOT tasks with diferent aims. For example, in CQ writing some approaches take plain
text [23], while others take text and external templates [24]. Others reverse engineer the questions with
an ontology file [ 23]. Therefore, the first challenge is to identify precisely each OE task, specifying the
expected input and output for the LLM.</p>
      <p>Challenge 2: Establishing common evaluation methods and metrics for OE tasks. Many
tasks from Table 1 have a clear input and output, but may be challenging to evaluate. For example,
for the ontology conceptualization and encoding tasks, the ontology models generated from LLMs
may be compared to multiple valid representations instead of a single reference ontology. In addition,
diferent similarity metrics may be used to consider similar entities in the ontology graph (e.g., if the
LLM proposes object properties that are synonyms of a reference property) or the completeness of
the proposed model (correct number of classes, properties and data properties). Similarly to the BLEU
score [39] established in other areas like Machine Translation, new metrics may have to be developed
to establish a fair result evaluation for OE tasks.</p>
      <p>Challenge 3: Establishing OE task-specific curated benchmarks . In order to ensure a fair
evaluation of LLM results in diferent tasks, diferent benchmarks must be defined and be adapted to the
diferent inputs and outputs of the respective OE tasks. Some approaches have started working in this
direction [23], e.g., by reusing existing CQ benchmarks like Coral [40], which collect hundreds on CQs
across 14 ontologies. However, the quality among these CQs is heterogeneous (i.e., some CQs may be
ambiguous, or not provide enough context) as they belong to diferent initiatives with no common set of
practices or guidelines for their creation. In addition, existing resources may be subject to contamination,
i.e., resources are ingested as pre-training or fine-tuned data by LLMs. New task-specific benchmarks
should define common guidelines for curators, and ensure that a portion of the benchmark remains
concealed from web crawlers (but available for full evaluations on demand).</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and Future work</title>
      <p>In this paper we provide an overview of existing eforts using (L)LMs for Ontology Engineering tasks,
grouping them according to the diferent phases described in the Linked Open Terms methodology
and characterizing them in terms of the input and output each work proposes. Our eforts highlight
unexplored areas where LLMs may be used to aid OE tasks (e.g., evaluation, documentation, maintenance)
describing the main research challenges to be addressed in order to create reference benchmarks for each
of these tasks. We believe that creating high-quality benchmarks will provide a common framework for
automated evaluation, promoting the use of LLMs for OE tasks and reducing human efort required in
their evaluation. Extending the analysis started in this paper by including methodologies and activities
for building Knowledge Graphs (or other types of ontology exploitation scenarios) will likely be highly
valuable for assessing automated Knowledge Graphs construction processes.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgments</title>
      <p>This work was supported by the grant “SOEL: Supporting Ontology Engineering with Large Language
Models” PID2023-152703NA-I00 funded by MCIN/AEI/ 10.13039/501100011033 and by “ERDF/UE”.
J. Launay, The refinedweb dataset for falcon llm: outperforming curated corpora with web data,
and web data only, arXiv preprint arXiv:2306.01116 (2023).
[15] I. O. Gallegos, R. A. Rossi, J. Barrow, M. M. Tanjim, S. Kim, F. Dernoncourt, T. Yu, R. Zhang, N. K.</p>
      <p>Ahmed, Bias and fairness in large language models: A survey, Computational Linguistics (2024)
1–79.
[16] R. Alharbi, V. Tamma, F. Grasso, T. Payne, An experiment in retrofitting competency questions for
existing ontologies, in: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing,
SAC ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 1650–1658. URL:
https://doi.org/10.1145/3605098.3636053. doi:10.1145/3605098.3636053.
[17] P. Mateiu, A. Groza, Ontology engineering with large language models, in: 2023 25th International
Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), IEEE, 2023,
pp. 226–229.
[18] A. Lopes, J. Carbonera, D. Schmidt, L. Garcia, F. Rodrigues, M. Abel, Using terms and informal
definitions to classify domain entities into top-level ontology concepts: An approach based on
language models, Knowledge-Based Systems 265 (2023) 110385. URL: https://www.sciencedirect.com/
science/article/pii/S0950705123001351. doi:https://doi.org/10.1016/j.knosys.2023.110385.
[19] M. C. Suárez-Figueroa, A. Gómez-Pérez, M. Fernández-López, The NeOn Methodology framework:</p>
      <p>A scenario-based methodology for ontology development, Applied Ontology 10 (2015) 107–145.
[20] M. Fernández-López, A. Gómez-Pérez, N. Juristo, METHONTOLOGY: from ontological art
towards ontological engineering, in: Proceedings of the Ontological Engineering AAAI97 Spring
Symposium Series, American Asociation for Artificial Intelligence, 1997.
[21] H. Pinto, C. Tempich, S. Staab, Ontology engineering and evolution in a distributed world using
diligent, in: S. Staab, R. Studer (Eds.), Handbook on Ontologies, International Handbooks on
Information Systems, Springer Berlin Heidelberg, 2009, pp. 153–176.
[22] S. Peroni, A simplified agile methodology for ontology development, in: OWL: Experiences
and Directions–Reasoner Evaluation: 13th International Workshop, OWLED 2016, and 5th
International Workshop, ORE 2016, Bologna, Italy, November 20, 2016, Revised Selected Papers 13,
Springer, 2017, pp. 55–69.
[23] R. Alharbi, V. Tamma, F. Grasso, T. Payne, An Experiment in Retrofitting Competency Questions for
Existing Ontologies, in: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing,
SAC ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 1650–1658. URL:
https://doi.org/10.1145/3605098.3636053. doi:10.1145/3605098.3636053.
[24] Y. Rebboud, L. Tailhardat, P. Lisena, R. Troncy, Can LLMs generate competency questions?, in:
Springer (Ed.), ESWC 2024, Extended Semantic Web Conference, Special Track on Large Language
Models for Knowledge Engineering, 26-30 May 2024, Hersonissos, Greece, Hersonissos, 2024.
[25] F. Ciroku, J. de Berardinis, J. Kim, A. Meroño-Peñuela, V. Presutti, E. Simperl, RevOnt:
Reverse engineering of competency questions from knowledge graphs via language models,
Journal of Web Semantics 82 (2024) 100822. URL: https://www.sciencedirect.com/science/article/pii/
S1570826824000088. doi:https://doi.org/10.1016/j.websem.2024.100822.
[26] V. K. Kommineni, B. König-Ries, S. Samuel, From human experts to machines: An llm supported
approach to ontology and knowledge graph construction, 2024. URL: https://arxiv.org/abs/2403.
08345. arXiv:2403.08345.
[27] B. Zhang, V. A. Carriero, K. Schreiberhuber, S. Tsaneva, L. S. González, J. Kim, J. de Berardinis,
Ontochat: a framework for conversational ontology engineering using language models, 2024.</p>
      <p>URL: https://arxiv.org/abs/2403.05921. arXiv:2403.05921.
[28] M.-J. Antia, C. M. Keet, Automating the generation of competency questions for ontologies with
agocqs, in: F. Ortiz-Rodriguez, B. Villazón-Terrazas, S. Tiwari, C. Bobed (Eds.), Knowledge Graphs
and Semantic Web, Springer Nature Switzerland, Cham, 2023, pp. 213–227.
[29] N. Tufek, A. Saissre, A. Hanbury, Validating Semantic Artifacts With Large Language Models,
in: Proceedings of the 21th European Semantic Web Conference (ESWC), Krete, Greece, 2024, pp.
24–30.
[30] S. Hertling, H. Paulheim, Olala: Ontology matching with large language models, in: Proceedings of
the 12th Knowledge Capture Conference 2023, K-CAP ’23, Association for Computing Machinery,
New York, NY, USA, 2023, p. 131–139. URL: https://doi.org/10.1145/3587259.3627571. doi:10.1145/
3587259.3627571.
[31] H. Babaei Giglou, J. D’Souza, S. Auer, Llms4ol: Large language models for ontology learning, in:
T. R. Payne, V. Presutti, G. Qi, M. Poveda-Villalón, G. Stoilos, L. Hollink, Z. Kaoudi, G. Cheng, J. Li
(Eds.), The Semantic Web – ISWC 2023, Springer Nature Switzerland, Cham, 2023, pp. 408–427.
[32] S. Toro, A. V. Anagnostopoulos, S. M. Bello, K. Blumberg, R. Cameron, L. Carmody, A. D. Diehl,
D. M. Dooley, W. D. Duncan, P. Fey, P. Gaudet, N. L. Harris, M. P. Joachimiak, L. Kiani, T. Lubiana,
M. C. Munoz-Torres, S. O‘Neil, D. Osumi-Sutherland, A. Puig-Barbe, J. T. Reese, L. Reiser, S. M.
Robb, T. Ruemping, J. Seager, E. Sid, R. Stefancsik, M. Weber, V. Wood, M. A. Haendel, C. J. Mungall,
Dynamic retrieval augmented generation of ontologies using artificial intelligence (dragon-ai),
Journal of Biomedical Semantics 15 (2024) 19. URL: https://doi.org/10.1186/s13326-024-00320-3.
doi:10.1186/s13326- 024- 00320- 3.
[33] P. Mateiu, A. Groza, Ontology engineering with Large Language Models, in: 2023 25th International
Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), IEEE
Computer Society, Los Alamitos, CA, USA, 2023, pp. 226–229. URL: https://doi.ieeecomputersociety.
org/10.1109/SYNASC61333.2023.00038. doi:10.1109/SYNASC61333.2023.00038.
[34] N. Köhler, F. Neuhaus, The Mercurial Top-Level Ontology of Large Language Models, 2024. URL:
https://arxiv.org/abs/2405.01581. arXiv:2405.01581.
[35] D. Doumanas, A. Soularidis, K. Kotis, G. Vouros, Integrating llms in the engineering of a sar
ontology, in: I. Maglogiannis, L. Iliadis, J. Macintyre, M. Avlonitis, A. Papaleonidas (Eds.), Artificial
Intelligence Applications and Innovations, Springer Nature Switzerland, Cham, 2024, pp. 360–374.
[36] M. J. Saeedizade, E. Blomqvist, Navigating ontology development with large language models, in:
A. Meroño Peñuela, A. Dimou, R. Troncy, O. Hartig, M. Acosta, M. Alam, H. Paulheim, P. Lisena
(Eds.), The Semantic Web, Springer Nature Switzerland, Cham, 2024, pp. 143–161.
[37] J. H. Caufield, H. Hegde, V. Emonet, N. L. Harris, M. P. Joachimiak, N. Matentzoglu, H. Kim, S. A. T.</p>
      <p>Moxon, J. T. Reese, M. A. Haendel, P. N. Robinson, C. J. Mungall, Structured prompt interrogation
and recursive extraction of semantics (spires): A method for populating knowledge bases using
zero-shot learning, 2023. URL: https://arxiv.org/abs/2304.02711. arXiv:2304.02711.
[38] R. Amini, S. S. Norouzi, P. Hitzler, R. Amini, Towards Complex Ontology Alignment using Large</p>
      <p>Language Models, 2024. URL: https://arxiv.org/abs/2404.10329. arXiv:2404.10329.
[39] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, BLEU: a method for automatic evaluation of machine
translation, in: Proceedings of the 40th Annual Meeting on Association for Computational
Linguistics, ACL ’02, Association for Computational Linguistics, USA, 2002, p. 311–318. URL:
https://doi.org/10.3115/1073083.1073135. doi:10.3115/1073083.1073135.
[40] A. Fernández-Izquierdo, M. Poveda-Villalón, R. García-Castro, Coral: a corpus of ontological
requirements annotated with lexico-syntactic patterns, in: The Semantic Web: 16th International
Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings 16, Springer, 2019, pp.
443–458.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Melo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>graphs</given-names>
          </string-name>
          ,
          <source>ACM Computing Surveys (Csur) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fernández-Izquierdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernández-López</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>García-Castro, LOT: An industrial oriented ontology engineering framework</article-title>
          ,
          <source>Engineering Applications of Artificial Intelligence</source>
          <volume>111</volume>
          (
          <year>2022</year>
          )
          <article-title>104755</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.engappai.
          <year>2022</year>
          .
          <volume>104755</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fernández-Izquierdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>García-Castro</surname>
          </string-name>
          ,
          <article-title>Themis: a tool for validating ontologies through requirements</article-title>
          ,
          <source>in: International Conference on Software Engineering and Knowledge Engineering</source>
          ,
          <year>2019</year>
          . URL: https://api.semanticscholar.org/CorpusID:199571789.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <article-title>WIDOCO: a wizard for documenting ontologies</article-title>
          , in: International Semantic Web Conference, Springer, Cham,
          <year>2017</year>
          , pp.
          <fpage>94</fpage>
          -
          <lpage>102</lpage>
          . URL: http://dgarijo.com/papers/widoco-iswc2017.
          <source>pdf. doi:10</source>
          .1007/978- 3-
          <fpage>319</fpage>
          - 68204-
          <issue>4</issue>
          _
          <fpage>9</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Suárez-Figueroa</surname>
          </string-name>
          ,
          <article-title>Oops!(ontology pitfall scanner!): An on-line tool for ontology evaluation</article-title>
          ,
          <source>International Journal on Semantic Web and Information Systems (IJSWIS) 10</source>
          (
          <year>2014</year>
          )
          <fpage>7</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Poveda-Villalón, FOOPS!: An Ontology Pitfall Scanner for the FAIR Principles 2980 (</article-title>
          <year>2021</year>
          ). URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2980</volume>
          /paper321.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>Journal of machine learning research 21</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          , et al.,
          <article-title>Palm: Scaling language modeling with pathways</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>24</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
          </string-name>
          , et al.,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <source>arXiv preprint arXiv:2302.13971</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Roziere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gloeckle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sootla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Gat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. E.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Adi</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sauvestre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Remez</surname>
          </string-name>
          , et al.,
          <article-title>Code llama: Open foundation models for code</article-title>
          ,
          <source>arXiv preprint arXiv:2308.12950</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>in: Proceedings of the 34th International Conference on Neural Information Processing Systems</source>
          , NIPS '20, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Valmeekam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Olmo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sreedharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kambhampati</surname>
          </string-name>
          ,
          <article-title>Large language models still can't plan (a benchmark for llms on planning and reasoning about change)</article-title>
          ,
          <source>in: NeurIPS 2022 Foundation Models for Decision Making Workshop</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Rawte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
          </string-name>
          ,
          <source>A Survey of Hallucination in Large Foundation Models</source>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2309.05922. arXiv:
          <volume>2309</volume>
          .
          <fpage>05922</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Penedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Malartic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hesslow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cojocaru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cappelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Alobeidli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pannier</surname>
          </string-name>
          , E. Almazrouei,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>