<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Small Open-Source Language Models for Ontology Generation through Metric-Guided Continual Pretraining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Miquel Canal-Esteve</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Research Group of Language Processing and Information System, University of Alicante</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Ontology development requires expert knowledge and structural precision. While Large Language Models (LLMs) show promise for ontology tasks, small open-source models like Llama 3.2-1B still lack strong semantic and structural understanding. We propose a two-phase approach: continual pretraining on high-quality ontology datasets, guided by two frameworks-one for semantic metrics and another for lexical-structural metrics. We pretrained Llama 3.2-1B using semantic-based high-quality subsets and evaluated improvements through manual and structural analyses. Results show small, high-quality subsets yield rapid gains, while larger, diverse datasets improve long-term performance. Since semantic metrics need complete ontologies, ORI remains key for evaluating fragments. Future work will apply instruction- and fine-tuning for specialized tasks such as those in the LLMs4OL benchmark or for generating structured resources across domains using ontology-based methods. This work shows that thoughtful data selection and continual pretraining can push small LLMs toward expert-level ontology generation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology generation</kwd>
        <kwd>continual pretraining</kwd>
        <kwd>semantic metrics</kwd>
        <kwd>large language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>2. Related Work</title>
      <p>
        Ontology-to-ontology generation remains a largely untapped area, especially when considering the
capabilities of large language models (LLMs). While most prior research focuses on applying LLMs
to tasks like ontology refinement, enrichment, or generation from unstructured text sources [
        <xref ref-type="bibr" rid="ref11 ref6 ref7">6, 7, 11</xref>
        ],
these works primarily provide the groundwork upon which this study builds.
      </p>
      <p>
        Recent research highlights the diverse roles LLMs can play in ontology engineering. For instance,
Zhao et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] incorporate OntoClean principles to improve refinement; Toro et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] leverage
retrieval-augmented generation (RAG) in DRAGON-AI for dynamic construction; Fathallah et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
address structured translation using NeOn-GPT; Zhang et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] develop conversational approaches
with OntoChat; He et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] apply deep learning for ontology completion in DeepOnto; and Mukanova
et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] use LLMs for enrichment tasks. Yet, none of these approaches directly tackle the challenge of
autonomously generating new ontologies from partial or incomplete inputs.
      </p>
      <p>
        Additional research on text-to-ontology generation also informs our approach. Babaei et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] break
down the generation process into subtasks and emphasize the importance of fine-tuning; Saeedizade et
al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] guide progressive ontology construction through competency questions; and Da Silva et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
demonstrate how few-shot prompting can enhance generation outcomes. By contrast, our work adopts
continual pretraining to help the model internalize domain-relevant patterns and knowledge prior to
task-specific fine-tuning, thus reducing dependence on prompt-driven examples.
      </p>
      <p>
        Moreover, the importance of data cleaning for improving model performance is well established
across the literature [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ], with several studies introducing sophisticated filtering methods during
preprocessing [
        <xref ref-type="bibr" rid="ref21 ref22 ref23">21, 22, 23</xref>
        ]. However, these methods are predominantly designed for unstructured text,
and to date, no standardized methodology exists for selecting high-quality data specifically for continual
pretraining in the context of ontologies. Our study addresses this gap by proposing a systematic
approach to identify and leverage high-quality ontological data for improving LLM performance.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Description of the Proposed Research</title>
      <p>This research is structured in two main phases. First, we focus on improving the models’ general
semantic and ontology knowledge through continual pretraining on ontology datasets. For this, we have
developed two dedicated repositories to measure the quality of datasets: one for computing semantic
metrics 1 (covering classes, taxonomic and non-taxonomic relations) and another for computing lexical
and structural metrics 2 (focused on vocabulary usage and structural patterns). The goal is to combine
these into a global metric that can evaluate the quality of both the datasets and the outputs generated
by the model, helping to identify high-quality subsets — since it is well known that less data of higher
quality is more efective for training. Some experiments have already been carried out in this direction.</p>
      <p>
        Second, we will apply instruction-tuning and fine-tuning to adapt the continually pretrained models
for specific, high-value ontology tasks, such as those defined in the LLMs4OL [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] benchmark — including
term typing, type taxonomy discovery, and type non-taxonomic relation extraction. Together, these
eforts aim to systematically enhance small, open-source LLMs’ ability to handle advanced ontology
engineering challenges and structured knowledge tasks more efectively than base models. This tasks
can help to automatize the creation of didacic material based on ontologies.
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. Methodology</title>
      <p>This section details the methodological framework designed to improve small open-source language
models for ontology generation. We combine curated ontology repositories, carefully designed semantic
and lexical-structural metrics, and a continual pretraining strategy. By integrating dataset selection,
metric-driven evaluation, and robust training configurations, our approach systematically enhances</p>
      <sec id="sec-3-1">
        <title>1https://github.com/miquelcanalesteve/LLM4Onto</title>
        <p>2https://github.com/miquelcanalesteve/ontology-metrics-pretraining
the model’s semantic, lexical, and structural capabilities. Below, we describe the ontology sources, the
metric systems, the segmentation strategies, the evaluation framework, and the pretraining setup.</p>
        <sec id="sec-3-1-1">
          <title>4.1. Ontology Repository</title>
          <p>We base our methodology on an ontology repository that provides structured knowledge for pretraining.
These repositories include ontologies of varying size, completeness, and semantic richness, requiring
additional filtering before use.</p>
          <p>
            For this study, we selected DBpedia Archivo3, a widely used repository of ontologies across diverse
domains [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ]. Its files are provided in TTL (Turtle) format, a popular and human-readable RDF serialization,
making it ideal for evaluating how quality-based dataset selection afects model performance.
          </p>
          <p>Our dataset, downloaded on July 15, 2024, includes 1,766 ontologies totaling 71 million triples, ranging
from small sets under 10 triples to large ontologies exceeding 10 million.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>4.2. Lexical and Structural Ontology Metrics</title>
          <p>This section introduces the text-based metrics designed to quantify vocabulary usage, lexical richness,
and structural variability in ontology files.</p>
          <p>
            To evaluate an ontology, we assess its raw text representation—whether it comes from an existing
dataset or is generated by a model—using lightweight text-based metrics inspired by Palomar et al. [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ].
These metrics capture vocabulary use and structural diversity without requiring reasoning or formal
parsing. We then aggregate them into the Ontology Reference Index (ORI), drawing on concepts from
[
            <xref ref-type="bibr" rid="ref25">25</xref>
            ], to support data ranking and performance evaluation.
          </p>
          <p>Vocabulary-specific density. Average number of predefined vocabulary terms per non-empty line
(dependent on typical one-relation-per-line format):</p>
          <p>where  is the number of non-empty lines, and  is the number of vocabulary terms detected in line
. The vocabulary is a predefined set of ontology modelling terms commonly used across structured
knowledge representations, including those from RDF, RDFS, OWL, and XSD (the full vocabulary is
available in the repository). Terms inside quoted literals are excluded.</p>
          <p>Vocabulary-specific diversity. Proportion of vocabulary terms that appear at least once in the file:
den =
1 ∑︁ 
 =1
div = |doc|</p>
          <p>|spec|
where doc ⊆  spec is the set of vocabulary terms found in the file, and spec is the same vocabulary
used for den. A higher value indicates broader use of available modeling constructs.
Logical block uniqueness ratio (LBUR). Fraction of unique logical blocks in the ontology:
LBUR = |unique_blocks|</p>
          <p>|blocks|</p>
          <p>Logical blocks are defined as minimal self-contained RDF/OWL units, starting from a subject and
continuing until the terminating period. These typically include class declarations, property assertions,
or grouped triples.
Line uniqueness ratio (LUR). Fraction of unique non-empty lines:</p>
          <p>LUR = |unique_nonempty_lines|</p>
          <p>|nonempty_lines|</p>
          <p>This metric captures surface-level textual redundancy, regardless of line type (structural, directive, or
annotation).</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Brunet Index (BI). Lexical richness index:</title>
        <p>BI =   −0.165
where  is the total number of word tokens and  is the number of unique word types. Composite
terms (e.g., prefix-based identifiers) are tokenized accordingly. Lower values indicate greater lexical
diversity.</p>
        <p>Ontology Reference Index (ORI) and Evaluation The Ontology Reference Index (ORI) provides a
weighted measure of an ontology’s alignment with an idealized reference, which aggregates the best
observed values for each of the five previously defined metrics. This reference does not represent any
single ontology but instead reflects the per-metric maxima identified across the dataset.</p>
        <p>The computation normalizes all metric values using min-max scaling. Because lower Brunet Index
values indicate better lexical diversity, the method inverts this metric using 1 − () , where ()
denotes the normalized value. The ORI score is then calculated as:
where  = {, ,  ,  , }, and</p>
        <p>∈
 = ∑︁ () *  ()</p>
        <p>() =
{︃(),</p>
        <p>if  ̸= 
1 − (), if  =</p>
        <p>The weight  assigned to each metric reflects the performance gap between the base model (Llama
3.2-1B) and the top-performing ontology for that metric. For the Brunet Index, the method computes
this ratio inversely (base / best) to maintain consistency with its inverted interpretation. The procedure
then normalises these gains to derive the final weights, which appear in Table 1.</p>
        <p>LBUR LUR BI
Llama 3.2-1B 0.500 0.035 0.955 0.738 16.11
Top-1 dataset 1.257 0.622 1 1 4.382
Gain 2.514 17.744 1.048 1.345 3.679</p>
        <p>Weights 0.096 0.673 0.040 0.051 0.140</p>
        <p>To estimate base model values, we sampled 12 ontology fragments of 150 tokens each and
generated 6 completions of 450 tokens per fragment. The generation used the following configuration:
do_sample=True, top_k=50, top_p=0.95, and temperature=0.7. The trained models followed
the same ontology completion protocol.</p>
        <sec id="sec-3-2-1">
          <title>4.3. Semantic Metrics</title>
          <p>
            To evaluate the structural quality of an ontology, we propose lightweight complexity-based metrics
inspired by Tello et al. [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ] and Gutiérrez et al. [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ]. These metrics quantify the richness and density
of the ontology without requiring reasoning or formal entailment, making them scalable for large
repositories. We then aggregate them into a unified quality score to support dataset filtering and model
evaluation.
          </p>
          <p>Average Subclasses per Class (SC). Average number of subclasses per class, reflecting the
hierarchical depth and granularity of the ontology taxonomy:
Average Non-Taxonomic Relations per Class (NTR). Average number of non-taxonomic
relationships per class, indicating the density of semantic links beyond simple hierarchies:
∑︀=1 not()
   =

where not() is the number of non-taxonomic relationships attached to class .</p>
          <p>Property Density (PD). Average number of attributes and non-taxonomic relations per class, serving
as a proxy for schema richness and information density:</p>
          <p>∑︀=1(att() + not())
  =

where att() denotes the number of data properties (attributes) of class .</p>
          <p>To consolidate these aspects into a unified quality score , we normalize the three metrics using
min-max scaling and compute:</p>
          <p>= ( ) + () + (  )</p>
          <p>These metrics are computed using the rdflib Python library, providing an eficient and reproducible
basis for ontology quality analysis.
4.3.1. Segmentation of Datasets
While the segmentation approach described here is applied using the  metric, the same logic could
be extended to the Ontology Reference Index (ORI) or other metric, allowing future work to explore
dataset splits that prioritize lexical and structural quality alongside semantic complexity. For this
study, however, segmentation is based solely on , which focuses on semantic richness, density, and
hierarchy.</p>
          <p>To segment the dataset, we first compute the token count for each ontology. This allows us to
define partitions based on token distribution, ensuring that diferent subsets capture varying levels of
quality and diversity (i.e., more ontologies lead to greater diversity). While segmentation can be done
in multiple ways—by quartiles, deciles, halves, or other thresholds—we adopt three specific strategies:
1. Q1 (Prioritizing Quality): Ontologies are ranked by , and those with the highest scores are
selected until reaching at least 25% of the total tokens. Since selection is done without truncation,
the last ontology added may slightly exceed this threshold. In our case, this resulted in 31% of the
total tokens.
2. Q1,2 (Quality + Diversity): Ontologies are again ranked by , and selection continues until
reaching at least 50% of the total tokens. This strategy balances quality and diversity while
ensuring that no ontology is arbitrarily truncated.
3. Q1-4 (Full Dataset): This set includes all available ontologies, covering the entire range of
quality levels and structural complexities. It serves as a baseline to assess the impact of training
on the full, unfiltered dataset.</p>
          <p>This segmentation enables a systematic assessment of how training on subsets with varying semantic
quality afects model performance. Table 3 summarizes the selected datasets, showing the average
values and standard deviations for each key quality metric.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>4.4. Manual Evaluation</title>
          <p>
            The manual evaluation framework is based on da Silva et al. [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ], which categorizes errors into syntactic,
semantic, and structural issues to comprehensively assess ontology quality. Additional criteria follow
Chen et al. [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ] to address ambiguity and redundancy, and Xu et al. [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ] to capture text repetition.
Errors include syntactic violations (e.g., missing delimiters), triplet repetition, text repetition within
comments or literals, semantic redundancy, ambiguity between entities, semantic contradictions (e.g.,
conflicting OWL types), and vocabulary misuse involving incorrect ontology terms. A complete guide
for the evaluation is available in the repository 4.
          </p>
          <p>
            To quantify performance, we compute the mean error rate per triple across categories, following
da Silva et al. [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. The evaluation uses unseen ontology fragments drawn from diverse repositories such
as AGRO5, EDAM6, MDS7, and SWEET8, covering domains like biology, spatial data, and agriculture.
Each fragment (150 tokens) was randomly sampled and generated six times using the Hugging Face
library with do_sample=True, top_k=50, top_p=0.95, and temperature=0.7, ensuring robust
and unbiased measurement of generalization capabilities.
          </p>
        </sec>
        <sec id="sec-3-2-3">
          <title>4.5. Pretraining LLM</title>
          <p>
            For continual pretraining, we used the Llama 3.2-1B model [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], chosen for its compact yet expressive
1.2 billion parameter architecture, which balances adaptability and computational eficiency. The TTL
ontologies were processed as plain text, allowing standard NLP tokenization without specialized parsing,
resulting in 1.25 billion tokens across all subsets. To scale training efectively, we applied a Distributed
Data Parallel (DDP) strategy [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ], distributing parameters and gradients across four NVIDIA A100
GPUs (40 GB each), with gradient accumulation and checkpointing to optimize batch size. Each dataset
subset was pretrained for two epochs, as longer runs showed no additional gains.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Experiments</title>
      <p>The results reported in Table 2 use quartiles selected based on semantic quality metrics, which guided
the segmentation of the dataset into high-quality (Q1), top-half (Q1,2), and full (Q1...4) subsets. While
the same segmentation logic could, in principle, be applied using the Ontology Reference Index (ORI) to
emphasize lexical and structural quality, in this study it was only tested with the  metric, which
focuses on semantic richness, density, and hierarchy.</p>
      <p>Importantly,  cannot be applied to model-generated outputs because these are only partial ontology
fragments, and the rdflib library requires complete, parsable ontology structures to compute these
metrics, whereas  remains applicable because it operates directly over the raw text representation.</p>
      <sec id="sec-4-1">
        <title>4https://github.com/miquelcanalesteve/LLM4Onto/tree/main/results</title>
        <p>5https://bioportal.bioontology.org/ontologies/AGRO
6https://bioportal.bioontology.org/ontologies/EDAM
7https://matportal.org/ontologies/MDS
8https://earthportal.eu/ontologies/SWEET</p>
        <p>Model Ep
Base
Q1 1
Q1 2
Q1,2 1
Q1,2 2
Q1...4 1
Q1...4 2</p>
        <p>The base Llama 3.2-1B model shows a high total error rate of 6.6%, mainly driven by repetition errors
(30.7%) and syntactic issues (3.0%). Semantic and vocabulary-specific errors are almost negligible in
the base outputs, reflecting structurally shallow generations. Pretraining on the high-quality subset
(Q1) for one epoch sharply reduces the total error rate to 1.6%, with substantial improvements in
repetition (4.4%) and syntactic errors (0.6%). A second epoch on Q1 slightly increases total errors to
2.4%, suggesting diminishing returns or mild overfitting.</p>
        <p>Expanding the training to larger subsets, such as Q1,2 or the full dataset (Q1...4), stabilizes error
rates between 2.4% and 2.5%, with the best redundancy and text repetition reduction achieved under
the Q1,2 (2 epochs) and Q1...4 (1 epoch) settings. These configurations show that simply increasing
dataset size or epochs does not linearly improve performance, making it crucial to calibrate training
parameters carefully.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusions and Future Work</title>
      <p>Overall, the results demonstrate that continual pretraining meaningfully boosts the model’s ability to
generate coherent, semantically aligned, and structurally rich ontologies. While small, high-quality
subsets like Q1 enable rapid improvements, broader datasets like Q1...4 maximize long-term structural
gains, provided training configurations are carefully balanced to avoid performance plateaus. These
ifndings highlight the need to rethink data selection strategies: although this study segmented data
using semantic metrics, future work should explore integrating lexical and structural dimensions into a
combined metric. Such a mixed metric could help isolate subsets that ofer the best balance between
semantic depth, lexical richness, and structural complexity, potentially driving even more robust model
improvements.</p>
      <p>
        Additionally, the evaluation framework itself presents an opportunity for advancement. The current
manual assessment, while informative, is labor-intensive and limits scalability; developing an automated
evaluation pipeline would not only streamline the process but also enhance reproducibility and allow
ifner-grained analysis across larger test sets. Looking ahead, the next research phase will apply
instruction tuning and task-specific fine-tuning, aligning pretrained models with specialized ontology
tasks such as those outlined in the LLMs4OL [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] benchmark, as well as expanding applications across
diverse domains like education and biomedicine. Together, these steps aim to move small, open-source
LLMs beyond general improvements toward expert-level performance in key ontology engineering
applications.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authora used ChatGPT in order to: Grammar and spelling
check, Paraphrase, translate and reword. After using this tool/service, the authors reviewed and edited
the content as needed and takes full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernández-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Juristo</surname>
          </string-name>
          ,
          <article-title>Methontology: from ontological art towards ontological engineering (</article-title>
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fernández-Izquierdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernández-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>García-Castro</surname>
          </string-name>
          ,
          <article-title>Lot: An industrial oriented ontology engineering framework</article-title>
          ,
          <source>Engineering Applications of Artificial Intelligence</source>
          <volume>111</volume>
          (
          <year>2022</year>
          )
          <fpage>104755</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lambrix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Armiento</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abd Nikooie Pour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>The materials design ontology</article-title>
          ,
          <source>Semantic Web</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          , G. Song,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Development of an ontology for construction carbon emission tracking and evaluation</article-title>
          ,
          <source>Journal of Cleaner Production</source>
          <volume>443</volume>
          (
          <year>2024</year>
          )
          <fpage>141170</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Amalki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tatane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bouzit</surname>
          </string-name>
          ,
          <article-title>Deep learning-driven ontology learning: A systematic mapping study</article-title>
          ,
          <source>Engineering, Technology &amp; Applied Science Research</source>
          <volume>15</volume>
          (
          <year>2025</year>
          )
          <fpage>20085</fpage>
          -
          <lpage>20094</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>A short review for ontology learning: Stride to large language models trend</article-title>
          ,
          <source>arXiv preprint arXiv:2404.14991</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Babaei Giglou</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>S. Auer,</given-names>
          </string-name>
          <article-title>Llms4ol: Large language models for ontology learning</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2023</year>
          , pp.
          <fpage>408</fpage>
          -
          <lpage>427</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Llama</surname>
          </string-name>
          ,
          <source>Model cards and prompt formats - llama 3</source>
          .2, https://www.llama.com/docs/ model-cards-and
          <string-name>
            <surname>-</surname>
          </string-name>
          prompt-formats/llama3_2/,
          <year>2024</year>
          . Accessed:
          <fpage>2025</fpage>
          -03-04.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kamath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ferret</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vieillard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Merhej</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Perrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Matejovicova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rivière</surname>
          </string-name>
          , et al.,
          <source>Gemma 3 technical report, arXiv preprint arXiv:2503.19786</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schoelkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. G.</given-names>
            <surname>Anthony</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bradley</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. O'Brien</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hallahan</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Purohit</surname>
            ,
            <given-names>U. S.</given-names>
          </string-name>
          <string-name>
            <surname>Prashanth</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Raf</surname>
          </string-name>
          , et al.,
          <article-title>Pythia: A suite for analyzing large language models across training and scaling</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2397</fpage>
          -
          <lpage>2430</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. J. Saeedizade</surname>
          </string-name>
          , E. Blomqvist,
          <article-title>Navigating ontology development with large language models</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vetter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Aryan</surname>
          </string-name>
          ,
          <article-title>Using large language models for ontoclean-based ontology refinement</article-title>
          ,
          <source>arXiv preprint arXiv:2403.15864</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Toro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Anagnostopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Bello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Blumberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cameron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Carmody</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Diehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Dooley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. D.</given-names>
            <surname>Duncan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fey</surname>
          </string-name>
          , et al.,
          <article-title>Dynamic retrieval augmented generation of ontologies using artificial intelligence (dragon-ai)</article-title>
          ,
          <source>Journal of Biomedical Semantics</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Fathallah</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>S. D.</given-names>
          </string-name>
          <string-name>
            <surname>Giorgis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Poltronieri</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Haase</surname>
          </string-name>
          , L. Kovriguina,
          <article-title>Neon-gpt: a large language model-powered pipeline for ontology learning</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Carriero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schreiberhuber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tsaneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. S.</given-names>
            <surname>González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          , J. de Berardinis,
          <article-title>Ontochat: a framework for conversational ontology engineering using language models</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>102</fpage>
          -
          <lpage>121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          , I. Horrocks,
          <string-name>
            <given-names>C.</given-names>
            <surname>Allocca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sapkota</surname>
          </string-name>
          ,
          <article-title>Deeponto: A python package for ontology engineering with deep learning</article-title>
          ,
          <source>Semantic Web</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>1991</fpage>
          -
          <lpage>2004</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Milosz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dauletkaliyeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nazyrova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Yelibayeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kuzin</surname>
          </string-name>
          , L. Kussepova,
          <article-title>Llm-powered natural language text processing for ontology enrichment</article-title>
          .,
          <source>Applied Sciences (2076- 3417) 14</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L. M. V.</given-names>
            da
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gehlhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fay</surname>
          </string-name>
          ,
          <article-title>On the use of large language models to generate capability ontologies</article-title>
          ,
          <source>in: 2024 IEEE 29th International Conference on Emerging Technologies and Factory Automation (ETFA)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Palomar-Giner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Saiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Espuña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Da</given-names>
            <surname>Dalt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Llop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. O.</given-names>
            <surname>Suarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rehm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          , et al.,
          <article-title>A curated catalog: Rethinking the extraction of pretraining corpora for mid-resourced languages</article-title>
          ,
          <source>in: Proceedings of the 2024 Joint International Conference on Computational Linguistics</source>
          ,
          <article-title>Language Resources and Evaluation (LREC-COLING</article-title>
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
          <fpage>335</fpage>
          -
          <lpage>349</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P. J. O.</given-names>
            <surname>Suárez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sagot</surname>
          </string-name>
          , L. Romary,
          <article-title>Asynchronous pipeline for processing huge corpora on medium to low resource infrastructures</article-title>
          ,
          <source>in: 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7)</source>
          ,
          <source>Leibniz-Institut für Deutsche Sprache</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yu</surname>
          </string-name>
          , Opal:
          <article-title>Ontology-aware pretrained language model for end-to-end task-oriented dialogue</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>11</volume>
          (
          <year>2023</year>
          )
          <fpage>68</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>02116</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kudugunta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Caswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kusupati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bapna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Firat</surname>
          </string-name>
          , Madlad-400:
          <article-title>A multilingual and document-level large audited dataset</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          )
          <fpage>67284</fpage>
          -
          <lpage>67296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Frey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Streitmatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Götz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Arndt</surname>
          </string-name>
          ,
          <article-title>Dbpedia archivo: a web-scale interface for ontology archiving under consumer-oriented aspects, Semantic Systems</article-title>
          .
          <source>In the Era of Knowledge Graphs</source>
          <volume>12378</volume>
          (
          <year>2020</year>
          )
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>H.</given-names>
            <surname>Alani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Brewster</surname>
          </string-name>
          ,
          <article-title>Metrics for ranking ontologies</article-title>
          ,
          <source>in: Proceedings of the Evaluating Ontologies for the Web Workshop (EON2006)</source>
          , 15th International World Wide Web Conference, EON Workshop, Edinburgh, Scotland,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A. J. L.</given-names>
            <surname>Tello</surname>
          </string-name>
          , Métrica de idoneidad de ontologías,
          <source>Ph.D. thesis</source>
          , Universidad de Extremadura,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tomas</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Moreno</surname>
          </string-name>
          ,
          <article-title>Developing an ontology schema for enriching and linking digital media assets</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>101</volume>
          (
          <year>2019</year>
          )
          <fpage>381</fpage>
          -
          <lpage>397</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Cao,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <article-title>A practical framework for evaluating the quality of knowledge graph, in: Knowledge Graph and Semantic Computing: Knowledge Computing and Language Understanding: 4th China Conference</article-title>
          ,
          <string-name>
            <surname>CCKS</surname>
          </string-name>
          <year>2019</year>
          , Hangzhou, China,
          <source>August 24-27</source>
          ,
          <year>2019</year>
          ,
          <source>Revised Selected Papers 4</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Learning to break the loop: Analyzing and mitigating repetitions for neural text generation</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>3082</fpage>
          -
          <lpage>3095</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>Eficient training of large language models on distributed infrastructures: a survey, arXiv preprint</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>