1. Introduction

Enriching Ontologies with Disjointness Axioms using Large Language Models

Elias Crum

elias.crum@ugent.be 0

Antonio De Santis

antonio.desantis@polimi.it 1

Manon Ovide

manon.ovide@univ-tours.fr 5

Jiaxin Pan

jiaxin.pan@ki.uni-stuttgart.de 4

Alessia Pisu

alessia.pisu96@unica.it 2

Nicolas Lazzari

nicolas.lazzari3@unibo.it 3

Sebastian Rudolph

TU Dresden

Germany sebastian.rudolph@tu-dresden.de

0 Ghent University , Belgium 1 Politecnico di Milano , Italy 2 University of Cagliari , Italy 3 University of Pisa and University of Bologna , Italy 4 University of Stuttgart , Germany 5 University of Tours , France

Ontologies often lack explicit disjointness declarations between classes, despite their usefulness for sophisticated reasoning and consistency checking in Knowledge Graphs. In this study, we explore the potential of Large Language Models (LLMs) to enrich ontologies by identifying and asserting class disjointness axioms. Our approach aims at leveraging the implicit knowledge embedded in LLMs, using prompt engineering to elicit this knowledge for classifying ontological disjointness. We validate our methodology on the DBpedia ontology, focusing on open-source LLMs. Our findings suggest that LLMs, when guided by efective prompt strategies, can reliably identify disjoint class relationships, thus streamlining the process of ontology completion without extensive manual input. For comprehensive disjointness enrichment, we propose a process that takes logical relationships between disjointness and subclass statements into account in order to maintain satisfiability and reduce the number of calls to the LLM. This work provides a foundation for future applications of LLMs in automated ontology enhancement and ofers insights into optimizing LLM performance through strategic prompt design. Our code is publicly available on GitHub at https://github.com/n28div/llm-disjointness.

eol>Large Language Models Disjointness Learning Ontology Enrichment

1. Introduction

It is generally understood that complementing the factual (assertional) knowledge represented in Knowledge Graphs with ontological (terminological) information greatly advances the usefulness of the ensuing knowledge base in terms of querying and many other downstream tasks. This is because combining assertional information with terminological background knowledge allows for the derivation of a vast amount of implicit knowledge, which is not explicitly stated in the knowledge base but follows logically from it and thus can be taken into account for all kinds of knowledge management activities, including query answering.

The by far most widespread type of ontological information added to knowledge graphs is taxonomic in nature, that is, it is related to (i) putting the individual objects of interest into categories, usually referred to as classes, based on shared characteristics and (ii) establishing set-theoretic relationships between these classes. Among the diverse possible such taxonomic relationships, the subclass/superclass relationships – tightly connected to the linguistic hyponymy/hypernymy relationships of the corresponding class names – are the ones predominantly found across numerous ontologies today, typically forming sizeable conceptual hierarchies. As an example, the subclass/superclass relationship between the classes Mammal and Vertebrate implies that any object that belongs to (or, in more technical terms: is an instance of) the class Mammal also must belong to the class Vertebrate.

Another well-known basic type of taxonomic relationship between two classes is that of disjointness. Two classes are said to be disjoint if it is impossible that they have common instances, which, intuitively, means that the two classes cannot overlap, and membership in these two classes is mutually exclusive. For example, disjointness of the classes Mammal and Fish implies that any instance of Mammal must not be an instance of Fish. Given the symmetric nature of disjointness, this is logically equivalent to saying that any instance of Fish must not be an instance of Mammal. As opposed to subclass statements, which allow for inferring positive facts from other positive facts, disjointness statements enable the inference of negated facts. For example, given the fact that Flipper is an instance of Mammal, the above subclass relationship gives rise to the information that Flipper is an instance of Vertebrate, whereas the disjointness statement allows us to infer the information that Flipper must not be an instance of Fish. This fact makes disjointness information particularly valuable in the context of machine-learning approaches that rely on the presence of negative examples, such as Knowledge Graph Embedding.

When specifying taxonomic relationships between classes in the course of the ontology design process, it should be kept in mind that they are not meant to reflect spurious relationships in the data currently available, but rather they are supposed to represent immutable background knowledge that continues to hold in diferent situations or at diferent points in time. For instance, although historically, no woman has served as US President, a woman may be elected as the US President in the future. Therefore, the corresponding classes Woman and USPresident are not (ontologically) disjoint.1 To reflect this situation more formally, one can employ the idea of possible or conceivable worlds (referred to as interpretations in model-theoretic terms), which e.g., include potential future or just hypothetical circumstances. Then, a certain taxonomic relationship (such as subclass or disjointness) between two classes holds if the corresponding set relation (such as subset or intersection-emptiness) holds between the sets of class instances in every conceivable world (under every conceivable interpretation). Based on this, we will employ a very lightweight logical framework to give our arguments a formal underpinning: Stipulating a set I of conceivable worlds, we define taxonomic relationships for this set. The goal of ontological knowledge modeling is to capture I using a knowledge base whose statements rule out the inconceivable worlds so that only the conceivable ones remain as models of . 1We might call them materially disjoint due to the absence of material evidence demonstrating their non-disjointness. Definition 1. Fixing a vocabulary consisting of a set C of class names and a set I of individual names, an interpretation ℐ = (∆ , · ℐ ) consists of a set ∆ called the domain and a function · ℐ mapping every class name C ∈ C to a subset Cℐ ⊆ ∆ and every individual name i ∈ I to an element iℐ ∈ ∆ .

Let I be a set of interpretations, representing the conceivable worlds. Then, for an individual name i ∈ I and for concept names C, D ∈ C we call • i an instance of C (written i : C) if every interpretation ℐ ∈ I satisfies iℐ ∈ Cℐ , • C a subclass of D (written C ⊑ D) if every interpretation ℐ ∈ I satisfies Cℐ ⊆ Dℐ , • C disjoint with D if every interpretation ℐ ∈ I satisfies Cℐ ∩ Dℐ = ∅.

• C incoherent if every interpretation ℐ ∈ I satisfies Cℐ = ∅.

As discussed above, ontologically dictated taxonomic relationships can be leveraged for sophisticated reasoning and consistency-checking tasks when reasoning over a knowledge graph. Yet, despite their usefulness, disjointness relationships are rarely explicitly recorded within an ontology. Research on 1,275 ontologies showed that only 97 of them include disjointness assertions [ 1 ]. Arguably, this can be explained by the fact that disjointness information is so self-evident from a human common-sense point of view, that human experts are often not aware that it is not logically “built-in” but needs to be explicitly specified. For this reason, semi-automated labeling of disjoint classes could be advantageous. Recent approaches [ 2, 3, 4 ] propose supervised and unsupervised models using various features in disjointness axioms. However, the generalizability of these methods is limited to their specific datasets and cannot be implemented on a large scale. Additionally, the sophisticated feature engineering required hinders their practical application. Therefore, a method that functions independently of feature design and dataset restrictions is highly desirable.

Given that (i) ontological class descriptions are often recorded as (or associated with) terms in natural language and (ii) LLMs have been found to possess wide linguistic and semantic working knowledge, we aim to assess the potential of LLMs to decide on the question which classes ought to be disjoint while assessing the impact of prompt engineering on classification validity. We hypothesize that through the use of prompt engineering, LLMs are to classify ontologically disjoint classes with high validity in both positive (two classes are ontologically disjoint), and negative (two classes are not ontologically disjoint), cases. We test our hypothesis on the DBpedia ontology2 using LLMs. We propose a method that intertwines the LLMbased disjointness classification with basic logical inferencing to increase eficiency, maintain consistency, and minimize the number of calls to the LLM.

Thus, this paper is dedicated to answering the following main research questions: RQ1: Can LLMs help enrich ontologies with class disjointness axioms? RQ2: Which LLM prompts work better for disjointness discovery?

RQ3: How can we exploit taxonomic relationships to reduce interaction with the LLM?

2. Related Work

Disjointness Learning Models for disjointness learning can be categorized into supervised and unsupervised approaches. In the unsupervised category, Schlobach [ 5 ] follows the strong disjointness assumption [ 6 ], which posits that children of a common parent in the subsumption hierarchy should be considered disjoint. They introduced a pinpointing algorithm to identify minimal sets of axioms that need revision to make an ontology coherent, thereby enriching appropriate disjointness statements. However, this approach neglects background knowledge, which could be beneficial in identifying disjoint classes. Rizzo et al. [ 4 ] proposes an unsupervised approach based on concept learning and inductive classification. This method employs a hierarchical conceptual clustering technique capable of providing intensional cluster descriptions and utilizes a novel form of semi-distances over individuals in an ontological knowledge base, incorporating available background knowledge. In the supervised category, Völker et al. [ 2, 3 ] gather syntactic and semantic evidence, such as positive and negative association rules as well as correlation coeficients, from various sources to establish a strong foundation for learning disjointness. However, their work exploits background knowledge and reasoning only to a limited extent. Subsequent work, the DL-Learner by Lehmann [ 7 ], uses Inductive Logic Programming (ILP) for learning class descriptions, including disjointness. Despite these advancements, disjointness learning with LLMs remains much underexplored. Large Language Models In recent years, Large Language Models (LLMs) have become state-of-the-art for Natural Language Processing and have also significantly impacted other ifelds such as knowledge engineering [ 8, 9, 10, 11 ]. LLMs rely on pre-training Transformer models [ 12 ] over large-scale unlabeled corpora. Pre-trained context-aware word representations achieve state-of-the-art performance on various downstream tasks and set the “pre-training and fine-tuning” learning paradigm. Early LLMs, such as BERT [ 13 ], utilized relatively small training corpora and required fine-tuning for specific downstream tasks. However, subsequent research demonstrated that scaling up both model size and dataset volume significantly enhances performance. GPT-3 [ 14 ], for instance, achieves competitive results through few-shot learning and in-context learning without parameter updates. GPT-3.5 further improves capabilities by incorporating reinforcement learning from human feedback (RLHF). The introduction of GPT-4 [ 15 ] marked a milestone by extending beyond text input to include multimodal signals. Meta AI introduced the collection of LLaMA models [ 16, 17 ] with four diferent sizes. Other notable LLMs, such as Claude, Gemini [ 18 ], and Mixtral [ 19 ], have also garnered significant attention. Prompt Engineering Designing efective prompts for LLMs is essential for maximizing their potential. Key strategies in prompt engineering include zero-shot [ 20 ], few-shot [ 14 ], and chainof-thought [ 21 ] prompting. Zero-shot [ 20 ] involves providing task descriptions to LLMs without any input-output examples, relying on the models’ pre-existing knowledge to generate responses. Few-shot [ 14 ] includes input-output examples, guiding the models’ generation process. Chainof-Thought (CoT) [ 22 ] promotes coherent and step-by-step reasoning by decomposing a complex question into a series of simpler logical reasoning questions, mimicking human problem-solving processes. This method has been shown to significantly improve performance on reasoning tasks [ 22 ]. However, the need for multiple prompts makes this approach dificult to use at large scales. With this in mind, Kojima et al. [ 23 ] proposed Zero-shot-CoT prompting. They found that by appending the phrase “Let’s think step by step.” to the end of a question, LLMs can generate a chain of thought that leads to more accurate answers w.r.t the vanilla zero-shot approach.

3. Resources

To efectively assess the ability of LLMs to support the assertion of disjointness axioms, we ideally require a reference ontology that includes a sized set of classes, to ensure diversity during the experiments and some disjoint classes in its description, preferably specified through a specific disjoint class property such as owl:disjointWith. These criteria maximize the generalizability of the approach and encourage its use for future studies.

Several ontologies can be identified for this task, from foundational ontologies, such as DOLCE3 or UFO4, to domain-specific ontologies, such as FoodOn 5. Disjointness axioms from these ontologies, however, are not intuitive and require extensive common-sense reasoning and domain knowledge. For instance, DOLCE defines an Event to be disjoint from an Object while UFO does not. Both axioms are correct, as they deeply depend on their philosophical commitment to these abstract concepts. Similarly, the FoodOn ontology asserts that the Arabia cofee plant 6, the plant used to produce black cofee, is disjoint with Camellia sinensis 7, the plant used to produce black tea. In this case, deciding whether the two plants should be considered disjoint highly depends on the domain of the ontology. To avoid feeding the LLM with classes whose disjointness highly depends on the context or domain, we choose to avoid foundational and domain-specific ontologies for our initial experiments. Moreover, as our interaction with the LLM is based on natural language, we only consider ontologies that provide natural language labels for classes via labeling properties, such as skos:prefLabel or rdfs:label.

We ultimately decided to use the DBpedia ontology8 because of its general popularity and conformity with dataset minimal requirements. Since the DBpedia ontology is created through a crowdsourcing approach [ 24 ], the availability of disjointness axioms cannot be expected to be equally accurate across all classes, as it depends on the annotators’ expertise and diligence. This issue has been actively discussed within the DBpedia community9. The main drawback is the lack of a systematic approach in the creation of the taxonomy, which greatly impacts the consistency of the ontology when disjointness axioms are asserted. In particular, we found 23 explicit disjointness axioms in the DBpedia ontology. In Section 4 we show how exploiting automated reasoning techniques allows the creation of a larger pool of disjoint classes. In Table 1 a selection of disjointness axioms within the ontology is shown. Indeed, most of the disjointness axioms are universally known common-sense relations, such as disjointness between dbo:Fish and dbo:Mammal or dbo:Agent and dbo:Place. 3https://github.com/appliedontolab/DOLCE/blob/main/OWL/DOLCEbasic.owl http://www.ontologydesignpatterns.org/ont/dul/DUL.owl 4https://nemo-ufes.github.io/gufo/ 5https://foodon.org/ 6https://en.wikipedia.org/wiki/Cofea_arabica 7https://en.wikipedia.org/wiki/Camellia_sinensis 8https://DBpedia.org/ontology/, often referred to with the dbo: namespace, which we omit hereafter 9https://github.com/DBpedia/ontology-tracker/issues/2

Class A

Class B http://DBpedia.org/ontology/Person http://DBpedia.org/ontology/Person http://DBpedia.org/ontology/Agent http://DBpedia.org/ontology/Fish http://DBpedia.org/ontology/Event http://DBpedia.org/ontology/ProtohistoricalPeriod http://DBpedia.org/ontology/UnitOf Work http://DBpedia.org/ontology/Place http://DBpedia.org/ontology/Mammal http://DBpedia.org/ontology/Person

4. Proposed approach

We now describe our approach which, given a Knowledge Base, clarifies for every pair of named classes of that ontology if disjointness should hold between the two classes or not. At the core of the approach is prompting an LLM to exploit the semantic and linguistic “world knowledge” it has obtained from training on vast amounts of textual data. The two major underlying objectives of our approach are: 1. Ensuring that the resulting disjointness-enriched ontology is satisfiable (i.e., contradictionfree) for usability reasons since otherwise it would be unusable for any reasoning tasks, including ontology-supported querying. 2. Minimizing the number of interactions with the LLM for eficiency reasons and costawareness.

We propose to address both objectives using automated reasoning. More specifically, we continuously materialize all the (non-)disjointness information that follows logically from the original knowledge base plus the already acquired disjointness information. Thus, the LLM is only queried about the disjointness status of pairs of classes, when neither of the outcomes would result in an inconsistency. In this way, the derived information remains contradiction-free “by design” and, at the same time, the number of queries to the LLM is significantly reduced. Our approach relies on several logical correspondences, discussed in the following. Proposition 1. Let be a knowledge base and let C1, C2, D1, D2 be classes of such that the following statements follow from : (i) C1 and C2 are disjoint, (ii) D1 is a subclass of C1, (iii) D2 is a subclass of C2. Then also entails that D1 and D2 are disjoint.

Proof. Consider an arbitrary model ℐ of . According to the assumptions and in view of Definition 1, we know that (i) C1ℐ ∩ C2ℐ = ∅, (ii) D1ℐ ⊆ C1ℐ , and (iii) D2ℐ ⊆ C2ℐ . We equivalently express (ii) and (iii) by (ii’) D1ℐ = C1ℐ ∩ D1ℐ , and (iii’) D2ℐ = C2ℐ ∩ Dℐ . This allows us to infer 2 D1ℐ ∩ D2ℐ = (C1ℐ ∩ D1ℐ ) ∩ (C2ℐ ∩ D2ℐ ) = (C1ℐ ∩ C2ℐ ) ∩ (D1ℐ ∩ D2ℐ ) = ∅ ∩ (D1ℐ ∩ D2ℐ ) = ∅. We exploit this property to use subclass relationships from to deduce class disjointness statements from existing class disjointness statements. This way we avoid posing redundant disjointness queries to the underlying LLM.

Proposition 2. Let be a knowledge base and let C1, C2, C be classes of such that the following statements follow from : (i) C1 and C2 are disjoint, (ii) C is a subclass of C1, (iii) C is a subclass of C2. Then also entails that C is incoherent.

Proof. Consider an arbitrary model ℐ of . According to the assumptions and in view of Definition 1, we know that (i) C1ℐ ∩ C2ℐ = ∅, (ii) Cℐ ⊆ C1ℐ , and (iii) Cℐ ⊆ C2ℐ . We equivalently express (ii) and (iii) by (ii’) Cℐ = C1ℐ ∩ Cℐ , and (iii’) Cℐ = C2ℐ ∩ Cℐ . This allows us to infer Cℐ = Cℐ ∩ Cℐ = (C1ℐ ∩ Cℐ ) ∩ (C2ℐ ∩ Cℐ ) = (C1ℐ ∩ C2ℐ ) ∩ Cℐ = ∅ ∩ Cℐ = ∅.

We exploited this property indirectly under the assumption that any named class in the considered ontology is supposed to have instances – which seems to be a reasonable assumption since, otherwise, the definition of the class appears to be meaningless. In that case, any two classes that have a common subclass must be not disjoint.

Proposition 3. Let be a knowledge base, let C1, C2 be classes and let be an individual of that such that the following statements follow from : (i) C1 and C2 are disjoint, (ii) is an instance of C1, (iii) e is an instance of C2. Then is unsatisfiable.

Proof. Suppose ℐ is a model of . According to the assumptions and in view of Definition 1, we know that (i) C1ℐ ∩ C2ℐ = ∅, (ii) eℐ ∈ C1ℐ , and (iii) eℐ ∈ C2ℐ . Then, combining (ii) and (iii) we obtain eℐ ∈ C1ℐ ∩ C2ℐ and applying (i) yields eℐ ∈ ∅ which is a contradictory statement. Thus cannot have any models, which means it is unsatisfiable.

Again, this property can be exploited by noting that any two classes having common instances must not be disjoint. These considerations lead to the proposed methodology, detailed in Algorithm 1, which achieves the above-mentioned objective of producing an enriched knowledge base that is guaranteed to be contradiction-free, provided that the original knowledge base is.

Algorithm 2 achieves the objective of reducing the number of interactions with the LLM and maintaining satisfiability as new disjointness information is added through the LLM. The aim of producing an output that accurately reflects taxonomic relationships crucially depends on the quality and accuracy of the LLM’s responses. This, in turn, is influenced by both the LLM itself and the chosen prompting strategy. We focus on these issues in Section 5.

The last steps of Algorithm 2 (lines 14 and 15) are optional, but highly recommended, as they remove logically redundant statements from the disjointness-enriched knowledge base ∪ . This yields a knowledge base that is logically equivalent but typically much smaller in size and hence both easier to process algorithmically and to scrutinize and maintain manually. Also, this “pruning step” is not computationally expensive, as it only requires |* | calls to a reasoner.

5. Experiments

In this section, we experiment with the approach proposed in Section 4 on the classes extracted from the DBpedia ontology. In particular, by relying on Algorithm 1, we obtain the list ℒ related to the DBpedia ontology. We have that |ℒ| = 1148, with 370 pairs labeled as disjoint and 778 pairs labeled as not unknown. In Table 2, we provide some examples of classes in ℒ.

Note that the list ℒ assumes that the ontology designers carefully produced a taxonomy that is intended to also reflect disjointness between classes. As shown in Section 3, however, this is not the case. The design of the taxonomy of DBpedia is structured such that disjoint axioms might result in unwanted inconsistencies. For this reason, we employ multiple metrics to evaluate the LLMs’ performances, each measuring a diferent behavior of the model. For all metrics, a higher score indicates better performances, with 1 being the maximum score. In particular, disjoint recall (DR) measures how much the LLM aligns with humans by measuring the amount of true disjointness axioms that have been identified by the LLM. This measure provides an evaluation of the reliability of the prompt. Non-disjoint F1 (NDF1) measures the F1 score between the non-disjoint couples in and the ones identified by the LLM. This provides a measure of how conservative the LLM is on its answers – i.e. how much the LLM acknowledges the open-world assumption. The F1 metrics measure the end-to-end performances of the model. The symmetric consistency metric (SC) measures how much the answers provided by the LLM Algorithm 2 Determine the set of disjointness statements consistent with

Input A list ℒ containing pairs of classes labeled as “unknown”, “disjoint” or “not disjoint”; a prompt for disjointness classification, with : C × C → {“disjoint”,“not disjoint”} the function that queries an LLM for disjointness of two classes using prompt

Output A set of class disjointness axioms, such that all valid disjointness statements logically follow from ∪ and no invalid disjointness statements follow from it. respect the symmetric property of the disjointness axiom – i.e. if is disjoint from then is disjoint from . Finally, we measure the overall accuracy of each model.

Prompting We adopt diferent prompting strategies: a naive approach, where the LLM has to autonomously understand the task, a task description approach, where the disjointedness task is described and a few-shot approach that extends the task description by also providing some positive and negative examples. For each prompt, we frame the problem as a questionanswering (QA) task, where the LLM has to answer positively or negatively to classify two classes as disjoint. To identify the best QA approach, we identify two prompts: (i) the LLM has to answer positively to classify two classes as disjoint and (ii) the LLM has to answer negatively. Table 3 describes the prompt templates we used. When possible, we rely on the instruction format of each LLM and use the Prompting Strategy template to instruct the LLM while we use the QA Strategy as a query to the instructed LLM.

5.1. Experimental setup

We perform our experiments on publicly available LLMs, to ensure full reproducibility of the experiments. For each LLM, we set the sampling temperature to 0, to reduce the randomness of the result. Moreover, we only rely on small LLMs – i.e. LLMs with approximately 8 billion of parameters. Through the use of proper optimization techniques, it is possible to run these models on consumer-level devices without the need for specialized hardware. We perform This is a question about ontological disjointness, answer only with “yes” or “no”.

Examples of disjoint are: “person” and “file system”, “tower” and “person”, “place” and “agent”, “continent” and “sea”, “baseball league” and “bowling league”, “planet” and “star”.

Examples of not disjoint are: “basketball player” and “baseball player”, “means of transportation” and “reptile”, “garden” and “historic place”, “president” and “beauty queen”, “castle” and “prison”.

QA Strategy our experiments on a selection of the current state-of-the-art models, including Mistral 0.3 7B [ 25 ], Gemma 2 9B [ 26 ], LLama 3 8B10, and Qwen 2 7B [ 27 ]11. All experiments are run on 8-bit quantized models on an RTX3090 with 24GB of RAM. We experiment with each combination of the prompts of Table 3.

5.2. Results

The overall results are shown in Table 4. In general, LLMs achieve promising results in disjointness detection. Notably, the best prompting technique is not providing few-shot examples, but rather providing the LLM with little to no description of the task. Indeed, it has been observed how few-shot prompting is more efective when in-context learning is required, while zero-shot prompting is more efective when the implicit knowledge of the LLM should be exploited [ 28]. Nonetheless, further research on few-shot prompting for disjointness classification should be performed, as lower performances can also be attributed to the amount and nature of the examples we provide in the prompt. We manually select examples that are likely to provide meaningful disjointness instances. However, a more complex approach could be employed, such as exploiting Retrieval Augmented Generation (RAG) techniques to provide examples that are more likely to be relevant for the classes used as input. Diferent heuristics can be used to measure the relevance of other classes, such as word embeddings or knowledge graph embeddings. Interestingly, framing the problem as a negative QA task – i.e. asking whether an individual of a class can also be an instance of another class – consistently outperforms the positive QA prompt. This could be attributed to the fact that using the negative approach is 10https://llama.meta.com/ 11Due to their closed-source nature and high costs, we reserve the exploration of GPT-3.5 and GPT-4 for future work. more consistent with natural language questions. LLMs can actively exploit their pre-training phase, which generally includes a fine-tuning phase to solve QA tasks akin to our negative prompt. On average, Gemma 2 performs better than the other LLMs. However, depending on the requirements, other LLMs might be better suited. For instance, Mistral 0.3 is better aligned with human judgment, since it has a higher recall on disjointness axioms.

5.3. Disjointness on DBpedia

Given the results of Table 4, we consider Gemma 2 with task description prompt and a negative QA strategy as the most efective way of producing disjointness axioms among the methods tested. We execute Algorithm 2 on the whole DBpedia ontology. We rely on a straightforward random selection for the pair (D1, D2) (line 3). In total, the algorithm takes 21589.75 ≈ 6ℎ to execute. Note that given the random selection, we are not able to exploit parallelism and query the LLM with single prompts. However, a selection strategy that enables parallel selection would greatly enhance the performance of the algorithm. In total, we find 510, 600 disjointness axioms, which results in ≈ 98% of the classes participating in at least one disjointness axiom. The number of axioms can be greatly reduced by relying on the “pruning” operation of Algorithm 2 (line 15). In the case of the DBpedia ontology, the number of resulting axioms is 170, 122 – a reduction of ≈ 66%.

For illustration and discussion purposes, Table 5 shows a non-representative selection of particularly discussion-worthy positive and negative disjointness statements retrieved via Algorithm 2. We observe that for some class relationships, including both common-sense and domain-specific classes, our approach resulted in the “conservative” misclassification of classes as non-disjoint, meaning the LLM classified the classes as non-disjoint despite the classes actually being disjoint. Examples include dbo:VideogamesLeague and dbo:Website, dbo:GeneLocation and dbo:HumanGene, and dbo:Identifier and dbo:District. Conversely, we also observed ”aggressive” misclassification where our approach classified classes as disjoint despite them being really non-disjoint. Straightforward examples include dbo:PlayboyPlaymate and dbo:Camera or dbo:WikimediaTemplate and dbo:WomensTennisAssociationTournament, with a more complicated example being dbo:Mosque and dbo:Museum. The latter being disproven by a counter-example, the famous Mosque Hagia Sophia in Turkey12. To address these misclassification instances, we suspect that providing more contextual information in the prompt may improve classification accuracy, especially for domain-specific scenarios. Also, future work could be done to assess how, through prompt design, the approach could encourage more “aggressive” or “conservative” disjointness classifications in scenarios where relationships are more uncertain.

6. Conclusion and Future Work

This work shows that LLMs can roughly identify and assert disjointness axioms in ontologies, with a diferent degree of reliability depending on the model. By harnessing their inherent background knowledge and employing strategic prompt engineering, we showed that these models can classify ontological disjointness with minimal human intervention. This capability simplifies ontology management and supports more robust reasoning in knowledge graphs. Our findings underscore the potential of LLMs as valuable tools for the automated enrichment of ontologies, which encourages future exploration and innovation in this domain.

Future works include testing the approach proposed in Section 4 on other ontologies, to assess its efectiveness on diferent types of ontologies, including domain-specific ontologies. Additionally, comprehensive validation by human domain experts would be required to obtain conclusive insights into the degree of reliability of the axioms asserted by the LLM.

Moreover, using diferent LLMs with diferent numbers of parameters and improving and expanding our strategies for testing disjointness constitutes interesting future work.

It could be worthwhile to look into heuristics for – given a large list of disjointness candidate pairs – picking those entries that are particularly “promising”. One option would be to follow the strong disjointness assumption [ 6 ] and pick “sibling classes”, that is, classes and that have a common direct superclass . Furthermore, it could be interesting to test class pairs with just one or two examples of non-disjointness, as these instances may be errors to remove from the KG. On another note, one could develop strategies for gauging the reliability of an LLM response by rephrasing the question asked. This involves adding a description of classes in prompts to see if it improves the answers, relying on proper ontology serialization techniques [29]. Finally, using advanced prompting techniques, such as chain-of-thought, may improve the results alongside RAG techniques to pick the few-shot examples. Similarly, a richer prompt, including more qualifying phrases such as “at the same time” to check the temporality of the disjointness or “theoretically” to force abstraction might instruct the model toward a more efective framing of the problem.

Acknowledgments

Elias Crum acknowledges funding provided by VITO NV (UG_PhD_2303_contract). Antonio De Santis’s doctoral scholarship is funded by the Italian Ministry of University and Research (MUR) under the National Recovery and Resilience Plan (NRRP), by Thales Alenia Space, and by the European Union (EU) under the NextGenerationEU project. Alessia Pisu acknowledges MUR and EU-FSE for financial support of the PON Research and Innovation 2014-2020 (D.M. 1061/2021). Nicolas Lazzari has received funding from the FAIR – Future Artificial Intelligence Research Foundation as part of the grant agreement MUR n. 341. Sebastian Rudolph is funded by the Bundesministerium für Bildung und Forschung (BMBF, Federal Ministry of Education and Research) and DAAD (German Academic Exchange Service) in project 57616814 (SECAI, School of Embedded and Composite AI).

Qwen2 Technical Report, arXiv preprint arXiv:2407.10671 (2024). [28] L. Reynolds, K. McDonell, Prompt Programming for Large Language Models: Beyond the Few-shot Paradigm, in: Y. Kitamura, A. Quigley, K. Isbister, T. Igarashi (Eds.), CHI ’21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama Japan, May 8-13, 2021, Extended Abstracts, ACM, 2021, pp. 314:1–314:7. URL: https://doi.org/10. 1145/3411763.3451760. doi:10.1145/3411763.3451760. [29] C. Ringwald, F. Gandon, C. Faron, F. Michel, H. Abi Akl, 12 shades of RDF: Impact of Syntaxes on Data Extraction with Language Models, in: ESWC 2024 Extended Semantic Web Conference, May 2024, Hersonissos, Greece., 2024.

[1]

T. D.

Wang , Gauging Ontologies and Schemas by Numbers , in: D. Vrandecic , M. C. SuárezFigueroa , A. Gangemi , Y. Sure (Eds.), Proceedings of 4th International EON Workshop 2006 Evaluation of Ontologies for the Web Co-located with the WWW2006 Edinburgh , UK, May 22 , 2006 , volume 179 of CEUR Workshop Proceedings, CEUR-WS.org , 2006 . URL: https://ceur-ws. org/ Vol- 179 /eon2006wang.pdf.

[2]

Völker ,

Vrandecic ,

Sure ,

Hotho , Learning Disjointness, in: E. Franconi,

Kifer , W. May (Eds.), The Semantic Web: Research and Applications, 4th European Semantic Web Conference, ESWC 2007 , Innsbruck, Austria, June 3-7, 2007 , Proceedings, volume 4519 of Lecture Notes in Computer Science, Springer, 2007 , pp. 175 - 189 . URL: https://doi. org/10.1007/978-3- 540 -72667-8_ 14 . doi: 10 .1007/978-3- 540 -72667-8\_ 14 .

[3]

Völker ,

Fleischhacker ,

Stuckenschmidt , Automatic acquisition of class disjointness , J. Web Semant . 35 ( 2015 ) 124 - 139 . URL: https://doi.org/10.1016/j.websem. 2015 . 07 .001. doi: 10 . 1016/J.WEBSEM. 2015 . 07 .001.

[4]

Rizzo , C. d'Amato,

Fanizzi , An unsupervised approach to disjointness learning based on terminological cluster trees , Semantic Web 12 ( 2021 ) 423 - 447 . URL: https: //doi.org/10.3233/SW-200391. doi: 10 .3233/SW-200391.

[5]

Schlobach , Debugging and Semantic Clarification by Pinpointing , in: A. GómezPérez, J. Euzenat (Eds.), The Semantic Web: Research and Applications , Second European Semantic Web Conference, ESWC 2005 , Heraklion, Crete, Greece, May 29 - June 1, 2005 , Proceedings, volume 3532 of Lecture Notes in Computer Science, Springer, 2005 , pp. 226 - 240 . URL: https://doi.org/10.1007/11431053_16. doi: 10 .1007/11431053\_ 16 .

[6]

Cornet ,

Abu-Hanna , Usability of expressive description logics-a case study in UMLS , in: AMIA 2002 , American Medical Informatics Association Annual Symposium, San Antonio, TX, USA, November 9- 13 , 2002 , AMIA, 2002 . URL: https://knowledge.amia.org/amia-55142 -a2002a-1 .610020/t-001 -1 .612667/f-001 -1 . 612668/a-036 -1 .613143/a-037 -1 . 613140 .

[7]

Lehmann , DL-Learner: Learning Concepts in Description Logics ,

Mach . Learn. Res . 10 ( 2009 ) 2639 - 2642 . URL: https://dl.acm.org/doi/10.5555/1577069.1755874. doi: 10 .5555/ 1577069.1755874.

[8]

B. P.

Allen ,

Stork ,

Groth , Knowledge Engineering Using Large Language Models , Transactions on Graph Data and Knowledge 1 ( 2023 ) 3: 1 - 3 : 19 . URL: https://drops.dagstuhl. de/entities/document/10.4230/TGDK.1. 1 .3. doi: 10 .4230/TGDK.1. 1 .3.

[9]

Pan ,

Luo ,

Wang ,

Chen ,

Wang ,

Wu , Unifying large language models and knowledge graphs: A roadmap , IEEE Transactions on Knowledge and Data Engineering 36 ( 2024 ) 3580 - 3599 . doi: 10 .1109/TKDE. 2024 . 3352100 .

[10] A. D. Santis , M.

Balduini , F. D.

Santis , A.

Proia , A.

Leo , M.

Brambilla , E. D.

Valle , Integrating large language models and knowledge graphs for extraction and validation of textual test data , 2024 . URL: https://arxiv.org/abs/2408.01700. arXiv: 2408 . 01700 .

[11]

Petroni ,

Rocktäschel ,

Riedel ,

Lewis ,

Bakhtin ,

Wu ,

Miller , Language models as knowledge bases? , in: K. Inui,

Jiang ,

Ng , X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , Association for Computational Linguistics , Hong Kong, China, 2019 , pp. 2463 - 2473 . URL: https://aclanthology.org/D19-1250. doi: 10 .18653/v1/ D19 -1250.

[12]

Vaswani ,

Shazeer ,

Parmar ,

Uszkoreit ,

Jones ,

A. N.

Gomez ,

Kaiser , I. Polosukhin , Attention is All you Need, in: I. Guyon, U. von Luxburg, S. Bengio,

H. M.

Wallach ,

Fergus ,

S. V. N.

Vishwanathan , R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9 , 2017 , Long Beach, CA, USA, 2017 , pp. 5998 - 6008 . URL: https://proceedings. neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

[13] J. D. M.-W. C. Kenton , L. K. Toutanova , BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , in: Proceedings of NAACL-HLT , 2019 , pp. 4171 - 4186 .

[14] T. B. Brown , B.

Mann , N.

Ryder , M.

Subbiah , J.

Kaplan , P.

Dhariwal , A.

Neelakantan , P.

Shyam , G.

Sastry , A.

Askell , S.

Agarwal , A.

Herbert-Voss , G. Krueger, T.

Henighan , R.

Child , A.

Ramesh , D. M.

Ziegler , J.

Wu , C.

Winter , C.

Hesse , M.

Chen , E. Sigler, M.

Litwin , S.

Gray , B.

Chess , J.

Clark , C.

Berner , S.

McCandlish , A.

Radford , I.

Sutskever , D.

Amodei , Language Models are Few-shot Learners , in: H. Larochelle , M.

Ranzato , R.

Hadsell , M.

Balcan , H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 , NeurIPS 2020 , December 6- 12 , 2020 , virtual, 2020 . URL: https://proceedings.neurips.cc/paper/2020/hash/ 1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.

[15] R. OpenAI , Gpt-4 technical report. arxiv 2303 .08774, View in Article 2 ( 2023 ).

[16]

Touvron ,

Lavril ,

Izacard ,

Martinet ,

Lachaux ,

Lacroix ,

Rozière ,

Goyal ,

Hambro ,

Azhar ,

Rodriguez ,

Joulin , E. Grave, G. Lample, LLaMA: Open and Eficient Foundation Language Models, CoRR abs/2302 .13971 ( 2023 ). URL: https://doi.org/ 10.48550/arXiv.2302.13971. doi: 10 .48550/ARXIV.2302.13971. arXiv: 2302 . 13971 .

[17]

Touvron ,

Martin ,

Stone ,

Albert ,

Almahairi ,

Babaei ,

Bashlykov ,

Batra ,

Bhargava ,

Bhosale ,

Bikel ,

Blecher ,

Canton-Ferrer ,

Chen ,

Cucurull ,

Esiobu ,

Fernandes ,

Fu ,

Fuller ,

Gao ,

Goswami ,

Goyal ,

Hartshorn ,

Hosseini ,

Hou ,

Inan ,

Kardas ,

Kerkez ,

Khabsa , I. Kloumann ,

Korenev ,

P. S.

Koura ,

Lachaux ,

Lavril ,

Lee ,

Liskovich ,

Lu ,

Mao ,

Martinet ,

Mihaylov ,

Mishra , I. Molybog,

Nie ,

Poulton ,

Reizenstein ,

Rungta ,

Saladi ,

Schelten ,

Silva ,

E. M.

Smith ,

Subramanian ,

X. E.

Tan ,

Tang ,

Taylor ,

Williams ,

J. X.

Kuan ,

Xu ,

Yan , I. Zarov,

Zhang ,

Fan ,

Kambadur ,

Narang ,

Rodriguez ,

Stojnic ,

Edunov , T. Scialom, Llama 2 :

Open

Foundation and Fine-tuned Chat

Models

, CoRR abs/2307 .09288 ( 2023 ). URL: https://doi.org/10.48550/arXiv.2307.09288. doi: 10 .48550/ ARXIV.2307.09288. arXiv: 2307 . 09288 .

[18]

Reid ,

Savinov ,

Teplyashin ,

Lepikhin ,

T. P.

Lillicrap ,

Alayrac ,

Soricut ,

Lazaridou ,

Firat ,

Schrittwieser , I. Antonoglou ,

Anil ,

Borgeaud ,

A. M.

Dai ,

Millican , E. Dyer,

Glaese ,

Sottiaux ,

Lee ,

Viola ,

Reynolds ,

Xu ,

Molloy ,

Chen ,

Isard ,

Barham ,

Hennigan ,

McIlroy ,

Johnson , J. Schalkwyk , E. Collins, E.

Rutherford , E.

Moreira , K.

Ayoub , M.

Goel , C. Meyer, G. Thornton, Z.

Yang , H.

Michalewski , Z.

Abbas , N.

Schucher , A.

Anand , R.

Ives , J.

Keeling , K.

Lenc , S.

Haykal , S.

Shakeri , P.

Shyam , A.

Chowdhery , R.

Ring , S.

Spencer , E.

Sezener , et al., Gemini 1 . 5: Unlocking multimodal understanding across millions of tokens of context , CoRR abs/2403 .05530 ( 2024 ). URL: https://doi.org/10.48550/arXiv.2403.05530. doi: 10 .48550/ARXIV.2403.05530. arXiv: 2403 . 05530 .

[19]

A. Q.

Jiang ,

Sablayrolles ,

Roux ,

Mensch ,

Savary ,

Bamford ,

D. S.

Chaplot , D. de Las Casas , E. B.

Hanna , F.

Bressand , G. Lengyel, G. Bour, G.

Lample , L. R.

Lavaud , L.

Saulnier , M.

Lachaux , P.

Stock , S.

Subramanian , S.

Yang , S.

Antoniak , T. L.

Scao , T.

Gervet , T.

Lavril , T.

Wang , T.

Lacroix , W. E.

Sayed , Mixtral of Experts, CoRR abs/2401 .04088 ( 2024 ). URL: https://doi.org/10.48550/arXiv.2401.04088. doi: 10 .48550/ ARXIV.2401.04088. arXiv: 2401 . 04088 .

[20]

Radford , J. Wu ,

Child ,

Luan ,

Amodei , I. Sutskever ,

Dean ,

Ghemawat , Language Models are Unsupervised Multitask Learners , in: OSDI'04: Sixth Symposium on Operating System Design and Implementation , 2018 , pp. 137 - 150 .

[21]

Chen ,

Zhang ,

Langrené ,

Zhu , Unleashing the potential of prompt engineering: a comprehensive review , 2024 . arXiv: 2310 . 14735 .

[22]

Wei ,

Wang ,

Schuurmans ,

Bosma ,

Ichter ,

Xia ,

E. H.

Chi ,

Q. V.

Le ,

Zhou , Chain-of-thought Prompting Elicits Reasoning in Large Language Models , in: S. Koyejo,

Mohamed ,

Agarwal ,

Belgrave ,

Cho , A . Oh (Eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 , NeurIPS 2022 , New Orleans, LA, USA, November 28 - December 9, 2022 , 2022 . URL: http://papers.nips.cc/paper_files/paper/2022/hash/ 9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html.

[23]

Kojima ,

S. S.

Gu ,

Reid ,

Matsuo ,

Iwasawa , Large Language Models are Zeroshot Reasoners , in: S. Koyejo,

Mohamed ,

Agarwal ,

Belgrave ,

Cho , A . Oh (Eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 , NeurIPS 2022 , New Orleans, LA, USA, November 28 - December 9, 2022 , 2022 . URL: http://papers.nips.cc/paper_files/paper/2022/ hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html.

[24]

Lehmann ,

Isele ,

Jakob ,

Jentzsch ,

Kontokostas ,

P. N.

Mendes ,

Hellmann ,

Morsey , P. van Kleef,

Auer , C. Bizer, DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web 6 ( 2015 ) 167 - 195 . URL: https://doi.org/10. 3233/SW-140134. doi: 10 .3233/SW-140134.

[25]

A. Q.

Jiang ,

Sablayrolles ,

Mensch ,

Bamford ,

D. S.

Chaplot , D. de Las Casas , F.

Bressand , G. Lengyel, G.

Lample , L.

Saulnier , L. R.

Lavaud , M.

Lachaux , P.

Stock , T. L.

Scao , T.

Lavril , T.

Wang , T.

Lacroix , W. E.

Sayed , Mistral 7B, CoRR abs/2310 .06825 ( 2023 ). URL: https://doi.org/10.48550/arXiv.2310.06825. doi: 10 .48550/ ARXIV.2310.06825. arXiv: 2310 . 06825 .

[26]

Mesnard ,

Hardin ,

Dadashi ,

Bhupatiraju ,

Pathak ,

Sifre ,

Rivière ,

M. S.

Kale ,

Love ,

Tafti ,

Hussenot ,

Chowdhery ,

Roberts ,

Barua ,

Botev ,

Castro-Ros ,

Slone ,

Héliou ,

Tacchetti ,

Bulanova ,

Paterson ,

Tsai ,

Shahriari ,

C. L.

Lan ,

C. A.

Choquette-Choo ,

Crepy ,

Cer ,

Ippolito ,

Reid ,

Buchatskaya ,

Ni , E. Noland, G. Yan, G. Tucker, G. Muraru, G. Rozhdestvenskiy,

Michalewski , I. Tenney , I. Grishchenko ,

Austin ,

Keeling ,

Labanowski ,

Lespiau ,

Stanway ,

Brennan ,

Chen ,

Ferret ,

Chiu , et al., Gemma: Open Models Based on Gemini Research and Technology, CoRR abs/2403 .08295 ( 2024 ). URL: https://doi.org/10.48550/arXiv.2403.08295. doi: 10 .48550/ARXIV.2403.08295. arXiv: 2403 . 08295 .

[27]

Yang ,

Hui ,

Zheng ,

Yu ,

Zhou ,

Li ,

Liu ,

Huang , et al.,