1. Introduction

On the Way to Diverse Datasets for Evaluating ABox Abduction Algorithms (Extended Abstract)

Janka Boborová

Jakub Kloc

Martin Homola

Júlia Pukancová

0 0 Comenius University in Bratislava , Mlynská dolina, 842 41 Bratislava , Slovakia

2025

38 3 6

We propose a method for generating evaluation datasets for ABox abduction algorithms, using diverse real-world knowledge bases, logical consequences as observations to ensure meaningfulness, justifications to guarantee explanations exist, and ontology modules to constrain the search space. Abduction [1, 2] is a form of inference that explains an observation by identifying its possible causes (explanations). We focus specifically on ABox abduction, where both the observation and its explanations consist of ABox assertions. To enable meaningful comparison and evaluation of ABox abduction algorithms, a suitable dataset of ABox abduction problems is needed. Such a dataset should include multiple real-world knowledge bases, each with meaningful observations. For each observation, it should provide abducibles (a set representing the search space) of varying sizes. Additionally, the ABox abduction problems should vary in the number and length of explanations. Datasets in existing evaluations sufer from several limitations, including use of only one knowledge base [ 3]; artificial automatically generated observations [3, 4, 5]; observations that may have no explanations [4]; limited diversity in explanation length and number [3]; and weak constraints on the search space [6, 7, 3, 4, 5]. Building on prior approaches, we have begun constructing an evaluation dataset that addresses these shortcomings.

eol>Description logics ABox abduction evaluation dataset test case

1. Introduction 2. Construction of a Robust Evaluation Dataset

To generate meaningful, non-artificial observations and reduce the search space without losing explanations, we propose two methods: one for generating observations (applicable to any knowledge base) and another for generating abducibles (applicable to any ABox abduction problem). For simplicity, the methods are defined for atomic concept assertions but extend easily to all atomic assertions and their complements. Applying these methods to diverse real-world knowledge bases enables the construction of a robust evaluation dataset.

Real-World Knowledge Bases: Koopmann et al. [ 4 ] proposed to use the 2015 OWL Reasoner Competition Corpus1[8], providing 1,920 diverse real-world ontologies, as a suitable set of knowledge bases for the evaluation of abduction algorithms. To focus only on relevant ontologies with the potential to produce interesting problems, we defined the following requirements: consistency; consistency check time ≤ 30 seconds; individual count ≥ 1. After applying these requirements, we obtained 865 ontologies as candidate knowledge bases.

Consequences as Observations: Although many knowledge bases are available, we are not aware of real-world use cases with predefined observations; therefore, observations must be generated separately. We aim to generate meaningful observations by selecting logical consequences of a knowledge base . To ensure explanatoriness ( ̸|= ), each must be modified to no longer entail observation . This is done by removing at least one assertion from each justification of , i.e., from a minimal set of axioms responsible for the entailment of [9]. Our approach is described in Algorithm 1.

The core idea is to “corrupt” by removing assertions that can later be recovered as explanations through ABox abduction. As ABox abduction yields only ABox assertions, other axiom types cannot be removed during the modification of .

Algorithm 1 Generating ABox Abduction Problems Input: knowledge base Output: a set of ABox abduction problems 1: ← {} 2: ← {() | ∈ , ∈ , |= (), () ∈/ } 3: for () in do 4: (()) ← get the ABox parts of justifications for () using OWLExplanation 5: end for 6: for () in do 7: ← get the size of the largest justification in (()) 8: for in ⟨1, ⟩ do 190:: for ← in (()) do 11: ← randomly select (, ||) assertions from 12: ← ∖ 13: end for 14: ← ∪ { = (, ())} 15: end for 16: end for 17: return ◁ generate observations ◁ compute observation justifications ◁ generate ABox abduction problems ◁ modify

To explore a wider range of possibilities, we generated multiple modified knowledge bases for each observation by progressively removing more assertions, aiming to produce diferent explanations for the same observation.

Module-Based Abducibles: During evaluation, it is useful to examine how algorithms perform with search spaces of varying sizes. However, reducing the search space requires a careful strategy to preserve explanations.

We propose using module extraction [10, 11], a technique that extracts a meaningful fragment of an ontology while preserving all axioms relevant to the complete meaning of a given signature. This technique can be used to generate module abducibles, i.e., assertions relevant to an observation , excluding symbols unlikely to appear in explanations: Abdmodule = {() | ∈ , ∈ from Σ()} . Specifically, the ⊤-module, which includes all subclasses of the atomic classes in the signature of , as explanations typically involve concepts subsumed by the concept in .

To diferentiate within Abdmodule when generating abducibles of a given size, we prioritise assertions involving individuals from , as they are more likely to appear in explanations.

3. Analysis of Generated Inputs

Generating ABox Abduction Problems: The observation generation process applied to 865 knowledge bases resulted in 37,042 ABox abduction problems. The largest ABox part of a justification contained 12 assertions, and the maximum number of justifications for an observation was 38 (all with a single-assertion ABox part). Over 90% of observations had one justification with a single-element ABox part.

|ℰ | (length : |{ℰ | ℰ ∈ ℰ , |ℰ| = }|)

In theory, more justifications should lead to more explanations, and justifications with more assertions should lead to longer explanations. In practice, justifications may contain complex assertions that cannot be reconstructed by algorithms limited to atomic assertions and their complements. Additionally, observations may have explanations beyond those found in the justifications. Therefore, given the large number of generated problems, identifying those with interesting properties is challenging without additional information.

Generating Abducibles: To analyse abducible generation, we selected a sample of ABox abduction problems (Table 1) by applying MergeXplain (MXP)2 [13] with Abddefault = {() | ∈ , ∈ from Σ( ∪ )} to a random subset of the generated problems. MergeXplain returns a set of explanations, ℰ , containing all explanations of length 1 and, if present, at least one additional explanation of a greater length. We selected the final sample based on the number and length of explanations. Notably, only one problem (ont934_obs01) in the subset produced explanations of varying lengths, including some longer than one.

For each problem in the sample, we generated module abducibles, Abd, which on average reduced the search space to 46%. Abd consistently included all explanations found by MergeXplain, ensuring none were lost.

To generate abducibles of varying sizes, three methods were applied: (a) module abducibles prioritising assertions with individuals from the observation (our proposed approach), (b) module abducibles without prioritisation, and (c) completely random selection. Each method was used to generate abducibles of sizes 10, 25, 50, 100, 250, and 500, and was run three times per size to obtain averaged results. For each method and size, we report the percentage of explanation assertions covered by the generated Abd sets, relative to their size, computed as m|ienx(p|Al.bda|s,|seexrptli.onasssienrtAiobnds||) (e.g., a set of size 10 can cover at most 10 explanation assertions, and the maximum possible coverage is bounded by the total number of explanation assertions). The results (Table 2) were averaged over all problems and runs. Method (a) was the most successful, consistently generating sets that covered the highest number of explanation assertions across all sizes. At sizes 250 and 500, it achieved full coverage for all problems except ont934_obs01, where it generated no more than two explanation assertions per Abd set. Still, even on this problem, it outperformed the other methods, which on average generated none. Out of 108 runs, the generated set contained no explanation assertions in 5 cases for method (a) (all for ont934_obs01), 57 for method (b), and 78 for method (c). 2MXP was run using CATS [12]: https://github.com/Comenius-Abduction-Team/CATS-Abduction-Solver

4. Discussion and Outlook

The dataset generation process needs refinement, especially in producing ABox abduction problems. To narrow down the generated problems and focus on the most relevant ones, we plan to analyse the ABox parts of justifications. Since many observations produce only single-assertion explanations, we aim to use observations composed of multiple assertions.

In contrast, for generating abducibles, the module-based approach prioritising individuals from the observation seems promising.

Acknowledgments

This research was sponsored by the Slovak Republic under the SRDA grant APVV-23-0292 (DyMAX) and VEGA grant no. 1/0630/25 (eXSec), and it is also based on the work from COST Action CA23147 GOBLIN. J. Boborová was supported by Grant no. UK/1283/2025 awarded by Comenius University in Bratislava.

Declaration on Generative AI

During the preparation of this work, the authors used gpt-4o in order to: Text translation, Grammar and spelling check, Improve writing style. After using these tools, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content. [8] B. Parsia, N. Matentzoglu, R. S. Gonçalves, B. Glimm, A. Steigmiller, The OWL reasoner evaluation (ORE) 2015 competition report, J. Autom. Reason. 59 (2017) 455–482. URL: https://doi.org/10.1007/ s10817-017-9406-8. doi:10.1007/S10817-017-9406-8. [9] A. Kalyanpur, B. Parsia, M. Horridge, E. Sirin, Finding all justifications of OWL DL entailments, in: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007, volume 4825 of Lecture Notes in Computer Science, Springer, 2007, pp. 267–280. URL: https://doi.org/10.1007/ 978-3-540-76298-0_20. doi:10.1007/978-3-540-76298-0\_20. [10] C. D. Vescovo, D. Gessler, P. Klinov, B. Parsia, U. Sattler, T. Schneider, A. Winget, Decomposition and modular structure of bioportal ontologies, in: The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I, volume 7031 of Lecture Notes in Computer Science, Springer, 2011, pp. 130–145. URL: https://doi.org/10.1007/ 978-3-642-25073-6_9. [11] B. C. Grau, I. Horrocks, Y. Kazakov, U. Sattler, Just the right amount: Extracting modules from ontologies, in: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banf, Alberta, Canada, May 8-12, 2007, ACM, 2007, pp. 717–726. URL: https://doi.org/10. 1145/1242572.1242669. doi:10.1145/1242572.1242669. [12] J. Kloc, J. Boborová, M. Homola, J. Pukancová, CATS solver: The rise of hybrid abduction algorithms, in: Proceedings of the 38th International Workshop on Description Logics (DL 2025), Opole, Poland, September 3–6, 2025, 2025. To appear. [13] K. M. Shchekotykhin, D. Jannach, T. Schmitz, MergeXplain: Fast computation of multiple conflicts for diagnosis, in: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, AAAI Press, 2015, pp. 3221–3228. URL: http://ijcai.org/Abstract/15/454.

[1]

Elsenbroich ,

Kutz ,

Sattler , A case for abductive reasoning over ontologies , in: Proceedings of the OWLED*06 Workshop on OWL: Experiences and Directions , Athens, GA, US, volume 216 of CEUR-WS, CEUR-WS.org , 2006 . URL: https://ceur-ws. org/ Vol- 216 /submission_25.pdf.

[2]

C. S.

Peirce , Illustrations of the logic of science VI: Deduction, induction, and hypothesis , Popular Science Monthly 13 ( 1878 ) 470 - 482 .

[3]

Homola ,

Pukancová , I. Balintová ,

Boborová , Hybrid MHS-MXP ABox abduction solver: First empirical results , in: Proceedings of the 35th International Workshop on Description Logics (DL 2022 ) co-located with Federated Logic Conference (FLoC 2022 ), Haifa, Israel, August 7th to 10th , 2022 , volume 3263 of CEUR Workshop Proceedings, CEUR-WS.org , 2022 . URL: https: //ceur-ws. org/ Vol- 3263 /paper-13.pdf.

[4]

Koopmann ,

Del-Pinto ,

Tourret ,

R. A.

Schmidt , Signature-based abduction for expressive description logics , in: Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning , KR 2020, Rhodes, Greece, 2020 , pp. 592 - 602 . URL: https://doi.org/ 10.24963/kr.2020/59. doi: 10 .24963/KR. 2020 /59.

[5]

Du ,

Wang ,

Shen , A tractable approach to ABox abduction over description logic ontologies , in: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31 , 2014 ,

Québec

City , Québec, Canada, AAAI Press, 2014 , pp. 1034 - 1040 . URL: https://doi.org/10.1609/aaai. v28i1.8852. doi: 10 .1609/AAAI.V28I1.8852.

[6]

Pukancová , M. Homola, ABox abduction for description logics: The case of multiple observations , in: Proceedings of the 31st International Workshop on Description Logics , Tempe, Arizona, US , volume 2211 of CEUR-WS, CEUR-WS.org , 2018 . URL: https://ceur-ws. org/ Vol- 2211 /paper-31.pdf.

[7]

Mrózek ,

Pukancová , M. Homola, ABox abduction solver exploiting multiple DL reasoners , in: Proceedings of the 31st International Workshop on Description Logics co-located with 16th International Conference on Principles of Knowledge Representation and Reasoning (KR 2018 ), Tempe, Arizona, US , October 27th - to - 29th, 2018 , volume 2211 of CEUR Workshop Proceedings, CEUR-WS.org , 2018 . URL: https://ceur-ws. org/ Vol- 2211 /paper-24.pdf.