1. Introduction

W. Jearanaiwongkul);

1613-0073

Human-Friendly Explanation for Ontology-based Concept Similarity: Design and Development

Watanee Jearanaiwongkul

watanee@tohoku.ac.jp 0

Teeradaj Racharak

racharak@tohoku.ac.jp 0

Workshop

Explainable AI, Explanation generation, Ontology Concept Similarity, LLMs, Interpretable AI

0 Advanced Institute of So-Go-Chi (Convergence Knowledge) Informatics, Tohoku University , Miyagi , Japan

2025

000 0 0002

While recent neuro-symbolic approaches have enabled interpretable computation of concept similarity in ontologies, translating these formal explanations into human-friendly forms remains a challenge. In this work, we investigate how large language models, particularly ChatGPT and Gemini, can be prompted to generate natural language explanations that justify similarity results in a way that is understandable to end users. Building on a neuro-symbolic framework for measuring concept similarity in Description Logic (DL) ontologies, we explore two types of human-friendly explanations i.e., node-based and path-based explanation. Furthermore, we evaluate LLMs's ability to generate each component of these path-based explanations using our small curated dataset. We evaluate the efectiveness of prompting approaches along diferent dimensions such as clarity, informativeness, and perceived usefulness through both qualitative analysis and user studies. Our results show the potential and limitations of using LLMs as a tool to bridge the gap between formal similarity reasoning and human interpretability, paving the way for more transparent ontology-driven systems.

1. Introduction

CEUR ceur-ws.org

2. Dataset Preparation

Our goal was to construct a dataset consisting of original logic-based explanation and their corresponding human-friendly explanations. To achieve this, we designed two explanation formats according to homomorphism-based semantic similarity introduced in [5]: ( 1 ) node-based explanation (which describes how a similarity score was calculated by using the comparison of similarity between two nodes) and ( 2 ) path-based explanation (which describes by using the comparison of two paths instead).

To evaluate these formats, we conducted a survey with 20 participants (10 per each explanation type) and used their feedback to improve and select one design for our experiment. The survey provided participants with: ( 1 ) similarity score between two concepts, ( 2 ) background knowledge represented as knowledge graphs, ( 3 ) summary explanation, ( 4 ) detailed explanation, and ( 5 ) table explanation.

Participants were asked to review the similarity score and explanation of why they are considered similar, then respond to eight questions. The participants were recruited via Prolific 1, compensated fairly, and represented a wide range of demographics (ages 19–65, educational backgrounds from high school to PhD, and professions including data analysis, engineering, IT, and research). All participants were fluent in English and resided in countries such as the US, South Africa, Australia, Mexico, Canada, and France. Table 1 presented the main results from six questions. The responses were rated on a 0–7 Likert Scale (0 = lowest satisfaction, 7 = highest). While the node-based explanations were easier to “read”, path-based explanations were more easier to “understand” and made the explanation more suficient as indicated by lower score on the questions about the need of additional explanation. Therefore, we selected to experiment with path-based explanation approach and prepared a dataset w.r.t. this format. A total of 20 well-constructed explanations were created and used for this study, examples of which are shown in Table 2.

Question Q1. Does the explanation provide understandable reasons? Q2. Is the explanation easy to read? Q3. Is the explanation suficient to answer why the 2 words are similar? Q4. Are you interested in more explanation of why they are similar? Q5. Is the explanation enough to answer how the 2 words are similar? Q6. Are you interested in more explanation on how the similarity score was calculated? Average Median

Node-based Path-based Node-based Path-based 4.5 5.4 4.9 5.4 5.2 5.6 5.8 4.4 5.4 4.3 5.5 4.5

3. Experiments and Results 3.1. Experimental setting

This work conducted an experiment using OpenAI API GPT-4o model, configured with a temperature 0 and top-p of 0.05 to ensure deterministic and focused outputs. The objective was to evaluate the model’s performance in generating human-friendly explanations from the given logic-based explanations. As aforementioned, our study involves three types of explanations: summary, detailed, and table-based. Due to space limitation, a shortened version of the one-shot prompt is provided in Prompt 1, while the complete prompt is publicly available on this GitHub repository2. 1https://www.prolific.com/ 2https://github.com/realearn-people/sim-elh-explainer-to-text Description graphs representing the meaning of two concepts Original explanation (from [5]) Summary explanation Detailed explanation Graph 1: SeaFood − (Menu, contain, Fish) − (Menu, contain, Shrimp) − (Menu, serveWith, LemonSauce) Graph 2: FriedFood − (Menu, contain, Shrimp) [G1*SeaFood][G2*FriedFood] → (0.52, G1*Menu, G1*contain.Fish, G1*contain.Shrimp, G1*serveWith.LemonSauce, (contain, serveWith)) [G1*Fish][G2*Shrimp] → (0.7, G1*Fish, { }, (Fish,Shrimp)) [G1*Shrimp][G2*Shrimp] → (1.0, G1*Shrimp, { }, { }) [G1*LemonSauce][G2*Shrimp] → (0.02, G1*Shrimp, { }, (LemonSauce, Shrimp)) They are 52% similar because Their Top Parent node comparison: 1 of 1 matched 100% Their Path comparison: 1 of 3 matched 100% They are 52% similar because They share top parent node: 100% match: Menu They share exactly the same “path” from Graph SeaFood to Graph FriedFood: 100% match: (Menu, contain, Shrimp) and (Menu, contain, Shrimp) They share the similar “path” from Graph SeaFood to Graph FriedFood (partly computed using the used embeddings): 70% match: (Menu, contain, Fish) and (Menu, contain, Shrimp) 2% match: (Menu, serveWith, LemonSauce) and (Menu, contain, Shrimp) Various types of explanations and graph representations for two concepts (namely, SeaFood and FriedFood). Note that asterisk indicates super-sub structures of graph’s elements.

Prompt 1 • • • Input structure: You are given 2 things: 1. Graph: A pair of knowledge graphs. 2. Explanation: A list of structured tuples that shown similarity scores and the reason how the score was calculated from: … Example of input: Graph 1: (…), Graph 2: (…) Explanation:

[G1*ActivePlace][G2*Mangrove] → (0.62, {G1*Place}, {∃{G1*canWalk, G1*canMoveWithLeg}.Trekking, ∃G1*canTravelWithSail.Kayaking}, {(canMoveWithLeg, canTravelWithSail)}) [G1*Trekking][G2*Trekking] → (1.0, {G1*Trekking}, { } , { } ) [G1*Kayaking][G2*Trekking] → (0.97, {G1*Kayaking}, { } , {(Kayaking,Trekking)} ) Output format of human-friendly explanation: ActivePlace and Mangrove are 62% similar meaning.

• Summary Explanation: … • Detail Explanation: … • Table Explanation: …

Your task: generate a friendly, human-readable explanation based on the input below, using the exact format provided. [Input:]

3.2. Results and Discussion

for listing the “same” path comparison (0.824), indicating balanced performance with minimal false positives and false negatives. Moreover, the “similar” path comparison achieves perfect precision ( 1 ) but a lower recall (0.5), resulting in a moderate F1 score (0.67), which indicates that many true matches are missed. For Table Explanation, the performance is slightly better, with precision (0.833) and recall (0.882) for the “same” path comparison leading to an F1 score of 0.857, and the “similar” path comparison shows perfect precision ( 1 ) and a recall of 0.6, indicating the same trend of missing true matches as observed in Detailed Explanation. Overall, these results suggest that Table Explanation provides the most balanced performance across metrics, particularly for listing the results of the “same” path comparisons.

3.3. Related work

Logic-based explainability has recently emerged as a rigorous and interpretable alternative to post-hoc explainers. Marques-Silv [6] provides a comprehensive survey on logic‑based explanations in trustable AI/ML which highlighting the needs of rigorous definitions, rigorous computation of explanations, but also expressivity of explanations. The paper also discusses ongoing challenges and outlines future directions, including the integration of symbolic and sub-symbolic methods and the need for scalable, user-centric explanations. In [7], they focused on explaining the predictions of machine learning models using interpretable concepts and logic rules. Specifically, explanation is provided in simple ifrst-order logic format for its expressiveness. Considering explaining concept similarity in ontologies, Racharak [5] introduced an ℰ ℒ ℋ-based similarity scoring framework that combines description logic with (pre-trained) embedding-based representations and ofers structured symbolic explanations as output. These methods ofer precise explanations but the output form can be further optimized for clarity to non-expert users. This work fulfills this gap by investigating how to generate human-friendly explanation from the neural-symbolic structures for ontology-based concept similarity.

4. Conclusion

This study presents an initial step to explore human-friendly explanation generation for similarity score of concepts. By defining node-based and path-based explanations and focusing on the latter in a ChatGPT-based experiment, we demonstrated both the potential and challenges of using LLMs for interpretable AI explanations. Our findings show that ChatGPT can efectively generate explanations for single node comparison, particularly top-parent nodes in all three types of explanations. However, a limitation remains in capturing true matches for similar path comparisons in both Detailed Explanations and Table Explanations. Upon examining the errors, we observed that LLMs tend to miss correct matches when the size of description graphs increase, i.e., the number of nodes and the graphs’ depth increase. Another error happened when each edge was labeled by a set in the explanation graphs. Our future steps are to enlarge the dataset, collect more feedback to improve the explanation format, as well as experiment with other LLMs and deep learning-based models’ construction for text generation. We also plan to fine-tune LLMs with our dataset for our explanation-graph-to-text translation in future.

Declaration of Generative AI and AI-assisted Technologies

During the preparation of this work, the authors used ChatGPT for grammatical check. After using this tool, the authors reviewed and edited the content as needed. Thus, they take full responsibility for the content of the publication.

[1]

Doshi-Velez ,

Kim , Towards a rigorous science of interpretable machine learning , arXiv preprint arXiv:1702.08608 ( 2017 ).

[2]

Adadi ,

Berrada , Peeking inside the black-box: a survey on explainable artificial intelligence (xai) , IEEE access 6 ( 2018 ) 52138 - 52160 .

[3] C. Molnar, Interpretable machine learning , Lulu. com , 2020 .

[4]

Marques-Silva , Logic-based explainability in machine learning , in: Reasoning Web. Causality, Explanations and Declarative Knowledge: 18th International Summer School 2022 , Berlin, Germany, September 27-30 , 2022 , Tutorial Lectures, Springer, 2023 , pp. 24 - 104 .

[5]

Racharak , On approximation of concept similarity measure in description logic ELH with pretrained word embedding , IEEE Access 9 ( 2021 ) 61429 - 61443 . URL: https://doi.org/10.1109/ACCESS. 2021 . 3073730 . doi: 10 .1109/ACCESS. 2021 . 3073730 .

[6]

Marques-Silva , Logic-based explainability: past, present and future , in: International Symposium on Leveraging Applications of Formal Methods , Springer, 2024 , pp. 181 - 204 .

[7]

Ciravegna ,

Barbiero ,

Giannini ,

Gori ,

Lió ,

Maggini ,

Melacci , Logic explained networks , Artificial Intelligence 314 ( 2023 ) 103822 .