<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>W. Jearanaiwongkul);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Human-Friendly Explanation for Ontology-based Concept Similarity: Design and Development</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Watanee Jearanaiwongkul</string-name>
          <email>watanee@tohoku.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teeradaj Racharak</string-name>
          <email>racharak@tohoku.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Explainable AI, Explanation generation, Ontology Concept Similarity, LLMs, Interpretable AI</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Advanced Institute of So-Go-Chi (Convergence Knowledge) Informatics, Tohoku University</institution>
          ,
          <addr-line>Miyagi</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>While recent neuro-symbolic approaches have enabled interpretable computation of concept similarity in ontologies, translating these formal explanations into human-friendly forms remains a challenge. In this work, we investigate how large language models, particularly ChatGPT and Gemini, can be prompted to generate natural language explanations that justify similarity results in a way that is understandable to end users. Building on a neuro-symbolic framework for measuring concept similarity in Description Logic (DL) ontologies, we explore two types of human-friendly explanations i.e., node-based and path-based explanation. Furthermore, we evaluate LLMs's ability to generate each component of these path-based explanations using our small curated dataset. We evaluate the efectiveness of prompting approaches along diferent dimensions such as clarity, informativeness, and perceived usefulness through both qualitative analysis and user studies. Our results show the potential and limitations of using LLMs as a tool to bridge the gap between formal similarity reasoning and human interpretability, paving the way for more transparent ontology-driven systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset Preparation</title>
      <p>
        Our goal was to construct a dataset consisting of original logic-based explanation and their corresponding
human-friendly explanations. To achieve this, we designed two explanation formats according to
homomorphism-based semantic similarity introduced in [5]: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) node-based explanation (which
describes how a similarity score was calculated by using the comparison of similarity between two
nodes) and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) path-based explanation (which describes by using the comparison of two paths instead).
      </p>
      <p>
        To evaluate these formats, we conducted a survey with 20 participants (10 per each explanation type)
and used their feedback to improve and select one design for our experiment. The survey provided
participants with: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) similarity score between two concepts, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) background knowledge represented as
knowledge graphs, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) summary explanation, (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) detailed explanation, and (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) table explanation.
      </p>
      <p>Participants were asked to review the similarity score and explanation of why they are considered
similar, then respond to eight questions. The participants were recruited via Prolific 1, compensated
fairly, and represented a wide range of demographics (ages 19–65, educational backgrounds from high
school to PhD, and professions including data analysis, engineering, IT, and research). All participants
were fluent in English and resided in countries such as the US, South Africa, Australia, Mexico, Canada,
and France. Table 1 presented the main results from six questions. The responses were rated on
a 0–7 Likert Scale (0 = lowest satisfaction, 7 = highest). While the node-based explanations were
easier to “read”, path-based explanations were more easier to “understand” and made the explanation
more suficient as indicated by lower score on the questions about the need of additional explanation.
Therefore, we selected to experiment with path-based explanation approach and prepared a dataset
w.r.t. this format. A total of 20 well-constructed explanations were created and used for this study,
examples of which are shown in Table 2.</p>
      <sec id="sec-2-1">
        <title>Question</title>
      </sec>
      <sec id="sec-2-2">
        <title>Q1. Does the explanation provide understandable reasons?</title>
      </sec>
      <sec id="sec-2-3">
        <title>Q2. Is the explanation easy to read?</title>
      </sec>
      <sec id="sec-2-4">
        <title>Q3. Is the explanation suficient to answer why the 2 words are similar?</title>
      </sec>
      <sec id="sec-2-5">
        <title>Q4. Are you interested in more explanation of why they are similar?</title>
      </sec>
      <sec id="sec-2-6">
        <title>Q5. Is the explanation enough to answer how the 2 words are similar?</title>
      </sec>
      <sec id="sec-2-7">
        <title>Q6. Are you interested in more explanation on how the similarity score was calculated?</title>
      </sec>
      <sec id="sec-2-8">
        <title>Average</title>
      </sec>
      <sec id="sec-2-9">
        <title>Median</title>
        <p>Node-based Path-based Node-based Path-based
4.5
5.4
4.9
5.4
5.2
5.6
5.8
4.4
5.4
4.3
5.5
4.5</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Results</title>
      <sec id="sec-3-1">
        <title>3.1. Experimental setting</title>
        <p>This work conducted an experiment using OpenAI API GPT-4o model, configured with a temperature 0
and top-p of 0.05 to ensure deterministic and focused outputs. The objective was to evaluate the model’s
performance in generating human-friendly explanations from the given logic-based explanations. As
aforementioned, our study involves three types of explanations: summary, detailed, and table-based.
Due to space limitation, a shortened version of the one-shot prompt is provided in Prompt 1, while the
complete prompt is publicly available on this GitHub repository2.
1https://www.prolific.com/
2https://github.com/realearn-people/sim-elh-explainer-to-text
Description graphs representing
the meaning of two concepts
Original explanation (from [5])
Summary explanation
Detailed explanation
Graph 1: SeaFood
− (Menu, contain, Fish)
− (Menu, contain, Shrimp)
− (Menu, serveWith, LemonSauce)
Graph 2: FriedFood
− (Menu, contain, Shrimp)
[G1*SeaFood][G2*FriedFood] → (0.52, G1*Menu, G1*contain.Fish,
G1*contain.Shrimp, G1*serveWith.LemonSauce, (contain, serveWith))
[G1*Fish][G2*Shrimp] → (0.7, G1*Fish, { }, (Fish,Shrimp))
[G1*Shrimp][G2*Shrimp] → (1.0, G1*Shrimp, { }, { })
[G1*LemonSauce][G2*Shrimp] → (0.02, G1*Shrimp, { }, (LemonSauce, Shrimp))
They are 52% similar because
Their Top Parent node comparison: 1 of 1 matched 100%
Their Path comparison: 1 of 3 matched 100%
They are 52% similar because
They share top parent node:
100% match: Menu
They share exactly the same “path” from Graph SeaFood to Graph FriedFood:
100% match: (Menu, contain, Shrimp) and (Menu, contain, Shrimp)
They share the similar “path” from Graph SeaFood to Graph FriedFood (partly computed
using the used embeddings):
70% match: (Menu, contain, Fish) and (Menu, contain, Shrimp)
2% match: (Menu, serveWith, LemonSauce) and (Menu, contain, Shrimp)
Various types of explanations and graph representations for two concepts (namely, SeaFood and FriedFood).
Note that asterisk indicates super-sub structures of graph’s elements.</p>
        <p>Prompt 1
•
•
•
Input structure: You are given 2 things: 1. Graph: A pair of knowledge graphs. 2. Explanation: A list of structured tuples that
shown similarity scores and the reason how the score was calculated from: …
Example of input: Graph 1: (…), Graph 2: (…)
Explanation:</p>
        <p>[G1*ActivePlace][G2*Mangrove] → (0.62, {G1*Place}, {∃{G1*canWalk, G1*canMoveWithLeg}.Trekking,
∃G1*canTravelWithSail.Kayaking}, {(canMoveWithLeg, canTravelWithSail)})
[G1*Trekking][G2*Trekking] → (1.0, {G1*Trekking}, { } , { } )
[G1*Kayaking][G2*Trekking] → (0.97, {G1*Kayaking}, { } , {(Kayaking,Trekking)} )
Output format of human-friendly explanation: ActivePlace and Mangrove are 62% similar meaning.</p>
        <p>• Summary Explanation: …
• Detail Explanation: …
• Table Explanation: …</p>
        <p>Your task: generate a friendly, human-readable explanation based on the input below, using the exact format provided. [Input:]</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Results and Discussion</title>
        <p>
          for listing the “same” path comparison (0.824), indicating balanced performance with minimal false
positives and false negatives. Moreover, the “similar” path comparison achieves perfect precision (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) but
a lower recall (0.5), resulting in a moderate F1 score (0.67), which indicates that many true matches are
missed. For Table Explanation, the performance is slightly better, with precision (0.833) and recall (0.882)
for the “same” path comparison leading to an F1 score of 0.857, and the “similar” path comparison shows
perfect precision (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) and a recall of 0.6, indicating the same trend of missing true matches as observed in
Detailed Explanation. Overall, these results suggest that Table Explanation provides the most balanced
performance across metrics, particularly for listing the results of the “same” path comparisons.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Related work</title>
        <p>Logic-based explainability has recently emerged as a rigorous and interpretable alternative to post-hoc
explainers. Marques-Silv [6] provides a comprehensive survey on logic‑based explanations in trustable
AI/ML which highlighting the needs of rigorous definitions, rigorous computation of explanations,
but also expressivity of explanations. The paper also discusses ongoing challenges and outlines future
directions, including the integration of symbolic and sub-symbolic methods and the need for scalable,
user-centric explanations. In [7], they focused on explaining the predictions of machine learning
models using interpretable concepts and logic rules. Specifically, explanation is provided in simple
ifrst-order logic format for its expressiveness. Considering explaining concept similarity in ontologies,
Racharak [5] introduced an ℰ ℒ ℋ-based similarity scoring framework that combines description logic
with (pre-trained) embedding-based representations and ofers structured symbolic explanations as
output. These methods ofer precise explanations but the output form can be further optimized for
clarity to non-expert users. This work fulfills this gap by investigating how to generate human-friendly
explanation from the neural-symbolic structures for ontology-based concept similarity.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This study presents an initial step to explore human-friendly explanation generation for similarity
score of concepts. By defining node-based and path-based explanations and focusing on the latter in
a ChatGPT-based experiment, we demonstrated both the potential and challenges of using LLMs for
interpretable AI explanations. Our findings show that ChatGPT can efectively generate explanations
for single node comparison, particularly top-parent nodes in all three types of explanations. However, a
limitation remains in capturing true matches for similar path comparisons in both Detailed Explanations
and Table Explanations. Upon examining the errors, we observed that LLMs tend to miss correct
matches when the size of description graphs increase, i.e., the number of nodes and the graphs’ depth
increase. Another error happened when each edge was labeled by a set in the explanation graphs. Our
future steps are to enlarge the dataset, collect more feedback to improve the explanation format, as well
as experiment with other LLMs and deep learning-based models’ construction for text generation. We
also plan to fine-tune LLMs with our dataset for our explanation-graph-to-text translation in future.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration of Generative AI and AI-assisted Technologies</title>
      <p>During the preparation of this work, the authors used ChatGPT for grammatical check. After using this
tool, the authors reviewed and edited the content as needed. Thus, they take full responsibility for the
content of the publication.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Doshi-Velez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Towards a rigorous science of interpretable machine learning</article-title>
          ,
          <source>arXiv preprint arXiv:1702.08608</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Adadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Berrada</surname>
          </string-name>
          ,
          <article-title>Peeking inside the black-box: a survey on explainable artificial intelligence (xai)</article-title>
          ,
          <source>IEEE access 6</source>
          (
          <year>2018</year>
          )
          <fpage>52138</fpage>
          -
          <lpage>52160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>C. Molnar,</surname>
          </string-name>
          <article-title>Interpretable machine learning</article-title>
          ,
          <source>Lulu. com</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Marques-Silva</surname>
          </string-name>
          ,
          <article-title>Logic-based explainability in machine learning</article-title>
          ,
          <source>in: Reasoning Web. Causality, Explanations and Declarative Knowledge: 18th International Summer School</source>
          <year>2022</year>
          , Berlin, Germany,
          <source>September 27-30</source>
          ,
          <year>2022</year>
          , Tutorial Lectures, Springer,
          <year>2023</year>
          , pp.
          <fpage>24</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Racharak</surname>
          </string-name>
          ,
          <article-title>On approximation of concept similarity measure in description logic ELH with pretrained word embedding</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>61429</fpage>
          -
          <lpage>61443</lpage>
          . URL: https://doi.org/10.1109/ACCESS.
          <year>2021</year>
          .
          <volume>3073730</volume>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3073730</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Marques-Silva</surname>
          </string-name>
          ,
          <article-title>Logic-based explainability: past, present and future</article-title>
          ,
          <source>in: International Symposium on Leveraging Applications of Formal Methods</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>181</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciravegna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barbiero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lió</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maggini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Melacci</surname>
          </string-name>
          ,
          <article-title>Logic explained networks</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>314</volume>
          (
          <year>2023</year>
          )
          <fpage>103822</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>