<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards Pattern-based Complex Ontology Matching using SPARQL and LLM</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ondřej Zamazal</string-name>
          <email>ondrej.zamazal@vse.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Ontology, Ontology Matching, Complex Ontology Matching, Large Language Model, Knowledge Graph</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Prague University of Economics and Business</institution>
          ,
          <addr-line>Czechia</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>17</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Complex ontology matching is a process to match complex structures in ontologies. While many matching tools tackle simple ontology matching, complex ontology matching is still rare. However, one entity in one ontology can be similar to a complex structure (1-to-n) or even complex structures can be on both sides (m-to-n). Therefore, the application, e.g., data integration, must consider complex correspondences within ontology alignment. Our poster paper presents a pattern-based approach where particular SPARQL queries correspond to a specific pattern, e.g., Class by Attribute Type (CAT), for its detection. SPARQL queries are anchored to entities from simple correspondences on input. Detected complex correspondence candidates are verbalized to be validated by the Large Language Model (LLM). Further, we provide a zero-shot prompting preliminary experiment and evaluation. The poster paper is equipped with the Jupyter notebook for automation of the pipeline and the full report of the experiment at: https://github.com/OndrejZamazal/ComplexOntologyMatching-SEMANTiCS2024</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Sharing domain knowledge is often made via domain ontology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Since diferent agents see the
domain diferently, more ontologies for one domain are inevitable. To enable interoperability,
Ontology Matching [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] aims at discovering relationships (e.q., equivalence) between entities
of ontologies (O1, O2) called correspondences (alignment). Correspondences involve a single
entity from each ontology, e.g., O1:Document=O2:Manuscript. This correspondence can enable
data interoperability between systems based on O1 and O2 resp. There are ample matching tools,
such as LogMap [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], to discover such correspondences. However, correspondences targeting
single entities can only solve some interoperability issues. E.g., O1 has an entity Reviewer
which is not explicitly in the O2, but still, O2 contains the terminology to describe the Reviewer
concept. Therefore, discovering complex correspondence would further support data and
schema interoperability: e.g., partly in Manchester OWL syntax, O1:Reviewer is equal to the
complex concept O2:Person and (O2:authorOf some O2:Review). Complex ontology matching aims
at matching complex concepts/structures (e.g., using a logic constructor) on at least one side of
an ontology matching pair [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. While ontology matching attracted ample ontology matching
https://nb.vse.cz/~svabo/ (O. Zamazal)
      </p>
      <p>
        © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
tools and evaluation eforts (e.g., OAEI 2023 1 had ten tracks targetting simple matching), only
one OAEI 2023 track focused on complex matching; even without tool participation in 2023.
There have been two complex alignment benchmarks in OAEI so far: the GeoLink complex
alignment benchmark [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and the consensual dataset for complex ontology matching [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] based
on conference ontologies from the OntoFarm collection [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Several approaches have dealt theoretically with complex ontology matching, mostly
patternbased approaches [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. For example, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] detects complex correspondences using several
structural and lexical matching conditions, such as hyponyms and head-nouns of labels and their
relationship. Similarly, in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the detection of the naming aspect played a crucial role. Our
approach relies on the pattern-based, structural aspect, but we do not use the lexical aspect for
detection. While corresponding structural matching conditions are usually straightforward,
designing and applying lexical matching conditions is often challenging. We avoid using the
lexical aspects during the detection phase, but we use verbalization and Large Language Model
(LLM) for the validation step.
      </p>
      <p>
        LLMs have already been applied for simple ontology matching. Conference ontologies have
been matched using Chat-GPT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. While the OLaLa matching system [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] addressed more tracks
within OAEI, the approach [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] targeted the Bio-ML track in OAEI. While there is still room
for improvement, the initial results are promising. Recently, an approach of complex ontology
alignment using LLMs has been proposed [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It works on the GeoLink complex alignment
benchmark where two ontologies are included: the GeoLink Base Ontology (GBO) and the
GeoLink Modular Ontology (GMO). Their approach is based on chain-of-thought prompting. On
input there is GMO ontology and specific complex structure from GBO ontology with a prompt
to give related parts in the GMO to the given complex structure from GBO. The following prompt
contains parts of related content from the GMO in text and the code (module information). It
makes LLM’s answer more proper. The paper primarily addresses the experiment with a very
specific setting. We share a general idea that LLM could reduce the number of candidates that
humans could finally validate. However, the approach [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] heavily involves human intervention
in each step. On contrary, we do not include humans for prompting and we include SPARQL for
detection. Next, we do not restrict complex matching on manually selected complex structure
instances of one ontology on input. Our approach is instead driven by an alignment pattern.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Approach</title>
      <p>
        We describe our pattern-based pipeline along with an example (in italics) targeting the alignment
pattern Class by Attribute Type (CAT) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This pattern specifies equivalence between a class in
O1 and a class in O2 restricted on its scope using existential restriction; in Manchester OWL
syntax: O1:Class1 EquivalentTo O2:Class1 and (O2:property some O2:Class2).
      </p>
      <p>Often, ontology matching techniques limit the search space. We use simple alignment
and limit complex matching to entities in complex structures related to entities from simple</p>
      <sec id="sec-4-1">
        <title>1https://oaei.ontologymatching.org/2023/</title>
        <p>alignment. We assume that potential matching complex structures involves entities from simple
correspondences.</p>
        <p>Input ontologies are conference ontologies: O1=cmt, O2=ekaw. For input alignment, we used the
subset of the alignment from LogMap OAEI 2023 (O1:Paper=O2:Paper; O1:Person=O2:Person) to
keep the experiment shorter. Any highly certain correspondences could be used.</p>
        <p>Step 1. Detection is based on a structural aspect. We use a pair of SPARQL queries designed
according to given alignment pattern for O1 and O2, resp.2 One alignment pattern can lead
to diferent structural aspects for detection (i.e., diferent SPARQL queries) in O1 or O2, resp.
Some entities in SPARQL queries are anchored based on input alignment.</p>
        <p>Figure 1 consists of a visualization of SPARQL queries for O1 and O2 for CAT, and the
correspondence from input alignment is depicted as a both-sided arrow. The detection is run for each
correspondence from input alignment. In our case, 4 and 7 SPARQL results for O1 per input
correspondence and 2 and 14 SPARQL results for O2 per input correspondence resp.; Example for O1 given
Person (?ent1) from input correspondence: O1:Class1=Reviewer. Example for O2 given Person (?ent1)
from input correspondence: O2:property1=authorOf, O2:class1=Document, O2:class2=Review.</p>
        <p>Step 2. Results from detecting both ontologies are joined according to the alignment pattern
separately per each input correspondence.</p>
        <p>In our example, there are 4 ×2=8 complex correspondence candidates for the Paper entity from
input correspondence and 14 ×7=98 complex correspondence candidates for the Person entity
from input correspondence. E.g., regarding the first input correspondence, O1:Person=O2:Person,
O1:Reviewer is joined with O2:Person, O2:authorOf, and O2:Review.</p>
        <p>Step 3. Pattern-based template-driven verbalization to natural language (English) is applied
to complex correspondence candidates to enable their validation using LLM. We also apply
several natural language preprocessing steps, such as tokenization and lowercasing. Similarly,
we use a template for serialization into Manchester OWL syntax.</p>
        <p>For CAT, we have the following verbalization pattern into English: ”&lt;O1:class1&gt; is the same
as &lt;O2:ent1&gt; which is3 &lt;O2:property1&gt; of &lt;O2:class2&gt;”. Example: Reviewer is the same as person
which is author of review.
2The approach is pattern-based in two ways: because it is driven by the alignment pattern and because the SPARQL
query can also be considered as a pattern for detection.
3If no ”has” exists in property1, ”is” is added.</p>
        <p>Step 4. Finally, LLM is used to validate whether verbalized complex correspondence
candidates are (probably) positives/negatives.</p>
        <p>We experimented with diferent LLMs. So far, the best results have been achieved using GPT-4o. 4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Preliminary Experiment and Evaluation</title>
      <p>The experiment is related to step 4. LLM is used to validate whether verbalized complex
correspondence candidates are (probably) positives/negatives. Our online supplementary material
contains the full report (SPARQL queries, their results and their joints, verbalized complex
correspondence candidates, the prompt for GPT-4o,5 its results, evaluation, complex correspondences
in EDOAL and Manchester OWL syntax). Of 106 candidates, 14 were labeled as negatives, 74 as
probably negatives, ten as probably positives, and eight as positives. The supplementary material
contains a detailed evaluation of all eight candidates labeled by LLM as positives. As a result of
eight candidates’ evaluations by human, there were one true positive, one partly true positive,
and six partly false positives. For the sake of brevity, here we present three evaluation examples:
1. Reviewer is the same as person which is author of review.</p>
      <p>Although it is debatable whether being an author of a review is enough to be a real reviewer,
it is certainly close enough. It was evaluated as a true positive.</p>
      <p>2. Meta-reviewer is the same as person which is author of review.</p>
      <p>Meta-reviewer is not only the author of the review but (s)he has a specific role within a
reviewing process. A meta-reviewer is instead a subclass of person which is an author of review,
i.e., Class ⊑ Class Expression. It was evaluated as a partly true positive example.
3. Author is the same as person which is author of abstract.</p>
      <p>Since some conferences call for abstracts, being an author can be merely based on abstract
authorship. However, the subsumption relation would be more fitting (i.e., Author subsumes
person which is author of abstract). Since it leads to General Concept Inclusion (GCI) subsumption,
i.e., Class ⊒ Class Expression, being not always allowed, it was evaluated as partly false positive.</p>
      <p>Considering only equivalence, precision equals 18 = 0.125. However, subsumption is also
important for interoperability, meaning relaxed precision (  ) could be used. If GCI axioms
(partly false positives) are not allowed,   = 28 = 0.25. If GCI axioms are allowed,6   = 1.0.
Regarding recall, the preliminary evaluation (details in the online supplementary material)
shows that all negatives are true. However, it needs further evaluation in terms of relaxed recall.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion and Future Work</title>
      <p>We reported on our approach and preliminary experiments with pattern-based complex ontology
matching using SPARQL for detection and verbalization before validation with LLM.</p>
      <p>
        Similarly to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we focus on the structural aspect of detection. In contrast, we do not capture
lexical aspects for detection while employing verbalization before validation by LLM. Similarly
to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we employ LLM for complex ontology matching, but we only use it in the final step
      </p>
      <sec id="sec-6-1">
        <title>4via https://chatgpt.com/</title>
        <p>
          5We ran the GPT-4o several times with minor changes in its reply; it is not substantial for the evaluation.
6GCI subsumptions could also help with interoperability within some scenarios.
of the approach for validation. Contrary to [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we do not involve humans in prompting,
complete ontology, pre-selected complex structure instances to be matched, nor manually
selected additional information for complex matching.
        </p>
        <p>For now, we consider a 1-to-n relationship. However, it can be generalized to an m-to-n
relationship. In our experiments, we use GPT-4o. However, we will experiment more with other
LLMs (e.g., Llama 3, Mixtral). Further, while we employ direct question prompting, we will also
experiment with contextual question prompting, where context will be gathered automatically.
We aim to explore more alignment patterns, involve more ontologies, and conduct in-depth
evaluations in future work. While the provided Jupyter notebook covers three steps, the fourth
step, dealing with LLM, will be implemented after further experimentation with other LLMs.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been supported by the EU’s Horizon Europe grant no. 101058682 (Onto-DESIDE).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Amini</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norouzi</surname>
            ,
            <given-names>S. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amini</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Towards Complex Ontology Alignment using Large Language Models</article-title>
          .
          <source>arXiv preprint arXiv:2404.10329</source>
          .
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Ontology matching</article-title>
          .
          <source>In: Springer. 978-3-642-38720-3</source>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Gruber</surname>
            ,
            <given-names>T. R.</given-names>
          </string-name>
          <article-title>Toward principles for the design of ontologies used for knowledge sharing? In: International journal of human-computer studies</article-title>
          ,
          <volume>43</volume>
          (
          <issue>5</issue>
          -
          <fpage>6</fpage>
          ).
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>Exploring large language models for ontology alignment</article-title>
          .
          <source>In: ISWC 2023 Posters, and Demos</source>
          .
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>OLaLa: Ontology matching with large language models</article-title>
          .
          <source>In Proc. of the 12th Knowledge Capture Conference</source>
          <year>2023</year>
          .
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Jiménez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Logmap</surname>
          </string-name>
          :
          <article-title>Logic-based and scalable ontology matching</article-title>
          . In: International Semantic Web Conference. Springer.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Norouzi</surname>
            ,
            <given-names>S. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahdavinejad</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Conversational Ontology Alignment with ChatGPT</article-title>
          .
          <source>In: Proc. of the Ontology Matching workshop at ISWC</source>
          .
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ritze</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Völker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Šváb-Zamazal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <article-title>Linguistic analysis for complex ontology matching</article-title>
          .
          <source>In: Proc. of the Ontology Matching workshop at ISWC</source>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Scharfe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>Correspondence patterns representation</article-title>
          .
          <source>PhD thesis</source>
          ,
          <source>Univ. of Innsbruck</source>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Šváb-Zamazal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svátek</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Towards ontology matching via pattern-based detection of semantic structures in OWL ontologies</article-title>
          .
          <source>In: Proc. of the Znalosti conference</source>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Thiéblin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haemmerlé</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernandez</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trojahn</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Survey on complex ontology matching</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Thiéblin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheatham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trojahn</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zamazal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <article-title>A consensual dataset for complex ontology matching evaluation</article-title>
          .
          <source>The Knowledge Engineering Review</source>
          ,
          <volume>35</volume>
          :
          <fpage>e34</fpage>
          .
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Zamazal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svátek</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>The ten-year ontofarm and its fertilization within the onto-sphere</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>43</volume>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheatham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krisnadhi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Geolink data set: A complex alignment benchmark from real-world ontology</article-title>
          .
          <source>Data Intelligence</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ).
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>