<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Conversational Ontology Alignment with ChatGPT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sanaz Saki Norouzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Saeid Mahdavinejad</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Hitzler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Kansas State University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study evaluates the applicability and eficiency of ChatGPT for ontology alignment using a naive approach. ChatGPT's output is compared to the results of the Ontology Alignment Evaluation Initiative 2022 campaign using conference track ontologies. This comparison is intended to provide insights into the capabilities of a conversational large language model when used in a naive way for ontology matching and to investigate the potential advantages and disadvantages of this approach.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology alignment</kwd>
        <kwd>ChatGPT</kwd>
        <kwd>Schema matching</kwd>
        <kwd>Ontology matching</kwd>
        <kwd>Large language models</kwd>
        <kwd>Prompt engineering</kwd>
        <kwd>LLM behavior</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Ontology alignment (OA), also referred to as ontology matching, is a central task in semantic
web technologies that aims to find semantic correspondences between two ontologies with
overlapping domains. As using ontologies is extending to many diferent fields, this task’s
importance is increasing, so ontology matching is required for bridging the semantic gap
between various ontologies [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Although OA already looks back to many years of research,
the task remains challenging, often requiring expert intervention to ensure accurate results.
Expert-driven matching can be both time-consuming and subject to human biases, so even
in this case absolute precision remains elusive [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]. To tackle this challenge, a variety
of ontology matching systems, incorporating natural language processing (NLP) techniques
considering grammar changes and diferent similarity measurements, machine learning, fuzzy
lexical matching, and other advanced methodologies are proposed in the Ontology Alignment
Evaluation Initiative (OAEI) 2022 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Each approach attempts to automate the matching process
and alleviate the need for extensive human involvement.
      </p>
      <p>
        With the emergence of large language models (LLMs), we have seen impressive results in many
NLP downstream tasks. Recently, using LLMs is increased for human-centric tasks, and models
like ChatGPT1 by OpenAI2 have attracted attention for doing diferent tasks such as logical
reasoning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], question answering [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and mental health analysis [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Prompt engineering is a
skill that is required to work with LLMs eficiently. A prompt can be considered as a direction
to interact with LLMs to adjust and control their output [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Generally, for using LLMs, there
are three main approaches: fine-tuning, few-shot prompting, and zero-shot prompting. For
using some LLMs in downstream tasks, fine-tuning would be helpful since it would make the
LLM adapt its knowledge (from the pre-training process) to the specific task. Recently, as it
is reported, models like GPT-3 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] are able to generate responses to some tasks that it has
not been trained on, so prompt engineering became more popular. In few-shot prompting, a
few examples of the task and the format of input/output are given to the model, so it would be
able to give the output based on the format while in zero-shot prompting it is only possible
to evaluate the performance of the LLM based on its knowledge in one prompt. Thus, prompt
patterns are important in the results provided by these LLMs.
      </p>
      <p>In this paper, we conduct a comparative analysis of ChatGPT’s performance in ontology
alignment when prompted with diferent strategies. We compare ChatGPT’s output with the
reference alignments provided by the Ontology Alignment Evaluation Initiative (OAEI) 2022
campaign, which uses conference-related ontologies. By evaluating ChatGPT’s performance
in a zero-shot manner, we aim to shed light on the capabilities and limitations of using a
conversational large language model for ontology matching. Furthermore, we discuss the
implications of our findings and propose potential directions for future research in this exciting
area.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>Data</title>
        <p>
          Our evaluation focuses on conference track ontologies provided by the OAEI [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], encompassing
seven ontologies: cmt, conference, sigkdd, iasted, ekaw, edas, and confOf. This selection yields
21 pairs of matched ontologies. We use the original reference alignment known as ra13 for our
evaluation. It is mentioned by OAEI, that M3 evaluation means both properties and classes are
considered for matching. Thus, we consider ra1-M3 OAEI 2022 results for comparison.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Prompts and Formatting</title>
        <p>An essential aspect of this evaluation involves designing prompts that efectively incorporate
the triples from the conference track ontologies. We explore diferent approaches to include
ontology triples in the prompts, with two primary methods considered: converting triples into
sentences and transforming them into formatted text following the pattern Predicate(Subject,
Object).</p>
        <p>After conducting experiments and considering the efectiveness of diferent prompt
approaches, we choose to adopt the formatted text approach for our prompts, which aligns well
with suggestions from OpenAI. This formatting presents triples in a structured manner, making
it easier for ChatGPT to comprehend and generate appropriate responses. For instance, an
original triple such as "track subclassOf conference_part" can be represented as "Is-a (track,
conference part)" using the formatted text approach. Similarly, properties are expressed in the
same structured format, such as "authorOf (Person, Document)".
3https://oaei.ontologymatching.org/2023/conference/data</p>
        <p>The limitation of a basic version of ChatGPT (v3.5), which we will elaborate on more in the
discussion section, led us to divide it into smaller parts instead of using one long prompt. This
approach allowed us to maintain essential context throughout the interaction, resulting in a
better understanding of the model and more accurate responses.</p>
        <p>In our early experiments, we found that adding more complex ontology axioms made it more
dificult for ChatGPT to capture the best possible matches between two ontologies. Therefore,
we decided to include only axioms that can be directly expressed as triples. We formulated our
prompt with a structured approach as follows:</p>
        <p>&lt;Problem Definition&gt;
In this task, we are given two ontologies in the form of Relation(Subject, Object), which
consist of classes and properties.</p>
        <p>&lt;Ontologies Triples&gt;</p>
        <p>Ontology 1:
Ontology 1 Triples</p>
        <p>Ontology 2:
Ontology 2 Triples</p>
        <p>&lt;Objective&gt;
Our objective is to provide ontology mapping for the provided ontologies based on
their semantic similarities.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results and Analysis</title>
      <p>In this section, we present the results of our evaluation. The objective was to gain insights
and investigate this approach’s potential advantages and disadvantages. Among the prompts,
"prompt 7" demonstrated the highest recall. However, it should be noted that the number of
generated statements for this prompt was relatively higher than "prompt 1" since it is a repetitive
prompt for each class/property name, and it tries to find the best match for each of them. Thus,
the increased recall came at the cost of reduced precision, while it should be noted that some of
the generated statements were deemed irrelevant even by non-expert evaluators. Nonetheless,
"prompt 7" exhibited the highest F1-score among all the prompts, showcasing a balance between
recall and precision.</p>
      <p>While the first three prompts are similar in essence but have diferent objectives, their
F1scores are almost the same. Asking for a complete and comprehensive matching gives the
highest recall, but also the least precision. On average, the first prompt achieved the best
balance between recall and precision. Interestingly, employing prompts that explicitly asked
for matching classes or properties, such as prompts 4 and 5, resulted in higher recall but lower
precision and F1-scores. Nevertheless, this drawback can be mitigated by domain experts who
can easily filter out irrelevant generated statements. For a more comprehensive evaluation, we
compare our results with OAEI 2022 results in Table 2. The prompts’ results are shown in Table
3.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>Our evaluation highlighted a significant challenge related to precision. The generated statements
often introduced errors that caused a decrease in precision. We identified several factors
contributing to this issue:</p>
      <p>ChatGPT context length limit: ChatGPT (v4.0) was used in our experiments because
ChatGPT (v3.5) struggled to retain context when the input was lengthy, afecting its performance
in ontology alignment tasks. ChatGPT (v4.0) has improved contextual understanding and better
adaptability to long inputs, and its maximum token length of 8192 accommodates both ontology
triples within the prompt.</p>
      <p>Inverse Functional Properties: These Properties can lead to imprecise matches if they are
not properly accounted for. For example, the statement hasBeenAssigned(Reviewer, Paper) is
matched to hasReviewer(Paper, Possible_Reviewer) by ChatGPT. However, the correct entity
for this matching is ReviewerOfPaper, which is the inverse of hasReviewer. If we properly
account for this inverse relationship, we can enhance precision by reducing the number of false
positives.</p>
      <p>Matches with Subclasses: The generated alignments sometimes matched a class in one
ontology to one class and all its subclasses in the other, leading to unintended matches.
For instance in the conference-edas matching, "active_conference_participant" and
"passive_conference_participant" which are subclasses of conf_participant are matched with
at</p>
      <p>Uncertain Matching: In certain cases, even though ChatGPT acknowledges that a matching
is unlikely, it still generates such matches and proposes new entities to be included in the graph.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>In this paper, we have evaluated the applicability and eficiency of ChatGPT for ontology
alignment using a naive approach. Our evaluation showed that ChatGPT can achieve high
recall but also sufers from low precision. We identified several factors contributing to this issue,
including the context length limit of ChatGPT, the handling of inverse functional properties, the
matching with subclasses, unseen alignments, and uncertain matchings. Despite the mentioned
challenges, we believe that ChatGPT has the potential to be a valuable tool for ontology
alignment. The high recall of ChatGPT means that it can be used to identify a large number of
potential matches, which can then be filtered by domain experts. Additionally, the ability of
ChatGPT to generate new entities suggests that it could be used to expand reference ontologies.
In future work, we plan to address the precision issues identified in this paper. We also plan
to explore other ways to use ChatGPT for ontology alignment, such as generating prompts
for more sophisticated alignment algorithms. Overall, we believe that the results of this paper
demonstrate the potential of ChatGPT for ontology alignment. We believe that this approach
can be used to improve the eficiency and efectiveness of ontology alignment tasks.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by the National Science Foundation (NSF) under Grant 2033521 A1.
Any opinions, findings, conclusions, or recommendations expressed in this material are those
of the authors and do not necessarily reflect the views of the NSF.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <article-title>Ontology matching: state of the art and future challenges</article-title>
          ,
          <source>IEEE Transactions on knowledge and data engineering 25</source>
          (
          <year>2011</year>
          )
          <fpage>158</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Trojahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vieira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pease</surname>
          </string-name>
          , G. Guizzardi,
          <article-title>Foundational ontologies meet ontology matching: A survey</article-title>
          ,
          <source>Semantic Web</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>685</fpage>
          -
          <lpage>704</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Malone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Matentzoglu</surname>
          </string-name>
          ,
          <article-title>Measuring expert performance at manually classifying domain entities under upper ontology classes</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>57</volume>
          (
          <year>2019</year>
          )
          <fpage>100469</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cheatham</surname>
          </string-name>
          , P. Hitzler, Conference v2.
          <article-title>0: An uncertain version of the OAEI conference benchmark</article-title>
          , in: P. Mika,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tudorache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Welty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandecic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Goble</surname>
          </string-name>
          (Eds.),
          <source>The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23</source>
          ,
          <year>2014</year>
          . Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>8797</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2014</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>48</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -11915-1\_3.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. A. N.</given-names>
            <surname>Pour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Algergawy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Buche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Fallatah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          , I. Fundulaki,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          , et al.,
          <source>Results of the ontology alignment evaluation initiative</source>
          <year>2022</year>
          , CEUR Workshop Proceedings,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Teng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>Evaluating the logical reasoning ability of chatgpt and gpt-4</article-title>
          , arXiv preprint arXiv:
          <volume>2304</volume>
          .03439 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Qi, Evaluation of chatgpt as a question answering system for answering complex questions</article-title>
          ,
          <source>arXiv preprint arXiv:2303.07992</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ananiadou</surname>
          </string-name>
          ,
          <article-title>On the evaluations of chatgpt and emotionenhanced prompting for mental health analysis</article-title>
          ,
          <source>arXiv preprint arXiv:2304.03347</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hays</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sandborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Olea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gilbert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnashar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Spencer-Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <article-title>A prompt pattern catalog to enhance prompt engineering with chatgpt</article-title>
          ,
          <source>arXiv preprint arXiv:2302.11382</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Zamazal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Svátek</surname>
          </string-name>
          ,
          <article-title>The ten-year ontofarm and its fertilization within the onto-sphere</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>43</volume>
          (
          <year>2017</year>
          )
          <fpage>46</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>