<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using LLM to Improve Knowledge Graph Entity Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victor Eiti Yamamoto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hideaki Takeda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Insitute of Informatics</institution>
          ,
          <addr-line>2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430</addr-line>
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Graduate University for Advanced Studies</institution>
          ,
          <addr-line>SOKENDAI, Shonan Village, Hayama, Kanagawa 240-0193</addr-line>
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>Knowledge graphs (KGs) are powerful tools for representing and reasoning over structured information. Entity matching between KGs helps integrate multiple KGs. However, the performance of entity matching tools can be sensitive to parameter settings, such as thresholds. Large language models (LLMs) have emerged as powerful tools for solving reasoning problems and show potential for improving entity alignment. Our approach incorporates two LLM-based steps: filtering and expansion. In the filtering step, an LLM is used to validate entity mappings. The expansion step then uses an LLM to select the correct mapping from a candidate list for any source entity that lacks a corresponding pair after the filtering step. Experiments on the OAEI KG track dataset and matchings with DBpedia datasets show that using an LLM as a filter achieves a low false-negative rate and a favorable false-positive rate, indicating that it can improve precision without significantly lowering recall. However, the expansion step has low precision because the LLM tries to select a corresponding entity even when no correct match exists in the candidate list.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge graph</kwd>
        <kwd>entity matching</kwd>
        <kwd>large language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>Existing ontology matching systems employ diverse strategies. For instance, PARIS [6] is a probabilistic
matcher that computes equivalence probabilities for predicates and entities by leveraging the concept of
functionality and comparing attribute values. LogMap [7], in contrast, uses an anchor-based approach
that combines lexical indexing with an ontology reasoner to achieve high-precision mappings, though its
efectiveness depends on a well-structured class ontology. Another approach, the Full Triple Matcher[ 8],
ifrst maps entities and predicates based on their labels, then uses these mappings to find compatible
triples, which are subsequently used to refine the initial entity mappings.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <sec id="sec-3-1">
        <title>3.1. Filtering</title>
        <p>Our approach refines entity alignments produced by an existing matcher, specifically leveraging the
output of the Full Triple Matcher. The Full Triple Matcher takes two distinct KGs, 1 and 2, as input
and outputs a set of aligned entities, , and triples, . We introduce a two-step post-processing
pipeline to enhance these initial alignments: Filtering and Expansion.</p>
        <p>This step filters the initial set of candidate entity alignments, , to identify pairs corresponding to
the same real-world entity. For each source entity  within a pair in , we first identify a set of
top candidate target entities, denoted as , based on the similarity score generated from FTM.
Subsequently, each candidate pair, consisting of  and a target entity  ∈ , is passed
to an LLM for verification. To provide context, this input is augmented with the Top-10 most similar
triples associated with both  and  selected based on the scores in . If the LLM confirms
that the entities in the pair refer to the same real-world entity, the pair is added to a new filtered set,
. Otherwise, the pair is discarded.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Expansion</title>
        <p>This step expands the matching set by recovering correct entity pairs that were erroneously removed
during the filtering phase. First, we identify a set of source entities, , that exist in the initial set
. For each removed entity
of pairs, , but lack a corresponding match in the filtered set, 
, we retrieve the top-10 most similar candidate entities, , from the target knowledge
graph, based on the score obtained from FTM. These candidates are then passed to an LLM, which is
prompted to perform a multiple-choice selection to identify the single best match that represents the
same real-world object. The resulting pair (, ) is then added to  to form the final
expanded matching set, ℎ.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Result</title>
      <p>We compared our methods, Filter and Expansion, against several baselines: BaselineAltLabel, LogMap
[7], PARIS [6], and the Full Triple Matcher (FTM) [8]. Our methods extend FTM by adding a filtering
step (Filter) or both filtering and expansion (Expansion). We tuned baselines for optimal F-measure
using a threshold, a step omitted by our "(w/o threshold)" variants. We used the Gemma 3 [9] with 27
billion parameters as the LLM model. We selected this model because it can be run locally with a single
GPU, making it easier to replicate and use. The codes and results are available in the GitHub 1.</p>
      <p>On the OAEI 2023 KG track using DBkWik datasets [10], our Expansion (w/o threshold) method
achieved the highest recall on four datasets, surpassing LogMap which was highest on three in terms of
precision. Furthermore, our standard Expansion method achieved the top F-measure on three datasets
(Table 1).</p>
      <p>On the large-scale Gollum dataset [5] (MAL/SWW-DBpedia), our Filter method achieved the highest
precision, while our Expansion method yielded the best F-measure across both tasks (Table 2). In
contrast, LogMap failed to load the data, and PARIS did not terminate on SWW-DBpedia.
1https://github.com/eitiyamamoto/llm-kg-matcher.git</p>
      <p>
        On the MCU-MDB dataset, we analyzed our method’s performance using confusion matrices (Table
3). The filtering step achieves a low false-negative rate, correctly pruning non-matches while passing
most correct pairs to the next step (Table 3a). The expansion step, which relies on a top-10 candidate list
for the LLM, achieves a precision of 0.13. In the table 3b, we have a confusion matrix with correct@n to
indicate if the correct answer is present in the n-th position. The LLM correctly identifies the entity in
49% of cases when it is in the top-10. This accuracy rises to 60% if the correct entity is ranked first, but
drops to 32% for lower-ranked (
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2-10</xref>
        ) entities.
      </p>
      <p>Figure 1 illustrates the comparison between the set of results obtained from the FTM method and
the four new methods, "Filtered" and "Expansion", with and without a selection threshold. The Venn
diagrams demonstrate a substantial overlap in all four scenarios, indicating a strong agreement between
the FTM method and the alternative approaches. Specifically, the application of a threshold to the
"Filtered" and "Expansion" methods significantly reduces their number of unique results, making them
almost entirely subsets of the FTM results. This high degree of similarity is quantitatively confirmed by
the Jaccard coeficient, which was calculated for each comparison and found to vary from 0.91 to 0.98,
signifying a very strong correlation between the result sets.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The FTM with filter and expansion outperformed the baselines because the filter improves precision
while the expansion considers non-top-1 candidates to improve recall. However, Table 3b reveals that
the expansion step’s performance is unexpectedly low. This is because the LLM often fails to select
the correct pair from the candidate list and forces a choice even when the correct match is absent.
Consequently, the method’s efectiveness still relies heavily on a well-tuned similarity threshold to
iflter out incorrect pairs.</p>
      <p>The filter step alone, however, presents a viable alternative to a threshold. As shown in Tables 1 and
2, it achieves comparable F-measure on most datasets, making it advantageous when a labeled dataset
for tuning a threshold is unavailable. According to Table 3a, the filter produces few false negatives, thus
preserving recall. Its primary limitation is a high false-positive rate, which restricts precision gains. In
summary, the filter is highly reliable when rejecting a pair (predicting ’false’) but less reliable when
accepting one (predicting ’true’).</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this work, we proposed a two-step (filter and expansion) extension to a statistical method using an
LLM. Our approach achieved the best results in most cases when combining both steps with threshold
25</p>
      <p>FTM
32</p>
      <p>Filtered
w/o threshold
1094</p>
      <p>57</p>
      <p>Expansion
w/o threshold
1087
72
25</p>
      <p>FTM
32
1094
1087</p>
      <p>Filtered
0
1
Expansion
ifltering. We also found that the filter step alone yields comparable results, suggesting it can replace
threshold filtering when an optimal threshold is unknown. However, the filter step maintains high
recall at the cost of lower precision. On the other hand, the expansion step helps find pairs beyond
the top-1 candidate, but its tendency to always select a pair—even when none is correct—results in
low precision. As future work, we will refine these steps to learn from the selected pairs, test how the
choice of LLM impacts performance and how to reduce the use of LLM to reduce computation efort,
such as finding corner cases and entity pairs with close similarities.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Gemini and Grammarly in order to: paraphrase
and reword, improve writing style, and grammar and spelling checking. After using these tools, the
authors reviewed and edited the content as needed and take full responsibility for the publication’s
content.
golden hammer bias, in: The Semantic Web: 17th International Conference, ESWC 2020, Heraklion,
Crete, Greece, May 31–June 4, 2020, Proceedings 17, Springer, 2020, pp. 343–359.
[5] S. Hertling, H. Paulheim, Gollum: A gold standard for large scale multi source knowledge graph
matching, arXiv preprint arXiv:2209.07479 (2022).
[6] F. M. Suchanek, S. Abiteboul, P. Senellart, Paris: Probabilistic alignment of relations, instances,
and schema, Proc. VLDB Endow. 5 (2011) 157–168. URL: https://doi.org/10.14778/2078331.2078332.
doi:10.14778/2078331.2078332.
[7] E. Jiménez-Ruiz, B. Cuenca Grau, Logmap: Logic-based and scalable ontology matching, in: The
Semantic Web–ISWC 2011: 10th International Semantic Web Conference, Bonn, Germany, October
23-27, 2011, Proceedings, Part I 10, Springer, 2011, pp. 273–288.
[8] V. E. Yamamoto, H. Takeda, Full triple matcher: Integrating all triple elements between
heterogeneous knowledge graphs, ACM Trans. Web (2025). URL: https://doi.org/10.1145/3754338.
doi:10.1145/3754338, just Accepted.
[9] A. Kamath, J. Ferret, S. Pathak, N. Vieillard, R. Merhej, S. Perrin, T. Matejovicova, A. Ramé,</p>
      <p>M. Rivière, L. Rouillard, et al., Gemma 3 technical report, CoRR (2025).
[10] S. Hertling, H. Paulheim, Dbkwik: extracting and integrating knowledge from thousands of wikis,
Knowledge and Information Systems 62 (2020) 2169–2190.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.-Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , H. Liu,
          <article-title>Two heads are better than one: Integrating knowledge from knowledge graphs and large language models for entity alignment</article-title>
          ,
          <source>arXiv preprint arXiv:2401.16960</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Llm-align:
          <article-title>Utilizing large language models for entity alignment in knowledge graphs</article-title>
          ,
          <source>arXiv preprint arXiv:2412.04690</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Leone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García-Durán</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>West</surname>
          </string-name>
          ,
          <article-title>A critical re-evaluation of neural methods for entity alignment</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>15</volume>
          (
          <year>2022</year>
          )
          <fpage>1712</fpage>
          -
          <lpage>1725</lpage>
          . URL: https://doi.org/10.14778/ 3529337.3529355. doi:
          <volume>10</volume>
          .14778/3529337.3529355.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>The knowledge graph track at oaei: Gold standards, baselines, and the</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>