<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Knowledge Editing for Large Language Models Using Knowledge Graph-based Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patipon Wiangnak</string-name>
          <email>w.patipon@jaist.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natthawut Kertkeidkachorn</string-name>
          <email>natt@jaist.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kiyoaki Shirai</string-name>
          <email>kshirai@jaist.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Large Language Models, Knowledge Editing, Butterfly Efects, Hallucination, Knowledge Graph-Based Analysis</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Japan Advanced Institute of Science and Technology</institution>
          ,
          <addr-line>Ishikawa</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>Large Language Models (LLMs), particularly those based on Generative Pre-trained Transformers (GPT), have achieved strong performance in various natural language tasks. However, LLMs are limited by a knowledge cut-of, so their information is not updated. Common methods for updating LLM knowledge, such as finetuning, retrieval-augmented generation, and machine unlearning, are often resource-intensive and may introduce unintended efects, including the loss of relevant context or conflicts with existing knowledge. Knowledge Editing (KE) ofers a more eficient alternative by enabling precise updates to specific facts without retraining the entire model, while preserving unrelated information. Still, such edits can trigger unexpected ripple efects, known as the Butterfly Efect, where modifying one fact causes errors in related knowledge. In this work, we introduce ButterflyKE, a knowledge graph-based analysis method that probes neighboring knowledge to identify local side efects caused by a single factual update. Using Wikidata as a reference knowledge graph, in ButterflyKE, we extract directly connected triples to provide a structural view of how knowledge propagates after editing. We evaluate three main KE approaches: External Memory-based, Global Optimization-based, and Local Modification-based approaches, using the Llama-3.1-8B-Instruct model. Our findings confirm the presence of the Butterfly Efect in KE, with side efects intensifying as the structural connections increase. To measure this impact, we propose the Butterfly Index, a metric to evaluate editing methods and their influence on surrounding knowledge. ButterflyKE serves as a practical method for extending existing benchmarks and supports a deeper analysis of knowledge integrity in LLM.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge</kwd>
        <kwd>Graph-based</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In this modern era, Large Language Models (LLMs), particularly those based on Generative Pre-trained
Transformers (GPT), have revolutionized various fields, including Question Answering, Machine
Translation, and Natural Language Inference (NLI). Nevertheless, as black-box models, the complexity of
LLMs presents challenges, as their limited by a knowledge cut-of, so their information is not updated.
In recent years, Knowledge Editing (KE) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] has emerged as a promising alternative for updating
knowledge in LLMs without full retraining or harming unrelated information. However, LLM knowledge
is often sensitive to edits, and a single update can introduce unintended consequences [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We define
this phenomenon as the Butterfly Efect
      </p>
      <p>
        , where one edit disrupts related knowledge. While recent
work focuses on making edits more accurate and precise, limited attention has been given to evaluating
potential side efects. Current KE methods can be broadly categorized into three strategies:
1. External Memory-based Approach: Stores new knowledge externally without changing
internal weights, such as RAG and In-context Knowledge Editing (IKE) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2. Global Optimization-based Approach: Updates the model using gradients from new
knowledge, such as Model Editor Networks with Gradient Decomposition (MEND) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
3. Local Modification-based Approach : Locates and updates only specific parameters related to
the target fact, such as Rank-One Model Editing (ROME) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>In this work, we introduce ButterflyKE: Butterfly Efects in Knowledge Editing , a method
designed to systematically probe the side efects of knowledge editing in LLMs. By leveraging structured
knowledge from knowledge graph, it traces how a single factual update may propagate through
semantically connected facts. We evaluate these efects using the proposed Butterfly Index , which
quantifies the model’s ability to maintain factual correctness in next-hop neighboring knowledge.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Butterfly Efect in Knowledge Editing for Large Language Models</title>
      <p>To identify and interpret the side efects of Knowledge Editing in Large Language Models, we introduce
ButterflyKE , a knowledge graph-based framework that probes next-hop neighboring knowledge
to detect localized efects from a single factual update. Instead of constructing a new dataset, we
use the public knowledge graph such as Wikidata to enable structural reasoning and trace how edits
can propagate through semantically connected facts in the model’s internal knowledge. Figure 1 (a)
illustrates the three main components of the framework, while Figure 1(b) illustrates an example of
next-hop probing across diferent types of relationships. In this figure, solid black arrows indicate
original triples, dashed arrows represent next-hop connections retrieved from the knowledge graph,
red arrows denote induced triples inferred from ontological properties, and red arrows marked with a
cross indicate removed connections resulting from knowledge edit.</p>
      <p>1. Next-Hop Relation Extraction: Given an edit instance in the form of a triple (subject, predicate,
object), we first retrieve its adjacent triples from an external knowledge graph using SPARQL
queries. Here, adjacency refers to triples that share either the subject entity or the object entity
with the edited triple. For example, for the triple [Britney_Spears, :parent, Jamie_P_Spears],
adjacent triples include [Jamie_P_Spears, :spouse, Lynn_Spears] and [Britney_Spears, :sibling,
Jamie_Lynn_Spears]. This step defines the immediate structural context in which the edit may
have side efects.
2. Inherent Relation Induction: After performing the edit by editing the original target entity
with a new entity, we expand the neighborhood by enforcing inherent relation constraints derived
from ontological properties. Specifically, we consider: inverse relations, if ( 1,  ,  2) holds, then
( 2,  −1,  1) should also hold; symmetric relations, if ( 1,  ,  2) holds, then ( 2,  ,  1) should also hold;
transitive relations, if ( 1,  ,  2) and ( 2,  ,  3) hold, then ( 1,  ,  3) can be inferred. For example, if
Britney is edited to have Errol Musk as a parent, the induced inverse relation makes Britney
a child of Errol; if Elon is a sibling of Kimbal, then Kimbal must also be a sibling of Elon; and
if Errol is the parent of Elon and Elon is the parent of another entity, then Errol becomes the
grandparent of that entity. These induced triples represent the logical consequences of the edit
that may introduce inconsistencies or propagate across unrelated domains.
3. Butterfly Efect Measurement</p>
      <p>: We aims to probe the impact of a knowledge edit using factual
questions derived from next-hop knowledge and inherent relations identified in previous steps.
After performing the edit, the model is queried to observe changes in its responses. For example,
from [Britney_Spears, :parent, Errol_Musk] we ask “Who is the parent of Britney Spears?”,
and from the induced inverse relation [Errol_Musk, :child, Britney_Spears] we ask “Who is the
child of Errol Musk?”. By comparing the model’s answers before and after the edit, we identify
discrepancies in correctness that signal local disruptions.</p>
      <p>To evaluate the side efects of knowledge editing on semantically related information, we introduce
the Butterfly Index</p>
      <p>(Equation 1). This metric quantifies the degradation in factual accuracy on
next-hop knowledge due to the edit by comparing the model’s answers before and after the update.</p>
      <p>1
 =1
ButterflyIndex =
∑ [( orig(  ) =   ) − ( edit(  ) =   )]
(1)</p>
      <p>Here,  orig and  edit denote the language model before and after editing, respectively;   is the factual
question derived from the  -th probed triple;   is the ground truth answer; and (⋅) is the indicator
function, returning 1 if the answer is correct and 0 otherwise. A higher Butterfly Index
reflects a
greater loss in accuracy on neighboring knowledge, thereby indicating stronger unintended side efects
of the edit.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment</title>
      <sec id="sec-3-1">
        <title>3.1. Experimental Setup</title>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Results and Discussion</title>
        <p>Table 2 presents the performance of representative KE-for-LLMs approaches evaluated on the original
CounterFact dataset and its probed counterpart, CounterFact-Probed, which includes next-hop
neighboring triples generated using the ButterflyKE framework. All methods achieve high accuracy on
CounterFact, confirming their efectiveness at injecting and retrieving the edited knowledge. However,
when evaluated on CounterFact-Probed, accuracy drops significantly across all methods, revealing local
inconsistencies is introduced by the edit. This degradation is quantified by the Butterfly Index , which
measures the diference in accuracy before and after editing on next-hop knowledge. For example, IKE
achieves a perfect editing accuracy of 1.0, but its accuracy on neighboring facts drops to 0.511, resulting
in a Butterfly Index of 0.489. Similarly, ROME drops from 0.87 to 0.26, yielding the highest Butterfly
Index of 0.61 among all methods. These results indicate that although edits are successful in isolation,
they often disrupt related factual knowledge embedded within the model.</p>
        <p>These findings demonstrate the presence of the Butterfly Efect in knowledge editing. A localized
factual change can unintentionally afect semantically related information within the model. This
observation reveals a key limitation of current knowledge editing techniques, which often fail to
maintain the broader contextual consistency of the model’s internal knowledge. The Butterfly Index
helps bridge this gap by ofering a principled metric that captures not only the factual accuracy but also
the semantic stability of the model after a edit.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this study, we presented ButterflyKE , a framework to evaluate local side efects of KE-for-LLMs.
By enriching the CounterFact dataset with next-hop neighboring triples, we constructed
CounterFactProbed, enabling probing of unintended impacts on semantically related knowledge. To quantify these
efects, we proposed the Butterfly Index , measuring accuracy diferences on surrounding facts before
and after editing. Experimental results show that while KE methods succeed in updating the target
information, they vary significantly in their ability to preserve adjacent facts. In particular, they
experience substantial drops in accuracy on neighboring triples, revealing local disruptions despite
successful edits. These findings confirm the presence of the Butterfly Efect in KE. This highlights a key
limitation of current approaches and emphasizes the need for methods that ensure both factual precision
and semantic stability. In future work, we will broaden the evaluation to diverse editing techniques and
domains, and extend analysis to advanced foundation models such as ChatGPT, DeepSeek, and Gemini.
We also plan to investigate deeper graph structures and multi-hop interactions to better understand
interference mechanisms and guide the design of more robust editing strategies.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT-4o to: Grammar, paraphrase, and
reword. After using this tool, the authors reviewed and edited the content as needed and assumed full
responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Knowledge Editing for Large Language Models: A Survey, ACM Comput</article-title>
          .
          <year>Surv</year>
          .
          <volume>57</volume>
          (
          <year>2024</year>
          )
          <volume>59</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>59</lpage>
          :
          <fpage>37</fpage>
          . URL: https://dl.acm.org/doi/10.1145/3698590. doi:
          <volume>10</volume>
          .1145/3698590.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <source>Unveiling the Pitfalls of Knowledge Editing for Large Language Models</source>
          ,
          <year>2024</year>
          . URL: http://arxiv.org/abs/2310.02129. doi:
          <volume>10</volume>
          .48550/arXiv.2310. 02129, arXiv:
          <fpage>2310</fpage>
          .02129 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Can We Edit Factual Knowledge by InContext Learning?</article-title>
          ,
          <year>2023</year>
          . URL: http://arxiv.org/abs/2305.12740. doi:
          <volume>10</volume>
          .48550/arXiv.2305.12740, arXiv:
          <fpage>2305</fpage>
          .12740 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosselut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Finn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Fast Model Editing at Scale,
          <year>2022</year>
          . URL: http://arxiv.org/abs/2110.11309. doi:
          <volume>10</volume>
          .48550/arXiv.2110.11309, arXiv:
          <fpage>2110</fpage>
          .11309 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Andonian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belinkov</surname>
          </string-name>
          , Locating and Editing Factual Associations in
          <string-name>
            <surname>GPT</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: http://arxiv.org/abs/2202.05262. doi:
          <volume>10</volume>
          .48550/arXiv.2202.05262, arXiv:
          <fpage>2202</fpage>
          .05262 [cs].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>