<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Protecting the Privacy in Velvet with Model Editing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giancarlo A. Xompero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Sofia Ruzzetti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristina Giannone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Favalli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raniero Romagnoli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Massimo Zanzotto</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Almawave S.p.A., Via di Casal Boccone</institution>
          ,
          <addr-line>188-190 00137, Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Human Centric ART, University of Rome Tor Vergata</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Large Language Models (LLMs) showed impressive generation abilities and are now integrated in many real-world applications. However, LLMs also tend to memorize information, including Personally Identifiable Information (PII), which can be learned and generated during inference, posing a risk to users' privacy. In this context, Model Editing techniques have been proposed recently to prevent the leakage of private information by modifying LLMs' parameters directly while preserving their generation capabilities. In this work, we show an application of Model Editing for Privacy Protection in the context of Italian data on Velvet, a multilingual LLM recently released. In particular, we focus on protection against Training Data Extraction (TDE) attacks. Empirical results from the experiments show that model editing techniques can be efective in mitigating privacy leakage in LLMs, even for Italian data, while preserving their multilingual generation capabilities.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>Model Editing</kwd>
        <kwd>Privacy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        tual sequences from training data. The success of these
attacks is evidence that the privacy of real individuals is
Large Language Models (LLMs) showed impressive gen- at risk, so methods to prevent leakage of PII are necessary.
eration capabilities in managing various tasks, and they Recently, many solutions have been proposed to mitigate
are now integrated into many real-world applications. this phenomenon, such as machine unlearning [9, 10].
Given the popularity and potential of these models, sev- Alternatively, Model Editing approaches showed
promiseral open-weight LLMs have been released to the public ing efects for protecting the privacy of users [ 11, 12, 13].
in the last years, including multilingual ones. Following The application of these methods allows us to modify
this trend, LLMs that support the Italian language have the knowledge encoded in the LLMs by breaking the
also been made available [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], thus allowing to manage association between some memorized prompts and the
tasks even in Italian. corresponding PII. Among these methods, Private
Mem
      </p>
      <p>
        However, since LLMs are now employed in many ser- orization Editing (PME) [13] is an approach that exploits
vices, they can be afected by some well-known issues, the memorization mechanism of transformers to modify
such as toxicity or privacy leakage [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which can have the association between a prompt and its related private
an important negative impact on model performance. information, showing its efectiveness in protecting LLMs
These problems raised concerns about privacy due to the from TDE attacks.
possible presence of undetected private information in In this work, we show an application of PME [13] to
training data. Prior research showed that these models protect users’ privacy for Italian data in LLMs. We
fotend to memorize training data [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5, 6</xref>
        ], thus they are cus on Velvet-2B1, a recent multilingual LLM for English
prone to memorizing Personal Identifiable Information and Italian languages. Even though the training data
(PII), which might be disclosed during the text generation. has been curated to remove PII, the model may learn
Italian LLMs can also be afected, as data used for train- some information during training. Our main objective
ing these models is often scraped from public web pages is to understand whether model editing can be extended
[7, 8], and although processes to identify and remove and used to protect users’ privacy whose PII might be
private information are used to clean data, PII could still included in training data obtained from public datasets.
be present. With PME, we can define an automatic process for
ob
      </p>
      <p>
        Privacy is critical for LLMs deployed as services, rais- scuring private information and making Velvet robust to
ing concerns about privacy leakage and thus requiring external attacks.
attention. Carlini et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] showed that extracting private We evaluate the efectiveness of our approach through
information from an LLM is possible by prompting tex- an experimental process to make Velvet more robust
against external attacks aimed at prompting the LLM to
generate memorized PII. We obtain Training Data
Extraction (TDE) attacks from a subset of documents in Italian
used to train Velvet to induce the model to leak PII; in par- 2.2. Knowledge Mechanism of
ticular, we focus on email addresses (Section 4.1). Then,
we adapt PME to Velvet and edit the model to protect
the LLM against identified TDE attacks (Section 4.2).
Finally, we measure the efectiveness of our approach by
observing the behaviour of Velvet against TDE attacks,
and we evaluate the preservation of post-edit Velvet’s
multilingual generation capabilities to ensure the edit
had no negative impact on the model (Section 4.3).
Results show that model editing can be adapted to Italian
data and make Velvet more robust against TDE attacks
by notably reducing the accuracy of attacks (Section 5.1).
      </p>
      <sec id="sec-1-1">
        <title>In addition, evaluation of post-edit Velvet suggests that</title>
        <p>the edit does not afect multilingual capabilities for both</p>
      </sec>
      <sec id="sec-1-2">
        <title>English and Italian languages (Section 5.2).</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>Given the large amount of data that is necessary to train
an LLM, the risks connected to privacy violations have
been largely investigated (Section 2.1). We describe what
mechanisms in LLMs have been identified to control
model predictions (Section 2.2), and how these insights
allow editing some undesired predictions without the
need of re-training the model (Section 2.3).</p>
      <sec id="sec-2-1">
        <title>2.1. LLMs &amp; Privacy</title>
        <p>As LLMs require large amounts of data for training, some
undesirable information may have been included in the
training material inadvertently: a person’s name, address,
email address, social security number, phone number, as
well as any other data that, when combined, could lead
to identification of individuals, are considered private
information and should not be further disseminated during
inference by an LLM. This kind of information, defined
as Personal Identifiable Information (PII), can in fact be
used to identify a specific individual, and threats their
privacy if disseminated.
rial, an LLM can leak it during inference. In fact, LLM
may memorize that information [14, 15, 16] and
consequently cause privacy leaks at inference time. A number
of attacks have been designed to exploit this tendency</p>
        <sec id="sec-2-1-1">
          <title>For LLMs, even in black-box access the right prompt may</title>
          <p>
            be suficient to obtain private information. While some
attacks require the attacker to craft an adversarial input
for the model [19, 20], other attacks do not even rely on
potentially harmful prompts [
            <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 6, 4, 5</xref>
            ].
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Developing techniques for the preservation of individuals privacy is central to make LLMs more robust, and trustworthy.</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Transformers</title>
        <p>Transformer-based Language Model Predictions
We consider the forward pass of a Transformer-based
decoder-only model ℳ of  layers and describe it in
terms of its sub-components on a prompt . Given the
tokenized prompt  = [1, ..., ] and their
corresponding input embeddings (0), a model builds the
prediction for the next token +1 with an iterative refinement
across layers. At a given layer , given the Attention</p>
        <sec id="sec-2-2-1">
          <title>Block as Attn, the layer normalization as LN and the</title>
          <p>Feed Forward block FFN, the output of that layer ()
is computed as:
∀ ∈ {1, . . . , } :
⎪
⎪
⎪
⎪⎨˜()
⎧() = Attn(LN((− 1)))</p>
          <p>= (− 1) + ()
⎪
⎪
⎪⎩() = ˜()
⎪() = FFN(LN(˜()))
+ ()
(1)</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>On the last position , at the last layer , the hidden rep</title>
          <p>resentation 
() is projected by a matrix  ∈
R×|  |
onto the vocabulary  space. The scores obtained,
normalized by a softmax function  , predict the next token:
ℳ() = arg max 
︁(

() ︁)
= +1</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>We aim to understand what are the mechanisms that control for the generation of next token, and if it is possible to alter them to modify the predictions on the next token when the model leaks private information.</title>
          <p>FFN Layers as Knowledge Memories</p>
          <p>
            Feed Forward
blocks FFN play a crucial role in the generation
mechanism of the model, and not only because they account
for most of the parameters of the network. The
interpretation of the Feed-Forward block in a Transformer
model is that it implements a mapping of paired keys
exception of activation function that is usually a ReLU
rather than a softmax, the equation for the Feed Forward
layer reminds the one that describes a neural memory
[
            <xref ref-type="bibr" rid="ref7">23</xref>
            ]. The Feed Forward block is in fact composed of
function  that process each position  ∈ [1, ..., ] of
the input independently from one another. The output
ℎ() of the Feed Forward block at the -th position of the
input is computed as follow:
(), 
()
          </p>
          <p>∈ R× 1 and an activation
︁(
ℎ() =  ˜()
())︁ ()</p>
          <p>(2)
where ˜() is the sum output of the Attention Block and
the output of the previous layer as in Equation 1. The</p>
          <p>
            However, once a PII is included in the training mate- to values [
            <xref ref-type="bibr" rid="ref6">21, 22</xref>
            ]. Geva et al. [21] notice that, with the
and extract private information from LLMs [
            <xref ref-type="bibr" rid="ref2">2, 17, 18</xref>
            ]. two matrices,
          </p>
          <p>()
keys of the memory are produced by the output of 
and the non-linear function  , while the values are the
corresponding columns in ().</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Editing Knowledge of LLMs</title>
        <p>
          currently stored in () for the new keys * . Since we
have * ⊆ 0 because the new keys are representations
already stored in (), and the new values 0* satisfy
0* ⊆ 0, we can define ()* = 0* . The equation
for ∆ () can be written as:
In the last years, there was a major interest around al- ∆ () = ( * − 0* )*  (00 + * *  )− 1 (6)
ternative methods to modify specific behaviors of LLMs
without retraining the entire model from scratch. Based We will use the matrix ∆ () to edit the memorized
mapon the insights about the knowledge mechanism of trans- ping in layer , without retraining.
formers, the research area of knowledge editing has been Since we do not have access to 0, Meng et al. [
          <xref ref-type="bibr" rid="ref10">26</xref>
          ]
lfourishing, with the number of methods and approaches assumes that this representation can be modeled with a
growing further. random sample of inputs, so 00 can be defined as
        </p>
        <p>
          Currently, knowledge editing methods can be roughly follows:
divided in two categories: parameter-preserving and
parameter-editing methods [
          <xref ref-type="bibr" rid="ref8">24</xref>
          ]. While parameter- 0() =  · E[ ] ≜ 00 , (7)
preserving methods rely on external adapters or
memories to intervene whenever there is a specific situation where  · E[ ] is an uncentered covariance statistics
requiring a diferent response, parameter-editing meth- computed on an empirical sample of vector inputs to the
ods are based on the theory about the knowledge mecha- layer. In this paper, we refer to it with 0 rather than
nism of transformers and modify the parameters of the 0() for simplicity.
        </p>
        <p>LLM directly, without the need of external modules like
parameter-preserving solutions. 2.4. Model Editing for Privacy</p>
        <p>
          We focus on parameter-editing methods: basically, Preservation
given an LLM ℳ with parameters  , parameter-editing
methods aim at finding a shift in parameters ∆ to obtain In recent studies, model editing techniques have been
a new model  +Δ, which allows to modify a specific applied to the context of privacy protection.
prediction while preserving the non-target generation Wu et al. [11] propose DEPN, which is a method that
capabilities. ROME [
          <xref ref-type="bibr" rid="ref9">25</xref>
          ] and MEMIT [
          <xref ref-type="bibr" rid="ref10">26</xref>
          ], in particular, locates neurons associated with private information, and
are parameter-editing approaches designed to edit the then edits their corresponding activations to remove their
LLMs’ parameters in a localized manner and are based on contribution to prediction.
the interpretation of Feed Forward layers as memories, as Patil et al. [
          <xref ref-type="bibr" rid="ref12">28</xref>
          ] showed an application of ROME [
          <xref ref-type="bibr" rid="ref9">25</xref>
          ]
introduced in Section 2.2. Under this interpretation, then, and MEMIT [
          <xref ref-type="bibr" rid="ref10">26</xref>
          ] to remove private information from
the matrix () is optimizing the mapping between keys FFN layers of transformers. This approach exploits the
and values, that is: association mechanism to break the associations leading
to the leakage of private information.
() =  min ∑︁ ||̂︁0 − 0||2 (3) Venditti et al. [12] propose PAE, a data-driven
ap̂︁ (0,0) proach based on the editing mechanism of MEMIT,
aiming to break the association between an individual and
with 0 ∈ 0 being a set of keys to memorize and their corresponding PIIs. The method uses prompt
tem0 ∈ 0 the corresponding values [
          <xref ref-type="bibr" rid="ref10 ref11 ref9">25, 26, 27</xref>
          ]. Given plates filled with the information about an individual and
the linearity of the system in Equation 3, the optimal their corresponding PII, to replace the private
informasolution can be computed as: tion with a dummy PII, thus preventing the leakage of
the real PII.
        </p>
        <p>
          () = 00 (00 )− 1 (4) Ruzzetti et al. [13] propose PME an automatic approach
Additionally, a closed-form equation can be found to taking advantage of the memorization mechanism in
calculate the edit to introduce new keys and values into LLMs. This approach basically uses memorized prompts
the mapping [
          <xref ref-type="bibr" rid="ref10 ref9">25, 26</xref>
          ]. Given a representation of keys 0 inducing privacy violation to remove associated PIIs.
Unand values 0 stored in that matrix, and the representa- like other locate-and-edit methods, PME distributes the
tions for the new keys * and values  * to store. residual for the editing among all the FFN layers of the
transformer. The main advantage of this method is that
∆ () = ( * − ()* )*  (00 + * *  )− 1 it can be used automatically on collected prompts
with(5) out the need of further manual analysis to determine
()* represents the residual be- the source of the knowledge, allowing for an automatic
algorithm for privacy protection.
        </p>
        <p>The term  * −
tween the new desired values  * and the old values</p>
        <p>In this paper, we apply PME because of its advantages,
original PII in the training material: the evaluation is
in particular the fact that it does not rely on assumptions
rigorous since a strict match between the generated PII
such as which layers to modify or which part of a text
and the one found in the training material is required.
retrieves the critical information, thus allowing for an
automated process.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Application and Method</title>
      <sec id="sec-3-1">
        <title>3.1. PII Leakage via Training Data</title>
      </sec>
      <sec id="sec-3-2">
        <title>Extraction attacks</title>
        <sec id="sec-3-2-1">
          <title>PII is private information that may have been inadver</title>
          <p>
            tently included in the training dataset and can be
extracted from an LLM using Training Data Extraction
attacks (TDE) [
            <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5, 6</xref>
            ]. In the initial formulation of
TDE attacks, Carlini et al. [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] demonstrate that black-box
access to an LLM can be suficient to extract memorized
information from a model: when prompted with a
context that has been included in the training material, the
target LLM tends to generate verbatim the continuation
of the original document. Among the generated verbatim
memorized content, a model may also generate private
information that should not be disseminated.
          </p>
          <p>
            Formally, given a model ℳ, a string  is -extractable
memorized if there exists a context string  of  tokens
such that the concatenation of [ || ] is contained in the
training material for ℳ and ℳ generates  exactly when
prompted with  in greedy decoding. When the context
exactly matches a sequence of the training material, the
success of the attack is maximized [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ], and since this is
the most informative setting that the attacker can obtain,
this is the worst-case scenario.
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>The success of the attack increases as the attacker</title>
          <p>
            gets more information regarding the training material:
one crucial aspect is the length  of the context that
the model is fed with [
            <xref ref-type="bibr" rid="ref4 ref5">5, 4</xref>
            ]: the longer the context, the
larger the probability of emission of verbatim memorized
information.
          </p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Since LLMs have been shown to memorize PII rather</title>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.2. PME for Automatic Privacy</title>
      </sec>
      <sec id="sec-3-4">
        <title>Mitigation</title>
        <sec id="sec-3-4-1">
          <title>To address the threats posed by TDE attacks, we adopt</title>
          <p>
            Private Memorization Editing (PME) [13], a model editing
strategy that aims to leverage the memorization
tendencies of LLMs as a defense strategy. The objective of the
method is to reduce the success of TDE attacks, and hence
to replace the memorized PII with a semantically
equivalent, but privacy-preserving information. PME applies
the editing on the Feed Forward layers of the models,
similarly to other model editing techniques like ROME
[
            <xref ref-type="bibr" rid="ref9">25</xref>
            ] and MEMIT [
            <xref ref-type="bibr" rid="ref10">26</xref>
            ].
          </p>
          <p>As discussed in Section 2.3, once one knows the
correct representation for keys and values that the ()
encodes, it is possible to apply the closed form solution
in Equation 6 to perform the update. To compute the
correct representation for keys and values, PME directly
exploits training material verbatim memorized from a
model.</p>
          <p>When the model is prompted with a context  that is
included in the training material that causes the
generation of a PII, PME edits the model to obtain a
privacypreserving output instead. In each layer, the keys are the
hidden representation that the model computes for the
context prompt as in Equation 2, so () =  ˜()())︁ .
︁(</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>For the values, the new privacy-preserving value</title>
          <p>should be encoded with an appropriate vector
representation. For this reason, PME initially optimizes a hidden
representation * in the last layer of the model: using</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>Gradient Descent, PME optimizes * so that, once de</title>
          <p>coded with the projection matrix on the vocabulary, it
gives the highest probability of generating a dummy,
privacy-preserving value.</p>
          <p>Then, the underlying hypothesis in PME is that each
than associating them with an individual identity [5, 12, layer should contribute to the generation of this
last</p>
        </sec>
        <sec id="sec-3-4-4">
          <title>2], those attacks represent one of the crucial challenges</title>
          <p>layer representation * . PME mimics the generation of
to protect individuals whom information have been in- the PII: with a forward pass on the memorized context,
advertently added to the training material of an LLM.</p>
          <p>the method quantifies how much each layer contributes</p>
          <p>Hence, we initially perform TDE attacks against our to the generation of the memorized PII. Instead of
relytarget model: we simulate an informed attacker who
has some background knowledge regarding the training
material, with increasing level of information.</p>
          <p>
            For a
given PII, we collect the context that precede it in the
training materials, and produce 50, 100, and
200-tokenslong sequences (see Section 4.1 for further details) as we
expect that a more informed attacker may obtain larger
volume of information. The model is then prompted to
generate the subsequent 100 tokens: the attack succeeds
if – in greedy decoding – the generated PII matches the
ing on Causal Mediation Analysis as in MEMIT [
            <xref ref-type="bibr" rid="ref10">26</xref>
            ] or
other localization techniques that have been shown to
not inform the edit [
            <xref ref-type="bibr" rid="ref13 ref14">29, 30</xref>
            ] for identifying a restricted
number of contributing layers, a contribution coeficient
          </p>
          <p>is
computed for each layer following a geometric approach.</p>
          <p>
            Since the computation of a Transformer model can be
described as a sum of its sub-components at each layer
[
            <xref ref-type="bibr" rid="ref15">31, 32</xref>
            ], PME computes the contribution coeficient
of each layer as the projection of that layer Feed Forward
output onto the last layer Feed Forward representation:
()
the larger the projection, the larger the impact of that Sample for computing PME Editing Statistics An
layer on the overall sum. This contribution coeficient – important step required by PME to perform the desired
rescaled to obtain a sum of one across diferent layers – edit is the uncentered covariance statistic 0() described
is then used to represent a fraction of * , proportionally in Eq.7. This is an estimation of the keys stored in the
to the contribution coeficient () of that layer, that is, at corresponding -th FFN layer, so we need to build an
each layer the value () = ()* empirical sample of vector inputs for the layer, which are
          </p>
          <p>Once the correct representation for keys and privacy obtained by feeding the LLM with sample texts. Since
preserving values is computed, then the edit can be per- we are dealing with a multilingual LLM trained on both
formed as in Equation 6, and the post-edit model should English and Italian texts, we prepare two samples of
not generate the target PII under TDE attacks. 100k documents each from English and Italian Wikipedia
subsets of the pre-training data used for Velvet-2B. The
4. Experimental Setting purpose of these samples is to understand the efects on
the editing performance of 0 computed on diferent
languages.</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>4.2. Application of PME</title>
        <p>In this section, we discuss the experimental setting we
use to assess the efectiveness of our approach.
Specifically, we define: (1) the process for data preparation
to obtain the TDE attacks and relative leaked
information (Section 4.1), (2) how PME is adapted and applied
to Velvet (Section 4.2), and (3) how we evaluate the
effectiveness of our privacy protection approach and the
post-edit preservation of Velvet’s capabilities (Section
4.3). For these experiments, we focus on email addresses
of Italian data as PII, and Velvet-2B as our target LLM.</p>
        <p>Mitigating Privacy Leakage Our strategy is to
prevent Velvet from generating memorized PIIs during
inference by applying PME to Velvet on identified TDE attacks
reported in Section 4.1. PME allows to edit the relative
knowledge of PII associated with multiple memorized
prompts by modifying the LLM’s parameters directly.</p>
        <p>
          The main advantage of this method is that we can edit
the TDE attacks directly and there is no need to specify
4.1. Data Preparation which layers are the target of the edit, unlike methods
Training Data Extraction Attacks As we discussed such as MEMIT [
          <xref ref-type="bibr" rid="ref10">26</xref>
          ].
in Section 3.1, Training Data Extraction attacks are based Based on this, for every attack (, ) with  = ℳ(),
on documents and prompts that the LLM has seen during  the prompt attack and  the leaked PII, we use PME
training, which induce a target LLM to complete the given to edit the knowledge encoded in Velvet’s FFN layers
prompts with a text verbatim memorized by the model. to force the new association (, ), where  is the new
Since LLMs are prone to leak PII during generation due dummy PII mail@domain.com, which is semantically
to possible contamination of training data with PII, we similar to the original PII. With this method, our
objecprepare Training Data Extraction attacks by analyzing tive is to reduce the accuracy of attacks, modifying the
a subset of the training data used for Velvet. We focus prediction of the LLM to prevent the generation of the
on the Italian subset of CulturaY [33], one of the public leaked information.
datasets seen by Velvet during the pre-training phase. We perform the editing process with an approach
        </p>
        <p>
          We focus on potentially harmful prompts, since our called sequential batch editing [12, 13], in which several
main objective is to study the feasibility of protecting prompts are edited in multiple steps, with a batch of
mulagainst TDE rather than assessing their accuracy. To do tiple examples edited at each step. For our experiments,
that, we define the following protocol. We filter all docu- we fixed the batch size to 16.
ments in the dataset that contain at least one email
address in them. Then, once we obtain only documents con- Computing Multilingual 0 for PME PME [13],
taining PII, we prepare batches of diferent potential TDE ROME [
          <xref ref-type="bibr" rid="ref9">25</xref>
          ] and MEMIT [
          <xref ref-type="bibr" rid="ref10">26</xref>
          ] require a representation
attack prompts of diferent lengths  ∈ {50, 100, 200}, of the keys 0 stored in the -th FFN layer to apply the
by selecting the  tokens preceding the identified PII. formula defined in Eq.6, which can be modeled as the
After obtaining a set of potential attacks, we deduplicate quantity 0() defined in Eq.7. This quantity is obtained
similar prompts. In order to select efective attacks, we by computing an uncentered covariance statistics on an
prompt Velvet-2B with the collected attacks and induce empirical sample of vector inputs to the layer when
parsthe model to generate 100 tokens: if the email address ing a sample of documents. For our experiments, we
generated by the model for a given prompt is the one prepare three types of 0 for PME on the text samples
expected as in the training data, we add it to the set of described in Section 4.1:
TDE attacks.
• IT: computed on the Italian sample;
• EN: computed on the English sample;
• multi: computed on the English and Italian
sam
        </p>
        <p>ples combined.</p>
        <sec id="sec-3-5-1">
          <title>We compute these statistics for all the FFN layers of Velvet</title>
          <p>
            following the same procedure carried out by Meng et al.
[
            <xref ref-type="bibr" rid="ref10">26</xref>
            ].
          </p>
          <p>This statistic plays a crucial role in Eq.6, as it allows
us to determine the interaction between the new keys
and the knowledge stored in that layer. An efective
computation of this statistic is necessary to obtain efective
edits, and we empirically explore how diferent estimates
of 0 may afect the edit in a multilingual setting.</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>4.3. Evaluation</title>
        <p>model. For this reason, we need to determine whether
the editing had a negative impact on the multilingual
generative capabilities of our LLM, thus afecting its skills
in non-related tasks.</p>
        <p>We adopt an automatic evaluation strategy similar to
the one used by Venditti et al. [12] to measure the
reliability of our post-edit models. We compare the generation
capabilities of the post-edit and pre-edit versions of
Velvet by measuring the similarity of generated texts on a
sample of prompts in terms of BLUE [34] and METEOR
[35] scores. For comparison, we consider the subsequent
50 tokens generated by each model after receiving in
input the first 100 token of each prompt of our sample.</p>
        <p>We perform the evaluation on a sample of 500 prompts
for the English and Italian languages, which is defined
as follows:</p>
        <p>multi
EN
IT
multi
EN
IT
multi
EN
IT</p>
        <p>Context Length
50
100
200</p>
        <p>Post-Edit Multilingual Generation Capabilities An
important aspect of model editing methods is that they
are designed to modify specific knowledge of LLMs, while
preserving the non-related generative capabilities of the
Post-Edit Attack Accuracy PME efectively protects
the privacy in Velvet if the parameter edit reduces the
number of successful TDE attacks against the model. • English sample: 100 prompts from Books3,
Therefore, the efectiveness of our approach is assessed Wikipedia-en, and Pile-CC subsets of the Pile,
by measuring the post-edit privacy leakage efects and respectively;
comparing them with the ones of the pre-edit model. • Italian sample: 100 prompts from Clean-C4 and</p>
        <p>We adopted the same measure used by Ruzzetti et al. Wikipedia-it, respectively.
[13], that is, the Attack Accuracy for memorization
attacks. After we edit Velvet for TDE attacks of  ∈ The composition of this sample allows to have an
indica{50, 100, 200} context lengths, we measure the Attack tion of the impact of PME editing on post-edit language
Accuracy of post-edit models and compare their scores capabilities of Velvet.
with the ones of the pre-edit version of Velvet. We feed We also extend the utility evaluation by measuring the
the TDE prompts to both post-edit and pre-edit version post-edit accuracy of Velvet on LAMBADA[36], one of
of Velvet, and then let them generate 100 tokens: if the the tasks included in EleutherAI Language Model
Evalgenerated text for each attack contains the expected PII, uation Harness[37]. LAMBADA is used to measure the
then the attack is considered successful. accuracy of a model in generating the missing target word
from a passage given in input. For the evaluation, we
focus on the full test split of the dataset to measure the
reliability of the edit. Since we are interested in evaluating
the preservation of the post-edit multilingual
capabilities of the model, we use both the English and Italian</p>
        <p>Editing Attacks
PME 0 Context Prompts</p>
        <p>IT
EN 50 83
multi
IT
EN 100 380
multi
IT
EN 200 34
multi</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Results and Discussion</title>
      <sec id="sec-4-1">
        <title>5.1. Editing reduces Privacy Risks</title>
        <sec id="sec-4-1-1">
          <title>Finally, we observe that the diferent statistics com</title>
          <p>puted as an approximation of 0 do not greatly afect the
post-edit attack accuracy, with a rather similar number
of leaked PII in each configuration.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Generation Capabilities are Preserved</title>
        <p>As we observed during the extraction and filtering phase The results reported in Table 2 show that BLEU and
MEof TDE attacks (see Sec. 4.1), Velvet memorized some PII TEOR scores are high in general for all the diferent
vercontained in the pre-training data. For diferent context sions of 0 and attacks used for editing, and the same
lengths  ∈ {50, 100, 200}, we obtained 83, 380, and observation holds for both English and Italian
genera34 leaked email addresses, respectively, with the same tion capabilities. The overall high scores suggest that
number of memorized prompts. Surprisingly, context of the generations of post-edit models are quite similar to
200 tokens obtained less leaked PII than shorter prompts. the generated texts of the pre-edit model. This aspect, as
In this phase, we observe that a slightly diferent prompt discussed in [12], suggests that the edit is robust, because
composition might afect the results: so in pre and post- it does not interfere with multilingual capabilities in both
edit we adopt the same batch size and batch composition, English and Italian languages.
to ensure the reproducibility of the results. Interestingly, the scores show that there is no real
con</p>
        <p>The results reported in Table 1 show that PME is ef- sensus on the type of statistics that is the best for the
fective in reducing the risks of privacy leakage. The English language, since the highest scores are shared
post-edit versions of Velvet for contexts 50 and 100 are between the EN and multi 0. However, we note that
more robust than the pre-edit model, leaking less than the IT version of 0 obtains lower scores than the other
9 and 16 PII with respect to 75 and 341 leaked by the two versions in general, suggesting that the IT statistics
pre-edit Velvet. The efect is similar for all the versions leads to a less efective preservation of Velvet’s
generaof 0 used by PME for editing, with minimal diferences tion capabilities for English.
among them: in fact, the diference is of 4 more leaked Observing the evaluation results for Italian, we notice
PII at best for context 100. that IT version of 0 achieves higher BLEU and
ME</p>
        <p>The number of leaked email addresses is reduced even TEOR scores, suggesting that this version is necessary to
for context 200 attacks, where post-edit Velvet leaked preserve the generation capabilities of Velvet for Italian.
17 PII instead of 31 of the pre-edit model. However, the Also, we note that the EN version of 0 tends to achieve
reduction here is lower compared with the other attacks, lower scores with respect to the other types, indicating
probably due to the lower number of PII extracted during that this 0 is less efective for preserving the abilities
the data processing phase. for Italian.</p>
        <p>Note that results also show that the model tends to In general, observed results indicate that using
vergenerate a large number of email addresses in general, sions of 0 computed on a diferent language from the
which are diferent from the correct ones. These difer- target one is less efective for preserving the generative
ent email addresses could be model’s hallucinations, or capabilities of the target language in post-edit. In fact, the
email addresses that follow the original one in the pre- IT version of 0 obtained lower scores for the English
training corpus. However, results in terms of successfully language, and the EN version of 0 was less efective for
Leaked PII suggest that PME is still suficiently efective the Italian language. Thus, these experiments suggest
in preserving privacy on edited prompts. that 0 should be computed on samples containing texts
in the target languages.</p>
        <p>multi
EN
IT
multi
EN
IT
multi
EN
IT</p>
        <p>50
100
200
83
380
34</p>
        <p>About task performance, results reported in Table 2 of
the LAMBADA benchmark corroborate the utility
preservation already observed with the previous evaluation
analysis. The accuracy scores of post-edit models are
comparable with the pre-edit ones, suggesting that the
edits performed by PME do not afect considerably the
capabilities of the model. The same observation holds for
both English and Italian versions of LAMBADA.
Diferently from the previous analysis, there are no noticeable
losses in terms of performance with respect to the
version of 0 used for the editing, except for the Italian
score of context-100 editing with EN 0 that is lower
than the pre-edit score (42.1 vs 45.2). Hence, this result
indicates that edits performed by PME are reliable in
general, allowing privacy protection of Velvet for Italian data
without loss of task performance.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusions and Future Work</title>
      <sec id="sec-5-1">
        <title>In this work, we show an application of model editing for protecting the privacy of Italian data on Velvet-2B, a multilingual model trained on both Italian and English data.</title>
        <p>Our method is based on a recent model editing
technique named Private Memorization Editing, which
prevents LLMs from generating memorized PII that might be
included in the training data. Results of our experiments
on privacy protection for email addresses shows that
model editing is efective in reducing the privacy risks
of Velvet, thus reducing the success of Training Data
Extraction (TDE) attacks, harmful prompts obtained from
the training data that are efective for extracting private
information from the original model. In addition, we
show that our approach mitigates the privacy risks while
preserving the model’s multilingual generation
capabilities.</p>
        <p>In conclusion, our approach shows that we can adapt
and apply model editing techniques for privacy
protection in multilingual LLMs for Italian data.</p>
        <p>For future work, we should focus on some other
aspects to further improve this work. Firstly, our approach
should be extended to diferent types of PII other than
email addresses, and further investigation is necessary to
understand the efects of the approach with diferent PII.</p>
        <p>Another aspect to consider is how well PME scales with
larger models such as Velvet-14B: this other model
requires additional investigation, because it manages other
languages other than English and Italian, and the
magnitude of data used for training is larger than the one
used for Velvet-2B. Finally, the evaluation of Velvet’s
post-edit capabilities should be extended to other tasks
of the Language Model Evaluation Harness[37] or other
benchmarks, and include human evaluation to have a
better perspective on the overall quality of post-edit models
instead of relying exclusively on automatic metrics.
aclanthology.org/2022.findings-emnlp.148. doi: 10. J. Nabende, E. Shutova, M. T. Pilehvar (Eds.),
Pro18653/v1/2022.findings-emnlp.148. ceedings of the 63rd Annual Meeting of the
Asso[6] M. Nasr, N. Carlini, J. Hayase, M. Jagielski, A. F. ciation for Computational Linguistics (Volume 1:
Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wal- Long Papers), Association for Computational
Linlace, F. Tramèr, K. Lee, Scalable extraction of train- guistics, Vienna, Austria, 2025, pp. 16572–16592.
ing data from (production) language models, arXiv URL: https://aclanthology.org/2025.acl-long.810/.
preprint arXiv:2311.17035 (2023). [14] S. Biderman, U. Prashanth, L. Sutawika,
[7] T. Nguyen, C. V. Nguyen, V. D. Lai, H. Man, N. T. H. Schoelkopf, Q. Anthony, S. Purohit, E. Raf,
Ngo, F. Dernoncourt, R. A. Rossi, T. H. Nguyen, Emergent and predictable memorization in large
CulturaX: A cleaned, enormous, and multilingual language models, Advances in Neural Information
dataset for large language models in 167 lan- Processing Systems 36 (2023) 28072–28090.
guages, in: N. Calzolari, M.-Y. Kan, V. Hoste, [15] F. Ranaldi, E. S. Ruzzetti, D. Onorati, L. Ranaldi,
A. Lenci, S. Sakti, N. Xue (Eds.), Proceedings of C. Giannone, A. Favalli, R. Romagnoli, F. M.
Zanthe 2024 Joint International Conference on Com- zotto, Investigating the impact of data
conputational Linguistics, Language Resources and tamination of large language models in
text-toEvaluation (LREC-COLING 2024), ELRA and ICCL, SQL translation, in: L.-W. Ku, A. Martins,
Torino, Italia, 2024, pp. 4226–4237. URL: https: V. Srikumar (Eds.), Findings of the Association
//aclanthology.org/2024.lrec-main.377. for Computational Linguistics: ACL 2024,
Asso[8] L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, ciation for Computational Linguistics, Bangkok,
C. Foster, J. Phang, H. He, A. Thite, N. Nabeshima, Thailand, 2024, pp. 13909–13920. URL: https:
S. Presser, C. Leahy, The pile: An 800gb dataset //aclanthology.org/2024.findings-acl.827/. doi: 10.
of diverse text for language modeling, 2020. 18653/v1/2024.findings-acl.827.
arXiv:2101.00027. [16] H. Kiyomaru, I. Sugiura, D. Kawahara, S.
Kuro[9] Y. Yao, X. Xu, Y. Liu, Large language model unlearn- hashi, A comprehensive analysis of
memorizaing, 2024. URL: https://arxiv.org/abs/2310.10683. tion in large language models, in: S. Mahamood,
arXiv:2310.10683. N. L. Minh, D. Ippolito (Eds.), Proceedings of the
[10] A. Kassem, O. Mahmoud, S. Saad, Preserving 17th International Natural Language Generation
privacy through dememorization: An unlearn- Conference, Association for Computational
Lining technique for mitigating memorization risks guistics, Tokyo, Japan, 2024, pp. 584–596. URL:
in language models, in: H. Bouamor, J. Pino, https://aclanthology.org/2024.inlg-main.45/.
K. Bali (Eds.), Proceedings of the 2023 Conference [17] B. Yan, K. Li, M. Xu, Y. Dong, Y. Zhang, Z. Ren,
on Empirical Methods in Natural Language Pro- X. Cheng, On protecting the data privacy of large
cessing, Association for Computational Linguis- language models (llms): A survey, arXiv preprint
tics, Singapore, 2023, pp. 4360–4379. URL: https: arXiv:2403.05156 (2024).
//aclanthology.org/2023.emnlp-main.265. doi:10. [18] A. Verma, S. Krishna, S. Gehrmann, M. Seshadri,
18653/v1/2023.emnlp-main.265. A. Pradhan, T. Ault, L. Barrett, D. Rabinowitz,
[11] X. Wu, J. Li, M. Xu, W. Dong, S. Wu, C. Bian, J. Doucette, N. Phan, Operationalizing a threat
D. Xiong, DEPN: Detecting and editing privacy neu- model for red-teaming large language models (llms),
rons in pretrained language models, in: H. Bouamor, arXiv preprint arXiv:2407.14937 (2024).
J. Pino, K. Bali (Eds.), Proceedings of the 2023 Con- [19] F. Perez, I. Ribeiro, Ignore previous prompt: Attack
ference on Empirical Methods in Natural Language techniques for language models, in: NeurIPS ML
Processing, Association for Computational Linguis- Safety Workshop, 2022.
tics, Singapore, 2023, pp. 2875–2886. URL: https: [20] X. Shen, Z. Chen, M. Backes, Y. Shen, Y. Zhang, " do
//aclanthology.org/2023.emnlp-main.174. doi:10. anything now": Characterizing and evaluating
in18653/v1/2023.emnlp-main.174. the-wild jailbreak prompts on large language
mod[12] D. Venditti, E. S. Ruzzetti, G. A. Xompero, els, in: Proceedings of the 2024 on ACM SIGSAC
C. Giannone, A. Favalli, R. Romagnoli, F. M. Conference on Computer and Communications
SeZanzotto, Enhancing data privacy in large lan- curity, 2024, pp. 1671–1685.
guage models through private association edit- [21] M. Geva, R. Schuster, J. Berant, O. Levy,
Transing, 2024. URL: https://arxiv.org/abs/2406.18221. former feed-forward layers are key-value
memarXiv:2406.18221. ories, in: M.-F. Moens, X. Huang, L. Specia,
[13] E. S. Ruzzetti, G. A. Xompero, D. Venditti, F. M. S. W.-t. Yih (Eds.), Proceedings of the 2021
ConZanzotto, Private memorization editing: Turning ference on Empirical Methods in Natural
Lanmemorization into a defense to strengthen data guage Processing, Association for Computational
privacy in large language models, in: W. Che, Linguistics, Online and Punta Cana,
Domini</p>
        <p>Declaration on Generative AI
During the preparation of this work, the author(s) used Grammarly in order to: Grammar and
spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Orlando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Moroni</surname>
          </string-name>
          , P.-L. Huguet
          <string-name>
            <surname>Cabot</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Conia</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Barba</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Orlandini</surname>
            , G. Fiameni,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , Minerva LLMs:
          <article-title>The first family of large language models trained from scratch on Italian data</article-title>
          , in: F.
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Montemagni</surname>
          </string-name>
          , R. Sprugnoli (Eds.),
          <source>Proceedings of the 10th Italian Conference on Computational Linguistics (CLiCit</source>
          <year>2024</year>
          ), CEUR Workshop Proceedings, Pisa, Italy,
          <year>2024</year>
          , pp.
          <fpage>707</fpage>
          -
          <lpage>719</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .clicit-
          <volume>1</volume>
          .77/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Miranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S.</given-names>
            <surname>Ruzzetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Santilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Zanzotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bratières</surname>
          </string-name>
          , E. Rodolà,
          <article-title>Preserving privacy in large language models: A survey on current threats and solutions</article-title>
          ,
          <source>Transactions on Machine Learning Research</source>
          (
          <year>2025</year>
          ). URL: https://openreview.net/forum? id=
          <fpage>Ss9MTTN7OL</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tramer</surname>
          </string-name>
          , E. Wallace,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jagielski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          , D. Song,
          <string-name>
            <given-names>U.</given-names>
            <surname>Erlingsson</surname>
          </string-name>
          , et al.,
          <article-title>Extracting training data from large language models</article-title>
          ,
          <source>in: 30th USENIX Security Symposium (USENIX Security 21)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2633</fpage>
          -
          <lpage>2650</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ippolito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jagielski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <source>Quantifying memorization across neural language models</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2202</volume>
          .
          <fpage>07646</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. C.-C. Chang</surname>
          </string-name>
          ,
          <article-title>Are large pretrained language models leaking your personal information?</article-title>
          , in: Y.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kozareva</surname>
          </string-name>
          , Y. Zhang (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>2038</fpage>
          -
          <lpage>2047</lpage>
          . URL: https:// can Republic,
          <year>2021</year>
          , pp.
          <fpage>5484</fpage>
          -
          <lpage>5495</lpage>
          . URL: https: doi:10.1162/tacl_a_
          <fpage>00501</fpage>
          . //aclanthology.org/
          <year>2021</year>
          .emnlp-main.
          <volume>446</volume>
          . doi:10. [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ferrando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bisazza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Costa-jussà</surname>
          </string-name>
          ,
          <source>A</source>
          <volume>18653</volume>
          /v1/
          <year>2021</year>
          .
          <article-title>emnlp-main.446. primer on the inner workings of transformer-based</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Geva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Caciularu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          , language models,
          <year>2024</year>
          . URL: https://arxiv.org/abs/ Transformer feed-forward
          <source>layers build predictions 2405</source>
          .00208. arXiv:
          <volume>2405</volume>
          .00208.
          <article-title>by promoting concepts in the vocabulary space</article-title>
          , [33]
          <string-name>
            <given-names>H. N. Thuat</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , T. Nguyen, Culturay: A large in: Y.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kozareva</surname>
          </string-name>
          , Y. Zhang (Eds.),
          <source>cleaned multilingual dataset of 75 languages</source>
          ,
          <year>2024</year>
          . Proceedings of the 2022 Conference on Empir- [34]
          <string-name>
            <given-names>T.</given-names>
            <surname>Glushkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zerva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F. T.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <article-title>BLEU ical Methods in Natural Language Processing, meets COMET: Combining lexical and neural metAssociation for Computational Linguistics, Abu rics towards robust machine translation evaluaDhabi</article-title>
          ,
          <source>United Arab Emirates</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>45</lpage>
          . tion, in: M.
          <string-name>
            <surname>Nurminen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Brenner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Koponen</surname>
          </string-name>
          , URL: https://aclanthology.org/
          <year>2022</year>
          .emnlp-main.3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Latomaa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikhailov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schierl</surname>
          </string-name>
          , T. Ranasdoi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .emnlp-main.3.
          <string-name>
            <surname>inghe</surname>
            , E. Vanmassenhove,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Vidal</surname>
          </string-name>
          , N. Aran-
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sukhbaatar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>End-</surname>
            to- berri, M. Nunziatini,
            <given-names>C. P.</given-names>
          </string-name>
          <string-name>
            <surname>Escartín</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Forcada, end memory networks</article-title>
          , Advances in
          <string-name>
            <given-names>Neural</given-names>
            <surname>Infor- M. Popovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scarton</surname>
          </string-name>
          , H. Moniz (Eds.),
          <source>Proceedmation Processing Systems</source>
          <volume>28</volume>
          (
          <year>2015</year>
          ).
          <source>ings of the 24th Annual Conference of the European</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tian</surname>
          </string-name>
          , S. Cheng,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Deng</surname>
          </string-name>
          , Association for Machine Translation, European AsH. Chen,
          <string-name>
            <surname>N. Zhang,</surname>
          </string-name>
          <article-title>Editing large language mod- sociation for Machine Translation</article-title>
          , Tampere, Finels: Problems, methods, and opportunities,
          <year>2023</year>
          . land,
          <year>2023</year>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>58</lpage>
          . URL: https://aclanthology. arXiv:
          <volume>2305</volume>
          .13172. org/
          <year>2023</year>
          .eamt-
          <volume>1</volume>
          .6.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Andonian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belinkov</surname>
          </string-name>
          , Locat- [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavie</surname>
          </string-name>
          ,
          <string-name>
            <surname>METEOR:</surname>
          </string-name>
          <article-title>An automatic meting and editing factual associations in gpt, 2023. ric for MT evaluation with improved correlation arXiv:2202.05262. with human judgments</article-title>
          , in: J.
          <string-name>
            <surname>Goldstein</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Lavie,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>K.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Andonian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belinkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          Voss (Eds.),
          <source>Proceedings of the ACL D. Bau</source>
          ,
          <article-title>Mass-editing memory in a transformer, 2023</article-title>
          . Workshop on Intrinsic and Extrinsic Evaluation arXiv:
          <volume>2210</volume>
          .07229.
          <article-title>Measures for Machine Translation</article-title>
          and/or Summa-
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kohonen</surname>
          </string-name>
          ,
          <article-title>Correlation matrix memories</article-title>
          , IEEE rization, Association for Computational LinguisTransactions on Computers C-
          <volume>21</volume>
          (
          <year>1972</year>
          )
          <fpage>353</fpage>
          -
          <lpage>359</lpage>
          . tics, Ann Arbor, Michigan,
          <year>2005</year>
          , pp.
          <fpage>65</fpage>
          -
          <lpage>72</lpage>
          . URL: URL: https://api.semanticscholar.org/CorpusID: https://aclanthology.org/W05-0909.
          <fpage>21483100</fpage>
          . [36]
          <string-name>
            <given-names>D.</given-names>
            <surname>Paperno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kruszewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lazaridou</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. Q.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>V.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          , Can sensitive in- Pham,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bernardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pezzelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Baroni</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Boleda, formation be deleted from llms? objectives R. Fernández, The LAMBADA dataset: Word predicfor defending against extraction attacks, 2023. tion requiring a broad discourse context</article-title>
          , in: K. Erk, arXiv:
          <fpage>2309</fpage>
          .17410. N. A.
          <string-name>
            <surname>Smith</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 54th An-</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [29]
          <string-name>
            <surname>T.-Y. Chang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Thomason</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Jia</surname>
          </string-name>
          ,
          <article-title>Do localiza- nual Meeting of the Association for Computational tion methods actually localize memorized data in Linguistics (Volume 1: Long Papers), Association LLMs? a tale of two benchmarks</article-title>
          , in: K. Duh, for Computational Linguistics, Berlin, Germany,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , S. Bethard (Eds.),
          <source>Proceedings of the 2016</source>
          , pp.
          <fpage>1525</fpage>
          -
          <lpage>1534</lpage>
          . URL: https://aclanthology.org/ 2024 Conference of the North American Chapter P16-1144. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P16</fpage>
          -1144. of the Association for Computational Linguistics: [37]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Abbasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <article-title>Human Language Technologies (Volume 1: Long A</article-title>
          . DiPofi,
          <string-name>
            <given-names>C.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Golding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Le Noac'h</surname>
          </string-name>
          , Papers), Association for Computational Linguis
          <string-name>
            <surname>- H. Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>McDonell</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Muennighof</surname>
          </string-name>
          , C. Ociepa, tics, Mexico City, Mexico,
          <year>2024</year>
          , pp.
          <fpage>3190</fpage>
          -
          <lpage>3211</lpage>
          . J.
          <string-name>
            <surname>Phang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Reynolds</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Schoelkopf</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Skowron</surname>
          </string-name>
          , URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>naacl-long</article-title>
          .
          <volume>176</volume>
          /. L.
          <string-name>
            <surname>Sutawika</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Thite</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , doi:10.18653/v1/
          <year>2024</year>
          .
          <article-title>naacl-long.176. A. Zou, A framework for few-shot language model</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghandeharioun</surname>
          </string-name>
          , evaluation,
          <year>2024</year>
          . URL: https://zenodo.org/records/ Does localization inform editing? surprising dif-
          <volume>12608602</volume>
          . doi:
          <volume>10</volume>
          .5281/zenodo.12608602.
          <article-title>ferences in causality-based localization vs. knowledge editing in language models</article-title>
          ,
          <year>2023</year>
          . URL: https: //arxiv.org/abs/2301.04213. arXiv:
          <volume>2301</volume>
          .
          <fpage>04213</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mickus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Paperno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <article-title>How to dissect a Muppet: The structure of transformer embedding spaces</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>981</fpage>
          -
          <lpage>996</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .tacl-
          <volume>1</volume>
          .
          <fpage>57</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>