<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Remember to Forget: A Study on Verbatim Memorization of Literature in Large Language ⋆ Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xinhao Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Seminck</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Amsili</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lattice (UMR 8094, CNRS, ENS-PSL, Sorbonne Nouvelle)</institution>
          ,
          <addr-line>1 rue Maurice Arnoux, 92120 Montrouge</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>961</fpage>
      <lpage>981</lpage>
      <abstract>
        <p>We examine the extent to which English and French literature is memorized by freely accessible LLMs, using a name cloze inference task (which focuses on the model's ability to recall proper names from a book). We replicate the key findings of previous research conducted with OpenAI models, concluding that, overall, the degree of memorization is low. Factors that tend to enhance memorization include the absence of copyrights, belonging to the Fantasy or Science Fiction genres, and the work's popularity on the Internet. Delving deeper into the experimental setup using the open source model Olmo and its freely available corpus Dolma, we conducted a study on the evolution of memorization during the LLM's training phase. Our findings suggest that excerpts of a book online can result in some level of memorization, even if the full text is not included in the training corpus. This observation leads us to conclude that the name cloze inference task is insufÏcient to definitively determine whether copyright violations have occurred during the training process of an LLM. Furthermore, we highlight certain limitations of the name cloze inference task, particularly the possibility that a model may recognize a book without memorizing its text verbatim. In a pilot experiment, we propose an alternative method that shows promise for producing more robust results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;memorization</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>membership inference attacks</kwd>
        <kwd>literature</kwd>
        <kwd>cloze task</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CHR 2024: Computational Humanities Research Conference, December 4–6, 2024, Aarhus, Denmark
∗Corresponding author.
†</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The emergence of Large Language Models (LLMs) has advanced the field of Natural Language
Processing (NLP) significantly. Successive models have consistently set new records on
language understanding benchmarks 3[
        <xref ref-type="bibr" rid="ref22 ref6">6, 35, 22</xref>
        ]. Notably, LLMs can now tackle a broad range
of tasks, allowing a single, general-purpose model to handle many NLP tasks. In the past, this
required specialized models for each specific task. This shift has significantly increased the
accessibility of NLP techniques, even for those without a specialized background. The ability
to interact with LLMs through natural language, particularly via chat interfaces, has partially
eliminated the need for programming knowledge.
      </p>
      <p>These features have made LLMs ubiquitous, enabling their use for a wide range of purposes,
including within the field of Digital Humanities, where they ofer new perspectives. In addition
to their ability to focus on specific tasks by learning from data curated by researchers [e1.6g,.
11], they also come equipped with pre-built knowledge and can be used even when there are
no, or very few specific data at hand: the so-called zero-shot learning framework [e2.g1., 5].</p>
      <p>While the knowledge acquired during the training phase enables an LLM to function with
few or no additional training data, this pre-training practice also presents several drawbacks
and risks. One of the primary issues is that we lack a clear understanding of the specific
knowledge these models possess, when of course this knowledge is crucial for accomplishing the tasks
we give them.</p>
      <p>The primary reason for this issue is that, for nearly all models, the specific data used for
training remain unknown. When models are made available on platforms such as Hugging
Face, users can typically access the model weights, but the training corpus itself is often not
disclosed.</p>
      <p>
        The second reason is that the actual learning process of such models is largely unknown,
particularly regarding what determines whether certain data are remembered or forgotten. During
training, billions of parameters are automatically adjusted within the model’s neural network,
and once this process is complete, it becomes impossible to interpret the activity of individual
neurons. In this regard, these models are often referred to as ”black boxes”: the processes that
generate a model’s response to a user’s task or question are virtually impossible to interpret.
The main way to get an idea of a model’s knowledge is to query it systematically and analyze
its answers, but it still remains to be seen to what extent this allows us to get a full view of the
knowledge. After all, even a slight change in the user’s input can lead to significant variations
in the results 1[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and some models’ outputs are not stable anyway (non-determinism).
      </p>
      <p>Lacking a clear understanding of LLMs’ knowledge presents a significant obstacle to their
use in the field of Digital Humanities. We concur with Underwood33[] that a model’s
knowledge carries with it a certain world view and, consequently, a view of culture. When querying
a model about literature, the texts included in its training corpus play a crucial role, as they
fundamentally shape its understanding of the subject12[]. Questions regarding aesthetics, style,
poetics, and so on will yield responses colored by the specific literature the model was trained
on. Furthermore, it is essential to assess what a model retains from the books encountered
during its training phase.</p>
      <p>These questions are important not only in the context of literary research, but also for
copyright compliance. If work covered by copyright is —unfortunately — in the training data, it is
important to be able to estimate to what extent it can be reproduced.</p>
      <p>In this paper, we aim to address the extent to which literature is memorized by LLMs and the
factors that contribute to this memorization. Additionally, we investigate whether it is possible
to determine if work protected by copyright is in the training data of LLMs.</p>
      <p>Our starting point is Chang, Cramer, Soni, and Bamman’s study8][who used a name cloze
task to determine to what extent OpenAI’s ChatGPT and GPT4 models are able to reproduce
literary works verbatim (word for word). We applied the same method with freely accessible
models, for English and French literature. In addition, we conducted a number of
supplementary studies to gain a deeper understanding of the memorization process during training as
well as the possible influence of the practice of prompting.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Memorization in LLMs is generally defined as the verbatim reproduction of the training data
[
        <xref ref-type="bibr" rid="ref23 ref3">24, 3</xref>
        ]. The phenomenon is typically associated with overfitting 7[
        <xref ref-type="bibr" rid="ref36">, 37</xref>
        ]. It has been found that
the following aspects can have a significant impact on memorization: data repetition in the
training corpus, the number of model parameters (more parameters leading to a higher degree
of memorization), and the number of tokens of context used to prompt the mod6e]l. [
      </p>
      <p>
        Memorization is undesirable for various reasons. The first — and the most extensively
studied by researchers — is that it includes privacy risks: generative models could disclose personal
information (e.g. including URLs, phone numbers, and addresses) in their output if it has been
memorized verbatim from the training data, making LLMs vulnerabletrtaoining data
extraction attacks [
        <xref ref-type="bibr" rid="ref29 ref3 ref6">6, 30, 3</xref>
        ]. In the case of fiction, the privacy risk is less salient, but it is important
that LLMs do not reproduce copyrighted materia1l5[]. Furthermore, there are also risks of the
memorization of literature from the public domain: as D’Souza and Mim9n]os[tated: ‘LLMs
are poised to perpetuate the echoic nature of the literary canon within a new digital context’. That
is to say: the view of what is literature and what is not will be more and more influenced by
how LLMs perceive it, because the number of applications of these models will only increase
in the future, not only in the domain of literary studies, but in the entire culture sector where
decisions about what should be commercialized are increasingly data driv3e4n]. [
      </p>
      <p>
        Finally, in the context of literature, there is also the question of whether certain copyrighted
works have been used to train LLMs. Memorization provides a lever to answer this question:
if the model can be prompted to reproduce specific passages, it is an indication that the work
has been used during training. Prompting a model to discover which data were present in the
training set is called amembership inference attack [
        <xref ref-type="bibr" rid="ref31">32</xref>
        ]. Chang, Cramer, Soni, and Bamman
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used this framework to study the verbatim memorization of literature by the LLMs of
OpenAI: ChatGPT and GPT4. They found a high degree of memorization for some copyrighted
works and an influence of the popularity of a book on the Internet with respect to the degree
of memorization (popular books were better memorized), but the efect of memorization on
downstream tasks remains equivocal. They expressed their concerns about the biases induced
by memorization for studies in the field of cultural analytics where LLMs are used. They
proposed the use of open models (with freely accessible training data) as a solution to the use of
LLMs in the field of Digital Humanities.
      </p>
      <p>In the remainder of this paper, we present the name cloze task proposed i8n],[that we used
and adapted for English and French with a variety of freely available models (sec3ti.1o)n; we
report and discuss the results that we obtained in sectio3n.3, along with several analyses of the
behaviour of the models depending on the copyright status, sub-genre, and popularity of the
works chosen to probe the models. We also present further studies that we ran to get a better
understanding of the learning, memorization and recalling processes. These are presented in
sections 3.4 and 4.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Name cloze task</title>
      <p>
        3.1. Task
To assess the memorization of literary data by language models, Chang, Cramer, Soni, and
Bamman [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] formulated amembership inference attack task, which they callname cloze inference,
where models have to predict a proper name missing from a text passage. Unlike other
completion tasks focusing on predicting named entities17[
        <xref ref-type="bibr" rid="ref26">, 27</xref>
        ], the text passages used by Chang,
Cramer, Soni, and Bamman 8[] contain no other named entities than the target name.
Therefore, this type of task tests the models’ ability to ‘remember’ very specific information from the
training data. By way of comparison, human performance on this task was assessed at 0% by
Chang, Cramer, Soni, and Bamman 8[]: the contexts were not informative enough for humans
to guess the target names.
      </p>
      <p>
        The experiments presented in this section used the protocol of Chang, Cramer, Soni, and
Bamman [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We used the prompt presented in Figur1e that displays two examples (that did
not vary across items) followed by the target item.
3.2. Data
3.2.1. English
The items we used for the task were taken from Chang, Cramer, Soni, and Bamman8][for the
English experiment 3(.2.1), and we used a similar method to construct the items for the French
experiment (3.2.2).
      </p>
      <p>
        Chang, Cramer, Soni, and Bamman 8[] created an item set by running the BookNL1Ppipeline
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] on the literary corpus presented in Tab1leto extract passages with a proper name of the
type character and no other named entities. They then randomly sampled 100 passages per
book. Books with fewer than 100 passages were excluded from the experiment. In total, there
were 57,100 items.2 Two examples are given below:
(1)
a.
      </p>
      <p>There is but such a quantity of merit between them; just enough to make one good
sort of man; and of late it has been shifting about pretty much. For my part, I am
inclined to believe it all [MASK]’s; but you shall do as you choose.
1https://github.com/booknlp/booknlp
2Items generated from these books can be found in a github repository:
https://github.com/bamman-group/gpt4-books/tree/main/data/model_output/chatgpt_results
b.</p>
      <p>I would go and see her if I could have the carriage.” [MASK], feeling really anxious,
was determined to go to her, though the carriage was not to be had; and as she was
no horsewoman, walking was her only alternative.</p>
      <p>
        Items from the book Pride and Prejudice
3.2.2. French
The French item set was selected from the Chapitres corpu2s3[], which includes about 3,000
digitized books in French. Thanks to the fr-BookNLP pipeline26[], we were able to easily
extract passages from books and produce items in the same manner as Chang, Cramer, Soni,
and Bamman [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Each of the items contains exactly one proper name of a character (named
entity of type PERSON) as a single token (see Example(2)).
(2)
a.
      </p>
      <p>Le campagnard, à ces mots, lâcha l’étui qu’il tournait entre ses doigts. Une saccade</p>
      <p>de ses épaules fit craquer le dossier de la chaise. Son chapeau tomba.– Je m’en
doutais, dit [MASK] en appliquant son doigt sur la veine.</p>
      <p>En passant auprès des portes, la robe d’[MASK], par le bas, s’ériflait au pantalon ;
leurs jambes entraient l’une dans l’autre ; il baissait ses regards vers elle, elle levait
les siens vers lui ; une torpeur la prenait, elle s’arrIêtetma.s from the book Madame Bovary
After excluding books with fewer than 100 generated elements, 2,459 books remained.
However, limiting the number of books is still necessary in order to avoid an excessive experiment
runtime. We selected 575 French books by balancing per genre, as shown in Tab2l.eFor all
books, we also carried out a random selection of 100 items each.</p>
      <sec id="sec-4-1">
        <title>3.3. Replication</title>
        <p>In this section, we report on the replication of Chang, Cramer, Soni, and Bamman’s name cloze
inference task using freely accessible models. The data we used are described in the previous
subsection.
(a) English. The accuracies marked with an asterisk (*) are results reported by Chang, Cramer, Soni, and</p>
        <p>
          Bamman [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
(b) French. For CamemBERT or FlauBERT, [0] means that we only countedhait if the highest ranking
answer was the correct proper name. For the other versions, we considered that there was a hit if
the correct answer was among the top 5 highest ranking answers.
        </p>
        <sec id="sec-4-1-1">
          <title>3.3.1. Replication with open models</title>
          <p>
            English: We tested MistralAI (Mistral7B, Mistral7B-Instruct and Mixtral8x7B19), [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ],
Olmo7B [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], Pythia (7B et 12B) [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] and Llama2 7B [31], in order to compare the performance
of all these models. For the ChatGPT, GPT-4 and BERT10[] models, the scores were taken
directly from the data of Chang, Cramer, Soni, and Bamman8][. The performance of each model
on the task is plotted in Figure2a.
          </p>
          <p>First, we observe that, with an average accuracy of 6.81%, GPT-4 clearly stands out as the
best-performing model, followed by ChatGPT (GPT 3.5 turbo) with an average score of 2.51%.
The Mistral7x8b, Mistral7B and Mistral7B-Instruct models show scores only just under 1%. The
other models (Olmo 7B, BERT, Pythia12B, Pythia7B and Llama2 7B) show lower accuracies,
ranging from 0.27% to 0.01%.</p>
          <p>Interestingly, the vast majority of books score (close to) 0%. The outliers are relatively few in
number, and it is probably only for these that we can speak of memorization. Intriguingly, for
almost all models (except BERT), the texAtlice’s Adventures in Wonderland obtains the highest
scores, probably due to its notoriety and high frequency in the training corpus.</p>
          <p>
            French: We decided not to test all the models we tested for English. As running these
models is time and resource consuming (about one night per model and even a whole week
for Mixtral8x7B) on our server with one GPU, we decided to exclude Mixtral8x7B because
of its consumption and unexceptional level of memorization and Mistral7B-Instruct, Llama2
and all the versions of Pythia because of very low degrees of memorization. To replace BERT
for English, we introduced comparable models specialized for French: CamemBE2R5T] a[nd
FlauBERT [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. The scores of these models can be found in Figur2eb.
          </p>
          <p>Remarkably, for French, the language-specialized model CamemBERT performed by far the
best, and in contrast to English where the BERT model was one of the lowest scoring compared
to latest generation LLMs, the BERT-architecture models for French performed similarly to
Mistral7B and better than Olmo7B.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>3.3.2. Analysis of copyright status</title>
          <p>(a) Average accuracy of books from the public
domain (public) and under copyright
(private) for English.
(b) Average accuracy of books from the public
domain (public) and under copyright
(private) for French.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>3.3.3. Analysis of the sub-genres of books</title>
          <p>We have already noted that freely accessible LLMs can predict certain elements from books,
regardless of their copyright status. Tabl3eexplores this capability by detailing the performances
by specific genres of the sub-corpus in English.</p>
          <p>Apart from a significant diference in accuracy scores, the trends observed on the English
items are similar to those of Chang, Cramer, Soni, and Bamma8n].[ The tested models seem to
have the best knowledge of science fiction and fantasy works and public domain texts.
However, they are less familiar with Global Anglophone fiction and works from black authors. For
French, we observe that CamemBERT, Flaubert and Mistral7B obtain the highest score on
children’s literature and Olmo7B on historical novels (see Ta4b).le
Name cloze average accuracy regarding sub-genres of books in the French experiment. Numbers in</p>
          <p>On the one hand, it certainly makes sense that the models perform better on public domain
texts, due to the regulations on the use of free works. On the other hand, the specificity of the
science fiction and fantasy genres seems to facilitate the models’ prediction. By closely
examining items from the‘Science-Fiction/Fantasy’ genre, we found words that are not named entities
but that are still very indicative of the book, such as for instance ‘Quidditch’, ‘Witchcraft’, or
‘Muggles’ in items fromHarry Potter.</p>
          <p>Camembert</p>
          <p>Large[0]</p>
          <p>Camembert</p>
          <p>Large</p>
          <p>Flaubert</p>
          <p>Large</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>3.3.4. Analysis of book popularity on the web</title>
          <p>
            According to Chang, Cramer, Soni, and Bamman8[], a book’s popularity should be defined by
its presence in many academic libraries, its frequency in large-scale training datasets (such as
Books3, part of The Pile), its citations in non-indexed academic journals, and its appearance on
the public web (both in excerpts and full text). In line with Chang, Cramer, Soni, and Bamman
[
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], we checked whether there was a relationship between the popularity of a book online and
the degree of memorization of models for the English items. We used the number of hits from
Bing, Google and the C4 corpus directly from their data and calculated a Spearman’s correlation
with the accuracy scores of the freely accessible models that we tested.
          </p>
          <p>Most open language models showed a positive correlation between prediction performance
and book popularity on the web (see Tabl5e). This experiment therefore reinforces the
hypothesis that web prevalence is correlated with performance on the name-cloze inference task.
However, the models that performed poorly (i.e. those that failed to give the right prediction
for most books) do not show a high correlation with any engine/corpus. It is for this reason
that we decided not to repeat this experiment for French: as generative LLMs perform poorly
on the French dataset, we did not expect high correlations between the accuracy on the French
items and the popularity of a work online.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.4. Evolution of memorization during training</title>
        <p>
          Since a high degree of memorization was found for some books and some models, and since
the popularity of a work online is correlated with the performance of the models, it seems
natural to wonder whether memorizing a book requires access to the full text, or if it can also
take place via excerpts from websites. In this section, we therefore present a new series of
experiments, in which we monitored the memorization of books during the pre-training
process of an LLM. Inspired by Biderman, Schoelkopf, Anthony, Bradley, O’Brien, Hallahan, Khan,
Purohit, Prashanth, Raf, et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and Biderman, Prashanth, Sutawika, Schoelkopf, Anthony,
Purohit, and Raf [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], we studied the emerging pattern of memorization as a function of the
book’s popularity online and whether it is in the public domain or under copyright.
        </p>
        <p>
          For this experiment, we used the OLMo7B model14[] as it has been trained on fully public
data, the Dolma corpus2[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and provides numerous checkpoints (states of the models during
the pre-training phase).
        </p>
        <p>It is beyond our computational resources to run experiments for all 571 books on OLMo’s
more than 500 checkpoints. (As many OLMo models would have to be downloaded as there
are checkpoints; i.e. more than 500, and the experiment would therefore take 500 times longer
than the initial experiment with this model.) That is why, in our study, we focused on fourteen
checkpoints — chosen at regular intervals — and four particularly representative books, selected
according to two dimensions, as illustrated in Figu4r:ecopyright status (public or private), and
their popularity (few hits or many hits). These works are respectiveTlhye Mysteries of Udolpho,
Pride and Prejudice, The Chosen and The Silmarillion.</p>
        <p>Figure5 shows the evolution of memorization during the training of OLMo. For the works
in the public domainT(he Mysteries of Udolpho and Pride and Prejudice) there is a noticeable
increase in accuracy towards the end of training, particularly between steps 450,000 and 557,000.
It can reasonably be suggested that at this stage of training, the model is seeing the full texts
of free works, such as those available in the most reputable projects such as Project
Gutenberg. This hypothesis is reinforced by the observation that in the Dolma corpu29s][corpora
representing literature are placed at the en4d.</p>
        <p>In contrast, for the copyrighted worksT,he Chosen and The Silmarillion, their performance
evolved continuously and steadily throughout the training period, without showing such a
sharp and sudden increase. For example, right from the start of the pre-training phase, from
step 50,000 onwards, the OLMo model successfully predicted a masked proper nounThine
Silmarillion items. For these works, the accuracy fluctuated slightly but remained relatively
stable throughout the training phase, right up to the end, although there were some additional
good predictions. This could support the hypothesis that excerpts or quotations from this
book are scattered throughout various sub-corpora and distributed throughout the pre-training
phase. Furthermore, it is clear that the influence of web popularity, measured by the number
of ‘hits’, also plays an important role in evolution, especially for copyrighted works. This is
particularly true foTrhe Silmarillion, whose popularity on the web is associated with more
pronounced fluctuations in predictive scores.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.5. Discussion</title>
        <p>
          The experiments in this section on the name cloze tasks first show that most models do not
feature a high degree of memorization in general. However, for some particular works the
degree of memorization can be very high. Despite the fact that average scores for ChatGPT
and GPT4.0 were higher, our data show the same distribution as Chang, Cramer, Soni, and
Bamman [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]’s, for English and for French. Interestingly, our experiments suggest that the
number of parameters is not a determining factor for memorization: heavier models from the
same series do not show an enhancement in accuracy on the task (e.g. Pythia13B with respect
to Pythia7B and Mixtral8x7B versus Mistral7B). For French, it is noteworthy that the
BERTtype models were the highest performing models, in contrast to English. Our hypothesis is
4Unfortunately, we could not find a map explaining which checkpoint corresponded exactly to which part of Dolma.
that there might be a higher overlap between the pre-training corpus of CamemBERT and
FlauBERT and the French items we constructed than there is between the items for English
and the pre-training corpus of BERT. We also think that the amount of training data in French,
which is smaller than the amount of English training data, must play an important role.
        </p>
        <p>In our experiments, we also replicated Chang’s findings that public domain books were better
remembered by LLMs than copyrighted books; we found this for both English and French. We
also replicated the relationship between the online popularity of books and scores on the name
cloze task, although this relationship was not strong for books for which LLMs showed low
levels of memorization anyway. Also, for the English items, we replicated the finding that
books from the genre of science fiction and fantasy were better memorized than those from
other genres.</p>
        <p>However, during the replication with open models we ran into various problems with the
protocol of the name cloze task. In sectio3n.3.3, we already identified the problem of words
that are not named entities, but are very specific to a particular booke.(g. Muggles in Harry
Potter). Moreover, during our experiments, we also saw that some items do contain named
entities that are not detected by BookNLP (for example, ‘Hogwarts’ and ‘VoldemortH’ianrry
Potter). Also, style is sometimes very recognizable, for example — to stay with the example of
Harry Potter — the way the character Hagrid speaks (see example(3)).
(3)
“Anyway, what does he know about it, some o’ the best I ever saw were the only ones
with magic in ’em in a long line o’ Muggles — look at yer mum! Look what she had fer
a sister!” “So what is [MASK]?”
This suggests that it is possible that instead of recognizing verbatim a sentence from the
training data, a model recognizes a book based on specific vocabulary, unfiltered named entities
and style, and guesses the name of the main character. This strategy would lead to a high
performance, as we checked for the English items that the main character was the correct answer
29.48% of the time, which is much higher than the performance of any LLM on the name cloze
inference task.</p>
        <p>Another concern that we have about the name cloze task is the exclusive focus on proper
names. A proper name might not be the most representative morpho-syntactic category for
all words. Indeed, Pang, Ye, Wang, Yu, Wong, Shi, and Tu28[] found in a morpho-syntactic
analysis carried out in the context of LLMs that proper nouns are systematically given higher
attention weights than common nouns or other word types.</p>
        <p>Finally, we also question whether prompting is the most ideal way to access the memory
of LLMs. We wonder if the lower scores we found for open models with respect to Chang,
Cramer, Soni, and Bamman 8[]’s findings on OpenAI models can be explained by a better
chatmodule of the latter, i.e. : it could be the case that memorization seems lower than it is for open
models because memory cannot be accessed conveniently by prompting (the comprehension
of instructions might be higher for the OpenAI models).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Further analysis</title>
      <p>These concerns with the name cloze task led us to design two new experiments: the first aims
at checking whether the prompting framework is suited to querying open LMMs (secti4o.n1)
and the second proposes an alternative protocol to the name cloze inference task (sect4i.o2n).</p>
      <sec id="sec-5-1">
        <title>4.1. Evaluating the appropriateness of prompting for the name cloze task</title>
        <p>
          In this section, we present a fine-tuning experiment of the Mistral7B model1[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to assess
whether prompting influences model performance on the name cloze task. The idea is the
following: we seek to enhance the task comprehension by fine-tuning the LLMs on English
items from books from the public domain. These books are certainly in the training data
because they are widely available for example in the Project Gutenb5erogr on Wikibook6s. Our
hypothesis is that if books have been memorized, the fine-tuning helps the model to learn how
to access the information from its memory.
        </p>
        <p>An example of an item from the fine-tuning training data is shown below:
[
{
"input": "You want breakfast, [MASK], or piss me off?",
"output": "&lt;name&gt;Gard&lt;/name&gt;",
"instruction": "You have seen the following passage ..."
},
...]</p>
        <p>
          Regarding the fine-tuning method, we employed Lora 1[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a model quantization technique
available in the Python librarpyeft 7. The fine-tuned model has been integrated and is accessible
on our Hugging Face account’s sit8e, where it is presented with the results of the fine-tuning
experiment.
        </p>
        <p>The evolution of the loss value is shown in Figur6e. It can be observed that this value
decreases significantly only during the initial steps. The averagaeccuracy score of the
Mistral7B model without fine-tuning is 0.00830, while the fine-tuned version achieves a score of
0.00893, so fine-tuning did not yield substantial gains on the task’s performance. We conclude
that the fact that open models fail at the name cloze inference task cannot be explained by a
misunderstanding of the prompt.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Pilot experiment: study memorization with n-grams</title>
        <p>Memorization of proper names may not be representative for other part-of-speech categories.
Therefore, we conducted a pilot experiment to evaluate the use of an alternative method to
the name cloze inference task. The idea is very simple: we ask an LLM to complete a passage
5https://www.gutenberg.org
6https://www.wikibooks.org
7https://pypi.org/project/peft/
8https://huggingface.co/LivevreXH/mistral_finetuned_items_livres/tree/main
extracted from a book and count the overlap of the first ten tokens it produces with the real
text in the book. For this pilot, we took the four books presented in Figu4raend used the
corresponding items from Chang, Cramer, Soni, and Bamma8n][in the following manner: first
we replaced the [MASK]-token with the proper name, and then we took the first ten tokens
to be presented in the prompt and the following 10 tokens as a gold answer. Our prompt is
provided in Figure7. To compare this method to the name cloze inference task, we decided to
test ChatGPT and study the correlation between the scores on the two tasks. The results can
be found in Figure8.</p>
        <p>As a sanity check, we also established a baseline score for the n-gram method. A young
novelist, Jingyi, provided us with an unpublished draft of her next novel, written in Chinese.
We translated this text into English using the DeepL translation to9.olFrom the translated
manuscript, we selected 100 random excerpts. We submitted this manuscript to the same
prediction task. The memorization score was very low: 0.005. In comparison, the lowest scoring
novel from Figure8 obtained a score of 0.038, more than seven times as high.</p>
        <p>The number of books tested in this framework remains low and therefore the performance
of the pilot should be interpreted with caution. Still, we want to put forward a first evaluation
of the n-gram method as opposed to the name cloze inference. A first observation is that both
tasks show a substantial level of correlation (0.77) but that the values of the scores for the
ngram task are more fine-grained than those of the name cloze task. Indeed, whereas for the
name cloze task we have 100 items per book, for the n-gram task we have 100 x 10 tokens to
evaluate which can help to make a better distinction amongst the lower scoring works. The
baseline of the unseen manuscript shows that there still is some distinction to make between
very low degrees of memorization and no memorization at 1a0llF.urthermore, our results
suggest that the n-gram method could help against the sensitivity of the name cloze task to
recognizing a style, or specific word from a fictional universe and guessing a random character
from a work without true memorization of the exact passage. Looking at ”The Silmarillion” in
Figure8, we see that its n-gram score is lower than would be expected by looking at the name
cloze inference score. Inspecting Chang, Cramer, Soni, and Bamman’s items for this book more
9https://www.deepl.com/fr/translator
10Admittedly, the translation of a Chinese novel by DeepL might not be the most representative literature and this
experiment should be repeated using an unpublished draft of a native speaker writer.
closely, we observe that there are important diferences in the choice of answers of ChatGPT.
For example: 8 items should receive the answer ‘Melkor’ but ChatGPT never put forward this
name, whereas it predicts ‘Aragorn’ 4 times even though this is never the correct answer. This
leads us suspect that the name cloze task is sensitive to the short cut of guessing a character
from a book rather than retrieving the correct name from its memory.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>The memorization of English and French literature is low on average in freely accessible LLMs,
while a small number of fictional works seem to undergo an extreme degree of memorization.
Memorization is favored by the presence of quotes and excerpts of the books on the Internet,
which makes it impossible to say if a high score for memorization means that the full text of
the novel was actually used to train an LLM, except if the training corpus has been released,
which is only the case for a very small number of LLMs.</p>
      <p>For our research, we used the name cloze inference task, in which an LLM must guess a
proper name from a sentence without the presence of any other named entities. Using this
method, it occurred to us that it has some undesirable efects that were initially unforeseen.
The first is that the method is sensitive to errors. As items are automatically filtered for named
entities, not all named entities are removed from the context and could be used by the LLM to
guess the name of a character from the book without there being real verbatim memorization.
The same can happen because of a recognizable style and typical words (such as in science
ifction novels). Given the fact that the memorization score of LLMs is low, this noise cannot
be ignored. When testing a very simple alternative method that counts n-gram overlap when
the model is prompted to continue a passage from a novel, our pilot experiment showed that
this method has the potential to be more robust than the name cloze inference task.</p>
      <p>In future work, we aim to explore not only verbatim memorization, but also memorization
of plots and stories. Ultimately, coming back to the introduction in which we argued that LLMs
give a biased point of view on culture and literature, we would like to not only measure the
spread and memorization of exact texts, but also of ideas and more abstract patterns present in
literature.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Availability of Resources and Code</title>
      <p>All the experimental items and programming code for our experiments can be found on the
following GitHub pageh:ttps://github.com/XINHAO-ZHANG/books-memorizatio.n</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was funded in part by the French government under management of Agence
Nationale de la Recherche as part of the ”Investissements d’avenir” program, reference
ANR-19P3IA-0001 (PRAIRIE 3IA Institute, Thierry Poibeau’s Chair).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          .
          <source>BookNLP</source>
          .
          <year>2021</year>
          . url: https://github.com/booknlp/booknl.p
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lewke</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Mansoor. “</surname>
          </string-name>
          <article-title>An Annotated Dataset of Coreference in English Literature”</article-title>
          .
          <source>In:Proceedings of the Twelfth Language Resources and Evaluation Conference .</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Piperidis</surname>
          </string-name>
          . Marseille, France: European Language Resources Association,
          <year>2020</year>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>54</lpage>
          . urlh:ttps://aclant hology.
          <source>org/2020.lrec-1..6</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Prashanth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sutawika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schoelkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Anthony</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Purohit</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Raf.</surname>
          </string-name>
          “
          <article-title>Emergent and predictable memorization in large language models”</article-title>
          .
          <source>IAnd:vances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schoelkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. G.</given-names>
            <surname>Anthony</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bradley</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. O'Brien</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hallahan</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Purohit</surname>
            ,
            <given-names>U. S.</given-names>
          </string-name>
          <string-name>
            <surname>Prashanth</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Raf</surname>
          </string-name>
          , et al. “
          <article-title>Pythia: A suite for analyzing large language models across training and scaling”</article-title>
          .
          <source>IInn:ternational Conference on Machine Learning. Pmlr</source>
          .
          <year>2023</year>
          , pp.
          <fpage>2397</fpage>
          -
          <lpage>2430</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Borst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klähn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Burghardt</surname>
          </string-name>
          . “
          <article-title>Death of the Dictionary?-The Rise of Zero-Shot Sentiment Classification”</article-title>
          .
          <source>In: CHR 2023: Computational Humanities Research Conference</source>
          .
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ippolito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jagielski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tramer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          . “
          <article-title>Quantifying Memorization Across Neural Language Models”</article-title>
          .
          <source>ITnh: e Eleventh International Conference on Learning Representations</source>
          .
          <year>2023</year>
          . url: https://openreview.net/forum?id=
          <source>TatRHT%5C%5 F1cK.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tramèr</surname>
          </string-name>
          , E. Wallace,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jagielski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          , D. Song, Ú. Erlingsson,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oprea</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          . “
          <article-title>Extracting Training Data from Large Language Models”</article-title>
          .
          <source>In:30th USENIX Security Symposium (USENIX Security 21)</source>
          .
          <source>USENIX Association</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2633</fpage>
          -
          <lpage>2650</lpage>
          . url: https : / / www . usenix . org / conference /usenixsecurity21/presentation/carlini-extracti.ng
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          . “Speak,
          <string-name>
            <surname>Memory:</surname>
          </string-name>
          <article-title>An Archaeology of Books Known to ChatGPT/GPT-4”</article-title>
          .
          <source>In:Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</source>
          . Ed. by
          <string-name>
            <given-names>H.</given-names>
            <surname>Bouamor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Bali</surname>
          </string-name>
          . Singapore: Association for Computational Linguistics,
          <year>2023</year>
          , pp.
          <fpage>7312</fpage>
          -
          <lpage>7327</lpage>
          .
          <year>do1i0</year>
          : .
          <volume>18653</volume>
          /v1 /
          <year>2023</year>
          .emnlp-main.
          <volume>453</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>L. D'Souza</surname>
            and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mimno</surname>
          </string-name>
          . “
          <article-title>The Chatbot and the Canon: Poetry Memorization in LLMs”</article-title>
          .
          <source>In: CHR 2023: Computational Humanities Research Conference</source>
          .
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          . “BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding”. IPnr:oceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). Ed. by
          <string-name>
            <given-names>J.</given-names>
            <surname>Burstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Doran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Solorio</surname>
          </string-name>
          . Minneapolis, Minnesota: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Garcia</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Weilbach</surname>
          </string-name>
          . “
          <article-title>If the Sources Could Talk: Evaluating Large Language Models for Research Assistance in History”</article-title>
          .
          <source>InC:HR 2023: Computational Humanities Research Conference</source>
          .
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gebru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Morgenstern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vecchione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Vaughan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Iii</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Crawford</surname>
          </string-name>
          . “
          <article-title>Datasheets for datasets”</article-title>
          .
          <source>InC:ommunications of the ACM 64.12</source>
          (
          <year>2021</year>
          ), pp.
          <fpage>86</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Iyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Blevins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Smith</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          . “
          <article-title>Demystifying Prompts in Language Models via Perplexity Estimation”</article-title>
          .
          <article-title>InF:indings of the Association for Computational Linguistics: EMNLP 2023</article-title>
          . Ed. by
          <string-name>
            <given-names>H.</given-names>
            <surname>Bouamor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Bali</surname>
          </string-name>
          . Singapore: Association for Computational Linguistics,
          <year>2023</year>
          , pp.
          <fpage>10136</fpage>
          -
          <lpage>10148</lpage>
          .
          <year>do1i</year>
          :
          <fpage>0</fpage>
          .18653/v1/202 3.findings-emnlp.
          <volume>679</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Groeneveld</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Walsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhagia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tafjord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ivison</surname>
          </string-name>
          , I. Magnusson,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , et al. “
          <article-title>Olmo: Accelerating the science of language models”</article-title>
          .
          <source>In: arXiv preprint arXiv:2402.00838</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hashimoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Lemley</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          . “
          <article-title>Foundation Models and Fair Use”</article-title>
          .
          <source>In:Journal of Machine Learning Research 24.400</source>
          (
          <year>2023</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>79</lpage>
          . url: http://jmlr.org/papers/v24/
          <fpage>23</fpage>
          -
          <lpage>0569</lpage>
          .htm. l
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Hicke</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Mimno</surname>
          </string-name>
          . “
          <article-title>T5 meets Tybalt: Author Attribution in Early Modern English Drama Using Large Language Models”</article-title>
          .
          <source>InC:HR 2023: Computational Humanities Research Conference</source>
          .
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reichart</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Korhonen</surname>
          </string-name>
          . “SimLex-999:
          <article-title>Evaluating Semantic Models With (Genuine) Similarity Estimation”</article-title>
          .
          <source>InC:omputational Linguistics 41.4</source>
          (
          <issue>2015</issue>
          ), pp.
          <fpage>665</fpage>
          -
          <lpage>695</lpage>
          . doi:
          <volume>10</volume>
          .1162/COLI\_a\_
          <volume>00237</volume>
          . url: https://aclanthology.org/J15-400.
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and W. ChenL.oRA: Low-Rank
          <source>Adaptation of Large Language Models</source>
          .
          <year>2021</year>
          . arXiv:
          <volume>2106</volume>
          .09685 [cs.CL].
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bamford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Chaplot</surname>
          </string-name>
          , D. d. l. Casas,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bressand</surname>
          </string-name>
          , G. Lengyel,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Saulnier</surname>
          </string-name>
          , et al. “
          <article-title>Mistral 7B”</article-title>
          .
          <source>Ianr:Xiv preprint arXiv:2310.06825</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Savary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bamford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Chaplot</surname>
          </string-name>
          , D. d. l. Casas,
          <string-name>
            <given-names>E. B.</given-names>
            <surname>Hanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bressand</surname>
          </string-name>
          , et al. “
          <article-title>Mixtral of experts”</article-title>
          .
          <source>Inar:Xiv preprint arXiv:2401.04088</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kaganovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Münz-Manor</surname>
          </string-name>
          , and E. Ezra-Tsur.
          <article-title>“Style Transfer of Modern Hebrew Literature Using Text Simplification and Generative Language Modeling”</article-title>
          .
          <source>InC:HR 2023: Computational Humanities Research Conference</source>
          .
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>H.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vial</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Frej</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Segonne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Coavoux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lecouteux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Allauzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Crabbé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Besacier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Schwab</surname>
          </string-name>
          . “
          <article-title>FlauBERT: Unsupervised Language Model Pre-training for French”</article-title>
          .
          <source>In:Proceedings of the Twelfth Language Resources</source>
          and Evaluation Conference . Ed. by
          <string-name>
            <given-names>N.</given-names>
            <surname>Calzolari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Béchet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Blache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Isahara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mazo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Odijk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Piperidis</surname>
          </string-name>
          . Marseille, France: European Language Resources Association,
          <year>2020</year>
          , pp.
          <fpage>2479</fpage>
          -
          <lpage>2490</lpage>
          . urlh:ttps://aclanthol ogy.
          <source>org/2020.lrec-1</source>
          .
          <fpage>30</fpage>
          .2
          <string-name>
            <given-names>A.</given-names>
            <surname>Leblond</surname>
          </string-name>
          .
          <source>Corpus Chapitres. Version v1.0.0</source>
          .
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.7446728.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ippolito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nystrom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          . “
          <article-title>Deduplicating Training Data Makes Language Models Better”</article-title>
          .
          <article-title>IPnr:oceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          . Ed. by
          <string-name>
            <given-names>S.</given-names>
            <surname>Muresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Villavicencio</surname>
          </string-name>
          . Dublin, Ireland: Association for Computational Linguistics,
          <year>2022</year>
          , pp.
          <fpage>8424</fpage>
          -
          <lpage>8445</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>577</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J. Ortiz</given-names>
            <surname>Suárez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dupont</surname>
          </string-name>
          , L. Romary, É. de la Clergerie,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seddah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Sagot</surname>
          </string-name>
          . “
          <article-title>CamemBERT: a Tasty French Language Model”</article-title>
          .
          <article-title>InP: roceedings of the 58th Annual Meeting of the Association for Computational Linguistics</article-title>
          . Ed. by
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schluter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Tetreault</surname>
          </string-name>
          . Online: Association for Computational Linguistics,
          <year>2020</year>
          , pp.
          <fpage>7203</fpage>
          -
          <lpage>7219</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>645</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mélanie-Becquet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Seminck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Plancq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naguib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pastor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Poibeau</surname>
          </string-name>
          .
          <article-title>BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature</article-title>
          .
          <source>Tech. rep. 1</source>
          .
          <string-name>
            <surname>Darmstadt</surname>
          </string-name>
          ,
          <year>2024</year>
          , 34 Seiten. doi:https://doi.org/10.26083 /tuprints-00027396.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T.</given-names>
            <surname>Onishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>McAllester</surname>
          </string-name>
          . “
          <article-title>Who did What: A LargeScale Person-Centered Cloze Dataset”</article-title>
          .
          <source>InP:roceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          . Ed. by
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Duh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Carreras</surname>
          </string-name>
          . Austin, Texas: Association for Computational Linguistics,
          <year>2016</year>
          , pp.
          <fpage>2230</fpage>
          -
          <lpage>2235</lpage>
          .
          <year>do1i</year>
          :
          <fpage>0</fpage>
          .18653/v 1/
          <fpage>D16</fpage>
          -1241.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tu</surname>
          </string-name>
          .
          <source>Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models</source>
          .
          <year>2024</year>
          . url: http: //arxiv.org/abs/2401.0835 0.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Soldaini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhagia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Atkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Authur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bogin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chandu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Elazar</surname>
          </string-name>
          , et al. “
          <article-title>Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research”</article-title>
          .
          <source>Inar:Xiv preprint arXiv:2402.00159</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>R.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Balunovic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vechev</surname>
          </string-name>
          . “Beyond Memorization:
          <article-title>Violating Privacy via Inference with Large Language Models”</article-title>
          .
          <source>InT:he Twelfth International Conference on Learning Representations</source>
          .
          <year>2024</year>
          . url: https://openreview.net/forum?id=kmn0BhQk7.p [31]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Albert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Almahairi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Babaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bashlykov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Batra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhargava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhosale</surname>
          </string-name>
          , et al. “
          <article-title>Llama 2: Open foundation and fine-tuned chat models”.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>In: arXiv preprint arXiv:2307.09288</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>S.</given-names>
            <surname>Truex</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Gursoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Wei</surname>
          </string-name>
          . “
          <article-title>Towards demystifying membership inference attacks”</article-title>
          . In:arXiv preprint arXiv:
          <year>1807</year>
          .
          <volume>09173</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          .
          <article-title>Mapping the latent spaces of culture. Essay prepared for a roundtable</article-title>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>M.</given-names>
            <surname>Walsh</surname>
          </string-name>
          .
          <article-title>Where is all the book data? Online essay</article-title>
          .
          <year>2022</year>
          . url:https://www.publicbooks .org/where-is
          <article-title>-all-the-book-dat.a/ A</article-title>
          . Wang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pruksachatkun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nangia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Levy</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. R.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Bowman. “</surname>
          </string-name>
          <article-title>SuperGLUE: a stickier benchmark for general-purpose language understanding systems”</article-title>
          .
          <source>In: Proceedings of the 33rd International Conference on Neural Information Processing Systems</source>
          . Red Hook,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA: Curran Associates Inc.,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Levy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Bowman</surname>
          </string-name>
          . “
          <article-title>GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding”</article-title>
          .
          <source>PIrno:ceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</source>
          . Ed. by
          <string-name>
            <given-names>T.</given-names>
            <surname>Linzen</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Chrupała, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Alishahi</surname>
          </string-name>
          . Brussels, Belgium: Association for Computational Linguistics,
          <year>2018</year>
          , pp.
          <fpage>353</fpage>
          -
          <lpage>355</lpage>
          .
          <year>doi1</year>
          :
          <fpage>0</fpage>
          .18653/v1/
          <fpage>W18</fpage>
          -5446.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Bengio,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Recht</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          . “
          <article-title>Understanding deep learning (still) requires rethinking generalization”</article-title>
          .
          <source>ICno:mmunications of the ACM 64.3</source>
          (
          <issue>2021</issue>
          ), pp.
          <fpage>107</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>