<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Remember to Forget: A Study on Verbatim Memorization of Literature in Large Language Models ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Xinhao</forename><surname>Zhang</surname></persName>
							<email>zhangxinhao672@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="laboratory" key="lab1">Lattice (UMR 8094</orgName>
								<orgName type="laboratory" key="lab2">ENS-PSL</orgName>
								<orgName type="institution" key="instit1">CNRS</orgName>
								<orgName type="institution" key="instit2">Sorbonne Nouvelle)</orgName>
								<address>
									<addrLine>1 rue Maurice Arnoux</addrLine>
									<postCode>92120</postCode>
									<settlement>Montrouge</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Olga</forename><surname>Seminck</surname></persName>
							<email>olga.seminck@cnrs.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory" key="lab1">Lattice (UMR 8094</orgName>
								<orgName type="laboratory" key="lab2">ENS-PSL</orgName>
								<orgName type="institution" key="instit1">CNRS</orgName>
								<orgName type="institution" key="instit2">Sorbonne Nouvelle)</orgName>
								<address>
									<addrLine>1 rue Maurice Arnoux</addrLine>
									<postCode>92120</postCode>
									<settlement>Montrouge</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pascal</forename><surname>Amsili</surname></persName>
							<email>pascal.amsili@ens.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory" key="lab1">Lattice (UMR 8094</orgName>
								<orgName type="laboratory" key="lab2">ENS-PSL</orgName>
								<orgName type="institution" key="instit1">CNRS</orgName>
								<orgName type="institution" key="instit2">Sorbonne Nouvelle)</orgName>
								<address>
									<addrLine>1 rue Maurice Arnoux</addrLine>
									<postCode>92120</postCode>
									<settlement>Montrouge</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Remember to Forget: A Study on Verbatim Memorization of Literature in Large Language Models ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">F478BF57C4BD7B96B3F77ABCCBC84B99</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>memorization</term>
					<term>Large Language Models</term>
					<term>membership inference attacks</term>
					<term>literature</term>
					<term>cloze task</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We examine the extent to which English and French literature is memorized by freely accessible LLMs, using a name cloze inference task (which focuses on the model's ability to recall proper names from a book). We replicate the key findings of previous research conducted with OpenAI models, concluding that, overall, the degree of memorization is low. Factors that tend to enhance memorization include the absence of copyrights, belonging to the Fantasy or Science Fiction genres, and the work's popularity on the Internet. Delving deeper into the experimental setup using the open source model Olmo and its freely available corpus Dolma, we conducted a study on the evolution of memorization during the LLM's training phase. Our findings suggest that excerpts of a book online can result in some level of memorization, even if the full text is not included in the training corpus. This observation leads us to conclude that the name cloze inference task is insufÏcient to definitively determine whether copyright violations have occurred during the training process of an LLM. Furthermore, we highlight certain limitations of the name cloze inference task, particularly the possibility that a model may recognize a book without memorizing its text verbatim. In a pilot experiment, we propose an alternative method that shows promise for producing more robust results.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The emergence of Large Language Models (LLMs) has advanced the field of Natural Language Processing (NLP) significantly. Successive models have consistently set new records on language understanding benchmarks <ref type="bibr" target="#b36">[36,</ref><ref type="bibr" target="#b35">35,</ref><ref type="bibr" target="#b21">22]</ref>. Notably, LLMs can now tackle a broad range of tasks, allowing a single, general-purpose model to handle many NLP tasks. In the past, this required specialized models for each specific task. This shift has significantly increased the accessibility of NLP techniques, even for those without a specialized background. The ability to interact with LLMs through natural language, particularly via chat interfaces, has partially eliminated the need for programming knowledge.</p><p>These features have made LLMs ubiquitous, enabling their use for a wide range of purposes, including within the field of Digital Humanities, where they offer new perspectives. In addition to their ability to focus on specific tasks by learning from data curated by researchers [e.g. <ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b10">11]</ref>, they also come equipped with pre-built knowledge and can be used even when there are no, or very few specific data at hand: the so-called zero-shot learning framework [e.g. <ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b4">5]</ref>.</p><p>While the knowledge acquired during the training phase enables an LLM to function with few or no additional training data, this pre-training practice also presents several drawbacks and risks. One of the primary issues is that we lack a clear understanding of the specific knowledge these models possess, when of course this knowledge is crucial for accomplishing the tasks we give them.</p><p>The primary reason for this issue is that, for nearly all models, the specific data used for training remain unknown. When models are made available on platforms such as Hugging Face, users can typically access the model weights, but the training corpus itself is often not disclosed.</p><p>The second reason is that the actual learning process of such models is largely unknown, particularly regarding what determines whether certain data are remembered or forgotten. During training, billions of parameters are automatically adjusted within the model's neural network, and once this process is complete, it becomes impossible to interpret the activity of individual neurons. In this regard, these models are often referred to as "black boxes": the processes that generate a model's response to a user's task or question are virtually impossible to interpret. The main way to get an idea of a model's knowledge is to query it systematically and analyze its answers, but it still remains to be seen to what extent this allows us to get a full view of the knowledge. After all, even a slight change in the user's input can lead to significant variations in the results <ref type="bibr" target="#b12">[13]</ref> and some models' outputs are not stable anyway (non-determinism).</p><p>Lacking a clear understanding of LLMs' knowledge presents a significant obstacle to their use in the field of Digital Humanities. We concur with Underwood <ref type="bibr" target="#b33">[33]</ref> that a model's knowledge carries with it a certain world view and, consequently, a view of culture. When querying a model about literature, the texts included in its training corpus play a crucial role, as they fundamentally shape its understanding of the subject <ref type="bibr" target="#b11">[12]</ref>. Questions regarding aesthetics, style, poetics, and so on will yield responses colored by the specific literature the model was trained on. Furthermore, it is essential to assess what a model retains from the books encountered during its training phase.</p><p>These questions are important not only in the context of literary research, but also for copyright compliance. If work covered by copyright is -unfortunately -in the training data, it is important to be able to estimate to what extent it can be reproduced.</p><p>In this paper, we aim to address the extent to which literature is memorized by LLMs and the factors that contribute to this memorization. Additionally, we investigate whether it is possible to determine if work protected by copyright is in the training data of LLMs.</p><p>Our starting point is Chang, Cramer, Soni, and Bamman's study <ref type="bibr" target="#b7">[8]</ref> who used a name cloze task to determine to what extent OpenAI's ChatGPT and GPT4 models are able to reproduce literary works verbatim (word for word). We applied the same method with freely accessible models, for English and French literature. In addition, we conducted a number of supplementary studies to gain a deeper understanding of the memorization process during training as well as the possible influence of the practice of prompting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Memorization in LLMs is generally defined as the verbatim reproduction of the training data <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b2">3]</ref>. The phenomenon is typically associated with overfitting <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b37">37]</ref>. It has been found that the following aspects can have a significant impact on memorization: data repetition in the training corpus, the number of model parameters (more parameters leading to a higher degree of memorization), and the number of tokens of context used to prompt the model <ref type="bibr" target="#b5">[6]</ref>.</p><p>Memorization is undesirable for various reasons. The first -and the most extensively studied by researchers -is that it includes privacy risks: generative models could disclose personal information (e.g. including URLs, phone numbers, and addresses) in their output if it has been memorized verbatim from the training data, making LLMs vulnerable to training data extraction attacks <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b30">30,</ref><ref type="bibr" target="#b2">3]</ref>. In the case of fiction, the privacy risk is less salient, but it is important that LLMs do not reproduce copyrighted material <ref type="bibr" target="#b14">[15]</ref>. Furthermore, there are also risks of the memorization of literature from the public domain: as D'Souza and Mimno <ref type="bibr" target="#b8">[9]</ref> stated: 'LLMs are poised to perpetuate the echoic nature of the literary canon within a new digital context'. That is to say: the view of what is literature and what is not will be more and more influenced by how LLMs perceive it, because the number of applications of these models will only increase in the future, not only in the domain of literary studies, but in the entire culture sector where decisions about what should be commercialized are increasingly data driven <ref type="bibr" target="#b34">[34]</ref>.</p><p>Finally, in the context of literature, there is also the question of whether certain copyrighted works have been used to train LLMs. Memorization provides a lever to answer this question: if the model can be prompted to reproduce specific passages, it is an indication that the work has been used during training. Prompting a model to discover which data were present in the training set is called a membership inference attack <ref type="bibr" target="#b32">[32]</ref>. Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref> used this framework to study the verbatim memorization of literature by the LLMs of Ope-nAI: ChatGPT and GPT4. They found a high degree of memorization for some copyrighted works and an influence of the popularity of a book on the Internet with respect to the degree of memorization (popular books were better memorized), but the effect of memorization on downstream tasks remains equivocal. They expressed their concerns about the biases induced by memorization for studies in the field of cultural analytics where LLMs are used. They proposed the use of open models (with freely accessible training data) as a solution to the use of LLMs in the field of Digital Humanities.</p><p>In the remainder of this paper, we present the name cloze task proposed in <ref type="bibr" target="#b7">[8]</ref>, that we used and adapted for English and French with a variety of freely available models (section 3.1); we report and discuss the results that we obtained in section 3.3, along with several analyses of the behaviour of the models depending on the copyright status, sub-genre, and popularity of the works chosen to probe the models. We also present further studies that we ran to get a better understanding of the learning, memorization and recalling processes. These are presented in sections 3.4 and 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Name cloze task</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Task</head><p>To assess the memorization of literary data by language models, Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref> formulated a membership inference attack task, which they call name cloze inference, where models have to predict a proper name missing from a text passage. Unlike other completion tasks focusing on predicting named entities <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b27">27]</ref>, the text passages used by Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref> contain no other named entities than the target name. Therefore, this type of task tests the models' ability to 'remember' very specific information from the training data. By way of comparison, human performance on this task was assessed at 0% by Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>: the contexts were not informative enough for humans to guess the target names.</p><p>The experiments presented in this section used the protocol of Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>. We used the prompt presented in Figure <ref type="figure" target="#fig_0">1</ref> that displays two examples (that did not vary across items) followed by the target item.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Data</head><p>The items we used for the task were taken from Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref> for the English experiment (3.2.1), and we used a similar method to construct the items for the French experiment (3.2.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">English</head><p>Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref> created an item set by running the BookNLP<ref type="foot" target="#foot_0">1</ref> pipeline <ref type="bibr" target="#b0">[1]</ref> on the literary corpus presented in Table <ref type="table" target="#tab_0">1</ref> to extract passages with a proper name of the type character and no other named entities. They then randomly sampled 100 passages per book. Books with fewer than 100 passages were excluded from the experiment. In total, there were 57,100 items. <ref type="foot" target="#foot_1">2</ref> Two examples are given below:</p><p>(1) a. There is but such a quantity of merit between them; just enough to make one good sort of man; and of late it has been shifting about pretty much. For my part, I am inclined to believe it all [MASK]'s; but you shall do as you choose. Then please give me the output in one word surrounded by &lt;name&gt; and &lt;/name&gt; without any explanation for the following input:'. The examples are identical. We made these decisions based on preliminary tests performed on Mixtral8x7B <ref type="bibr" target="#b19">[20]</ref>. This prompt was used for English and French. After some preliminary testing, we decided not to translate for French, as this seemed to lead to results of lower quality.</p><p>b. I would go and see her if I could have the carriage. " [MASK], feeling really anxious, was determined to go to her, though the carriage was not to be had; and as she was no horsewoman, walking was her only alternative.</p><p>Items from the book Pride and Prejudice</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">French</head><p>The French item set was selected from the Chapitres corpus <ref type="bibr" target="#b22">[23]</ref>, which includes about 3,000 digitized books in French. Thanks to the fr-BookNLP pipeline <ref type="bibr" target="#b26">[26]</ref>, we were able to easily extract passages from books and produce items in the same manner as Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>. Each of the items contains exactly one proper name of a character (named entity of type PERSON) as a single token (see Example (2)).</p><p>(2) a. Le campagnard, à ces mots, lâcha l'étui qu'il tournait entre ses doigts. Une saccade After excluding books with fewer than 100 generated elements, 2,459 books remained. However, limiting the number of books is still necessary in order to avoid an excessive experiment runtime. We selected 575 French books by balancing per genre, as shown in Table <ref type="table">2</ref>. For all books, we also carried out a random selection of 100 items each.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Breakdown by genre of the 575 books that were selected from the Chapitres corpus <ref type="bibr" target="#b22">[23]</ref> to build the French item set. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Genre</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Replication</head><p>In this section, we report on the replication of Chang, Cramer, Soni, and Bamman's name cloze inference task using freely accessible models. The data we used are described in the previous subsection.</p><p>(a) English. The accuracies marked with an asterisk (*) are results reported by Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>.</p><p>(b) French. For CamemBERT or FlauBERT, [0] means that we only counted a hit if the highest ranking answer was the correct proper name. For the other versions, we considered that there was a hit if the correct answer was among the top 5 highest ranking answers. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1.">Replication with open models</head><p>English: We tested MistralAI (Mistral7B, Mistral7B-Instruct and Mixtral8x7B) <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20]</ref>, Olmo7B <ref type="bibr" target="#b13">[14]</ref>, Pythia (7B et 12B) <ref type="bibr" target="#b3">[4]</ref> and Llama2 7B <ref type="bibr" target="#b31">[31]</ref>, in order to compare the performance of all these models. For the ChatGPT, GPT-4 and BERT <ref type="bibr" target="#b9">[10]</ref> models, the scores were taken directly from the data of Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>. The performance of each model on the task is plotted in Figure <ref type="figure" target="#fig_1">2a</ref>. First, we observe that, with an average accuracy of 6.81%, GPT-4 clearly stands out as the best-performing model, followed by ChatGPT (GPT 3.5 turbo) with an average score of 2.51%. The Mistral7x8b, Mistral7B and Mistral7B-Instruct models show scores only just under 1%. The other models (Olmo 7B, BERT, Pythia12B, Pythia7B and Llama2 7B) show lower accuracies, ranging from 0.27% to 0.01%.</p><p>Interestingly, the vast majority of books score (close to) 0%. The outliers are relatively few in number, and it is probably only for these that we can speak of memorization. Intriguingly, for almost all models (except BERT), the text Alice's Adventures in Wonderland obtains the highest scores, probably due to its notoriety and high frequency in the training corpus.</p><p>French: We decided not to test all the models we tested for English. As running these models is time and resource consuming (about one night per model and even a whole week for Mixtral8x7B) on our server with one GPU, we decided to exclude Mixtral8x7B because of its consumption and unexceptional level of memorization and Mistral7B-Instruct, Llama2 and all the versions of Pythia because of very low degrees of memorization. To replace BERT for English, we introduced comparable models specialized for French: CamemBERT <ref type="bibr" target="#b25">[25]</ref> and FlauBERT <ref type="bibr" target="#b21">[22]</ref>. The scores of these models can be found in Figure <ref type="figure" target="#fig_1">2b</ref>.</p><p>Remarkably, for French, the language-specialized model CamemBERT performed by far the best, and in contrast to English where the BERT model was one of the lowest scoring compared to latest generation LLMs, the BERT-architecture models for French performed similarly to Mistral7B and better than Olmo7B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.2.">Analysis of copyright status</head><p>Figure <ref type="figure" target="#fig_3">3a</ref> shows the accuracy of the models according to copyright status. A general trend can be observed: all models scored higher for public works for English and French, even though the difference is smaller for French. This result confirms our hypothesis that the models are mainly trained on public domain books, and replicates the findings from Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.3.">Analysis of the sub-genres of books</head><p>We have already noted that freely accessible LLMs can predict certain elements from books, regardless of their copyright status. Table <ref type="table" target="#tab_2">3</ref> explores this capability by detailing the performances by specific genres of the sub-corpus in English.</p><p>Apart from a significant difference in accuracy scores, the trends observed on the English items are similar to those of Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>. The tested models seem to have the best knowledge of science fiction and fantasy works and public domain texts. However, they are less familiar with Global Anglophone fiction and works from black authors. For French, we observe that CamemBERT, Flaubert and Mistral7B obtain the highest score on children's literature and Olmo7B on historical novels (see Table <ref type="table" target="#tab_3">4</ref>).  On the one hand, it certainly makes sense that the models perform better on public domain texts, due to the regulations on the use of free works. On the other hand, the specificity of the science fiction and fantasy genres seems to facilitate the models' prediction. By closely examining items from the 'Science-Fiction/Fantasy' genre, we found words that are not named entities but that are still very indicative of the book, such as for instance 'Quidditch', 'Witchcraft', or 'Muggles' in items from Harry Potter.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.4.">Analysis of book popularity on the web</head><p>According to Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>, a book's popularity should be defined by its presence in many academic libraries, its frequency in large-scale training datasets (such as Books3, part of The Pile), its citations in non-indexed academic journals, and its appearance on the public web (both in excerpts and full text). In line with Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>, we checked whether there was a relationship between the popularity of a book online and the degree of memorization of models for the English items. We used the number of hits from Bing, Google and the C4 corpus directly from their data and calculated a Spearman's correlation with the accuracy scores of the freely accessible models that we tested.</p><p>Most open language models showed a positive correlation between prediction performance and book popularity on the web (see Table <ref type="table" target="#tab_4">5</ref>). This experiment therefore reinforces the hypothesis that web prevalence is correlated with performance on the name-cloze inference task. However, the models that performed poorly (i.e. those that failed to give the right prediction for most books) do not show a high correlation with any engine/corpus. It is for this reason that we decided not to repeat this experiment for French: as generative LLMs perform poorly on the French dataset, we did not expect high correlations between the accuracy on the French items and the popularity of a work online. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Evolution of memorization during training</head><p>Since a high degree of memorization was found for some books and some models, and since the popularity of a work online is correlated with the performance of the models, it seems natural to wonder whether memorizing a book requires access to the full text, or if it can also take place via excerpts from websites. In this section, we therefore present a new series of experiments, in which we monitored the memorization of books during the pre-training process of an LLM. Inspired by Biderman, Schoelkopf, Anthony, Bradley, O'Brien, Hallahan, Khan, Purohit, Prashanth, Raff, et al. <ref type="bibr" target="#b3">[4]</ref> and Biderman, Prashanth, Sutawika, Schoelkopf, Anthony, Purohit, and Raff <ref type="bibr" target="#b2">[3]</ref>, we studied the emerging pattern of memorization as a function of the  book's popularity online and whether it is in the public domain or under copyright.</p><p>For this experiment, we used the OLMo7B model <ref type="bibr" target="#b13">[14]</ref> as it has been trained on fully public data, the Dolma corpus <ref type="bibr" target="#b29">[29]</ref> and provides numerous checkpoints (states of the models during the pre-training phase).</p><p>It is beyond our computational resources to run experiments for all 571 books on OLMo's more than 500 checkpoints. (As many OLMo models would have to be downloaded as there are checkpoints; i.e. more than 500, and the experiment would therefore take 500 times longer than the initial experiment with this model.) That is why, in our study, we focused on fourteen checkpoints -chosen at regular intervals -and four particularly representative books, selected according to two dimensions, as illustrated in Figure <ref type="figure" target="#fig_4">4</ref>: copyright status (public or private), and their popularity (few hits or many hits). These works are respectively The Mysteries of Udolpho, Pride and Prejudice, The Chosen and The Silmarillion.</p><p>Figure <ref type="figure" target="#fig_5">5</ref> shows the evolution of memorization during the training of OLMo. For the works in the public domain (The Mysteries of Udolpho and Pride and Prejudice) there is a noticeable increase in accuracy towards the end of training, particularly between steps 450,000 and 557,000. It can reasonably be suggested that at this stage of training, the model is seeing the full texts of free works, such as those available in the most reputable projects such as Project Gutenberg. This hypothesis is reinforced by the observation that in the Dolma corpus <ref type="bibr" target="#b29">[29]</ref> corpora representing literature are placed at the end. <ref type="foot" target="#foot_2">4</ref>In contrast, for the copyrighted works, The Chosen and The Silmarillion, their performance evolved continuously and steadily throughout the training period, without showing such a sharp and sudden increase. For example, right from the start of the pre-training phase, from step 50,000 onwards, the OLMo model successfully predicted a masked proper noun in The Silmarillion items. For these works, the accuracy fluctuated slightly but remained relatively stable throughout the training phase, right up to the end, although there were some additional good predictions. This could support the hypothesis that excerpts or quotations from this book are scattered throughout various sub-corpora and distributed throughout the pre-training phase. Furthermore, it is clear that the influence of web popularity, measured by the number of 'hits', also plays an important role in evolution, especially for copyrighted works. This is particularly true for The Silmarillion, whose popularity on the web is associated with more pronounced fluctuations in predictive scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Discussion</head><p>The experiments in this section on the name cloze tasks first show that most models do not feature a high degree of memorization in general. However, for some particular works the degree of memorization can be very high. Despite the fact that average scores for ChatGPT and GPT4.0 were higher, our data show the same distribution as Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>'s, for English and for French. Interestingly, our experiments suggest that the number of parameters is not a determining factor for memorization: heavier models from the same series do not show an enhancement in accuracy on the task (e.g. Pythia13B with respect to Pythia7B and Mixtral8x7B versus Mistral7B). For French, it is noteworthy that the BERTtype models were the highest performing models, in contrast to English. Our hypothesis is that there might be a higher overlap between the pre-training corpus of CamemBERT and FlauBERT and the French items we constructed than there is between the items for English and the pre-training corpus of BERT. We also think that the amount of training data in French, which is smaller than the amount of English training data, must play an important role.</p><p>In our experiments, we also replicated Chang's findings that public domain books were better remembered by LLMs than copyrighted books; we found this for both English and French. We also replicated the relationship between the online popularity of books and scores on the name cloze task, although this relationship was not strong for books for which LLMs showed low levels of memorization anyway. Also, for the English items, we replicated the finding that books from the genre of science fiction and fantasy were better memorized than those from other genres.</p><p>However, during the replication with open models we ran into various problems with the protocol of the name cloze task. In section 3.3.3, we already identified the problem of words that are not named entities, but are very specific to a particular book (e.g. Muggles in Harry Potter). Moreover, during our experiments, we also saw that some items do contain named entities that are not detected by BookNLP (for example, 'Hogwarts' and 'Voldemort' in Harry Potter). Also, style is sometimes very recognizable, for example -to stay with the example of Harry Potter -the way the character Hagrid speaks (see example (3)). This suggests that it is possible that instead of recognizing verbatim a sentence from the training data, a model recognizes a book based on specific vocabulary, unfiltered named entities and style, and guesses the name of the main character. This strategy would lead to a high performance, as we checked for the English items that the main character was the correct answer 29.48% of the time, which is much higher than the performance of any LLM on the name cloze inference task.</p><p>Another concern that we have about the name cloze task is the exclusive focus on proper names. A proper name might not be the most representative morpho-syntactic category for all words. Indeed, Pang, Ye, Wang, Yu, Wong, Shi, and Tu <ref type="bibr" target="#b28">[28]</ref> found in a morpho-syntactic analysis carried out in the context of LLMs that proper nouns are systematically given higher attention weights than common nouns or other word types.</p><p>Finally, we also question whether prompting is the most ideal way to access the memory of LLMs. We wonder if the lower scores we found for open models with respect to Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref>'s findings on OpenAI models can be explained by a better chatmodule of the latter, i.e. : it could be the case that memorization seems lower than it is for open models because memory cannot be accessed conveniently by prompting (the comprehension of instructions might be higher for the OpenAI models).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Further analysis</head><p>These concerns with the name cloze task led us to design two new experiments: the first aims at checking whether the prompting framework is suited to querying open LMMs (section 4.1) and the second proposes an alternative protocol to the name cloze inference task (section 4.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Evaluating the appropriateness of prompting for the name cloze task</head><p>In this section, we present a fine-tuning experiment of the Mistral7B model <ref type="bibr" target="#b18">[19]</ref> to assess whether prompting influences model performance on the name cloze task. The idea is the following: we seek to enhance the task comprehension by fine-tuning the LLMs on English items from books from the public domain. These books are certainly in the training data because they are widely available for example in the Project Gutenberg <ref type="foot" target="#foot_3">5</ref> or on Wikibooks <ref type="foot" target="#foot_4">6</ref> . Our hypothesis is that if books have been memorized, the fine-tuning helps the model to learn how to access the information from its memory.</p><p>An example of an item from the fine-tuning training data is shown below:</p><p>[ { "input": "You want breakfast, [MASK], or piss me off?", "output": "&lt;name&gt;Gard&lt;/name&gt;", "instruction": "You have seen the following passage ..." }, ...]</p><p>Regarding the fine-tuning method, we employed Lora <ref type="bibr" target="#b17">[18]</ref>, a model quantization technique available in the Python library peft <ref type="foot" target="#foot_5">7</ref> . The fine-tuned model has been integrated and is accessible on our Hugging Face account's site <ref type="foot" target="#foot_6">8</ref> , where it is presented with the results of the fine-tuning experiment.</p><p>The evolution of the loss value is shown in Figure <ref type="figure" target="#fig_7">6</ref>. It can be observed that this value decreases significantly only during the initial steps. The average accuracy score of the Mis-tral7B model without fine-tuning is 0.00830, while the fine-tuned version achieves a score of 0.00893, so fine-tuning did not yield substantial gains on the task's performance. We conclude that the fact that open models fail at the name cloze inference task cannot be explained by a misunderstanding of the prompt.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Pilot experiment: study memorization with n-grams</head><p>Memorization of proper names may not be representative for other part-of-speech categories. Therefore, we conducted a pilot experiment to evaluate the use of an alternative method to the name cloze inference task. The idea is very simple: we ask an LLM to complete a passage extracted from a book and count the overlap of the first ten tokens it produces with the real text in the book. For this pilot, we took the four books presented in Figure <ref type="figure" target="#fig_4">4</ref> and used the corresponding items from Chang, Cramer, Soni, and Bamman <ref type="bibr" target="#b7">[8]</ref> in the following manner: first we replaced the [MASK]-token with the proper name, and then we took the first ten tokens to be presented in the prompt and the following 10 tokens as a gold answer. Our prompt is provided in Figure <ref type="figure" target="#fig_8">7</ref>. To compare this method to the name cloze inference task, we decided to test ChatGPT and study the correlation between the scores on the two tasks. The results can be found in Figure <ref type="figure" target="#fig_9">8</ref>.</p><p>As a sanity check, we also established a baseline score for the n-gram method. A young novelist, Jingyi, provided us with an unpublished draft of her next novel, written in Chinese. We translated this text into English using the DeepL translation tool <ref type="foot" target="#foot_7">9</ref> . From the translated manuscript, we selected 100 random excerpts. We submitted this manuscript to the same prediction task. The memorization score was very low: 0.005. In comparison, the lowest scoring novel from Figure <ref type="figure" target="#fig_9">8</ref> obtained a score of 0.038, more than seven times as high.</p><p>The number of books tested in this framework remains low and therefore the performance of the pilot should be interpreted with caution. Still, we want to put forward a first evaluation of the n-gram method as opposed to the name cloze inference. A first observation is that both tasks show a substantial level of correlation (0.77) but that the values of the scores for the ngram task are more fine-grained than those of the name cloze task. Indeed, whereas for the name cloze task we have 100 items per book, for the n-gram task we have 100 x 10 tokens to evaluate which can help to make a better distinction amongst the lower scoring works. The baseline of the unseen manuscript shows that there still is some distinction to make between very low degrees of memorization and no memorization at all. <ref type="foot" target="#foot_8">10</ref> Furthermore, our results suggest that the n-gram method could help against the sensitivity of the name cloze task to recognizing a style, or specific word from a fictional universe and guessing a random character from a work without true memorization of the exact passage. Looking at "The Silmarillion" in Figure <ref type="figure" target="#fig_9">8</ref>, we see that its n-gram score is lower than would be expected by looking at the name cloze inference score. Inspecting Chang, Cramer, Soni, and Bamman's items for this book more closely, we observe that there are important differences in the choice of answers of ChatGPT. For example: 8 items should receive the answer 'Melkor' but ChatGPT never put forward this name, whereas it predicts 'Aragorn' 4 times even though this is never the correct answer. This leads us suspect that the name cloze task is sensitive to the short cut of guessing a character from a book rather than retrieving the correct name from its memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>The memorization of English and French literature is low on average in freely accessible LLMs, while a small number of fictional works seem to undergo an extreme degree of memorization. Memorization is favored by the presence of quotes and excerpts of the books on the Internet, which makes it impossible to say if a high score for memorization means that the full text of the novel was actually used to train an LLM, except if the training corpus has been released, which is only the case for a very small number of LLMs.</p><p>For our research, we used the name cloze inference task, in which an LLM must guess a proper name from a sentence without the presence of any other named entities. Using this method, it occurred to us that it has some undesirable effects that were initially unforeseen. The first is that the method is sensitive to errors. As items are automatically filtered for named entities, not all named entities are removed from the context and could be used by the LLM to guess the name of a character from the book without there being real verbatim memorization. The same can happen because of a recognizable style and typical words (such as in science fiction novels). Given the fact that the memorization score of LLMs is low, this noise cannot be ignored. When testing a very simple alternative method that counts n-gram overlap when the model is prompted to continue a passage from a novel, our pilot experiment showed that this method has the potential to be more robust than the name cloze inference task.</p><p>In future work, we aim to explore not only verbatim memorization, but also memorization of plots and stories. Ultimately, coming back to the introduction in which we argued that LLMs give a biased point of view on culture and literature, we would like to not only measure the spread and memorization of exact texts, but also of ideas and more abstract patterns present in literature.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Availability of Resources and Code</head><p>All the experimental items and programming code for our experiments can be found on the following GitHub page: https://github.com/XINHAO-ZHANG/books-memorization.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Prompt for Name Cloze Inference. The prompt is almost identical to that of Chang, Cramer, Soni, and Bamman [8], the difference is that we added the sentences 'This is the end of the examples.Then please give me the output in one word surrounded by &lt;name&gt; and &lt;/name&gt; without any explanation for the following input:'. The examples are identical. We made these decisions based on preliminary tests performed on Mixtral8x7B<ref type="bibr" target="#b19">[20]</ref>. This prompt was used for English and French. After some preliminary testing, we decided not to translate for French, as this seemed to lead to results of lower quality.</figDesc><graphic coords="5,99.64,89.99,395.99,282.85" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Box-plots of the scores of various models in English and French on the name-cloze inference task.</figDesc><graphic coords="7,89.28,300.26,383.38,125.55" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>(a) Average accuracy of books from the public domain (public) and under copyright (private) for English. (b) Average accuracy of books from the public domain (public) and under copyright (private) for French.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Comparative accuracy of books based on copyright status in English and French.</figDesc><graphic coords="8,89.28,408.32,187.52,170.63" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Four books selected based on two criteria: copyright status and the popularity of the works online as measured by Chang, Cramer, Soni, and Bamman [8].</figDesc><graphic coords="11,183.11,84.17,229.06,298.81" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Evolution of accuracy scores across different checkpoints</figDesc><graphic coords="11,89.28,417.28,432.02,288.02" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head></head><label></label><figDesc>what does he know about it, some o' the best I ever saw were the only ones with magic in 'em in a long line o' Muggles -look at yer mum! Look what she had fer a sister!" "So what is [MASK]?"</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Evolution of training loss during fine-tuning. After a first gain in performance, the model quickly stagnates.</figDesc><graphic coords="15,197.43,84.17,200.42,120.01" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Prompt of the n-gram pilot experiment.</figDesc><graphic coords="16,99.64,89.99,395.99,261.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: The correlation between the scores on the name cloze inference task and the n-gram task for the ChatGPT model on the four selected books from Figure 4.</figDesc><graphic coords="17,110.12,84.17,375.04,187.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Number of books selected by collection and genre for English. NY Times and Publishers Weekly 95 The Black Book Interactive Project &amp; the Black Caucus American Library Association 3 101 Global Anglophone fiction (outside the U.S. and U.K.) 95 Science fiction, fantasy, horror, mystery, crime, romance and spy novels 99 Total 571 de ses épaules fit craquer le dossier de la chaise. Son chapeau tomba.-Je m'en doutais, dit [MASK] en appliquant son doigt sur la veine. b. En passant auprès des portes, la robe d'[MASK], par le bas, s'ériflait au pantalon ;</figDesc><table><row><cell>Genre</cell></row></table><note>leurs jambes entraient l'une dans l'autre ; il baissait ses regards vers elle, elle levait les siens vers lui ; une torpeur la prenait, elle s'arrêta. Items from the book Madame Bovary</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Name cloze average accuracy regarding sub-genres of books in the English experiment. Numbers in bold are the highest scores per column.</figDesc><table><row><cell>Source</cell><cell cols="2">Olmo-7B Mistral7B Inst</cell><cell cols="4">Mixtral7x8B Mistral7B GPT-4* ChatGPT*</cell></row><row><cell>BBIP</cell><cell>0.0016</cell><cell>0.0042</cell><cell>0.0051</cell><cell>0.0039</cell><cell>0.0191</cell><cell>0.0126</cell></row><row><cell>BCALA</cell><cell>0.0008</cell><cell>0.0032</cell><cell>0.0032</cell><cell>0.0016</cell><cell>0.0112</cell><cell>0.0076</cell></row><row><cell>Bestsellers</cell><cell>0.0028</cell><cell>0.0069</cell><cell>0.0061</cell><cell>0.0068</cell><cell>0.0332</cell><cell>0.0160</cell></row><row><cell>Genre Fiction:Action/Spy</cell><cell>0.0015</cell><cell>0.0030</cell><cell>0.0050</cell><cell>0.0045</cell><cell>0.0320</cell><cell>0.0070</cell></row><row><cell>Genre Fiction:Horror</cell><cell>0.0021</cell><cell>0.0032</cell><cell>0.0095</cell><cell>0.0068</cell><cell>0.0542</cell><cell>0.0279</cell></row><row><cell>Genre Fiction:Mystery/Crime</cell><cell>0.0000</cell><cell>0.0070</cell><cell>0.0075</cell><cell>0.0005</cell><cell>0.0290</cell><cell>0.0140</cell></row><row><cell>Genre Fiction:Romance Genre Fiction:SF/Fantasy</cell><cell>0.0025 0.0040</cell><cell>0.0030 0.0215</cell><cell>0.0055 0.0285</cell><cell>0.0045 0.0345</cell><cell>0.0290 0.2350</cell><cell>0.0110 0.1075</cell></row><row><cell>Global</cell><cell>0.0014</cell><cell>0.0029</cell><cell>0.0039</cell><cell>0.0028</cell><cell>0.0204</cell><cell>0.0087</cell></row><row><cell>Pulitzer pre-1923 LitBank</cell><cell>0.0012 0.0076</cell><cell>0.0061 0.0157</cell><cell>0.0052 0.0224</cell><cell>0.0051 0.0221</cell><cell>0.0259 0.2440</cell><cell>0.0113 0.0715</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Name cloze average accuracy regarding sub-genres of books in the French experiment. Numbers in bold are the highest scores per column.</figDesc><table><row><cell>Literary genre</cell><cell cols="2">Olmo-7B Camembert Large[0]</cell><cell>Camembert Large</cell><cell>Flaubert Large</cell><cell>Flaubert Large[0]</cell><cell>Flaubert Base</cell><cell>Mistral7B</cell></row><row><cell>Cycle and series Children's literature Short stories</cell><cell>0.0008 0.0012 0.0012</cell><cell>0.0082 0.0099 0.0086</cell><cell>0.0272 0.0481 0.0296</cell><cell>0.0052 0.0079 0.0059</cell><cell>0.0016 0.0011 0.0014</cell><cell>0.0003 0.0002 0.0005</cell><cell>0.0072 0.0093 0.0086</cell></row><row><cell>Thriller</cell><cell>0.0005</cell><cell>0.0023</cell><cell>0.0136</cell><cell>0.0018</cell><cell>0.0005</cell><cell>0.0000</cell><cell>0.0025</cell></row><row><cell>Adventure novels Historical fiction</cell><cell>0.0011 0.0025</cell><cell>0.0050 0.0085</cell><cell>0.0191 0.0372</cell><cell>0.0041 0.0058</cell><cell>0.0015 0.0017</cell><cell>0.0003 0.0002</cell><cell>0.0057 0.0054</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5</head><label>5</label><figDesc>Spearman's correlation between model accuracy and the online popularity of books from the English data set.</figDesc><table><row><cell>Model accuracy</cell><cell cols="4">Bing Hits Google Hits C4 Hits Pile Hits</cell></row><row><cell>Llama2-7B</cell><cell>0.086</cell><cell>0.107</cell><cell>0.120</cell><cell>0.098</cell></row><row><cell>Pythia7B</cell><cell>0.009</cell><cell>-0.027</cell><cell>0.020</cell><cell>0.019</cell></row><row><cell>Pythia12B</cell><cell>-0.013</cell><cell>0.014</cell><cell>0.027</cell><cell>0.072</cell></row><row><cell>Olmo-7B</cell><cell>0.105</cell><cell>0.084</cell><cell>0.102</cell><cell>0.107</cell></row><row><cell>Mistral7B Instruct</cell><cell>0.245</cell><cell>0.244</cell><cell>0.263</cell><cell>0.182</cell></row><row><cell>Mixtral7x8B</cell><cell>0.313</cell><cell>0.305</cell><cell>0.306</cell><cell>0.233</cell></row><row><cell>Mistral7B</cell><cell>0.276</cell><cell>0.235</cell><cell>0.265</cell><cell>0.209</cell></row><row><cell>GPT-4</cell><cell>0.550</cell><cell>0.537</cell><cell>0.540</cell><cell>0.461</cell></row><row><cell>ChatGPT</cell><cell>0.439</cell><cell>0.410</cell><cell>0.426</cell><cell>0.359</cell></row><row><cell>BERT</cell><cell>0.014</cell><cell>-0.015</cell><cell>0.020</cell><cell>-0.004</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/booknlp/booknlp</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Items generated from these books can be found in a github repository: https://github.com/bamman-group/gpt4-books/tree/main/data/model_output/chatgpt_results</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">Unfortunately, we could not find a map explaining which checkpoint corresponded exactly to which part of Dolma.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://www.gutenberg.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">https://www.wikibooks.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">https://pypi.org/project/peft/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_6">https://huggingface.co/LivevreXH/mistral_finetuned_items_livres/tree/main</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_7">https://www.deepl.com/fr/translator</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_8">Admittedly, the translation of a Chinese novel by DeepL might not be the most representative literature and this experiment should be repeated using an unpublished draft of a native speaker writer.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was funded in part by the French government under management of Agence Nationale de la Recherche as part of the "Investissements d'avenir" program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute, Thierry Poibeau's Chair).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">D</forename><surname>Bamman</surname></persName>
		</author>
		<author>
			<persName><surname>Booknlp</surname></persName>
		</author>
		<ptr target="https://github.com/booknlp/booknlp" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">An Annotated Dataset of Coreference in English Literature</title>
		<author>
			<persName><forename type="first">D</forename><surname>Bamman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Lewke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mansoor</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.lrec-1.6" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twelfth Language Resources and Evaluation Conference</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Mazo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Moreno</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><surname>Piperidis</surname></persName>
		</editor>
		<meeting>the Twelfth Language Resources and Evaluation Conference<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<publisher>European Language Resources Association</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="44" to="54" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Emergent and predictable memorization in large language models</title>
		<author>
			<persName><forename type="first">S</forename><surname>Biderman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Prashanth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sutawika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schoelkopf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Anthony</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Purohit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Raff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Pythia: A suite for analyzing large language models across training and scaling</title>
		<author>
			<persName><forename type="first">S</forename><surname>Biderman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schoelkopf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">G</forename><surname>Anthony</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bradley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>O'brien</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hallahan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Purohit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">S</forename><surname>Prashanth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Raff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning. Pmlr</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="2397" to="2430" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Death of the Dictionary?-The Rise of Zero-Shot Sentiment Classification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Borst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Klähn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Burghardt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Humanities Research Conference</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>CHR 2023</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Quantifying Memorization Across Neural Language Models</title>
		<author>
			<persName><forename type="first">N</forename><surname>Carlini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ippolito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jagielski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Tramer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=TatRHT%5C%5F1cK" />
	</analytic>
	<monogr>
		<title level="m">The Eleventh International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Extracting Training Data from Large Language Models</title>
		<author>
			<persName><forename type="first">N</forename><surname>Carlini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Tramèr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jagielski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Herbert-Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ú</forename><surname>Erlingsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oprea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<ptr target="https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting" />
	</analytic>
	<monogr>
		<title level="m">30th USENIX Security Symposium (USENIX Security 21)</title>
				<imprint>
			<publisher>USENIX Association</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2633" to="2650" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4</title>
		<author>
			<persName><forename type="first">K</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cramer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bamman</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.emnlp-main.453</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Bouamor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Bali</surname></persName>
		</editor>
		<editor>
			<persName><surname>Singa</surname></persName>
		</editor>
		<meeting>the 2023 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="7312" to="7327" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The Chatbot and the Canon: Poetry Memorization in LLMs</title>
		<author>
			<persName><forename type="first">L</forename><surname>Souza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mimno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Humanities Research Conference</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>CHR 2023</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Burstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Doran</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</editor>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">If the Sources Could Talk: Evaluating Large Language Models for Research Assistance in History</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">G</forename><surname>Garcia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Weilbach</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CHR 2023: Computational Humanities Research Conference</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Datasheets for datasets</title>
		<author>
			<persName><forename type="first">T</forename><surname>Gebru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Morgenstern</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Vecchione</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Vaughan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">D</forename><surname>Iii</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Crawford</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">64</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="86" to="92" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Demystifying Prompts in Language Models via Perplexity Estimation</title>
		<author>
			<persName><forename type="first">H</forename><surname>Gonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Iyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Blevins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.findings-emnlp.679</idno>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2023</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Bouamor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Bali</surname></persName>
		</editor>
		<meeting><address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="10136" to="10148" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Olmo: Accelerating the science of language models</title>
		<author>
			<persName><forename type="first">D</forename><surname>Groeneveld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Beltagy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Walsh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhagia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kinney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Tafjord</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Jha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ivison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Magnusson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.00838</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Foundation Models and Fair Use</title>
		<author>
			<persName><forename type="first">P</forename><surname>Henderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hashimoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Lemley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<ptr target="http://jmlr.org/papers/v24/23-0569.html" />
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="1" to="79" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">T5 meets Tybalt: Author Attribution in Early Modern English Drama Using Large Language Models</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Hicke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mimno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Humanities Research Conference</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>CHR 2023</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation</title>
		<author>
			<persName><forename type="first">F</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Reichart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korhonen</surname></persName>
		</author>
		<idno type="DOI">10.1162/COLI\_a\_00237</idno>
		<ptr target="https://aclanthology.org/J15-4004" />
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="665" to="695" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">LoRA: Low-Rank Adaptation of Large Language Models</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2106.09685</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note>cs.CL</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Mistral 7B</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Q</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sablayrolles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mensch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bamford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Chaplot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Casas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bressand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lengyel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Saulnier</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.06825</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Mixtral of experts</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Q</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sablayrolles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mensch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Savary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bamford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Chaplot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Casas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">B</forename><surname>Hanna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bressand</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2401.04088</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Style Transfer of Modern Hebrew Literature Using Text Simplification and Generative Language Modeling</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kaganovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Münz-Manor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ezra-Tsur</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Humanities Research Conference</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>CHR 2023</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">FlauBERT: Unsupervised Language Model Pre-training for French</title>
		<author>
			<persName><forename type="first">H</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Vial</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Frej</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Segonne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Coavoux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Lecouteux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Allauzen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Crabbé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Besacier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schwab</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.lrec-1.302" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twelfth Language Resources and Evaluation Conference</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Mazo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Moreno</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><surname>Piperidis</surname></persName>
		</editor>
		<meeting>the Twelfth Language Resources and Evaluation Conference<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<publisher>European Language Resources Association</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="2479" to="2490" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Corpus Chapitres</title>
		<author>
			<persName><forename type="first">A</forename><surname>Leblond</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.7446728</idno>
		<imprint>
			<biblScope unit="volume">0</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Deduplicating Training Data Makes Language Models Better</title>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ippolito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nystrom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Eck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Callison-Burch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Carlini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 60th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
	<note>Long Papers</note>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">;</forename><forename type="middle">S</forename><surname>Ed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Muresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><surname>Villavicencio</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.acl-long.577</idno>
		<title level="m">Association for Computational Linguistics</title>
				<meeting><address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="8424" to="8445" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">CamemBERT: a Tasty French Language Model</title>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Muller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Ortiz Suárez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dupont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Romary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">É</forename><surname>De La Clergerie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Seddah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.645</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Schluter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tetreault</surname></persName>
		</editor>
		<meeting>the 58th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="7203" to="7219" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature</title>
		<author>
			<persName><forename type="first">F</forename><surname>Mélanie-Becquet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Barré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Seminck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Plancq</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Naguib</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pastor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Poibeau</surname></persName>
		</author>
		<author>
			<persName><surname>Booknlp-Fr</surname></persName>
		</author>
		<idno type="DOI">10.26083/tuprints-00027396</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.26083/tuprints-00027396" />
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">34</biblScope>
			<pubPlace>Darmstadt</pubPlace>
		</imprint>
	</monogr>
	<note type="report_type">Tech. rep</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Who did What: A Large-Scale Person-Centered Cloze Dataset</title>
		<author>
			<persName><forename type="first">T</forename><surname>Onishi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mcallester</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D16-1241</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Su</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Duh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Carreras</surname></persName>
		</editor>
		<meeting>the 2016 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Austin, Texas</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="2230" to="2235" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">F</forename><surname>Wong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tu</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/2401.08350" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research</title>
		<author>
			<persName><forename type="first">L</forename><surname>Soldaini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kinney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhagia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Atkinson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Authur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bogin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chandu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dumas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Elazar</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.00159</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Beyond Memorization: Violating Privacy via Inference with Large Language Models</title>
		<author>
			<persName><forename type="first">R</forename><surname>Staab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Balunovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vechev</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=kmn0BhQk7p" />
	</analytic>
	<monogr>
		<title level="m">The Twelfth International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<title level="m" type="main">Llama 2: Open foundation and fine-tuned chat models</title>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Stone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Albert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Almahairi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Babaei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bashlykov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Batra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bhargava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bhosale</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.09288</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">Towards demystifying membership inference attacks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Truex</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Gursoy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wei</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1807.09173</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Mapping the latent spaces of culture</title>
		<author>
			<persName><forename type="first">T</forename><surname>Underwood</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Essay prepared for a roundtable</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<title level="m" type="main">Where is all the book data? Online essay</title>
		<author>
			<persName><forename type="first">M</forename><surname>Walsh</surname></persName>
		</author>
		<ptr target="https://www.publicbooks.org/where-is-all-the-book-data/" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">SuperGLUE: a stickier benchmark for general-purpose language understanding systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Pruksachatkun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nangia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Bowman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 33rd International Conference on Neural Information Processing Systems</title>
				<meeting>the 33rd International Conference on Neural Information Processing Systems<address><addrLine>Red Hook, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates Inc</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding</title>
		<author>
			<persName><forename type="first">A</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bowman</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W18-5446</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Linzen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Chrupała</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Alishahi</surname></persName>
		</editor>
		<meeting>the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="353" to="355" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Understanding deep learning (still) requires rethinking generalization</title>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Recht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">64</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="107" to="115" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
