<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Early Modern Book Catalogues and Multilingualism: Identifying Multilingual Texts and Translations using Titles</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Yann</forename><surname>Ryan</surname></persName>
							<email>yann.ryan@kuleuven.be</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Arts</orgName>
								<orgName type="institution">KU Leuven</orgName>
								<address>
									<addrLine>Blijde-Inkomststraat 21</addrLine>
									<postCode>3000</postCode>
									<settlement>Leuven</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Margherita</forename><surname>Fantoli</surname></persName>
							<email>margherita.fantoli@kuleuven.be</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Arts</orgName>
								<orgName type="institution">KU Leuven</orgName>
								<address>
									<addrLine>Blijde-Inkomststraat 21</addrLine>
									<postCode>3000</postCode>
									<settlement>Leuven</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Early Modern Book Catalogues and Multilingualism: Identifying Multilingual Texts and Translations using Titles</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">0E5E6167E8CEB10EF678404C9C73B971</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>multilingualism</term>
					<term>metadata</term>
					<term>transformer models</term>
					<term>few-shot classification</term>
					<term>library catalogues</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>With this paper we aim to assess whether Early Modern book titles can be exploited to track two aspects of multilingualism in book publishing: publications featuring multiple languages and the distinction between editions of works in their original language and in translation. To this scope we leverage the manually annotated language information available in two book catalogs: the Collectio Academica Antiqua, recording publications of scholars of the Old University of Leuven (1425-1797) and a subset of the Eighteenth Century Collections Online, namely publications of Ancient Greek and Latin works. We evaluate three different approaches: we train a simple tf-idf based support vector classifier, we fine-tune a multilingual transformer model (BERT) and we use a few-shot approach with a pre-trained sentence transformer model. In order to get a better understanding of the results, we make use of SHAP, a library for explaining the output of any machine Learning model. We conclude that while the few-shot prediction is not currently usable for this task, the tf-idf approach and BERT fine-tuning are comparable and both usable. BERT shows better results for the task of identifying translations and when generalizing across different datasets.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Metadata catalogues, particularly library catalogues, are increasingly valuable for reconstructing the cultural and intellectual life of the past <ref type="bibr" target="#b32">[33,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b29">30,</ref><ref type="bibr" target="#b26">27,</ref><ref type="bibr" target="#b18">19]</ref>. These catalogues provide insights into both cultural artefacts and the actors behind the publishing industry, often spanning vast temporal and spatial ranges. Widely implemented metadata schemes such as MARC21 1  and Dublin Core 2 facilitate large-scale mining of these resources. The manual creation of catalogues, relying on experts familiar with the epoch and place covered, as well as cataloguing best practices, ensures their reliability as data sources.</p><p>In this paper, we aim at investigating whether machine learning and Large Language Models can support the labelling of Early Modern book records in relation to language. Specifically, we explore the use of titles to identify multilingual publications and distinguish between works published in their original language and those translated. The full titles recorded in several catalogues of Early Modern books are highly informative regarding the linguistic form of the book's content: they may mention the translator, the language in which the text is printed, and the language from which the text is translated. A typical is example is provided by the title 'A poetical translation of the works of Horace: with the original text, and critical notes collected from his best Latin and French commentators. By the Rev. d Mr. Philip Francis. In four volumes. '. This paper aims to answer three research questions:</p><p>• RQ1: Do the titles recorded in catalogues of Early Modern books contain sufÏcient information to predict if they were multilingual or monolingual, and printed in the original language or translated? • RQ2: Which approach yields the best results: a simple tf-idf classifier, training a Large Language Model, or adopting a few-shot approach? • RQ3: Given the heterogeneity of Early Modern publications, can models trained on one dataset yield satisfactory results on others? Does the diversification of training data improve the results on the datasets analyzed?</p><p>The work is structured as follows: in Section 2, we discuss the importance of multilingualism for Early Modern studies and the current possibilities for automatic language information extraction. Section 3 introduces the two datasets used in this experiment. <ref type="foot" target="#foot_0">3</ref> In Section 4, we describe the tasks (Section 4.1) and models (Section 4.2) employed. Finally, Sections 5 and 6 present the results and discuss the potential of this approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related work</head><p>Early Modern Europe was marked by multilingualism. As Latin's dominance as the lingua franca waned, vernacular languages began to emerge in scientific and literary production. This shift influenced various practices in the printed press, drawing interest from linguistics, book history, literary studies, and translation studies <ref type="bibr" target="#b1">[2]</ref>. A key focus is the reception of classical texts. During Humanism and the Renaissance, Ancient Greek and Latin gained prominence, and, on the one hand, reading original works became central to humanistic education <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b16">17]</ref>. On the other hand, this interest led to significant translation efforts, impacting the cultural landscape <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b19">20]</ref>.</p><p>This study examines two datasets reflecting aspects of Early Modern multilingualism: the diverse linguistic environment of the Low Countries and the evolving practice of printing classical authors in England. The Low Countries, a multilingual hub due to their political situation <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b37">38]</ref>, saw significant scholarly activity around the Old University of Leuven, captured by the catalog Collection Academica Antiqua (CAA). The CAA features several Ancient Greek and Latin authors, reflecting the high value placed on classics in the Low Countries' learned society, as exemplified by the curriculum of the Collegium Trilingue <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b5">6]</ref>. In England, we focus on the printing of Classics in the eighteenth century. The influence of Ancient Greek and Latin on Grammar School curricula and the role of translations in circulating classics have been welldocumented <ref type="bibr" target="#b38">[39,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b40">41]</ref>. This resulted in multilingual publications recorded in catalogs such as the English Short Title Catalog (ESTC) and Eighteenth Century Collections Online (ECCO), the latter used in this study. More details are provided in Section 3.</p><p>Our work utilizes long titles of Early Modern books to annotate their linguistic characteristics. Book titles have been leveraged for metadata enrichment and large-scale analysis in several studies: from the decline of the average length of modern British novel titles <ref type="bibr" target="#b24">[25]</ref>, to genre classification <ref type="bibr" target="#b25">[26]</ref>, <ref type="foot" target="#foot_1">4</ref> and topic modeling (two examples based on art catalogs are <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b4">5]</ref>). Recent experiments have leveraged language and multimodal models to semantically enrich metadata sets <ref type="bibr" target="#b39">[40,</ref><ref type="bibr" target="#b0">1,</ref><ref type="bibr" target="#b23">24,</ref><ref type="bibr" target="#b30">31]</ref>. In this paper, we assess whether titles can be used to track multilingualism phenomena in a catalogue (i.e., to enrich metadata with specific language information). As noted by Hatzel, Stiemer, Biemann, and Gius <ref type="bibr" target="#b11">[12]</ref>, traditional, feature-based machine learning approaches are still widely applied in the Humanities. Hence, we compare a tf-idf-based classifier with the performance of Large Language Models (LLMs) <ref type="bibr" target="#b36">[37]</ref> (here, BERT <ref type="bibr" target="#b7">[8]</ref>), particularly trained for multilingual sentence classification. Transformer-based LLMs are increasingly used for annotation and to enrich metadata or analyse historical text collections, for example to predict the year of publication from text <ref type="bibr" target="#b41">[42]</ref>, or to investigate genre within books <ref type="bibr" target="#b27">[28]</ref>. The availability of multilingual and historical text models, through easy-to-use APIs such as HuggingFace, means that the potential for such models to enhance research or augment our bibliographic understanding of large collections has greatly increased in recent years. Given the high resource cost of fine-tuning LLMs, we also test a few-shot approach for the same task, where only a few examples are used to tune the model (see Section 4.2).</p><p>We aim to achieve two objectives: label a work as multilingual or monolingual and identify whether it is printed in the original language or translated. These tasks, while related to language identification <ref type="bibr" target="#b15">[16]</ref>, are tailored to Early Modern book history: a title may be monolingual but indicate a multilingual work, and identifying the title's language alone is insufÏcient to determine if it is a translation or an original edition. The presence of multiple languages in metadata sets has already been recognized as a major challenge in metadata processing <ref type="bibr" target="#b22">[23]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data</head><p>The present study relies on two datasets: the CAA<ref type="foot" target="#foot_2">5</ref> from KU Leuven, and a version of Eighteenth Century Collections Online (ECCO) <ref type="foot" target="#foot_3">6</ref> manually enriched by a group of students. The CAA is curated by the Special Collections of KU Leuven Libraries and comprises books related to the Old University of Leuven (1425-1797), mostly of scholars that, at a certain point of their career, were afÏliated to this university. The CAA version used for this study (exported on 28 July 2023) comprises 3660 holdings, each of them described in MARC XML records. ECCO is a digital database assembled by Gale and stores the (OCRed) full text of a collection of 184,536 titles published in the eighteenth century. Within this collection, we identified the set of classical publications, as those authored by Ancient Greek or Latin authors living before the sixth cen-language pair # CAA language pair #ECCO  <ref type="foot" target="#foot_4">7</ref> The total number of classical editions amounts to 5237 rows. We refer to this dataset as ECCO-classics. These two datasets are chosen because of their meticulous language annotation, their partial chronological overlap, the shared presence of classics (several classical works were printed in Early Modern Flanders, and feature in the CAA), <ref type="foot" target="#foot_5">8</ref> but also clear differences in terms of languages included and cultural and geographical background: these characteristics make them useful sets for comparing the capacities of generalization of the different approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Linguistic annotation</head><p>Both datasets have been manually annotated with respect to language. The MARC21 metadata schema includes a specific code for language annotation (041), further specified by several subfields, two of which are used in the CAA: 'a' indicating the language of the record, and 'h', indicating the original language. Hence, multilingual works are those including several 'a' codes, regardless of the presence of a 'h' code. Monolingual works include only one 'a' code. Within the monolingual works, some also include an 'h' code, which is noted when the original is different from the language of the edition. We speak of monolingual edition if no 'h' code is recorded, and monolingual translation if it is recorded (and is consequently different from the 'a' code). In fact, monolingual translations are usually works translated into a single target language and published without the original text. We include only monolingual works for identifying translations, because for multilingual works it is hard to single out the function of the different target languages and be sure that one of them is used for translation. An example of multilingual work in the CAA is represented by 'Les dialogvesde Iean Loys Vives, traduits de Latin en François pour l'exercice des deux langues .../Les dialogues de Jean Loys Vives', which is labeled as French and Latin. Table <ref type="table" target="#tab_0">1</ref> lists the most frequently attested language combinations for multilingual works in the CAA.</p><p>The 'Histoire de Notre-Dame de Hale,par Juste Lipse ... Traduit du latin, &amp; augmentée de plusieurs merveilles, venues en lumière depuis la mort de l'auteur' is the title of a work labeled as monolingual translation. Table <ref type="table" target="#tab_1">2</ref> shows the most frequent pairs of original and target languages in the CAA. As both Table <ref type="table" target="#tab_1">1 and 2</ref>  The same schema was used to label the books in ECCO-classics, and the most frequently attested language-combinations are shown in Table <ref type="table" target="#tab_1">1 and 2</ref>. An example of multilingual work is for instance 'Phaedri Augusti liberti Fabularum aesopiarum libri quinque. Or, a correct latin edition of the Fables of Phaedrus: with a new literal English translation, and a copious parsingindex; Whereby young Beginners may easily and speedily attain the Knowledge of the Latin Tongue. By a gentleman of the University of Cambridge. For the Use of Schools', while an example of monolingual translation is given by 'The iliad of Homer. Translated by Alexander Pope, Esq. '.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Methodology</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Tasks</head><p>As mentioned above, we aim at classifying the titles following two criteria, namely whether the edition is monolingual or multilingual (multilingual task henceforth), and whether, in case it is monolingual, it contains a work in its original language or in translation (monolingual translation task henceforth). We work with four combinations of the datasets, as listed in Table <ref type="table" target="#tab_2">3</ref>: the CAA, ECCO-classics, balanced CAA, <ref type="foot" target="#foot_6">9</ref> and ECCO and CAA combined. The datasets were split in 80-20 for training and test.</p><p>Multilingual and translated works are proportionately more frequent in the ECCO-classics dataset, because printing multilingual editions (i.e. the original text + a commentary or a translation in a modern language) was common practice for the circulation of classical works. When testing the different models, we evaluate the option of training on each dataset separately and testing on each dataset separately, or training with the union of the two and testing on the datasets separately and combining them. In this way, we want to assess both the capacity of the separate models to generalize, and whether more increasing and diversifying the training data improved the final results (RQ3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Models and approaches</head><p>In order to answer RQ 2, we have tested three different approaches: (1) a simple tf-idf model with Linear Support Vector classification <ref type="bibr" target="#b34">[35]</ref> (ML henceforth), (2) fine-tuning a Large Language Model (BERT henceforth), and (3) taking a few-shot approach to fine-tune a sentence transformer model (SetFit henceforth). For the ML task, we performed minimal preprocessing of the titles (they were made lowercase, and punctuation was stripped), and created a common vocabulary comprising CAA and ECCO titles. We performed hyperparameter optimization for each model trained, on the hyperparameters ngram range (all combinations of monograms, bigrams and trigrams), the norm used for penalizing the model and avoiding overfitting ('l1', 'l2', 'elasticnet', None) and whether to weight the classes to limit the impact of very frequent classes ('weighted', None).</p><p>For the BERT approach, we fine-tuned the base model bert-base-multilingual-cased <ref type="bibr" target="#b6">[7]</ref>, using the HuggingFace API and packages. We used the model hyperparameters set out in the HuggingFace documentation for fine-tuning BERT for text classification <ref type="bibr" target="#b31">[32]</ref>, and for this paper, we have not performed hyperparameter optimization on them.</p><p>For the few-shot experiment the aim was to provide a small number of examples which were as representative as possible with respect to each task. Separate sets were made for the multilingual and translation tasks. For the multilingual task, the final training set contains 5 examples from each of the languages or language pairs, and an equal number of monolingual and multilingual titles, from both the ECCO and CAA datasets, resulting in about 80 examples in the train set. The train set for the translation task was constructed in a similar way but with an even number of original language and translated works. These were then evaluated using the same test sets as above.</p><p>To perform the few-shot classification, the SetFit library was used. SetFit fine-tunes a pretrained SentenceTransformers model <ref type="bibr" target="#b28">[29]</ref> using a contrastive training approach. Sentence-Transformers is a form of Transformer-based Large Language Model which can be trained to generate embedding representations at the sentence, paragraph, or document level (rather than at the word-level as a regular LLM). These embeddings are then generally used for tasks such as semantic textual similarity or semantic search. SetFit is a framework for few-shot fine-tuning SentenceTransformers models. Setfit has shown to have performance comparable to a LLMbased approach on tasks such as text classification, but with far fewer data and training time <ref type="bibr" target="#b35">[36]</ref>. We used the pre-trained SentenceTransformers model distiluse-base-multilingual-cased-v2 and the hyperparameters from the examples set out in the introductory guide <ref type="bibr" target="#b33">[34]</ref>. We then fine-tuned the SentenceTransformers model using a small number of examples.</p><p>For each set of results we recorded the accuracy, as well as the precision, recall and f1 scores separately for each class. We include tables comparing the results of the two main tasks, plus the full tables as an appendix. Moreover, we used the SHAP (SHapley Additive exPlanations) library <ref type="bibr" target="#b20">[21]</ref> to understand the features most relevant in the classification by the model. SHAP is based on Shapely values, a game-theory approach to explanations which aims to calculate the contribution of each feature in an instance of a prediction. We used the SHAP library to produce plots which highlight tokens and spans of text based on their contribution to the prediction (Figure <ref type="figure" target="#fig_0">1</ref>). These plots can then be interpreted qualitatively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Quantitative results</head><p>Below are shown some of the most relevant results, for the full set see the Appendix. Table <ref type="table" target="#tab_3">4</ref> summarises the performance of the models trained on the 'combined' dataset and tested on both the individual and combined datasets. We report on the class-wise f-scores because the classes are very unevenly distributed, particularly for the CAA, and so the accuracy score is not a good indication of performance. Tables <ref type="table" target="#tab_4">5 and 6</ref> give direct comparisons between the models on the multilingual and translation tasks, listing a difference simply by subtracting the score of the BERT model from the ML model (negative numbers mean the BERT model performed worse). Tables 7 to 10 in the Appendix provide the details of precision, recall and f1 for the ML and Bert models, on each task, for each class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">RQ1: Titles can be exploited for tracking multilingualism</head><p>As can be seen from Table <ref type="table" target="#tab_3">4</ref>, both BERT and the ML method gave quite comparable results across both tasks and all datasets. The SetFit method performed noticeably worse in most cases, except when tested on the combined CAA and ECCO dataset. Overall, results can be considered satisfactory which leads to the conclusion that titles can be used to this scope (RQ1), however the task requires an extended set of labeled training data to be provided.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">RQ2: Comparison of the approaches</head><p>Tables <ref type="table" target="#tab_4">5 and 6</ref> give direct comparisons between the models, listing a difference simply by subtracting the score of the BERT model from the ML model (negative numbers mean the BERT model performed worse). These show that generally, the tf-idf approach performed significantly better on the task to distinguish multilingual from monolingual works in many cases (with the exception of the set trained on the CAA and tested on ECCO). For the BERT model, in particular, Table <ref type="table">7</ref> (in Appendix A.1) shows that the identification of the 0 class (i.e. multilingual works) is particularly problematic: recall values tend to be rather low -which indicates that the models tends to generally predict 'monolingual' for most titles.</p><p>For the translation task, there is slightly more variation between results of the approaches. The ML model has very low recall of the 1 class (translated work) when trained on the CAA and tested on ECCO, meaning almost all true positives (translations) are missed. This is a significant drawback since it is, for multilingualism studies, the class of interest. The BERT model performs reasonably well except again struggling with the recall of translated works when trained on the CAA and tested on another dataset. Most notable was the ability to identify ECCO translated documents using the model trained only on the CAA, both the full test dataset and the smaller 'balanced' set, as well as the other way around. For this task, BERT was able to generalize much better than the ML method when testing on a different dataset than the one on which it was trained.</p><p>The performance of the setfit method (see the Appendix, Table <ref type="table" target="#tab_0">11</ref>) had a comparable pattern to the BERT models. It similarly had low recall and precision for the 0 class (multilingual works), but performed well with most tests on the translated works task, with just 40 examples of each class, across multiple languages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">RQ3: Specificity/generality of the training</head><p>In general, the ML and Bert models, when trained on examples from across datasets, are able to perform reasonably well -meaning that a training set made from a combined dataset of ECCO and the CAA gives satisfactory results. Both the ML method and the BERT fine-tuned model give very similar results.</p><p>Both models perform very well at identifying monolingual/multilingual works when trained and tested on ECCO. Models trained and tested on ECCO fared better in general, while still underperforming when applied to the CAA test dataset.</p><p>The results from models trained on one dataset and tested on the other are much worse. In particular, models trained on the CAA and tested on ECCO perform very badly at both recall and precision of the multilingual class. Again, there is little difference between the ML and BERT models, though the BERT model performs marginally better. The 'CAA balanced' model, trained on a sample of the CAA containing an equal number of monolingual/multilingual titles, balanced across the various target languages, did not perform significantly better than the CAA model, though it was marginally better and much quicker to train. However, the very small number of records might represent a limitation.</p><p>Since for the Setfit method we used a mix of examples coming from both datasets, RQ3 does not apply to this model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5.">Qualitative results</head><p>To understand qualitatively what parts of the text caused the classification, we use SHAP explanations, and looked at a range of true positive, true negative, false positive and false negative predictions. Here, we focus on the BERT model trained on the CAA and tested on both CAA and ECCO for the prediction of multilingual texts (a particularly 'difÏcult' combination).</p><p>When the models wrongly label a title as monolingual when it is multilingual, in general, these phenomena seem to occur:</p><p>• There is no trace of multilingualism in the title (e.g. the Latin title 'Specimen doctrine traditae ab anno MDCXCI.usque ad annum MDCXCVI. inclusive. ' doesn't contain any mention of parts in a different language). • Most of these titles, despite containing hints of multilingualism, are fully in Latin. The wrong prediction might be due to the fact that the CAA contains a lot of Latin monolingual titles, and hence Latin context is considered monolingual despite possible multilingual records. Figure <ref type="figure" target="#fig_1">2</ref>    <ref type="table">6</ref>: Comparative results for the translation task, for bert-base-multilingual-cased approach and tf-idf/SVM. Number reported is the BERT result subtracted from the tf-idf result. Numbers under zero mean that the BERT approach performed worse. Acc, r, p, and f1 denote accuracy, recall, precision, and f-score respectively. translated bit ('cum latina interpretatione') being entirely assigned to monolingual (blue) by the model.</p><p>Another recurrent trend in both false and true prediction is the role of Greek: the word 'Greek' (or Gracae, in Graecam linguam) is always used as a predictor of multilingualism, even when the work is monolingual (either in the original language or in translation). Figure <ref type="figure">4  and 3</ref> show an example of two monolingual works whose titles contain the word 'Greek'. In both cases, the word Greek heavily impacts the 'multilingual' component, despite the fact that the output is different for the two predictions. This might be due to the fact that in the CAA Ancient Greek texts usually come with translations/notes in a modern language. Text in the Greek alphabet also seems to be used to make identifications of multilingual texts. This raises  the issue of the dependency of the models on these specific dataset features. Furthermore, the model in some cases uses the text which we would read as making it likely to be multilingual as an output pointing to monolingual. For example things like 'original subjoined' or 'notes at the end', 'on the opposite page'... One example of this can be seen in Figure <ref type="figure" target="#fig_3">5</ref>. This is because these phrases are not found in the CAA titles for multilingual works. The 'combined' model doesn't have this bias, in this case, words relating to notes or annotations contribute to a positive prediction of a work as multilingual, as one might expect.</p><p>Words like 'translated', or 'lexicon' across languages increase the output of the model in identifying multilingual works, which is close to what we would expect. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion of relevance and possible uses</head><p>Overall, these experiments suggest it is a difÏcult problem to solve using machine learning methods. In particular, the approaches do not seem to generalise well, even using multilingual LLMs which we hoped might mean that different styles of title would be recognised if they were in some way semantically similar. This is perhaps because the way that multilingual and translated works are signified in a title is varied and changes over time and across languages. Despite these reservations, when trained on examples across both datasets, the performance of both traditional machine learning and LLM methods was at a level which we deem usable in real-world applications.</p><p>The multilingual fine-tuned BERT has some advantages over traditional ML approaches in identifying translated works but performs worse when distinguishing multilingual works. This seems to be because the signifiers for translated works are more descriptive and straightforward (e.g. 'translated from' or 'made English by'). The multilingual approach means that these kinds of phrases tend be be picked up by the model in different languages.</p><p>The few-shot method using SetFit shows some promise in a number of tasks, but does not, from our experiments, seem to be a 'silver bullet' for low-resource metadata enrichment of this kind. However, perhaps with a very well thought-out and diverse set of examples, it may be possible to build a model which can be trained and used for inferences on real-world data. An ideal real-world scenario for metadata enrichment may involve collecting a small number of examples from a specific dataset or collection, fine-tuning a bespoke but small model, and applying it only to that collection. However, as of yet, from our experiments, it does not seem that the multilingual capabilities of SetFit or SentenceTransformers are enough to get highquality results on this task without at least some annotation of the target dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusions</head><p>Automatically enriched metadata has significant value to heritage collections catalogue data, potentially helping to increase the accuracy and findability of records. If the purpose is to get enriched metadata, our experiments show some promise and could potentially be operationalised in the future. In fact, traditional ML methods may be enough in many cases, partic-ularly for identifying multilingual works, and have big advantages in terms of ease of use and use of resources. In some cases, methods such as keyword search or regular expressions might also provide acceptable results, though when using multilingual datasets, machine learning methods should have an advantage.</p><p>Furthermore, we suggest that certain evaluation metrics are more important than others, particularly with library catalogue data, which is likely to be very unevenly distributed with regards to language and classes. This is of course dependant on the particular task and usecase. If the purpose is to improve catalogue metadata for example, the recall of the multilingual or translated classes may be particularly important, as it may be better to find additional false positives which can then be checked manually afterwards, rather than aiming for precision but missing some relevant works. If the information is not necessarily intended to be 'fed back' to a catalogue but used for bibliographic data science at scale, it may be more important to focus on the overall f-scores to get a broad, albeit imperfect, accuracy. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2. Multilingual/Monolingual Task: TFIDF/SVM</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Example of a text plot from the python SHAP library. In this case, parts of the text contributing to the identification of the title as a translation are highlighted in red.</figDesc><graphic coords="10,89.28,384.64,416.72,89.04" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Example of a text plot from the python SHAP library. In this case, parts of the text contributing to the identification of the title as multilingual are highlighted in red. The title was labeled as monolingual while being multilingual.</figDesc><graphic coords="11,89.28,84.17,416.72,66.12" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :Figure 4 :</head><label>34</label><figDesc>Figure 3: Example of a text plot from the python SHAP library. In this case, parts of the text contributing to the identification of the title as multilingual are highlighted in red. The title was labeled as multilingual while being monolingual. The word Greek heavily contributes to the multilingual prediction</figDesc><graphic coords="11,89.28,361.38,416.72,60.49" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Example of SHAP plot showing a work from ECCO predicted as monolingual by the CAAtrained model. Parts of the text which we would intuitively see make it likely to be multilingual are in fact in this cases contributing to the prediction of the instance as monolingual.</figDesc><graphic coords="12,89.28,84.17,416.72,86.94" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Most attested language combinations in multilingual works of CAA and ECCO-classics tury.</figDesc><table><row><cell>lat|grc</cell><cell>95</cell><cell>lat|eng</cell><cell>876</cell></row><row><cell>fre|lat</cell><cell>32</cell><cell>grc|lat</cell><cell>648</cell></row><row><cell>lat|heb</cell><cell>16</cell><cell>lat|fre</cell><cell>27</cell></row><row><cell>ita | lat</cell><cell>13</cell><cell>grc|lat|eng</cell><cell>31</cell></row><row><cell>dut|fre</cell><cell>12</cell><cell>lat|fre|eng</cell><cell>8</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>demonstrate, translation of the classical languages (Ancient Greek and Latin) plays a central role in the multilingualism of the academic production. Most attested language combinations in monolingual translations of CAA and ECCOclassics</figDesc><table><row><cell cols="4">source-target languages # CAA source-target languages # ECCO</cell><cell></cell></row><row><cell>lat-dut</cell><cell>51</cell><cell>grc-eng</cell><cell>1198</cell><cell></cell></row><row><cell>lat-fre</cell><cell>34</cell><cell>lat-eng</cell><cell>926</cell><cell></cell></row><row><cell>fre-dutch</cell><cell>11</cell><cell>grc-lat</cell><cell>11</cell><cell></cell></row><row><cell>lat-ger</cell><cell>11</cell><cell>grc-fre</cell><cell>26</cell><cell></cell></row><row><cell>dataset</cell><cell cols="4">monolingual multilingual monolingual ed. monolingual transl.</cell></row><row><cell>CAA</cell><cell>3466</cell><cell>194</cell><cell>3291</cell><cell>175</cell></row><row><cell>balanced CAA monolingual</cell><cell>200</cell><cell>194</cell><cell>not used</cell><cell>not used</cell></row><row><cell>balanced CAA translation</cell><cell>not used</cell><cell>not used</cell><cell>350</cell><cell>175</cell></row><row><cell>ECCO-classics</cell><cell>550</cell><cell>1765</cell><cell>1156</cell><cell>609</cell></row><row><cell>combined</cell><cell>7020</cell><cell>1877</cell><cell>4513</cell><cell>2507</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Number of records per class in the four datasets used</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>shows a very long title in Latin with an explicit mention of a Class-wise f-scores for the fine-tuned BERT, SVM, and SetFit methods using combined CAA + ECCO datasets.</figDesc><table><row><cell></cell><cell></cell><cell cols="2">Multilingual Task</cell><cell cols="2">Translation Task</cell></row><row><cell>Train</cell><cell>Test</cell><cell>F-score (0)</cell><cell>F-score (1)</cell><cell>F-score (0)</cell><cell>F-score (1)</cell></row><row><cell>ML</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>combined</cell><cell>caa</cell><cell>0.82</cell><cell>0.99</cell><cell>0.99</cell><cell>0.89</cell></row><row><cell>combined</cell><cell>ecco</cell><cell>0.91</cell><cell>0.97</cell><cell>0.98</cell><cell>0.99</cell></row><row><cell>combined</cell><cell>combined</cell><cell>0.75</cell><cell>0.97</cell><cell>0.98</cell><cell>0.96</cell></row><row><cell>BERT</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>combined</cell><cell>caa</cell><cell>0.81</cell><cell>0.99</cell><cell>1.00</cell><cell>0.99</cell></row><row><cell>combined</cell><cell>ecco</cell><cell>0.91</cell><cell>0.97</cell><cell>0.99</cell><cell>0.90</cell></row><row><cell>combined</cell><cell>combined</cell><cell>0.78</cell><cell>0.97</cell><cell>0.98</cell><cell>0.96</cell></row><row><cell>SetFit</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Few-shot</cell><cell>caa</cell><cell>0.16</cell><cell>0.90</cell><cell>0.98</cell><cell>0.06</cell></row><row><cell>Few-shot</cell><cell>ecco</cell><cell>0.51</cell><cell>0.74</cell><cell>0.59</cell><cell>0.33</cell></row><row><cell>Few-shot</cell><cell>combined</cell><cell>0.42</cell><cell>0.82</cell><cell>0.80</cell><cell>0.23</cell></row><row><cell>Train</cell><cell>Test</cell><cell>Acc</cell><cell>r</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 :</head><label>5</label><figDesc>Comparative results for the monolingual/multilingual task, for bert-base-multilingualcased approach and tf-idf/SVM. Number reported is the BERT result subtracted from the tf-idf result. Numbers under zero mean that the BERT approach performed worse. Acc, r, p, and f1 denote accuracy, recall, precision, and f-score respectively.</figDesc><table><row><cell>caa</cell><cell>caa</cell><cell>0.00</cell><cell>0.06</cell><cell>0.01</cell><cell>0.04</cell><cell>0.00</cell><cell>0.00</cell><cell>0.41</cell></row><row><cell>caa</cell><cell>ecco</cell><cell>0.00</cell><cell>0.18</cell><cell>-0.52</cell><cell>0.25</cell><cell>-0.06</cell><cell>0.03</cell><cell>-0.01</cell></row><row><cell>caa</cell><cell>combined</cell><cell>-0.01</cell><cell>0.09</cell><cell>-0.19</cell><cell>0.06</cell><cell>-0.01</cell><cell>0.01</cell><cell>0.00</cell></row><row><cell>caa</cell><cell>caa_balanced</cell><cell>-0.14</cell><cell>-0.34</cell><cell>-0.10</cell><cell>-0.24</cell><cell>-0.04</cell><cell>-0.14</cell><cell>-0.10</cell></row><row><cell>ecco</cell><cell>ecco</cell><cell>0.03</cell><cell>0.03</cell><cell>0.06</cell><cell>0.05</cell><cell>0.02</cell><cell>0.01</cell><cell>0.01</cell></row><row><cell>ecco</cell><cell>caa</cell><cell>-0.18</cell><cell>0.56</cell><cell>0.02</cell><cell>0.11</cell><cell>-0.22</cell><cell>0.02</cell><cell>-0.12</cell></row><row><cell>ecco</cell><cell>combined</cell><cell>-0.14</cell><cell>-0.10</cell><cell>-0.50</cell><cell>-0.37</cell><cell>-0.15</cell><cell>-0.01</cell><cell>-0.09</cell></row><row><cell>ecco</cell><cell>caa_balanced</cell><cell>0.07</cell><cell>0.34</cell><cell>-0.62</cell><cell>0.24</cell><cell>-0.38</cell><cell>0.22</cell><cell>0.02</cell></row><row><cell>combined</cell><cell>combined</cell><cell>0.00</cell><cell>0.08</cell><cell>-0.03</cell><cell>0.03</cell><cell>-0.01</cell><cell>0.01</cell><cell>0.00</cell></row><row><cell>combined</cell><cell>caa</cell><cell>0.01</cell><cell>0.06</cell><cell>-0.13</cell><cell>-0.01</cell><cell>-0.01</cell><cell>0.00</cell><cell>0.00</cell></row><row><cell>combined</cell><cell>ecco</cell><cell>0.00</cell><cell>0.00</cell><cell>0.01</cell><cell>0.00</cell><cell>0.00</cell><cell>0.00</cell><cell>0.00</cell></row><row><cell>combined</cell><cell>caa_balanced</cell><cell>0.08</cell><cell>0.20</cell><cell>-0.13</cell><cell>0.05</cell><cell>-0.08</cell><cell>0.22</cell><cell>0.09</cell></row><row><cell>caa_balanced</cell><cell>caa_balanced</cell><cell>-0.10</cell><cell>-0.09</cell><cell>-0.28</cell><cell>-0.17</cell><cell>-0.10</cell><cell>0.09</cell><cell>0.00</cell></row><row><cell>caa_balanced</cell><cell>caa</cell><cell>-0.39</cell><cell>-0.09</cell><cell>-0.26</cell><cell>-0.36</cell><cell>-0.39</cell><cell>-0.01</cell><cell>-0.27</cell></row><row><cell>caa_balanced</cell><cell>ecco</cell><cell>0.01</cell><cell>0.06</cell><cell>0.03</cell><cell>0.05</cell><cell>0.00</cell><cell>0.02</cell><cell>0.00</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 8 :</head><label>8</label><figDesc>Performance results for multilingual/monolingual task and TFIDF/SVM</figDesc><table><row><cell>Train</cell><cell>Test</cell><cell>Acc</cell><cell>r (0)</cell><cell>p (0)</cell><cell>f1 (0)</cell><cell>r (1)</cell><cell>p (1)</cell><cell>f1 (1)</cell></row><row><cell>caa</cell><cell>caa</cell><cell>0.96</cell><cell>0.56</cell><cell>0.58</cell><cell>0.57</cell><cell>0.98</cell><cell>0.98</cell><cell>0.57</cell></row><row><cell>caa</cell><cell>ecco</cell><cell>0.77</cell><cell>0.01</cell><cell>1.00</cell><cell>0.02</cell><cell>1.00</cell><cell>0.77</cell><cell>0.87</cell></row><row><cell>caa</cell><cell>combined</cell><cell>0.90</cell><cell>0.26</cell><cell>0.88</cell><cell>0.40</cell><cell>0.99</cell><cell>0.90</cell><cell>0.94</cell></row><row><cell>caa</cell><cell>caa_balanced</cell><cell>0.99</cell><cell>0.98</cell><cell>1.00</cell><cell>0.99</cell><cell>1.00</cell><cell>0.97</cell><cell>0.99</cell></row><row><cell>ecco</cell><cell>ecco</cell><cell>0.90</cell><cell>0.76</cell><cell>0.82</cell><cell>0.79</cell><cell>0.95</cell><cell>0.93</cell><cell>0.94</cell></row><row><cell>ecco</cell><cell>caa</cell><cell>0.93</cell><cell>0.06</cell><cell>0.08</cell><cell>0.07</cell><cell>0.97</cell><cell>0.96</cell><cell>0.97</cell></row><row><cell>ecco</cell><cell>combined</cell><cell>0.96</cell><cell>0.91</cell><cell>0.92</cell><cell>0.92</cell><cell>0.98</cell><cell>0.98</cell><cell>0.98</cell></row><row><cell>ecco</cell><cell>caa_balanced</cell><cell>0.48</cell><cell>0.09</cell><cell>1.00</cell><cell>0.16</cell><cell>1.00</cell><cell>0.45</cell><cell>0.62</cell></row><row><cell>combined</cell><cell>combined</cell><cell>0.94</cell><cell>0.66</cell><cell>0.85</cell><cell>0.75</cell><cell>0.98</cell><cell>0.95</cell><cell>0.97</cell></row><row><cell>combined</cell><cell>caa</cell><cell>0.97</cell><cell>0.72</cell><cell>0.96</cell><cell>0.82</cell><cell>1.00</cell><cell>0.99</cell><cell>0.99</cell></row><row><cell>combined</cell><cell>ecco</cell><cell>0.96</cell><cell>0.90</cell><cell>0.92</cell><cell>0.91</cell><cell>0.98</cell><cell>0.97</cell><cell>0.97</cell></row><row><cell>combined</cell><cell>caa_balanced</cell><cell>0.85</cell><cell>0.73</cell><cell>1.00</cell><cell>0.85</cell><cell>1.00</cell><cell>0.74</cell><cell>0.85</cell></row><row><cell>caa_balanced</cell><cell>caa_balanced</cell><cell>0.85</cell><cell>0.73</cell><cell>0.92</cell><cell>0.81</cell><cell>0.91</cell><cell>0.72</cell><cell>0.81</cell></row><row><cell>caa_balanced</cell><cell>caa</cell><cell>0.92</cell><cell>0.97</cell><cell>0.34</cell><cell>0.50</cell><cell>0.91</cell><cell>1.00</cell><cell>0.95</cell></row><row><cell>caa_balanced</cell><cell>ecco</cell><cell>0.61</cell><cell>0.39</cell><cell>0.26</cell><cell>0.31</cell><cell>0.68</cell><cell>0.79</cell><cell>0.73</cell></row><row><cell>Train</cell><cell>Test</cell><cell>Acc</cell><cell>r (0)</cell><cell>p (0)</cell><cell>f1 (0)</cell><cell>r (1)</cell><cell>p (1)</cell><cell>f1 (1)</cell></row><row><cell>caa</cell><cell>caa</cell><cell>0.98</cell><cell>0.99</cell><cell>0.99</cell><cell>0.99</cell><cell>0.74</cell><cell>0.72</cell><cell>0.73</cell></row><row><cell>caa</cell><cell>ecco</cell><cell>0.75</cell><cell>0.96</cell><cell>0.59</cell><cell>0.73</cell><cell>0.62</cell><cell>0.97</cell><cell>0.76</cell></row><row><cell>caa</cell><cell>combined</cell><cell>0.87</cell><cell>0.99</cell><cell>0.83</cell><cell>0.90</cell><cell>0.64</cell><cell>0.98</cell><cell>0.77</cell></row><row><cell>caa</cell><cell>caa_balanced</cell><cell>0.97</cell><cell>1.00</cell><cell>0.96</cell><cell>0.98</cell><cell>0.91</cell><cell>1.00</cell><cell>0.95</cell></row><row><cell>ecco</cell><cell>ecco</cell><cell>0.96</cell><cell>0.91</cell><cell>0.97</cell><cell>0.94</cell><cell>0.99</cell><cell>0.95</cell><cell>0.97</cell></row><row><cell>ecco</cell><cell>caa</cell><cell>0.66</cell><cell>0.65</cell><cell>0.99</cell><cell>0.78</cell><cell>0.87</cell><cell>0.10</cell><cell>0.18</cell></row><row><cell>ecco</cell><cell>combined</cell><cell>0.83</cell><cell>0.73</cell><cell>0.99</cell><cell>0.84</cell><cell>0.99</cell><cell>0.68</cell><cell>0.80</cell></row><row><cell>ecco</cell><cell>caa_balanced</cell><cell>0.61</cell><cell>0.47</cell><cell>0.92</cell><cell>0.62</cell><cell>0.91</cell><cell>0.44</cell><cell>0.59</cell></row><row><cell>combined</cell><cell>combined</cell><cell>0.97</cell><cell>0.97</cell><cell>0.99</cell><cell>0.98</cell><cell>0.97</cell><cell>0.95</cell><cell>0.96</cell></row><row><cell>combined</cell><cell>caa</cell><cell>0.99</cell><cell>1.00</cell><cell>1.00</cell><cell>1.00</cell><cell>0.90</cell><cell>0.90</cell><cell>0.90</cell></row><row><cell>combined</cell><cell>ecco</cell><cell>0.99</cell><cell>0.99</cell><cell>0.98</cell><cell>0.99</cell><cell>0.99</cell><cell>1.00</cell><cell>0.99</cell></row><row><cell>combined</cell><cell>caa_balanced</cell><cell>0.96</cell><cell>1.00</cell><cell>0.95</cell><cell>0.97</cell><cell>0.88</cell><cell>1.00</cell><cell>0.94</cell></row><row><cell>caa_balanced</cell><cell>caa_balanced</cell><cell>0.87</cell><cell>0.86</cell><cell>0.94</cell><cell>0.90</cell><cell>0.88</cell><cell>0.74</cell><cell>0.81</cell></row><row><cell>caa_balanced</cell><cell>caa</cell><cell>0.93</cell><cell>0.92</cell><cell>1.00</cell><cell>0.96</cell><cell>0.97</cell><cell>0.37</cell><cell>0.54</cell></row><row><cell>caa_balanced</cell><cell>ecco</cell><cell>0.88</cell><cell>0.87</cell><cell>0.82</cell><cell>0.84</cell><cell>0.89</cell><cell>0.92</cell><cell>0.91</cell></row><row><cell>caa_balanced</cell><cell>combined</cell><cell>0.91</cell><cell>0.90</cell><cell>0.96</cell><cell>0.93</cell><cell>0.93</cell><cell>0.85</cell><cell>0.89</cell></row></table><note>A.3. Translation Task: BERT</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 9 :</head><label>9</label><figDesc>Performance results for translation task and fine-tuned BERT</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">The data and the code are available at: https://github.com/mfantoli/CHR2024_multilingualism.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">Enriching metadata based on book titles is also of interest to GLAM institutions, as demonstrated by a recent experiment on British Library data, https://living-with-machines.github.io/genre-classification/01_BL_fiction_no n_fiction.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">https://dial.uclouvain.be/digitization/en/digital-collection/old-academic-collection.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_3">https://www.gale.com/primary-sources/eighteenth-century-collections-online.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_4">More information on the identification of classical authors is provided in<ref type="bibr" target="#b8">[9]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_5">We haven't counted the exact number of classical works in the CAA, but, as an example, there are at least five editions of Homer, more than 10 editions of Cicero, etc.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_6">We kept double the number of monolingual editions compared to monolingual translations in order to still achieve enough critical mass in the number of examples.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We want to express our gratitute to the STUDIUM.AI team, particular to Violet Soen, whose efforts enabled this research. In addition, we would like to thank the KU Leuven Libraries staff, in particular the metadata and digitization services for sharing the CAA metadata and the relative documentation. Finally, we would like to thank the Computational History group of Helsinki, for providing the framework and infrastructure for annotating the ECCO training data.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Computer vision and machine learning approaches for metadata enrichment to improve searchability of historical newspaper collections</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Milleville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Verstockt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Van De Weghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chambers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Birkholz</surname></persName>
		</author>
		<idno type="DOI">10.1108/jd-01-2022-0029</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Documentation</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m">Multilingual texts and practices in early modern Europe</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Auger</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Brammall</surname></persName>
		</editor>
		<meeting><address><addrLine>New York, NY</addrLine></address></meeting>
		<imprint>
			<publisher>Routledge</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">William Shakspere&apos;s Small Latine and Lesse Greeke</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">W</forename><surname>Baldwin</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1944">1944</date>
			<publisher>University of Illinois Press</publisher>
			<pubPlace>Urbana</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Collaborative Translation as a Model for Multilingual Printing in Early Renaissance Editions of Aesop&apos;s Fables</title>
		<author>
			<persName><forename type="first">B</forename><surname>Bistué</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Multilingual texts and practices in early modern Europe</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Auger</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Brammall</surname></persName>
		</editor>
		<meeting><address><addrLine>New York, NY</addrLine></address></meeting>
		<imprint>
			<publisher>Routledge</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Text-mining metadata: What can titles tell us of the history of modern and contemporary art?</title>
		<author>
			<persName><forename type="first">M</forename><surname>Bowman</surname></persName>
		</author>
		<idno type="DOI">10.22148/001c.74602</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Cultural Analytics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Printers of the Greek Classics and Market Distribution in the Sixteenth Century: The Case of France and the Low Countries</title>
		<author>
			<persName><forename type="first">N</forename><surname>Constantinidou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Specialist Markets in the Early Modern Book World</title>
		<editor>R. Kirwan and S. Mullins</editor>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="page" from="273" to="293" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno>arXiv: 181 0.04805</idno>
		<ptr target="http://arxiv.org/abs/1810.04805" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Burstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Doran</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</editor>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Quantifying the Presence of Ancient Greek and Latin Classics in Early Modern Britain</title>
		<author>
			<persName><forename type="first">M</forename><surname>Fantoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Suomela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Van Hal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Depauw</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Virkki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tolonen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Journal of Cultural Analytics</title>
				<imprint/>
	</monogr>
	<note>forthcoming</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Topic modelling characterization of Mudejar art based on document titles</title>
		<author>
			<persName><forename type="first">C</forename><surname>Garcia-Zorita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Pacios</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqx055</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="529" to="539" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">The Availability of the Classics. Readers, Writers, Translation, Performance</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gillespie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Oxford History of Classical Reception in English Literature</title>
				<imprint>
			<publisher>Oxford University Press</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="57" to="74" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Machine learning in computational literary studies</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">O</forename><surname>Hatzel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Stiemer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Biemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gius</surname></persName>
		</author>
		<idno type="DOI">10.1515/itit-2023-0041</idno>
	</analytic>
	<monogr>
		<title level="j">it -Information Technology</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="issue">4-5</biblScope>
			<biblScope unit="page" from="200" to="217" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Multilingualism and Translation in the Early Modern Low Countries</title>
		<author>
			<persName><forename type="first">T</forename><surname>Hermans</surname></persName>
		</author>
		<idno type="DOI">10.4324/9781003092445</idno>
		<ptr target="https://www.taylorfrancis.com/books/9781003092445" />
	</analytic>
	<monogr>
		<title level="m">Language Dynamics in the Early Modern Period</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Bennett</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Cattaneo</surname></persName>
		</editor>
		<meeting><address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>Routledge</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page">20</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Enseignement du grec et livres scolaires dans les anciens Pays-Bas et la Principaute de Liege de 1483 à 1600. Deuxième partie: 1551-1600</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hoven</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Gutenberg-Jahrbuch</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="118" to="126" />
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Enseignement du grec et livres scolaires dans les anciens Pays-Bas et la Principauté de Liège de 1483 à 1600. Première partie: 1483-1550</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hoven</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Gutenberg-Jahrbuch</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="page" from="80" to="86" />
			<date type="published" when="1979">1979</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Automatic Language Identification in Texts: A Survey</title>
		<author>
			<persName><forename type="first">T</forename><surname>Jauhiainen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Baldwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lindén</surname></persName>
		</author>
		<idno type="DOI">10.1613/jair.1.11675</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Artificial Intelligence Research</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Printing the Classical Text</title>
		<author>
			<persName><forename type="first">H</forename><surname>Jones</surname></persName>
		</author>
		<ptr target="https://brill.com/display/title/26045" />
	</analytic>
	<monogr>
		<title level="m">Printing the Classical Text</title>
				<imprint>
			<publisher>Brill</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A Quantitative Study of History in the English Short-Title Catalogue (ESTC), 1470-1800</title>
		<author>
			<persName><forename type="first">L</forename><surname>Lahti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ilomäki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tolonen</surname></persName>
		</author>
		<idno type="DOI">10.18352/lq.10112</idno>
	</analytic>
	<monogr>
		<title level="j">LIBER Quarterly: The Journal of the Association of European Research Libraries</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="87" to="116" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Bibliographic Data Science and the History of the Book (c. 1500-1800)</title>
		<author>
			<persName><forename type="first">L</forename><surname>Lahti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Marjanen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Roivainen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tolonen</surname></persName>
		</author>
		<idno type="DOI">10.1080/01639374.2018.1543747</idno>
	</analytic>
	<monogr>
		<title level="j">Cataloging &amp; Classification Quarterly</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="5" to="23" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Translations from the Classics into English from Caxton to Chapman</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">B</forename><surname>Lathrop</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1620">1620. 1933</date>
			<publisher>University of Wisconsin Studies in Language and Literature</publisher>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page">1477</biblScope>
			<pubPlace>Madison</pubPlace>
		</imprint>
		<respStmt>
			<orgName>University of Wisconsin</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A Unified Approach to Interpreting Model Predictions</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
		<ptr target="http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 30</title>
				<editor>
			<persName><forename type="first">I</forename><surname>Guyon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><forename type="middle">V</forename><surname>Luxburg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Vishwanathan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="4765" to="4774" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Humanism and the Classical Tradition</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mack</surname></persName>
		</author>
		<idno type="DOI">10.1093/oso/9780192886699.003.0001</idno>
	</analytic>
	<monogr>
		<title level="m">The Oxford History of the Renaissance</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Campbell</surname></persName>
		</editor>
		<imprint>
			<publisher>Oxford University PressOxford</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="10" to="47" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Open Bibliographical Data Workflows and the Multilinguality Challenge</title>
		<author>
			<persName><forename type="first">V</forename><surname>Malıńek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Umerle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Heibi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Király</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Klaes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Korytkowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lindemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moretti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Panušková</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Péter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tolonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tomczyńska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vimr</surname></persName>
		</author>
		<idno type="DOI">10.5334/johd.190</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Open Humanities Data</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">27</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Text classification of column headers with a controlled vocabulary: leveraging LLMs for metadata enrichment</title>
		<author>
			<persName><forename type="first">M</forename><surname>Martorana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kuhn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Stork</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Van Ossenbruggen</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/2403.00884" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Style, Inc. Reflections on Seven Thousand Titles (British Novels, 1740?1850</title>
		<author>
			<persName><forename type="first">F</forename><surname>Moretti</surname></persName>
		</author>
		<idno type="DOI">10.1086/606125</idno>
	</analytic>
	<monogr>
		<title level="j">Critical Inquiry</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="134" to="158" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Genre Classification of Books on Spanish</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Nolazco-Flores</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Guerrero-Galván</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Del-Valle-Soto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Garcia-Perera</surname></persName>
		</author>
		<idno type="DOI">10.1109/access.2023.3332997</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="132878" to="132892" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Multilingual Analysis and Visualization of Bibliographic Metadata and Texts With the AVOBMAT Research Tool</title>
		<author>
			<persName><forename type="first">R</forename><surname>Péter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Szántó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Biacsi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Berend</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Bilicki</surname></persName>
		</author>
		<idno type="DOI">10.5334/johd.175</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Open Humanities Data</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">23</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model</title>
		<author>
			<persName><forename type="first">I</forename><surname>Rastas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ciarán Ryan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Tiihonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Qaraei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Repo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Babbar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mäkelä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tolonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ginter</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.lchange-1.7</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Tahmasebi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Montariol</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Kutuzov</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Hengchen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Dubossarsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Borin</surname></persName>
		</editor>
		<meeting>the 3rd Workshop on Computational Approaches to Historical Language Change<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="68" to="77" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/1908.10084" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">The Evolution of Scottish Enlightenment Publishing</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">C</forename><surname>Ryan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tolonen</surname></persName>
		</author>
		<idno type="DOI">10.1017/s0018246x23000614</idno>
	</analytic>
	<monogr>
		<title level="j">The Historical Journal</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="223" to="255" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title level="m" type="main">Large Language Models for Data Annotation: A Survey</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Beigi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhattacharjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Karami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<idno type="DOI">10.48550/arxiv.2402.13446</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<ptr target="https://huggingface.co/docs/transformers/en/tasks/sequence%5C%5Fclassification" />
		<title level="m">Text classification</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">The Anatomy of Eighteenth Century Collections Online (ECCO)</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tolonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mäkelä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lahti</surname></persName>
		</author>
		<idno type="DOI">10.1353/ecs.2022.0060</idno>
	</analytic>
	<monogr>
		<title level="j">Eighteenth-Century Studies</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="95" to="123" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">SetFit: EfÏcient Few-Shot Learning Without Prompts</title>
		<author>
			<persName><forename type="first">L</forename><surname>Tunstall</surname></persName>
		</author>
		<ptr target="https://huggingface.co/blog/setfit" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<title level="m" type="main">EfÏcient Few-Shot Learning Without Prompts</title>
		<author>
			<persName><forename type="first">L</forename><surname>Tunstall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">E S</forename><surname>Jo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Korat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wasserblat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Pereg</surname></persName>
		</author>
		<idno type="DOI">10.48550/arxiv.2209.11055</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<title level="m" type="main">EfÏcient Few-Shot Learning Without Prompts</title>
		<author>
			<persName><forename type="first">L</forename><surname>Tunstall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">E S</forename><surname>Jo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Korat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wasserblat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Pereg</surname></persName>
		</author>
		<idno type="DOI">10.48550/arxiv.2209.11055</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<title level="m" type="main">Attention is All You Need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<ptr target="https://arxiv.org/pdf/1706.03762.pdf" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<monogr>
		<title level="m">Vertalen in de Nederlanden: een cultuurgeschiedenis</title>
				<meeting><address><addrLine>Amsterdam</addrLine></address></meeting>
		<imprint>
			<publisher>Boom</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">The English Grammar Schools to 1660</title>
		<author>
			<persName><forename type="first">F</forename><surname>Watson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Their Curriculum and Practice</title>
				<meeting><address><addrLine>London</addrLine></address></meeting>
		<imprint>
			<publisher>Frank Cass &amp; Co</publisher>
			<date type="published" when="1968">1968</date>
		</imprint>
	</monogr>
	<note>2nd ed</note>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">What to do with 2.000.000 Historical Press Photos? The Challenges and Opportunities of Applying a Scene Detection Algorithm to a Digitised Press Photo Collection</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wevers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Vriend</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">De</forename><surname>Bruin</surname></persName>
		</author>
		<idno type="DOI">10.18146/tmg.815</idno>
	</analytic>
	<monogr>
		<title level="j">TMG Journal for Media History</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">1</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">The Place of Classics in Education and Publishing</title>
		<author>
			<persName><forename type="first">P</forename><surname>Wilson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Oxford History of Classical Reception in English Literature</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Hopkins</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Martindale</surname></persName>
		</editor>
		<meeting><address><addrLine>Oxford and New York</addrLine></address></meeting>
		<imprint>
			<publisher>Oxford University Press</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="29" to="52" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Detecting Sequential Genre Change in Eighteenth-Century Texts</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">C</forename><surname>Ryan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Rastas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ginter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tolonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Babbar</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3290/%5C#short%5C%5Fpaper2630" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Computational Humanities Research Conference</title>
				<editor>
			<persName><forename type="first">F</forename><surname>Karsdorp</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Lassche</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Nielbo</surname></persName>
		</editor>
		<meeting>the Computational Humanities Research Conference<address><addrLine>Antwerp, Belgium</addrLine></address></meeting>
		<imprint>
			<publisher>Ceur</publisher>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="volume">3290</biblScope>
			<biblScope unit="page" from="243" to="255" />
		</imprint>
	</monogr>
	<note>CEUR Workshop Proceedings</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
