<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The birth of French orthography. A computational analysis of French spelling systems in diachrony ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Simon</forename><surname>Gabay</surname></persName>
							<email>simon.gabay@unige.ch</email>
							<affiliation key="aff1">
								<orgName type="institution">Inria Centre de Recherche de Paris</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Thibault</forename><surname>Clérice</surname></persName>
							<email>thibault.clerice@inria.fr</email>
							<affiliation key="aff1">
								<orgName type="institution">Inria Centre de Recherche de Paris</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Université de Genève</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">The birth of French orthography. A computational analysis of French spelling systems in diachrony ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">3B67951E1D39D722B055FAB3A12D5550</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Computational linguistics</term>
					<term>History of orthography</term>
					<term>Information extraction</term>
					<term>Corpus building</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The 17th c. is crucial for the French language, as it sees the creation of a strict orthographic norm that largely persists to this day. Despite its significance, the history of spelling systems remains however an overlooked area in French linguistics for two reasons. On the one hand, spelling is made up of micro-changes which requires a quantitative approach, and on the other hand, no corpus is available due to the interventions of editors in almost all the texts already available. In this paper, we therefore propose a new corpus allowing such a study, as well as the extraction and analysis tools necessary for our research. By comparing the text extracted with OCR and a version automatically aligned with contemporary French spelling, we extract the variant zones, we categorise these variants, and we observe their frequency to study the (ortho)graphic change during the 17th century.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The grapho-phonetic aspects of French during the 17th c. paradoxically remain very poorly known, despite the importance of the graphematic question at this period, which saw the appearance of the French orthography <ref type="foot" target="#foot_0">1</ref> . Rather than the actual practice of scriptors, it is the depth of theoretical debates on spelling that has until now concentrated most of research (e.g. <ref type="bibr" target="#b9">[10]</ref> or <ref type="bibr" target="#b6">[7]</ref>), and the notebooks of Mezeray <ref type="bibr" target="#b0">[1]</ref> or the Remarques of Vaugelas <ref type="bibr" target="#b55">[56,</ref><ref type="bibr" target="#b56">57]</ref> still remain among the main sources used, rather than statistical surveys on vast corpora.</p><p>If the various dialects and other scriptae populating Old and Middle French have been abundantly described (e.g. in <ref type="bibr" target="#b14">[15]</ref>), just like the "orthographie" of the Renaissance (to quote the term used by Baddeley <ref type="bibr" target="#b2">[3]</ref>), the slow imposition of an orthographic norm throughout modern times, although a major phenomenon in the history of a language as prescriptive as French, remains a blind spot in diachronic linguistics. How has the French that we know today supplanted its various modern variations? One of the main technical challenges for carrying out such a study relies on the existence of important amounts of data, in order to guarantee quantitatively the reliability of the results. Unfortunately, such corpora of classical French do not exist for two reasons. On the one hand, as Cl. Vachon <ref type="bibr">[55, p. 32, n. 31]</ref> bitterly experienced, text editors got into the habit of standardising the language of that era <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b15">16]</ref>, which makes its study particularly complicated, if not impossible to use for graphematic studies. On the other hand, the few corpora that have been created, like that of Cl. Vachon, but also others like that of the Réseau Corpus Français Préclassique et Classique (RCFC) <ref type="bibr" target="#b1">[2]</ref> are not, or not in full, available to researchers. Given the ever-increasing quantities of data necessary for computational studies, it is however dubious that these two corpora, even freely accessible, would in any case remain insufÏcient for the most recent approaches proposed in NLP.</p><p>This paper proposes to return to the history of the French vêtement graphique ("graphic clothing") in a computational way. We introduce a two-step approach: first, a unique corpus creation pipeline meticulously extracts spelling information from digital facsimiles. This pipeline includes a layout analysis model to distinguish text from paratext on the page, an OCR model that retains the historical character ‹ſ›, pivotal to written French, and a linguistic normaliser that "translates" historical French into its contemporary counterpart at the sentence level. In the second step, we analyse the created corpus using a comparison algorithm that matches the extracted historical text with its modern equivalent at the character level. This enables us to pinpoint significant variations, categorise these differences, and uncover detailed trends throughout the 17th century. This methodological framework not only enhances our understanding of historical French orthography, but also proposes a new approach for computational linguistic studies of spelling variation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">State of the art</head><p>Corpus building from OCR has long been a task in digital humanities and corpus linguistics. Initially deemed unsuitable for historical sources in 1993 <ref type="bibr" target="#b43">[44]</ref>, OCR gained credibility in the late 1990s for corpus building, including XML TEI formalisation in commercial projects such as the Patrologia Latina Database, and for Ancient Greek scripts in the 2010s <ref type="bibr" target="#b48">[49]</ref>. Most project using TEI, such as the First1KGreek project, relied on manual formalisation of the text's logical structure <ref type="bibr" target="#b39">[40]</ref>, as manual work was considered essential for accuracy. The advent of user-friendly OCR and HTR technologies has spurred interest in automatic document formalisation (ADF), primarily focused on facsimile formalisation <ref type="bibr" target="#b51">[52]</ref> and noisy text removal with tools based on vocabularies such as SegmOnto <ref type="bibr" target="#b24">[25]</ref>, which standardises the identification of paratextual zones (running titles, footnotes, etc.). Few projects, however, have utilised font, geometric, and textual features to reconstruct or emulate the original text structure from borndigital PDFs or OCR outputs. PaperXML <ref type="bibr" target="#b50">[51]</ref> demonstrated such transformation but was limited to the ACL Anthology structure. Grobid <ref type="bibr" target="#b49">[50]</ref> and Grobid Dictionaries <ref type="bibr" target="#b34">[35,</ref><ref type="bibr" target="#b33">34]</ref> employed geometric, font, and textual features to produce XML TEI output, though they were specific to scientific papers and dictionaries. In 2022, visual features outperformed linguistic ones in document formalisation, with YOLO models using the SegmOnto controlled vocabulary surpassing LayoutLM models in multilingual settings <ref type="bibr" target="#b40">[41]</ref>. Recently, research has started on OCR output formalisation for corpus building with a controlled vocabulary and a training dataset for models <ref type="bibr" target="#b32">[33,</ref><ref type="bibr" target="#b44">45]</ref>. Lastly, the Layout Analysis Dataset with SegmOnto (LADaS) <ref type="bibr" target="#b12">[13]</ref> allowed a much finer granularity in the analysis, and a significant improvement of the entire pipeline for the automatic creation of files encoded in XML-TEI that goes beyond facsimile approach and closer to reproducing the logical structure of the text.</p><p>Linguistic Normalisation (LN) has a long history, dating back to the 80's <ref type="bibr" target="#b16">[17]</ref>, but has developed itself as derived task from Machine Translation (MT) in the beginning of the 2010's, usually to improve downstream tasks in the pipeline such as linguistic annotation <ref type="bibr" target="#b53">[54]</ref>. LN share important similarities with MT, and therefore relies on the same methods, but with a slightly different objective: to "translate" a source into another state of the language, usually more recent (16th c. German→ contemporary German), rather than into another language (Italian → German). Resources existed first for Slovene, German, English, Hungarian, Spanish, Swedish, Portuguese <ref type="bibr" target="#b7">[8]</ref>, but several studies have recently improved both resources <ref type="bibr" target="#b20">[21]</ref> and techniques for historical French, first comparing rule-based, statistical and neural methods <ref type="bibr" target="#b22">[23]</ref>, and then alignment-based and neural MT-approaches <ref type="bibr" target="#b5">[6]</ref>.</p><p>Computational scriptology is based on the notion of scripta, coined by Remacle <ref type="bibr" target="#b47">[48]</ref> and widely used in Romanistics to to distinguish a spoken language (the dialect) and a written language (the scripta). The first studies on dialectometry date back from the early 70's with the pioneer work of Jean Séguy, who invented the term dialectométrie <ref type="bibr" target="#b52">[53]</ref>, on the distance between dialects in vast corpora <ref type="bibr" target="#b47">[48]</ref>. Since then, two main schools, based in Salzburg <ref type="bibr" target="#b29">[30]</ref> and Groningen <ref type="bibr" target="#b42">[43]</ref>, have advanced research on the topic, but relying mainly on geographical data to localise dialects. In parallel to these research, Cl. Vachon has changed the approach, switching to corpus-based research, using historical data to study semi-automatically the spelling <ref type="bibr" target="#b54">[55]</ref>, and more recently, J.-B. Camps has shifted the method, using unsupervised stylometry to categorise medieval scriptae <ref type="bibr" target="#b8">[9]</ref>. Regarding modern French, alternative studies have proposed alignementbased approaches to compare the historical source and an automatically normalised version to detect the evolution of spellings <ref type="bibr" target="#b23">[24]</ref> or to categorise documents <ref type="bibr" target="#b27">[28]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Corpus building</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data</head><p>For practical reasons, a first corpus of limited size (c. 600 texts) spanning the 17th c. was produced with our pipeline. The data comes from the Gallica digital library and contains only French-language documents. For our experiment, we have selected only plays, which offer medium size documents (compared to novels, potentially much longer), and linguistically homogeneous data (spelling can influenced by the genre, such as legal documents which tend to use more "archaic" traits and may involve Latin phrases). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Method</head><p>Our pipeline allows us to extract data, enrich it and store it in a standard format (cf. fig. <ref type="figure" target="#fig_0">1</ref>). Firstly, we apply a layout analysis model specialised in theatrical data trained for the occasion, then we use an OCR model prepared for this study which preserves the long s (‹ſ›). Based on the layout analysis we convert ALTO files to TEI files. Only textual data which contains text of the work (paragraph, speech, verse, etc.), and not linked to the structure of the book (running title, page number, quire marks, etc.) is extracted and normalised automatically, before to be reintroduced into the TEI file.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Training and evaluation data for Layout Analysis across the datasets.</p><p>Each image represent a single document in the dataset. Layout analysis. Based on the results of Najem-Meyer and Romanello <ref type="bibr" target="#b40">[41]</ref> and the initial evaluation of YOLO region segmenter against Kraken's <ref type="bibr" target="#b35">[36]</ref> as a region segmenter with YALTAi <ref type="bibr" target="#b11">[12]</ref>, we proposed to evaluate the ability of YOLOv8 <ref type="bibr" target="#b46">[47]</ref> to detect regions in our 17th c. print corpus.</p><p>For this purpose, we annotated one random image from each digitised version of our corpus, which could include empty pages (e.g., bookbinding, cover) and full pages. This resulted in a corpus of 620 images for training, evaluation, and testing. Our final corpus comprises 32 null pages (without annotations) and a variety of annotations, with a majority of speech-related tags (MainZone:SP, MainZone:SP#Continued), paratextual-related objects (e.g., Number-ingZone, RunningTitleZone), a smaller number of logical structuring features such as scene titles (MainZone:Head) and cast lists (MainZone:Entry), as well as a few paragraphs and poetic excerpts, mainly found in incipits or prefaces of the books (MainZone:P), as seen in tab. 1.   Optical character recognition. Since YOLO is well integrated within YALTAi, which in turn works seamlessly with Kraken, we decided to use the latter to train a new OCR model that includes the long s (‹ſ›). Kraken, unlike other OCR system, avoids the integration of a strong language model which in turn, for our purpose, allows for keeping more variations. This new model, derived from CATMuS Print <ref type="bibr" target="#b25">[26]</ref>, uses three datasets for fine-tuning <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b28">29]</ref> and one evaluation dataset <ref type="bibr" target="#b26">[27]</ref> (cf. tab. 2). We evaluate on a test set that includes data spanning three centuries (from the 16th to the 18th) and comprises one page from 10 different documents for each period.</p><p>TEI Document production. Document formalisation follows a logical approach based on the ALTO output produced by Kraken and YALTAi, rather than a neural one. Each region is processed in reading order, with regions not matching MainZone being ignored, except for the "default" region, which handles orphan lines. The default region is placed into a &lt;fw&gt; ("forme work") tag, which is typically excluded from our text export processes. Regions marked as #Continued are logically merged with previous ones. Each line is prepended by a TEI &lt;lb/&gt; (line beginning) tag to facilitate back-to-document correction capabilities. Hyphenisation is resolved by removing hyphen but keeping the &lt;lb/&gt; tag at its place. <ref type="foot" target="#foot_1">2</ref> While machine learning is employed for initial region detection, the formalisation process itself does not involve any learned behaviour. Metadata are systematically integrated in the &lt;teiHeader&gt;, using information automatically retrieved from the catalogue of the French National Library via the ark ID.</p><p>Linguistic normalisation. All documents are processed via a normaliser previously trained <ref type="foot" target="#foot_2">3</ref> . Only text contained in &lt;p&gt; and &lt;sp&gt; ("speech") elements are kept for normalisation, because a specific spelling variation occurring in the running title, for instance, would be repeated every two pages and potentially alter artificially the result of the scriptometric analysis. The text is split into sentences (ending by a full stop, an exclamation or a question mark) or subsentences (ending by a colon or a semicolon), all stored in a &lt;seg&gt; ("arbitrary segment") element, with the source text in &lt;orig&gt; ("original form") and the automatically normalised text in &lt;reg&gt; ("regularization"). The normalised version is evaluated against a dictionary of modern French to control the quality of the final product.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Experimental Setup and Evaluation</head><p>Layout analysis. We evaluate two possible setups: both use fine-tuning with the original YOLOv8L models and an input image size of 960 pixels (higher than the default). One setup uses only the dataset produced in the context of this paper, while the other merges this dataset with the larger LADaS dataset (5,000 images). We train both setups for 100 epochs with otherwise default parameters. Since our study focuses exclusively on the MainZone, which contains the primary text and excludes all paratextual elements (such as decorations, page numbers, and running titles), we have concentrated our evaluation on this specific zone. Overall, when considering all classes, we found that integrating our data with the LADaS corpus yields improved results (0.768 vs. 0.8). However, for the most critical classes (Sp and Sp-continued), the model trained exclusively with theatrical data produces slightly better outcomes. As previously mentioned, these are the classes essential for our study.  Text recognition. To fine-tune and adapt the CATMuS Print OCR model to the allographic variation of round s/long s, we modified the classifier codec (-resize new mode) and used a standard learning rate of 0.0001, along with a batch size of 32. This logical approach ensures the model is fine tuned to the specific typographic variations without relying on any learned behaviour during the formalisation process. We compare this approach to a model without fine-tuning, trained from scratch, with the same architecture (cf. tab. 4), revealing the superiority of the approach with fine tuning. Most of the errors are errors related to poor segmentation of the text (cf. tab. 5), in which there should be a space that is missing from the prediction -a classic error for historical prints. The prediction errors regarding two types of apostrophes (curved or straight) are of little concern because they do not affect the result from a linguistic point of view and are due to poor data preparation that is easily correctable. The confusion between the round s and the long s is likely attributable to the fine-tuning process and the absence of the long s in the base model. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Linguistic normalisation</head><p>To evaluate the results of the normalisation, we compare the prediction of the normaliser with a dictionary of contemporary French to obtain a Word Accuracy (WAcc). Results are satisfactory (cf. fig. <ref type="figure" target="#fig_3">3</ref>), with a median above 90%. Texts with a WAcc under 80% are removed to avoid using unreliable data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Result dataset</head><p>The final dataset is made of around 80,000 pages for 620 documents. While the number of unit is uneven over the years (cf. fig. <ref type="figure" target="#fig_5">5a</ref>), the accumulated tokens are progressing evenly (cf. fig. <ref type="figure" target="#fig_5">5b</ref>). An example of our TEI encoding is presented in fig. <ref type="figure" target="#fig_4">4</ref>   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Evaluation of spelling variation 4.1. Method</head><p>We use the ABA <ref type="bibr" target="#b45">[46]</ref> tool to precisely identify the portions of words which differ between the original version and the normalised version, and group similar differences, for example having the same historical-linguistic origin, or the same type of operations in terms of addition, deletion or modification of characters. Each &lt;orig&gt; and &lt;reg&gt; of the corpus is split into words, the punctuation is removed, and then the original and normalised versions are aligned at the word level using the Needleman-Wunsch <ref type="bibr" target="#b41">[42]</ref> algorithm, using the Levenshtein distance <ref type="bibr" target="#b38">[39]</ref> between each pair of words in the same &lt;seg&gt; in the original and normalised version <ref type="foot" target="#foot_3">4</ref> .</p><p>Secondly, for each of the aligned word pairs, the original version and the normalised version are aligned at the character level, still using the Needleman-Wunsch algorithm, but using a specific substitution matrix to allow not only identical letters to be aligned, but also letters considered close in (pre)classical French and contemporary French (presence/absence of diacritic, ligatures…). For example, while identical letters benefit from a substitution score of 4, letters differing only in accent or cedilla benefit from a score of 2, as do ‹ſ› and ‹s› or ‹s› and ‹ß› for example. Other pairs of letters benefit from a score of 1, such as ‹u› and ‹v›, ‹s› and ‹z› or even ‹n› and ‹m ›. Conversely, a score of -1 is assigned to pairs of distinct letters not subject to such exceptions, as well as to the deletion or insertion of a character. The arrows indicate the previous box on the optimal path to calculate the similarity between two prefixes, one from the word on the first row, the other from the word in the first column.</p><p>On this optimal path, green indicates equality, red indicates substitution, and blue indicates deletion.</p><formula xml:id="formula_0">A p o ſ t r e A ↘ 4 → 3 → 2 → 1 → 0 → -1 → -2 p ↓ 3 ↘ 8 → 7 → 6 → 5 → 4 → 3 ô ↓ 2 ↓ 7 ↘ 10 → 9 → 8 → 7 → 6 t ↓ 1 ↓ 6 ↓ 9 ↓ 8 ↘ 13 → 12 → 11 r ↓ 0 ↓ 5 ↓ 8 ↓ 7 ↓ 12 ↘ 17 → 16 e ↓ -1 ↓ 4 ↓ 7 ↓ 6 ↓ 11 ↓ 16 ↘ 21</formula><p>This execution of the Needleman-Wunsch algorithm to obtain character-level alignment is illustrated in the matrix in tab. <ref type="bibr" target="#b5">6</ref>, where each number represents the similarity score of the best alignment found between the prefix of ‹Apoſtre› and ‹Apôtre› up to this box. It is preceded by an arrow indicating which box to come from to obtain this best alignment. For example, to obtain the best alignment between ‹Apoſ› and ‹Apô›, we must consider the best alignment between ‹Apo› and ‹Apô› (which has a score of 10) then make an insertion of , which has a score of -1, which provides a total score of 9. If we had preferred to first consider the best alignment between ‹Apoſ› and ‹Ap›, which has a score of 6, then delete the ô, which has a score of -1, we would have obtained an alignment with a score of 5, therefore lower than optimal. In case of insertion or deletion during this alignment step, we use the ¤ character in order to obtain two words of the same length in both the original and normalised version. Thus, at the end of this second alignment step, the word Apo tre in the original version is matched with apô¤tre in a normalised version to obtain character-by-character alignment.</p><p>Finally, for each word in the corpus, its original and normalised versions are analysed, char-acter by character, to detect, in the case of different characters at the same position, the normalisation rule that applies, or to signal that no existing rule was identified when appropriate. 72 rules were defined based on the bibliography and the differences observed in the gold FreEM norm parallel corpus <ref type="bibr" target="#b20">[21]</ref>. For example, the rule Ramist letter is detected if an ‹i›, a ‹j›, an ‹u› or a ‹v› is present in the associated original word respectively to a ‹j›, an ‹i›, a ‹v› or an ‹u› in the normalised version.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Results</head><p>Figure <ref type="figure">6</ref>: Disappearance of ‹gn›.</p><p>Based on the alignments obtained using the Needleman-Wunsch algorithm and the detections of the 72 rules mentioned earlier, our analysis reveals four distinctive patterns of historical spelling changes. The principle underlying this analysis is straightforward: if a normalisation rule is detected less frequently, it indicates that the historical spelling it targets is becoming less prevalent in the corpus. To examine its evolution throughout the century, we normalize the total number of rule applications to its percentage within each text. For instance, the etymological spelling ‹gn›, found in form cognoitre (&lt;lat. cognoscere), is less and less replace by ‹nn› (today connaître, eng. "to know"), signifying the slow disappearance of this spelling (cf. fig. <ref type="figure">6</ref>).  Pattern A: constant rate. Using ABA, it is possible to detect more complex traits of historical graphic systems than the specific use of a single letter (e.g. ‹u› vs ‹v› as a vowel) or a group of letters (‹gn› vs ‹nn›), such as the presence of a diacritical letter to change the sound-value of the letter to which it is added (e.g. vowel + ‹s›). In historical French, the phoneme [e] is thus regularly noted with the grapheme ‹es› where today we use ‹é› (estat vs état, eng. "state"), and the phoneme [ä] is noted ‹as› where we now find ‹â› (pasturage vs pâturage, eng "pasture"). If counting the presence of ‹v› followed by consonant (‹vne› = [yn]) to identify the historical use of ‹v› is enough, it is impossible to count the occurrences of ‹es› to measure the presence of a diacritical s (‹esponge› → ‹éponge›, eng. "sponge", but ‹espagnol› → ‹espagnol› and not ‹épag-nol›, eng. "spanish"): the transition from a complex grapheme (such as a digraph) to a simple grapheme requires an alignment at the character level of the original text and its normalised version, and then the deduction of the spelling change from the difference between the two.  In our corpus, we detect a clear decrease in the use of complex graphemes with a diacritical s, whether the latter is combined with ‹e› (cf. fig. <ref type="figure" target="#fig_7">7a</ref>) or with ‹a› (cf. fig. <ref type="figure" target="#fig_7">7b</ref>). Interestingly, the propagation of these two new spellings (vowel+accent) does occur at a very similar speed (cf. fig. <ref type="figure" target="#fig_9">8a</ref>), recalling Kroch's constant rate hypothesis (cf. fig. <ref type="figure" target="#fig_9">8b</ref>) <ref type="foot" target="#foot_4">5</ref> , of which researchers have already found traces in syntactic <ref type="bibr" target="#b58">[59]</ref> and phonological <ref type="bibr" target="#b17">[18]</ref> change.  Pattern B: abrupt change. On the basis of such observations, it is however possible to go further and date the moment when a break occurs in the scribal practice, to date the moment when the spelling changes. To do so, we can use binary segmentation (BS) <ref type="bibr" target="#b57">[58,</ref><ref type="bibr" target="#b3">4]</ref>, an algorithm using a forward stepwise method, to identify change-point detection. This method has already been used in diachronic linguistic to study the sudden introduction of new lexical items <ref type="bibr" target="#b36">[37]</ref>. One of the main discoveries of our study is the extremely abrupt nature of certain changes, which take place at very high speed, such as the disappearance between 1668 and 1672 of Ramist letters (cf. fig. <ref type="figure" target="#fig_11">9a</ref>), as proposed by Christophe Plantin in the 16th c. <ref type="bibr" target="#b10">[11]</ref> and defended by Pierre Corneille in his foreword au lecteur of 1663 <ref type="bibr" target="#b13">[14]</ref>. A similar phenomenon, although slightly less abrupt, exists for the disappearance of the etymological ‹c› followed by ‹t› (e.g. ‹faict›&lt;factum, today fait, eng. "fact") at the end of the 1630s (cf. fig. <ref type="figure" target="#fig_11">9b</ref>).   It is indeed faster to compose the word eſtoit with the ligature (e+ſt+o+i+t=4 characters) than without (e+ſ|s+t+o+i+t=5 characters). One could argue that switching to the accented letter also requires only four characters (é+t+o+i+t=4 characters), but if the ligatures are present in number in the printer's type case, the accented characters are less so. Our working hypothesis is as follows: as ligatures are largely composed of a long s (‹ſ›), we should obtain a correlation between the use of this s (cf. fig. <ref type="figure" target="#fig_13">10a</ref>) and the acute accent (cf. fig. <ref type="figure" target="#fig_13">10b</ref>). We evaluate the correlation between the evolution of the two phenomena over time, and obtain a Pearson product-moment correlation coefÏcient of 0.365 with a p-value of 4.88e-20, which indicate a good correlation (cf. fig. <ref type="figure" target="#fig_14">11</ref>).  Pattern D: innovation. Finally, it is important to note that, in this slow movement of standardisation that we are drawing, innovations also appear. These innovations concern a lot diacritics, some of which are exploding in number like the diaeresis (cf. fig. <ref type="figure" target="#fig_13">10a</ref>): scriptors tend to add them more and more on one of the two hiatus vowel, especially with the sequence ‹ue› (louër or loüer, today louer, eng. "to rent/to praise"). We also note a great hesitation regarding the notation of nasal vowels (cf. fig. <ref type="figure" target="#fig_16">12b</ref>), especially [ã], for which we can use ‹en|m› or ‹an|m› such as aventure vs avanture (today aventure, eng. "adventure").</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and further work</head><p>The spelling of the 17th c. is changing throughout the century, and at the beginning of the 18th century the standardisation process is very advanced (etymological letters, use of diacritical letters, historical use of ‹u› and ‹i›, etc.), as studies on other languages, such as Polish <ref type="bibr" target="#b30">[31]</ref> or English <ref type="bibr" target="#b4">[5]</ref>, have been able to demonstrate.We still observe, however, a certain instability, which concerns more minor hesitations than anything else (notation of nasal vowels, hiatus vowel, etc): although the standardisation process is advanced, it is not yet finished.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 7</head><p>Main spelling change and their dating using change-point detection. Among all the changes in spelling, all those observed seem to follow the traditional shape of the scurve, and no "anomalies" have yet been found as it have been the case elsewhere <ref type="bibr" target="#b30">[31]</ref>: on the contrary, we even think we fond new evidences supporting Kroch's constant rate hypothesis.</p><p>The idea of a slow change, which spreads over a long period <ref type="bibr" target="#b4">[5]</ref>, seems to be confirmed by our analyses (cf. tab. 7). However, the velocity of change varies greatly from one phenomenon to another, with sometimes slow shifts over decades, or sometimes abrupt ruptures whose cause is not entirely clear, and which would be interesting to discover.</p><p>As for the reasons for the change, a lot of work still needs to be done, particularly in trying to find features that could predict the change <ref type="bibr" target="#b31">[32]</ref>. One of them, the identity of the printers, would be interesting to evaluate, unfortunately the data is not always available, particularly for the 18th century, which will pose a problem for the future of this study. Nevertheless, some indications suggest that it would be important to review the hypothesis that sees printing as a vector of change <ref type="bibr" target="#b4">[5]</ref>: the limitations imposed by the type case of printers could for instance be a hindrance to change.</p><p>A more precise modelling of these changes is therefore on the agenda for our future research. Whether it concerns the identification of possible reasons for these changes, their more precise dating (in particular by integrating confidence intervals), or the addition of new data for the 18th century. The improvement of all data extraction and enrichment tools has already begun, and should thus allow the creation of an even larger and more precise corpus.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Data production pipeline.</figDesc><graphic coords="4,89.28,84.16,416.72,89.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Three page examples with zone objects.</figDesc><graphic coords="5,95.93,84.17,127.57,187.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Word error rate for the corpus.</figDesc><graphic coords="7,286.22,440.31,218.69,167.25" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Example of TEI encoding with normalisation.</figDesc><graphic coords="8,99.70,353.54,187.52,187.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Description of the OCRised corpus.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head></head><label></label><figDesc>(a) Substitution of ‹es› by ‹é›. (b) Substitution of ‹as› by ‹â›.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Disappearance of ‹s› as a diacritical letter.</figDesc><graphic coords="10,93.45,403.31,195.86,125.01" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head></head><label></label><figDesc>(a) Accumulation of occurrences of different spellings: two similar (es→é, as→â) and one different (ct→t). Data are scaled to base 100 to be comparable. (b) Theoretical progression of two similar variants over time, which start at different times, but progress at the same speed, according to the constant rate hypothesis.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: The constant rate hypothesis in practice (left) vs in theory (right).</figDesc><graphic coords="11,93.45,167.88,186.06,113.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head></head><label></label><figDesc>(a) Apparition of the contemporary use of ramist letters. (b) Disappearance of the etymological combination ‹ct›.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Computing the change-point of two changes of spelling.</figDesc><graphic coords="11,310.14,457.12,195.86,136.74" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_12"><head></head><label></label><figDesc>(a) Slow decrease of the long s (‹ſ›). (b) Increase of the acute accent.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_13"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: Long s vs acute accent.</figDesc><graphic coords="12,93.45,245.18,195.86,125.23" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_14"><head>Figure 11 :</head><label>11</label><figDesc>Figure 11: Correlation ‹ſ›/acute accent.Pattern C: correlation. L. Biedermann-Pasques proposed as one of the parameters for spelling change the type case available: "the typographical use of the ligature has slowed down, in our opinion, the regular replacement of silent s by an accent"[7, p. 92]. It is indeed faster to compose the word eſtoit with the ligature (e+ſt+o+i+t=4 characters) than without (e+ſ|s+t+o+i+t=5 characters). One could argue that switching to the accented letter also requires only four characters (é+t+o+i+t=4 characters), but if the ligatures are present in number in the printer's type case, the accented characters are less so.Our working hypothesis is as follows: as ligatures are largely composed of a long s (‹ſ›), we should obtain a correlation between the use of this s (cf. fig.10a) and the acute accent (cf. fig.10b). We evaluate the correlation between the evolution of the two phenomena over time, and obtain a Pearson product-moment correlation coefÏcient of 0.365 with a p-value of 4.88e-20, which indicate a good correlation (cf. fig.11).</figDesc><graphic coords="12,310.14,433.57,195.86,120.58" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_15"><head></head><label></label><figDesc>(a) Increase of the diaeresis. (b) Increase of the confusion ‹en|m›/‹an|m›.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_16"><head>Figure 12 :</head><label>12</label><figDesc>Figure 12: Apparition of new phenomena.</figDesc><graphic coords="13,93.45,84.17,195.86,124.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Training and evaluation data for OCR.</figDesc><table><row><cell>Dataset</cell><cell>Century</cell><cell cols="2">Language Books</cell><cell>Lines</cell></row><row><cell>Train/Dev</cell><cell>16</cell><cell>French</cell><cell cols="2">7 17817</cell></row><row><cell>Train/Dev</cell><cell>17</cell><cell>French</cell><cell cols="2">19 20267</cell></row><row><cell>Train/Dev</cell><cell>16</cell><cell>Latin</cell><cell cols="2">12 10648</cell></row><row><cell>Test</cell><cell>16</cell><cell>French</cell><cell>10</cell></row><row><cell>Test</cell><cell>17</cell><cell>French</cell><cell>10</cell></row><row><cell>Test</cell><cell>18</cell><cell>French</cell><cell>10</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Results of the two YOLO models on modern plays.</figDesc><table><row><cell>Theatrical corpus</cell><cell>Theatrical and LADaS corpus</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Character and word error rates for both models.</figDesc><table><row><cell>Models</cell><cell cols="4">Characters Errors CER WER</cell></row><row><cell>No fine-tuning</cell><cell>38394</cell><cell>924</cell><cell>2.41</cell><cell>11.06</cell></row><row><cell>Fine-tuning</cell><cell>38394</cell><cell>649</cell><cell>1.69</cell><cell>8.34</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5</head><label>5</label><figDesc>Character and word error rates for both models.</figDesc><table><row><cell cols="2">% errors CER (part)</cell><cell cols="2">Errors Correct</cell><cell>Generated</cell></row><row><cell>8.78% 7.55% 6.62%</cell><cell>0.14% 0.13% 0.11%</cell><cell>57 49 43</cell><cell>SPACE ' s</cell><cell>' ſ</cell></row><row><cell>3.23%</cell><cell>0.05%</cell><cell>21</cell><cell>-</cell><cell>Ø</cell></row><row><cell>2.77% 2.62% 2.16%</cell><cell>0.05% 0.04% 0.04%</cell><cell>18 17 14</cell><cell>ſ ' Ø</cell><cell>f ' SPACE</cell></row><row><cell>2%</cell><cell>0.03%</cell><cell>13</cell><cell>1</cell><cell>I</cell></row><row><cell>2%</cell><cell>0.03%</cell><cell>13</cell><cell>.</cell><cell>Ø</cell></row><row><cell>1.85%</cell><cell>0.03%</cell><cell>12</cell><cell cols="2">◌́Ø</cell></row><row><cell>1.69%</cell><cell>0.03%</cell><cell>11</cell><cell>,</cell><cell>.</cell></row><row><cell>1.54%</cell><cell>0.03%</cell><cell>10</cell><cell>0</cell><cell>o</cell></row><row><cell>1.54%</cell><cell>0.03%</cell><cell>10</cell><cell>t</cell><cell>r</cell></row><row><cell>1.54%</cell><cell>0.03%</cell><cell>10</cell><cell cols="2">◌̂Ø</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head></head><label></label><figDesc>.</figDesc><table><row><cell>&lt;sp&gt;</cell></row><row><cell>&lt;ab&gt;</cell></row><row><cell>&lt;seg&gt;</cell></row><row><cell>&lt;orig&gt;SGANARELLE.&lt;/orig&gt;</cell></row><row><cell>&lt;reg&gt;SGANARELLE.&lt;/reg&gt;</cell></row><row><cell>&lt;/seg&gt;</cell></row><row><cell>&lt;seg&gt;</cell></row><row><cell>&lt;orig&gt;Promettez-moy donc, Seigneur Geronimo, de me parler avec toute ſorte de franchiſe.&lt;/orig&gt;</cell></row><row><cell>&lt;reg&gt;Promettez-moi donc, Seigneur Geronimo, de me parler avec toute sorte de franchise.&lt;/reg&gt;</cell></row><row><cell>&lt;/seg&gt;</cell></row><row><cell>&lt;/ab&gt;</cell></row><row><cell>&lt;/sp&gt;</cell></row><row><cell>&lt;sp&gt;</cell></row><row><cell>&lt;ab&gt;</cell></row><row><cell>&lt;seg&gt;</cell></row><row><cell>&lt;orig&gt;GERONIMO.&lt;/orig&gt;</cell></row><row><cell>&lt;reg&gt;GERONIMO.&lt;/reg&gt;</cell></row><row><cell>&lt;/seg&gt;</cell></row><row><cell>&lt;seg&gt;</cell></row><row><cell>&lt;orig&gt;Ie vous le promets.&lt;/orig&gt;</cell></row><row><cell>&lt;reg&gt;Je vous le promets.&lt;/reg&gt;</cell></row><row><cell>&lt;/seg&gt;</cell></row><row><cell>&lt;/ab&gt;</cell></row><row><cell>&lt;/sp&gt;</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 6 :</head><label>6</label><figDesc>Prefix similarity matrix for the original and normalised version of ‹Apoſtre›.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">In this article, we will distinguish between "spelling systems" and "orthography". To simplify, the first are coherent and competing logics of spelling words (as for the manuscripts with dialectal traits of the Middle Ages), the second is a strict norm which is recognised as a standard.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">The cases that may pose a problem (e.g. lui-mesme → luimesme, eng. "himself") represent less than 0.1% of the corrected hyphenations.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://huggingface.co/rbawden/modern_french_normalisation.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">Some subtleties are brought to this adjustment, such as et and &amp; which are considered equivalent.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">"When one grammatical option replaces another with which it is in competition across a set of linguistic contexts, the rate of replacement, properly measured, is the same in all of them. "<ref type="bibr" target="#b37">[38]</ref> </note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>Merci (dans l'ordre alphabétique) à Jean Barré, Alexandre Bartz, Rachel Bawden, Philippe Gambette et Benoît Sagot pour leur aide. À nos relecteur •trices aussi pour leurs excellentes remarques.</p></div>
			</div>


			<div type="availability">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Data and code</head><p>All the data and code is available on our GitHub repo: https://github.com/DEFI-COLaF/Theat reLFSV2.</p></div>
			</div>


			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Funding</head><p>This paper has been funded by the DEFI Inria COLaF Corpus et Outils pour les Langues de France and the FNS-Spark project N°220833.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Cahiers de remarques sur l&apos;orthographe françoise pour estre examinez par chacun de Messieurs de l</title>
		<ptr target="https://books.google.ch/books?id=u5Y5AQAAIAAJ" />
	</analytic>
	<monogr>
		<title level="m">&apos;Academie, avec des observations de Bossuet, Pellisson, etc</title>
				<meeting><address><addrLine>Paris</addrLine></address></meeting>
		<imprint>
			<publisher>Jules Gay</publisher>
			<date type="published" when="1863">1863</date>
		</imprint>
	</monogr>
	<note>Académie française</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Changement linguistique et périodisation du français (pré)classique: deux études de cas à partir des corpus du RCFC</title>
		<author>
			<persName><forename type="first">A</forename><surname>Amatuzzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ayres-Bennett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gerstenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Schøsler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Skupien-Dekens</surname></persName>
		</author>
		<idno type="DOI">10.1017/s0959269520000058</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of French Language Studies</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="301" to="326" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">L&apos;Ortographie française au temps de la Réforme</title>
		<author>
			<persName><forename type="first">S</forename><surname>Baddeley</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1993">1993</date>
			<publisher>Droz</publisher>
			<pubPlace>Genève</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Estimating Multiple Breaks One at a Time</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bai</surname></persName>
		</author>
		<idno type="DOI">10.1017/s0266466600005831</idno>
	</analytic>
	<monogr>
		<title level="j">Econometric Theory</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="315" to="352" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Early Modern Studies: the Digital Turn</title>
		<author>
			<persName><forename type="first">A</forename><surname>Basu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Ill shapen sounds, and false orthography&quot;&apos;: A Computational Approach to Early English Orthographic Variation</title>
				<editor>
			<persName><forename type="first">L</forename><surname>Estill</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><forename type="middle">K</forename><surname>Jakacki</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Ullyot</surname></persName>
		</editor>
		<meeting><address><addrLine>Toronto</addrLine></address></meeting>
		<imprint>
			<publisher>Iter Press</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="167" to="200" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Automatic Normalisation of Early Modern French</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bawden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Poinhos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kogkitsidou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gambette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<ptr target="https://inria.hal.science/hal-03540226" />
	</analytic>
	<monogr>
		<title level="m">LREC 2022 -13th Language Resources and Evaluation Conference. European Language Resources Association</title>
				<meeting><address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="3354" to="3366" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Biedermann-Pasques</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783110938593</idno>
		<title level="m">Les Grands Courants orthographiques au XVIIe siècle et la formation de l&apos;orthographe moderne, Impacts matériels, interférences phoniques, théories et pratiques</title>
				<meeting><address><addrLine>Tübingen</addrLine></address></meeting>
		<imprint>
			<publisher>Max Niemeyer Verlag</publisher>
			<date type="published" when="1992">1992</date>
			<biblScope unit="page" from="1606" to="1736" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Normalization of Historical Texts with Neural Network Models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Bollmann</surname></persName>
		</author>
		<ptr target="https://www.linguistics.rub.de/forschung/arbeitsberichte/22.pdf" />
		<imprint>
			<date type="published" when="2018">2018</date>
			<pubPlace>Bochum</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Ruhr-Universität Bochum</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Manuscripts in Time and Space: Experiments in Scriptometrics on an Old French Corpus</title>
		<author>
			<persName><forename type="first">J.-B</forename><surname>Camps</surname></persName>
		</author>
		<ptr target="https://hal.science/hal-01695899" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Second Workshop on Corpus-Based Research in the Humanities CRH-2</title>
				<editor>
			<persName><forename type="first">A</forename><forename type="middle">U</forename><surname>Frank</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Ivanovic</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Mambrini</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Passarotti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Sporleder</surname></persName>
		</editor>
		<meeting>the Second Workshop on Corpus-Based Research in the Humanities CRH-2<address><addrLine>Vienna, Austria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="55" to="64" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Histoire de l&apos;orthographe française</title>
		<author>
			<persName><forename type="first">N</forename><surname>Catach</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>Honoré Champion</publisher>
			<pubPlace>Paris</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">L&apos;orthographe plantinienne</title>
		<author>
			<persName><forename type="first">N</forename><surname>Catach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Golfand</surname></persName>
		</author>
		<ptr target="https://www.dbnl.org/tekst/%5C%5Fgul005197301%5C%5F01/%5C%5Fgul005197301%5C%5F01%5C%5F0003.php" />
	</analytic>
	<monogr>
		<title level="j">De Gulden Passer</title>
		<imprint>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="page" from="19" to="69" />
			<date type="published" when="1973">1973</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine</title>
		<author>
			<persName><forename type="first">T</forename><surname>Clérice</surname></persName>
		</author>
		<idno type="DOI">10.46298/jdmdh.9806</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Data Mining &amp; Digital Humanities</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Layout Analysis Dataset with SegmOnto</title>
		<author>
			<persName><forename type="first">T</forename><surname>Clérice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Janès</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Scheithauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bénière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Romary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<ptr target="https://inria.hal.science/hal-04513725" />
	</analytic>
	<monogr>
		<title level="m">DH2024 -Annual conference of the Alliance of Digital Humanities Organizations. Alliance of Digital Humanities Organizations (ADHO)</title>
				<meeting><address><addrLine>Washington, D.C., United States</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">Corneille</forename><surname>Le Théâtre De</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Corneille</surname></persName>
		</author>
		<ptr target="https://gallica.bnf.fr/ark:/12148/bpt6k71442p" />
		<imprint>
			<date type="published" when="1663">1663</date>
			<publisher>G. de Luyne</publisher>
			<pubPlace>Paris</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Dialectes et scriptae à l&apos;époque de l&apos;ancien français</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dees</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Revue de Linguistique Romane</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page" from="87" to="117" />
			<date type="published" when="1985">1985</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Les éditions de textes du XVIIe siècle</title>
		<author>
			<persName><forename type="first">F</forename><surname>Duval</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783110302608-017</idno>
	</analytic>
	<monogr>
		<title level="m">Manuel de la philologie de l&apos;édition</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Trotter</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin; Boston</addrLine></address></meeting>
		<imprint>
			<publisher>De Gruyter</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="369" to="394" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Automatische Normalisierung -Vorarbeit zur Lemmatisierung eines diplomatischen altisländischen Textes</title>
		<author>
			<persName><forename type="first">H</forename><surname>Fix</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783111438788.92</idno>
	</analytic>
	<monogr>
		<title level="m">Teil 3 Beiträge zum dritten Symposion Tübingen 17</title>
				<meeting><address><addrLine>Berlin/Boston</addrLine></address></meeting>
		<imprint>
			<publisher>Max Niemeyer Verlag</publisher>
			<date type="published" when="1977-02">Februar 1977. 1980</date>
			<biblScope unit="page" from="92" to="100" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Phonological Rule Change: The Constant Rate Effect</title>
		<author>
			<persName><forename type="first">J</forename><surname>Fruehwald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gress-Wright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wallenberg</surname></persName>
		</author>
		<ptr target="https://www.research.ed.ac.uk/files/14416788/Fruewald%5C%5FGress%5C%5FWright%5C%5FWallenberg%5C%5FPhonological%5C%5FRule%5C%5FChange.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th Annual Meeting of the North East Linguistic Society</title>
				<meeting>the 40th Annual Meeting of the North East Linguistic Society<address><addrLine>Cambridge, MA</addrLine></address></meeting>
		<imprint>
			<publisher>GLSA Publications</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="219" to="230" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.11526150</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">Fondue-fr-print</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.11526040</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">Fondue-fr-print</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">FreEM-corpora/FreEMnorm: FreEM norm Parallel corpus</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.5865428</idno>
	</analytic>
	<monogr>
		<title level="j">Version</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">0</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Pourquoi moderniser l&apos;orthographe? Principes d&apos;ecdotique et littérature du XVIIe siècle</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<idno>doi: 99.125005/vox201410027</idno>
	</analytic>
	<monogr>
		<title level="j">Vox Romanica</title>
		<imprint>
			<biblScope unit="volume">73</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="27" to="42" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Traduction automatique pour la normalisation du français du XVIIe siècle</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Barrault</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.jeptalnrecital-taln.20" />
	</analytic>
	<monogr>
		<title level="m">Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL</title>
		<title level="s">Traitement Automatique des Langues Naturelles</title>
		<meeting><address><addrLine>Nancy, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="213" to="222" />
		</imprint>
	</monogr>
	<note>Actes de la 6e conférence conjointe Journées d&apos;Études sur la Parole (JEP, 33e édition). 22e édition</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Le changement linguistique au XVIIe s. : nouvelles approches scriptométriques</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bawden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gambette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Poinhos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kogkitsidou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<idno type="DOI">10.1051/shsconf/202213802006</idno>
	</analytic>
	<monogr>
		<title level="m">CMLF 2022 -8e Congrès Mondial de Linguistique Française</title>
				<meeting><address><addrLine>Orléans, France</addrLine></address></meeting>
		<imprint>
			<publisher>EDP Sciences</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">138</biblScope>
			<biblScope unit="page" from="1" to="14" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">SegmOnto: common vocabulary and practices for analysing the layout of manuscripts (and more)</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-B</forename><surname>Camps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pinche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jahan</surname></persName>
		</author>
		<ptr target="https://hal.science/hal-03336528" />
	</analytic>
	<monogr>
		<title level="m">1st International Workshop on Computational Paleography (IWCPICDAR 2021)</title>
				<meeting><address><addrLine>Lausanne, Switzerland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Reconnaissance des écritures dans les imprimés</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Clérice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jacsont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Leblanc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jeannot-Tirole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Solfrini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dolto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Goy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Luján</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zaglio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perregaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Janès</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bawden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Nédey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chagué</surname></persName>
		</author>
		<ptr target="https://hal.science/hal-04557457" />
	</analytic>
	<monogr>
		<title level="m">Humanistica 2024. Association francophone des humanités numériques</title>
				<meeting><address><addrLine>Meknès, Morocco</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">-TEST-longS</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Clérice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Janès</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.11526316</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">FONDUE-MLT-PRINT</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Ancien ou moderne ? Pistes computationnelles pour l&apos;analyse graphématique des textes écrits au XVIIe siècle</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gambette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bawden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<idno type="DOI">10.4000/linx.9346</idno>
	</analytic>
	<monogr>
		<title level="j">Linx</title>
		<imprint>
			<biblScope unit="volume">85</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jeannot-Tirole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Goy</surname></persName>
		</author>
		<author>
			<persName><surname>Fondue</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.11526160</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">16</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Dialektometrie</title>
		<author>
			<persName><forename type="first">H</forename><surname>Goebl</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783110155785</idno>
	</analytic>
	<monogr>
		<title level="m">Quantitative Linguistik/Quantitative Linguistics. Ein internationales Handbuch/An International Handbook</title>
				<meeting><address><addrLine>Berlin; New York</addrLine></address></meeting>
		<imprint>
			<publisher>De Gruyter Mouton</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="498" to="531" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Modelling the Dynamics of Language Change: Logistic Regression, Piotrowski&apos;s Law, and a Handful of Examples in Polish</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Górski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eder</surname></persName>
		</author>
		<idno type="DOI">10.1080/09296174.2022.2151208</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Quantitative Linguistics</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="125" to="151" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Modeling the Decline in English Passivization</title>
		<author>
			<persName><forename type="first">L</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Smith</surname></persName>
		</author>
		<idno type="DOI">10.7275/r5zc812c</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Society for Computation in Linguistics (SCiL) 2018</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Jarosz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>O'connor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pater</surname></persName>
		</editor>
		<meeting>the Society for Computation in Linguistics (SCiL) 2018</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="34" to="43" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Towards automatic TEI encoding via layout analysis</title>
		<author>
			<persName><forename type="first">J</forename><surname>Janès</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pinche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jahan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Fantastic future 21, 3rd International Conference on Artificial Intelligence for Librairies, Archives and Museums. AI for Libraries</title>
				<meeting><address><addrLine>Archives, and Museums (AI4LAM</addrLine></address></meeting>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">Standard-based Lexical Models for Automatically Structured Dictionaries</title>
		<author>
			<persName><forename type="first">M</forename><surname>Khemakhem</surname></persName>
		</author>
		<ptr target="https://theses.hal.science/tel-03274454" />
		<imprint>
			<date type="published" when="2020">2020</date>
			<pubPlace>Paris</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Université de Paris</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields</title>
		<author>
			<persName><forename type="first">M</forename><surname>Khemakhem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Foppiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Romary</surname></persName>
		</author>
		<ptr target="https://hal.science/hal-01508868" />
	</analytic>
	<monogr>
		<title level="m">Electronic lexicography, eLex 2017</title>
				<meeting><address><addrLine>Leiden, The Netherlands</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Kraken -an Universal Text Recognizer for the Humanities</title>
		<author>
			<persName><forename type="first">B</forename><surname>Kiessling</surname></persName>
		</author>
		<idno type="DOI">10.34894/z9g2ex</idno>
	</analytic>
	<monogr>
		<title level="m">Digital Humanities Conference 2019 -DH2019</title>
				<meeting><address><addrLine>Utrecht, The Netherlands; ADHO)</addrLine></address></meeting>
		<imprint>
			<publisher>Alliance of Digital Humanities Organizations</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Detecting Linguistic Change Based on Word Co-occurrence Patterns</title>
		<author>
			<persName><forename type="first">C</forename><surname>Klaussner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Vogel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhattacharya</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-1992/paper%5C%5F4.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th International Workshop on Computational History</title>
				<meeting>the 4th International Workshop on Computational History<address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="14" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Reflexes of grammar in patterns of language change</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Kroch</surname></persName>
		</author>
		<idno type="DOI">10.1017/s0954394500000168</idno>
	</analytic>
	<monogr>
		<title level="j">Language Variation and Change</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="199" to="244" />
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Binary codes capable of correcting deletions, insertions, and reversals</title>
		<author>
			<persName><forename type="first">V</forename><surname>Levenshtein</surname></persName>
		</author>
		<ptr target="https://www.mathnet.ru/eng/dan31411" />
	</analytic>
	<monogr>
		<title level="j">Soviet physics doklady</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="707" to="710" />
			<date type="published" when="1966">1966</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Digital Classical Philology Ancient Greek and Latin in the Digital Revolution</title>
		<author>
			<persName><forename type="first">L</forename><surname>Muellner</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783110599572-002</idno>
	</analytic>
	<monogr>
		<title level="m">Chap. The Free First Thousand Years of Greek</title>
				<meeting><address><addrLine>Berlin/Boston</addrLine></address></meeting>
		<imprint>
			<publisher>De Gruyter Saur</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="7" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches</title>
		<author>
			<persName><forename type="first">S</forename><surname>Najem-Meyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Romanello</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2212.13924</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Computational Humanities Research Conference 2022</title>
				<meeting>the Computational Humanities Research Conference 2022<address><addrLine>Antwerp, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="36" to="54" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">A general method applicable to the search for similarities in the amino acid sequence of two proteins</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Needleman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Wunsch</surname></persName>
		</author>
		<idno type="DOI">10.1016/0022-2836(70)90057-4</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Molecular Biology</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="443" to="453" />
			<date type="published" when="1970">1970</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<analytic>
		<title level="a" type="main">Measuring dialect differences</title>
		<author>
			<persName><forename type="first">J</forename><surname>Nerbonne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Heeringa</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783110220278.550</idno>
	</analytic>
	<monogr>
		<title level="m">Theories and Methods: An International Handbook of Linguistic Variation</title>
				<imprint>
			<publisher>De Gruyter Mouton</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="550" to="567" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">Netherlands Historical Data Archive, Nijmegen Institute for Cognition &amp; Information. Optical Character Recognition in the Historical Discipline</title>
	</analytic>
	<monogr>
		<title level="m">Proceedings of an International Workshop</title>
				<meeting>an International Workshop<address><addrLine>St. Katharinen</addrLine></address></meeting>
		<imprint>
			<publisher>Halbgraue Reihe zur Historischen Fachinformatik</publisher>
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<analytic>
		<title level="a" type="main">Between automatic and manual encoding</title>
		<author>
			<persName><forename type="first">A</forename><surname>Pinche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Christensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.7092214</idno>
		<ptr target="https://hal.science/hal-03780302" />
	</analytic>
	<monogr>
		<title level="m">TEI 2022 conference : Text as data</title>
				<meeting><address><addrLine>Newcastle, United Kingdom</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b45">
	<monogr>
		<title level="m" type="main">ABA (Alignment-Based Approach</title>
		<author>
			<persName><forename type="first">J</forename><surname>Poinhos</surname></persName>
		</author>
		<ptr target="https://github.com/johnseazer/aba" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>Version 1</note>
</biblStruct>

<biblStruct xml:id="b46">
	<monogr>
		<title level="m" type="main">Real-Time Flying Object Detection with YOLOv8</title>
		<author>
			<persName><forename type="first">D</forename><surname>Reis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kupec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Daoudi</surname></persName>
		</author>
		<idno type="DOI">10.48550/arxiv.2305.09972</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b47">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Remacle</surname></persName>
		</author>
		<ptr target="http://books.openedition.org/pulg/338" />
		<title level="m">Le Problème de l&apos;ancien wallon</title>
				<meeting><address><addrLine>Liège</addrLine></address></meeting>
		<imprint>
			<publisher>Presses universitaires de Liège</publisher>
			<date type="published" when="1948">1948</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b48">
	<analytic>
		<title level="a" type="main">Large-Scale Optical Character Recognition of Ancient Greek</title>
		<author>
			<persName><forename type="first">B</forename><surname>Robertson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Boschetti</surname></persName>
		</author>
		<ptr target="https://muse.jhu.edu/article/679181" />
	</analytic>
	<monogr>
		<title level="j">Mouseion: Journal of the Classical Association of Canada</title>
		<imprint>
			<biblScope unit="volume">58</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="341" to="359" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b49">
	<analytic>
		<title level="a" type="main">GROBID -Information Extraction from Scientific Publications</title>
		<author>
			<persName><forename type="first">L</forename><surname>Romary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lopez</surname></persName>
		</author>
		<ptr target="https://inria.hal.science/hal-01673305" />
	</analytic>
	<monogr>
		<title level="j">ERCIM News. Scientific Data Sharing and Re-use</title>
		<imprint>
			<biblScope unit="volume">100</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b50">
	<analytic>
		<title level="a" type="main">Combining OCR Outputs for Logical Document Structure Markup. Technical Background to the ACL 2012 Contributed Task</title>
		<author>
			<persName><forename type="first">U</forename><surname>Schäfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Weitz</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/W12-3212" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries</title>
				<editor>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Banchs</surname></persName>
		</editor>
		<meeting>the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries<address><addrLine>Jeju Island, Korea</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="104" to="109" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b51">
	<monogr>
		<title level="m" type="main">Which TEI representation for the output of automatic transcriptions and their metadata? An illustrated proposition</title>
		<author>
			<persName><forename type="first">H</forename><surname>Scheithauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chagué</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Romary</surname></persName>
		</author>
		<ptr target="https://inria.hal.science/hal-04001303" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b52">
	<analytic>
		<title level="a" type="main">La dialectométrie dans l&apos;Atlas linguistique de la Gascogne</title>
		<author>
			<persName><forename type="first">J</forename><surname>Séguy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Revue de linguistique romane</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="1" to="24" />
			<date type="published" when="1973">1973</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b53">
	<analytic>
		<title level="a" type="main">The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">Tjong</forename><surname>Kim Sang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bollmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Boschker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Casacuberta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dietz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dipper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Domingo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van Der Goot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Koppen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ljubešić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Östling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pettersson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Scherrer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schraagen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sevens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tiedemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Vanallemeersch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zervanou</surname></persName>
		</author>
		<ptr target="https://clinjournal.org/clinj/article/view/68/61" />
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics in the Netherlands Journal</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="53" to="64" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b54">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Vachon</surname></persName>
		</author>
		<title level="m">Le Changement linguistique au XVIe siècle: une étude basée sur des textes littéraires français</title>
				<meeting><address><addrLine>Strasbourg</addrLine></address></meeting>
		<imprint>
			<publisher>Éditions de linguistique et de philologie</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b55">
	<monogr>
		<title level="m" type="main">Remarques sur la langue françoise</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">F D</forename><surname>Vaugelas</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>Droz</publisher>
			<pubPlace>Geneva</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b56">
	<monogr>
		<title level="m" type="main">Remarques sur la langue françoise, utiles à ceux qui veulent bien parler et bien escrire</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">F D</forename><surname>Vaugelas</surname></persName>
		</author>
		<editor>Vve J. Camusat et P. Le Petit</editor>
		<imprint>
			<date type="published" when="1647">1647</date>
			<pubPlace>Paris</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b57">
	<analytic>
		<title level="a" type="main">Detection of the disorder in multidimensional random-processes</title>
		<author>
			<persName><forename type="first">L</forename><surname>Vostrikova</surname></persName>
		</author>
		<ptr target="http://mi.mathnet.ru/dan44582" />
	</analytic>
	<monogr>
		<title level="j">Doklady Akademii Nauk SSSR</title>
		<imprint>
			<biblScope unit="volume">259</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="270" to="274" />
			<date type="published" when="1981">1981</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b58">
	<analytic>
		<title level="a" type="main">An improved test of the constant rate hypothesis: late Modern American English possessive have</title>
		<author>
			<persName><forename type="first">R</forename><surname>Zimmermann</surname></persName>
		</author>
		<idno type="DOI">10.1515/cllt-2021-0038</idno>
	</analytic>
	<monogr>
		<title level="j">Corpus Linguistics and Linguistic Theory</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="323" to="352" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
