<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Antwerp, Belgium
£ thibault.clerice@chartes.psl(.Te.uClérice)
ç ”https://github.com/ponteineptiq u(Te.”Clérice)
ȉ</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Ground-truth Free Evaluation of HTR on Old French and Latin Medieval Literary Manuscripts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>ThibaultClérice</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre Jean Mabillon, École nationale des Chartes, &amp; INRIA</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>As more and more projects openly release ground truth for handwritten text recognition (HTR), we expect the quality of automatic transcription to improve on unseen data. Getting models robust to scribal and material changes is a necessary step for speci昀椀c data mining tasks. However, evaluation of HTR results requires ground truth to compare prediction statistically. In the context of modern languages, successful attempts to evaluate quality have been done using lexical features or n-grams.This, however, proves di昀케cult in the context of spelling variation that both Old French and Latin have, even more so in the context of sometime heavily abbreviated manuscripts. We propose a new method based on deep learning where we attempt to categorize each line error rate into four error rat0e&lt;ranges ( 10% &lt; 25% &lt; 50% &lt; 100%) using three di昀erent encoder (GRU with Attention, BiLSTM, TextCNN). To train these models, we propose a new dataset engineering approach using early stopped model, as an alternative to rule-based fake predictions. Our model largely outperforms the n-gram approach. We also provide an example application to qualitatively analyse our classi昀椀er, using classi昀椀cation on new prediction on a sample of 1,800 manuscripts ranging from tthhece9ntury to the t1h5.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;HTR</kwd>
        <kwd>OCR Quality Evaluation</kwd>
        <kwd>Historical languages</kwd>
        <kwd>Spelling Variation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In order to evaluate the consistency of a model on an out-of-domain document such as
another manuscript or a new hand, researchers usually have to create new ground-truth
transcriptions to which the model predictions are compared. In this context, it seems out of reach
to leverage with con昀椀dence the amount of data that remains dormant in the open data vaults of
libraries such as the Bibliothèque Nationale de France (BnF) for statistical studies, making the
number of 50,149 IIIF manifests catalogued by Biblissima’s port1a8l] [promising while
leaving a bitter taste of unavailability: it would require the manual transcription of at least a few
hundred lines for each manuscri3p. t</p>
      <p>To address this, we can approach this issue not as an HTR one but rather as a Natural
Language Processing (NLP) task, evaluating the apparent “correctness” of the acquired text rather
than its direct relationship with the digital picture of the manuscript. Evaluating new
transcriptions without ground truth has been done, but mainly for OCR and non-historical documents.
For modern languages, where spelling is 昀椀xed and grammar stable, a dictionary approach in
combination with some n-gram statistics have provided a solid framework for establishing the
probability that a document is well transcribed. However, for languages such as old French or
medieval Latin, both evolving over the span of few centuries, the issue is di昀erent. For
example, Camps, Clérice, Duval, Ing, Kanaoka, and Pinche3][ has catalogued 36 forms of the word
cheval (horse) in the largest available Old French corpus. A Dictionary approach would already
prove to be complex, but to make things worse, the abbreviated nature of medieval texts would
require taking into account several abbreviation systems, making it unsustainable.</p>
      <p>
        HTR is most o昀琀en, in the humanities, not a task in itself but rather a preliminary step for
corpus building (such as digital editions) or corpus analysis. In this context, HTR quality can be
of primordial importance, depending on the task at hand. While E16d]ehra[s suggested that
good classi昀椀cation in stylometry is still possible for corpora with noise levels as high as 20%,
even for the smallest feature sets, Camps, Vidal-Gorène, and Ver4n]edtem[onstrated that, for
HTR, noise leads to accumulating errors throughout its post-processing (word segmentation,
abbreviation resolution, lemmatization and POS-tagging), making the post-processed textual
features less reliable than original character n-grams. For some other tasks, such as in corpus
linguistics (e.g. semantic dri昀琀 studies), the study of abbreviation systems such as the one
performed by Honkapohja and Suomela2[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or even the training of large language models such
as MacBerth 2[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] might require a higher level of precision.
      </p>
      <p>As such, evaluating the textual quality of an automatic transcription “from afar” is extremely
useful, as it provides solid grounds to either exclude documents from analysis or help guide
ground-truth creation campaigns in well-funded projects. For cultural heritage institutions, it
can also provide a welcome indicator for the document that could be ingested by a research
engine. We can even imagine situations where these institutions transcribe only a sample of
each element of their collection, and only fully and automatically transcribe the ones that reach
a certain level of quality, thus saving energy and ultimately budget on the computation front.</p>
      <p>
        From a human reader’s perspective, Springmann, Fink, and Schu3l9z][and Holley 2[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] have
set a limit of a CER below 10% for a good OCR quality. Recently, Cu1p5e]rh[as proposed the
3Five million lines would be required for the mentioned set of manifests of the BnF with only 100 lines per
manuscript. As a comparison point, the accumulated number of lines of manuscript dataset, regardless of the
script or language, publicly available on the HTR-United cat7a]loisg1[64,418 at the end of August 2022.
evaluation of OCR quality for heritage text collections, speci昀椀cally Dutch newspapers from the
17thcentury, to distinguish good OCR from bad, using the aforementioned threshold. They
provide a toolQ,uPipe, which o昀ers binary classi昀椀cation capacities, putting text in either the range
[0; 10]% of CER or in the remaining range of “bad” OCR. In 2022 as well, Ströbel, Clematide,
Volk, Schwitter, Hodel, and Schoc4h0][ addressed this issue regarding HTR of cultural
heritage documents, speci昀椀cally from the 1th6century. They provide a strong argument for using
lexical features and (pseudo-)perplexity scores for HTR quality estimation, with the speci昀椀c
limitation that the texts they studied, 16th-century Latin correspondence, does not provide as
much variation as older languages such as historical German. We also note that
correspondence may be less abbreviated, and that this dataset spans a very short period. In parallel to
these, Clausner, Pletschacher, and Antonacopou8lo]sap[proached the problem from a global
perspective, from segmentation to OCR, and proposed supervised classi昀椀cation methods.
      </p>
      <p>In this paper, we address this issue as a supervised classi昀椀cation task, based on a dataset
of around 50,000 lines of ground truth spanning from the 9th through to the 15th century.
Following the conclusion of Cupe1r5][, we augment the number of categories we want to 昀椀nd:
we distinguishGood ([0, 10)%), Acceptable ([10, 25)%), Bad ([25, 50)%), and Very Bad (≥ 50%)
rates of OCR. This provides a more 昀椀ne-grained evaluation of the transcription and allows for
guided transcription campaigns, by addressing either the low-hanging frAuccitepst(able) or
the rotten ones. We evaluate three kinds of basic architectures (GRU with attention, BiLSTM
and TextCNN) on line classi昀椀cation using real-life “bad” transcriptions and precomputed CER
scores.</p>
      <p>The resulting models have shown promising results, with quality levels VsuercyhBaasd and
Good being well recognized. In order to evaluate the models and showcase their usefulness, we
also provide an example of a real-life classi昀椀cation application, where 1800 manuscripts were
randomly selected from the BnF and classi昀椀ed by our best model.</p>
      <p>In summary, the contributions of this paper are:
1. a new approach for HTR evaluation of historical languages with variable spellings;
2. a new method to produce ground truth for OCR evaluation that does not rely on
arti昀椀cially and manually tuned generation;
3. an initial evaluation of the output and a quick glance at the state of HTR for Old French
and Medieval Latin over six centuries.</p>
      <p>The remainder of this paper is organised as follows. We start by addressing the background
in Section 2, speci昀椀cally regarding the speci昀椀cs of Old French and Medieval Latin and the idea
of readability. In Section 3, we describe the HTR datasets we used and their particularities.
In Section 4, we describe the architecture of the models, their feature engineering and the
process behind the generation of bad predictions. In Section 5, we describe the set-up of our
model selection and evaluation. Finally, in Section 6, we analyse the result both on the dataset
producedad hoc (described in Sections 3 and 4), but also on completely unseen documents from
the BnF, to showcase the capacities of such models.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>
        Handwritten Text Recognition, a sibling or sub-task of Optical Character Recognition, aims
at recognising text from digitised manuscripts. In the last 昀椀ve years, the digital humanities
landscape has seen a surge in HTR engines, as well as transcription interfaces that connect
and work well with these engines, from the dominant Transkri2b3u]st[o the open-source
pair of eScriptorium26[] and Kraken [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. To be able to recognize text, users have to provide
models, which are themselves the result of supervised training on ground truth data (human
provided transcriptions).
      </p>
      <p>
        Printed books have been, over the last few decades, the focus in terms of remediation, from
their analogue form to a digitized picture and 昀椀nally to a machine-readable (and human
searchable) text. With the advances in HTR over the last 昀椀ve years, the focus can now shi昀琀 or be
shared with materials that have, for the most part, remained inaccessible from a digital point of
view, except for pictures. Latin manuscripts are present during the whole period of manuscript
production in western Europe. Literary Old French manuscripts exist fromthtchenet1u2ry
onward, with only a hundred known surviving manuscripts in tthhec1e2ntury 5[]. Over the span
of these seven centuries, multiple forms of handwritten scripts have existed, for both French
and Latin. As an example, the 2016 ICHFRCompetition on the Classi昀椀cation of Medieval
Handwritings in Latin Script [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] provided ground truth for the classi昀椀cation of 12 main families,
of which at least six are represented in our datasets. This diversity makes training models for
HTR quite complex but also a reachable goal, as they tend, speci昀椀cally for literary manuscripts,
to be more readable and stable between di昀erent hands.
      </p>
      <p>Medieval French and Latin present both dialectal and scriptural variation in synchrony on
top of diachronic evolution. Old French’s syntax varies chronologically and geographically.
The spelling is simply variable. While Latin shows some level of variation, it di昀ers from Old
French mostly in its higher rate of abbreviation. These observations are limited to the context
of the datasets at hand, which are literary works (including scholastic, theological and medical
works). The Old French CREMMA Medieval datase3t4][ has 0.97% of horizontal tildes and
0.16% of vertical ones, which are markers used in the dataset guidelines to indicate various
similar abbreviation diacritic3s5][. Using the same guidelines, the CREMMA Medieval Latin
dataset shows a rate of 5.63% and 1.52% for the same characters. This di昀erence could be due
to the nature of the transcribed texts.</p>
      <p>
        The question of abbreviation and the speci昀椀city of medieval literary manuscripts has
provoked many discussions in terms of how to transcribe documents, from a completely
“diplomatic” approach with variants of letters to “semi-diplomatic” approaches. In the last year, three
authors have provided guidance or thoughts around guidelines for transcriptions:3P5i]nche [
focusing on Old French, Schoen and Sarett3o8][ on Middle English, and Guéville and
Wrisley [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] on Latin. The CREMMA guidelines have been used by 5 other datasets for a total
of 1.15 millions of characters over 昀椀琀y manuscripts, which make them the most diverse and
comprehensive ones for HTR of medieval manuscripts in Latin and Old French.
      </p>
      <p>The most traditional metrics for HTR and OCR are both Word Error Rate (WER) and
Character Error Rate (CER). The 昀椀rst one proves to be complicated to apply in Old French and
Medieval Latin, as spaces in medieval manuscripts tend to vary in size or simply be
nonexistent from a modern perspective, relying on the knowledge of the reader to separate words—or
the ability of NLP models to separate th9e]m.T[he second one works well, with the limitation
that spaces are o昀琀en the 昀椀rst source of mistakes. CER corresponds to the sum of character
insertion, removal and replacement over the total number of characters, thus providing a
昀椀negrained metric.</p>
      <p>As mentioned earlier, in the introduction, both CER and WER require ground truth, and other
metrics currently discussed as alternatives, such as the (pseudo-)perplexity or lexical measures
proposed by Ströbel, Clematide, Volk, Schwitter, Hodel, and Sch40o]c.hT[he other approach
to evaluating quality without ground truth is to predict a class of CER, such as the work done
by Bazzo, Lorentz, Suarez Vargas, and Moreir1a].[ These approaches rely on features such as
n-grams, word statistics and language classi昀椀er outputs which are di昀케cult to leverage in the
present context. In order to train their classi昀椀er, Bazzo, Lorentz, Suarez Vargas, and M1o]reira [
and Nguyen, Jatowt, Coustaty, and Douce3t2][engineered bad predictions by creating rules to
reproduce the most common errors in OCR, such as “rn” becoming “m”. These bad predictions
are then fed to their model along with the metrics both papers want to predict.</p>
      <p>
        Nguyen, Jatowt, Coustaty, and Douce3t2][ provide an innovative approach to the issue of
noise in OCR by shi昀琀ing from a CER/WER problem to a readability one: if the reader “can
reod a txt with mi昀pelling” without having to refer back to the picture, at least one of the
goals of OCR has been achieved. As simply put by Martinc, Pollak, and Robnik-Šiko3n0j]a, [
“Readability is concerned with the relation between a given text and the cognitive load of a
reader to comprehend it”. It is even more important in the context of handwritten documents
where a somewhat badly but readable HTR output can be easier for non-specialists to read than
the original. In the 昀椀eld of readability assessment, Martinc, Pollak, and Robnik-Šik3o0n]ja [
has shown that supervised models perform adequately, while Nguyen, Jatowt, Coustaty, and
Doucet [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] has shown that this translates to the OCR issues as well. This has not been applied
to any medieval dataset that we know of.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>
        To train di昀erent models, we reused the data from various projects, aligned with the same
guidelines used by Pinche 3[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Our experiment was made possible by the open release of
many projects’ datasets, including one MA thesis and one student pro4j1e,c2t].[We used the
ground truth of the CREMMA13][ and CREMMALab [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] projects, the Rescribe42[] project,
and the GalliCorpor3a6[
        <xref ref-type="bibr" rid="ref37">, 37</xref>
        ] projects, for a total of 42,292 lines (see Ta1b)l.eWe include one
dataset of incunabula, which use graphical shapes similarly to literary manuscripts (but with
more regularity), while also using an abbreviation system.
      </p>
      <p>The datasets present not only two main languages but also many di昀erent levels of
digitization quality (including old binarization), di昀erent kinds of handwriting families, di昀erent
abbreviation levels and di昀erent genres. For example, while the CREMMA Medieval dataset
focuses more on literary texts, speci昀椀cally hagiographical achnadnson de geste texts, the
CREMMA Medieval LAT corpus o昀ers theological commentaries and medicinal recipes, each
genre having its own speci昀椀c vocabulary. The dataset in general is skewed towards French and
thegothica handwritten family.</p>
      <p>
        The transcription guidelines of Pinch3e5][ provide simpli昀椀cation rules: allographic
approaches are forbidden (di昀erent shapes of s such as long s and “modern” s are not
di昀erentiated), macrons and general horizontal-line diacritics over the letters such as tildes are
represented by horizontal tildes, any “zig4zaogr” similarly shaped forms are simpli昀椀ed into
superscript vertical tildes, etc. This allows for simpler transcriptions and also limited diversity
of characters for the machine to learn, satisfying both the human transcriber in terms of the
learning curve of the guidelines, and the HTR engine in terms of complexity. Each corpus
was passed through the ChocoMu昀椀n so昀琀ware [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] using project-speci昀椀c character translation
tables. This so昀琀ware, along with these tables, allows each dataset to be controlled at the
character level and adapted to guideline modi昀椀cations. It also allows project-speci昀椀c transcription
standards to be translated to a more common one, such as Pinche’s.
      </p>
      <sec id="sec-3-1">
        <title>4O昀케cial name from the Unicode speci昀椀cations for the character U+299A.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Method</title>
      <p>Our goal is to be able to predict a quality class for any HTR output on medieval French and
Latin. First, we design a way to generate ground truth for the quality assessment of HTR output.
Then, we propose three supervised text-based models, with speci昀椀c adaptations to handle both
languages with a single classi昀椀er.</p>
      <sec id="sec-4-1">
        <title>4.1. “Bad Prediction” Ground Truth</title>
        <p>
          In order to train our classi昀椀cation model, we require ground truth material along a CER class:
Good ([0; 10)%), Acceptable ([10; 25)%), Bad ([25; 50)%) and Very Bad (≥ 50%). In order to have
real life errors, and to reproduce the rather di昀케cult to predict capacity of a model to confuse
certain characters with others in speci昀椀c settings, we propose a three-step method:
1. We train Kraken2[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] models based on the complete dataset, or on a subset. We
voluntarily stop some of the training in very early stages, when the CER on the validation dataset
remains high. We also keep one “best” mode1l2[] trained on the full dataset.
2. We run each model on our two biggest and most diverse repositoCrRiEeMs,MA Medieval
and CREMMA Medieval LAT. We also run a model trained on modern and contemporary
scripts, Manu McFrench6[] to create garbage-level transcriptions.
3. We evaluate each line’s CER and store it alongside the line. We also keep the ground
truth, whose CER is estimated as 0. We remove short lines (fewer than 15 characters)
and duplicated predictions across models for the same line.
        </p>
        <p>Regarding the 昀椀nal models for prediction production, we have 16 models, allowing for a
maximum of 16 versions of each line, if none of the models predict the same text (see 2Table
for examples):
1. 4 models trained on the same train and validation datasbeetstawsith a validation CER
of 55.9, 28.3, 23 and 20.8% according to Kraken.
2. 5 models trained on thCeREMMA Medieval LAT dataset only, from thest1to the t6h
epochs, ranging from 86% to 46% of CER.
3. 1 model trained on thEeutyches (Latin, Carolingian of theth9century) and thDe ecameron
(French, 16thcentury) datasets with a 98.5% CER on its validation set.
4. 3 models trained on thCeREMMA Medieval (Old French) dataset only, 昀椀ne-tuned from
theManu Mc French Model, from 11% of CER down to 8.2%.
5. Manu McFrench, thebest model and the ground-truth data.</p>
        <p>These provide variable CER on unseen data from the test set of both CREMMA dataset but
also on training and validation sets as they did not reach their full capacities during the training
phase. A昀琀er 昀椀ltering small and repeated predictions, we have access to 322,903 lines of “HTR
Predictions, CER” couples (see in appendix Figur6e). We then translate that into each bin of
CER to produce the four established classes.
u̾ra on de q̃ l vertu ses petis pies sont que vous
Bra on de q̃ l vertuses petis pies sont que vous
Fra on de q̃ l vertuses petis pies sont que vou
Bra on de q̃ l vertuses petis pies sont que uous
Pra on de ql vertuses petis pies sont que dons
ura on de q̃ l vertu ses petis pies font grre op
ura on de q̃ l uertu ses petis pies font re dory
ura on de ql vertu ses petis pies font itce ir
Ard ondegl ratules nus mes sont que ls
a on de at etn le peos pes os e
a om de ał vrtir sot olisͣ pa sosisinos
⁊s cm dec uł vrtr fe pdp̃ pns ots pte</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Model Architecture</title>
        <p>We applied three model architectures, common to many NLP task, with an embedding-sentence
encoder-linear classi昀椀er structure where only the sentence encoder changes from one model
to another (see Figur2e). The embedding layer takes into account special tokens (Padding,
Unknown char, Start of Line, End of Line) and each character according to the Unicod5e NFD
normalization of the line, for which characters and their diacritics are separat[eéd], e.g.
becomes [e]+[´]. The linear layer is a simple (Encoding Output Dimension, Class Count)
decision layer. Each model uses a cross-entropy loss func6tiaonnd reduces its learning rate at
plateau using the validation set’s macro averaged recall metric. Optimization of the model is
done through the Ranger optimiz4e3r][.</p>
        <p>
          The encoding layer varies between three di昀erent forms:
• The 昀椀rst version uses a single BiLSTM network where the sentence encoding is the result
of the concatenation of the start-of-line token (BOS) and end-of-line token (EOS) hidden
state.
• The second version follows the architecture of sentence-level attention proposed by Yang,
Yang, Dyer, He, Smola, and Hovy 4[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], using a bidirectional GRU. The encoded sentence
vector is the sum of products of the hidden state of each token with its attention.
Attention is also provided as an output for human interpretation of the results.
• The last one, TextCNN2[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], uses the concatenation of the Max Pooling of each n-gram
size (2, 3, 4, 5, 6) taken into account by a convolutional neural network.
        </p>
        <sec id="sec-4-2-1">
          <title>5Normalization Form Canonical Decomposition. 6Code available athttps://github.com/PonteIneptique/neNequ.itia</title>
          <p>
            As we deal with two di昀erent languages, we added another special token, following the work
of Martin, Villemonte de La Clergerie, Sagot, and Bor2d9e]sa[nd Gong, Bhat, and Viswanath
[
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]: for each encoding variation we add one variation of the codec where the 昀椀rst token a昀琀er
the beginning-of-string is a metadata token indicating the language. Thus, a line suFrcha as
on de ̃ql vertuses petis pies sont que vo will be encoded as&lt;BOS&gt;&lt;FRO&gt;Fra on de
̃ql vertuses petis pies sont que vo&lt;EOS&gt;.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Setup</title>
      <p>In order to avoid lexical bias and to ensure the strength of our analysis, we propose a
5-Foldlike experiment, where each subset for train, validation and test are the results of split across
manuscripts. For each K, two French manuscripts and two Latin ones are used for the
validation set and the test set, and they di昀er by at least one manuscript from one K to another,
leaving three K completely di昀erent (K1, K3, K5; see Tab3l)e. Each test set also contains a Latin
manuscript that was not used in any of the HTR model training or validBaetriloinn,:Hdschr.
25. This manuscript was then used for model evaluation, to have a stable pillar for evaluation.
Models are then evaluated using class-speci昀椀c precision and recall, as well as macro averaged
precision and recall.</p>
      <p>For our baseline, we use the relative frequency of the 2000 most common n-grams of size 3,
4 and 5 as features and feed them to a linear classi昀椀er, with cross entropy loss and the Adam
optimizer. We run each model architecture once for each K, resulting in 7 di昀erent results with
the baseline (presence/absence of language token for the three encoding modules + baseline).</p>
      <p>
        Our whole pipeline uses pandas for data preparati3o1n],[PyTorch [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] for model
development, and Pytorch Lightnin1g7[] for the training, evaluation and prediction wrapping.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Experiments</title>
      <sec id="sec-6-1">
        <title>6.1. Model Classification Results</title>
        <p>The 昀椀rst conclusion we can draw from the experience is that our models always beat the
baseline (see Table4 and, in the Appendix, Table4 for more details). No RNN-based architecture
clearly beats the other, but TextCNN clearly underperforms. The introduction of the language
metadata token helps when detectiGnogod transcriptions (delt≈a +7% for attention’s
median precision,≤ +1% for the recall) for both RNN based models. Models without a language
marker tend to outperform models with language markers, except fVoerrythBaed class where
the delta is up t+o6% in favour of models without language tokens (using median precision
scores).</p>
        <p>Regarding the variability of results, we found that the length of the string had an impact on
the prediction, no matter the model architecture. Surprisingly, none of the models withstand
long noisy lines: the accuracy of tVheery Bad class is inversely correlated with line size. On the
contrary, depending on the encoder, some classes bene昀椀t from longer striGnogos:d lines
bene昀椀t from it with all models except the baseline. TextCNN is the only model to really correlate
accuracy on theBad and Acceptable classes with line length.</p>
        <p>Finally, for all models except the baseline, the most common confusion is always in the
“adjacent” class(es) (see Figure4). For the classeAscceptable and Bad, which have two neighbours,
the error rate is evenly split between them: the Acclcaespstable tends to be confused with
eitherGood orBad. This shows the model’s ability to understand cleanness or noise, but also
shows the limit of these classes: for a line with 50 characters, such as “quuãtfatuetlgeist en
tes lieu.Derite respiont”, 6 mistakes are enough to swing into tAhcceeptable category (Ground
truth: “quãt tel enfant gist en tel lieu . Uerite respon”, one space has been removed before the
dot).</p>
        <p>Overall, with an accuracy for tGhoeod and Very Bad classes around 50% on these languages,
and considering that most of the confusions are from adjacent claes.gse.sG(ood is confused
withAcceptable, Acceptable withGood and Bad, etc.), the solution performs well either at
昀椀ltering badly read manuscripts, or keeping only the very good ones. ATchceptable class and the
Bad class have stable performance facing variable line length, althougAhctcehpetable class
shows the worst classi昀椀cation performance.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Application on a Real-World Library Dataset</title>
        <p>
          As a real-world application, we wanted to apply one of our best models to an unseen dataset, in
the same way that we envision cultural institutions might use the tool. We describe the set-up
for this particular experiment below, and then evaluate the results of the classi昀椀cation model
with regard to the capacity of the HTR model; we also study some randomly sampled elements.
6.2.1. Set-up
To evaluate on as much unseen data as possible, we crawled the Biblissima IIIF collection
portal 1[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. We searched individually for each combination of language (French, Latin) and century
(9th to 15th), limiting the number of samples retrieved to 500 manuscripts. We then sampled
10 sequential pictures from each manuscr7ipTto. avoid empty pages (which tend to be at the
start and the back of each book’s digitization or IIIF manifest at the BnF), we take either the
ten 昀椀rst pictures from the second decile of the manifest, or from tthhuep20to the 3t0hif there
are fewer than 100 pictures, or the 10 last if there are fewer than 20 pictures.
        </p>
        <p>Each downloaded sample is then segmented using YALTA1i0][ with the included model
designed for cultural heritage manuscripts and the base Kraken BLLA segme2n4t]e.rA[s
YALTAi provides di昀erent zones—from the margin to main body of text—through numbering,
we only consider lines that are part of the main bodies of text of each model, thus excluding
any marginal or paratext. We then use Kraken to predict a transcription for each line, using
the best trained model as described in our 昀椀rst experiment. Next, we feed each line to our best
BiLSTM model (K-Fold 1 has the best recall/precisionGooond) while keeping the line metadata:
language, century, manuscript identi昀椀er, and page identi昀椀er.</p>
        <p>Finally, we provide three di昀erent evaluations of the transcriptions. The 昀椀rst is based strictly
on the number of lines predicted in each claGsoso(d, Acceptable, etc.). The second is page-based:
we take the most common prediction for all lines. The last one is manuscript based: we take
the most common page prediction, using the previous page-based metric.
6.2.2. Evaluation
Overall, the HTR prediction results produced by our BiLSTM module are in line with the HTR
strength on the dataset (see Figu5)r.eThe model performs extremely well on early manuscripts
thanks to the presence of two datasets of early manuscrEiupttysch(es and Caroline Minuscule)
It performs well on Old French except for the 13th century, wBhaedrperedictions are more
common. The relative frequency oVfery Bad predictions tends to grow as we get closer to the
16th century: from the data we have seen, this could be due to the presence of non-literary
manuscripts written in cursive, for which our model has no ground truth.</p>
        <p>If we look at the sampled predictions (Appendix, Ta2b)l,emostGood predictions seem
correct or nearly correct. However, we can see that the metadata from Biblissima and the BnF
has some limitations when used automatically, as it can produce problematic results: most
12thcenturyAcceptable predictions are probably in Latin, which would indicate a multilingual
manuscript or a badly catalogued one. This issue also arises in the crawler for the century,
as some manuscripts were catalogued as French but with a production date that is before the
椀昀rst known French document: these are most likely multilingual documents, with either a
collection of various leaves from previous manuscripts, or the inclusion of the language used for
marginal notes. 3 out of theA6cceptable predictions between the t1h3and the 14th century
are de昀椀nitely readable and understandable, and we cannot but wonder if the lack of spaces
in “q̃ merueilles fu lacitebiengarne mlt” is responsible for its classi昀椀catiAonccaesptable rather
thanGood. We note that at least oVneery Bad prediction in French, “OU EtE L. Cheualier de
Monifort, son Oncle, Gles”, seems rather readable, albeit with more corrections thGaonodfor a
transcription. Latin shows the same trend, in being accurateGoovoedrand Acceptable.
7Note that we are not talking about pages but about pictures: in some cases, most commonly in the case of digitised
micro昀椀lms, one picture can contain two pages.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>The ability to 昀椀lter, without pre-transcribing samples, automated transcriptions of manuscripts
in Latin, Old French or any other Western historical language, might lead to the production of
datasets designed for analysis that relies on better transcriptions, or to guiding cultural heritage
institutions and their partners in the production of new ground truth. Producing HTR ground
truth does indeed require time, skilled transcribers and, last but not least, budget. However,
most current error rate prediction or HTR output analysis models rely on n-gram frequencies
and lexical features—two approaches that are o昀琀en less viable for languages such as Old French
which “su昀ers” from a highly variable spelling system or for languages like Latin which are
potentially highly abbreviated, with abbreviations changing even within a single manuscript,
depending on the context, the topic and the scribe.</p>
      <p>In this context, we chose to treat CER range predictions as a sentence-like classi昀椀cation
problem, for which we implemented three basic models, using either a single BiLSTM encoder,
an attention-supported GRU, or a TextCNN encoder. These three tools show stronger results
than an n-gram based baseline. On top of this, we include a language metadata token which
can improve the reliability of the lowest range of CER (between 0 and 10%G,otohdeclass)
while worsening the classi昀椀cation’s reliability for the highest range (over 50B%a, dthcleass).
For the purpose of training these models, we propose a new way to generate real life “bad
transcriptions”, using early-stopped HTR models, or models trained on small samples of data:
this provides an alternative to previous rule-based generation of “bad transcription” ground
truths.</p>
      <p>We show that on a completely unknown dataset of around 1,800 manuscripts, analysed
with a new HTR model speci昀椀cally trained on medieval Latin and French, the number of
welltranscribed manuscripts predicted is on par with the ground truth for that dataset. The quality
assessment predictions provide quick insights for larger collections, and could be run relatively
o昀琀en by cultural heritage institutions.</p>
      <p>
        In the future, hyper-parameter 昀椀ne-tuning and other encoders could be used in the
architecture. Speci昀椀cally, with more correctly transcribed manuscripts, including the abbreviations in
their transcriptions, 昀椀ne-tuning larger language models could allow the application of
(pseudo)perplexity ranking such as the one proposed by Ströbel, Clematide, Volk, Schwitter, Hodel, and
Schoch [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ], while allowing for partial noise in the training data. We hope to see such
classi椀昀cation of manuscripts used by ground truth producers in order to enhance the robustness of
openly available HTR models.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>I want to thanks Jean-Baptiste Camps, Ariane Pinche and Malamatenia Vlachou-Efstathiou
for their constant feedback and replies on some particular questions regarding manuscripts or
HTR data. Many thanks to Ben Nagy for his proof-reading of the pre-print version.</p>
      <p>This work was funded by the Centre Jean Mabillon and the DIM MhAtPt(ps://www.dim-m
ap.fr/projets-soutenus/cremmala).b/</p>
      <p>N. White, A. Karaisl, and T. CléricCe.aroline Minuscule by Rescribe. Ed. by A. Chagué
and T. Clérice. 2022. url:https://github.com/rescribe/carolineminuscule-ground.truth</p>
    </sec>
    <sec id="sec-9">
      <title>A. Appendix</title>
      <p>The so昀琀ware has been archived at the following addreshst:tps://doi.org/10.5281/zenodo.723
3984. A good chunk of the data is available herhet:tps://github.com/PonteIneptique/neNequ
itia/releases/tag/chr2022-rele.ase</p>
      <p>Manuscripts metadata and the predictions in XML ALTO formats for Section 6 are available
athttps://doi.org/10.5281/zenodo.72343.99The same repository contains also the XML data
for training the classi昀椀er.
lat
lat
lat
lat
lat
lat
lat
12
13
13
13
14
14
14
15
15
15
9
Good
Good
Good</p>
      <p>Good
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable</p>
      <p>Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable
Acceptable</p>
      <p>Acceptable
Bad
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
爀昀o
lat
lat
lat
lat
lat
lat
13
13
13
14
Lang
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
lat
Lang</p>
      <p>Encoder</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G. T.</given-names>
            <surname>Bazzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Lorentz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Suarez</given-names>
            <surname>Vargas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. P.</given-names>
            <surname>Moreira</surname>
          </string-name>
          .
          <article-title>“Assessing the Impact of OCR Errors in Information Retrieval”</article-title>
          .AInd:vances in Information Retrieval. Ed. by
          <string-name>
            <surname>J. M. Jose</surname>
            , E. Yilmaz,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Magalhães</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Castells</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          <string-name>
            <surname>Silva</surname>
            , and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          . Lecture Notes in Computer Science. Cham: Springer International Publishing,
          <year>2020</year>
          , pp.
          <fpage>102</fpage>
          -
          <lpage>109</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -45442-5\_
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Biay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Boby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Konstantinova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>CappeT. NAH-2021-DecameronFR</surname>
          </string-name>
          .
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.6126376. url: https://github.com/PSL-Chartes
          <string-name>
            <surname>-</surname>
          </string-name>
          HTR-Students/TNAH-2
          <fpage>021</fpage>
          -
          <lpage>DecameronFR</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>J.-B. Camps</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Clérice</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Duval</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Ing</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Kanaoka</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          . “
          <article-title>Corpus and Models for Lemmatisation and POS-tagging of Old French”</article-title>
          .
          <year>2022</year>
          . uhrtl:tps://halshs.archives
          <article-title>-o uvertes</article-title>
          .
          <source>fr/halshs-0335312</source>
          .
          <fpage>5</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>J.-B. Camps</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Vidal-Gorène</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <article-title>VernetH.andling Heavily Abbreviated Manuscripts: HTR engines vs text normalisation approaches</article-title>
          .
          <source>May</source>
          <year>2021</year>
          . url: https://hal-enc.
          <article-title>archives-o uvertes</article-title>
          .
          <source>fr/hal-0327960</source>
          .
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Careri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ruby</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I.</surname>
          </string-name>
          <article-title>ShortL.ivres et écritures en français et en occitan au XIIe siècle: catalogue illustré</article-title>
          .
          <source>Viella</source>
          ,
          <year>2011</year>
          . 274 pp.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chagué</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>ClériceH.TR-United - Manu McFrench</surname>
          </string-name>
          <article-title>V1 (Manuscripts of Modern and Contemporaneous French)</article-title>
          .
          <source>Version 1.0.0</source>
          .
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.6657809. url: https: //doi.org/10.5281/zenodo.665780.9
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chagué</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>ClériceH.</surname>
          </string-name>
          TR-United:
          <article-title>Ground Truth Resources for the HTR and OCR of patrimonial documents</article-title>
          .
          <year>2022</year>
          . url: https://htr-united.githu.b.io
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Clausner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pletschacher</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Antonacopoulos</surname>
          </string-name>
          . “
          <article-title>Quality Prediction System for Large-Scale Digitisation Work昀氀ows”</article-title>
          .
          <source>In2:016 12th IAPR Workshop on Document Analysis Systems (DAS)</source>
          .
          <year>2016</year>
          , pp.
          <fpage>138</fpage>
          -
          <lpage>143</lpage>
          . doi:
          <volume>10</volume>
          .1109/das.
          <year>2016</year>
          .
          <volume>82</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          . “
          <article-title>Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin”</article-title>
          .
          <source>IJno:urnal of Data Mining &amp; Digital Humanities</source>
          <year>2020</year>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .46298/jdmdh.5581. url: https://jdmdh.episciences.
          <source>org/62.64</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          . “
          <article-title>You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine”</article-title>
          .
          <year>2022</year>
          . uhrtlt:ps://hal-enc.
          <source>ar chives-ouvertes.fr/hal-037232</source>
          .
          <fpage>08</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>PincheC.</surname>
          </string-name>
          hoco
          <article-title>-Mu昀椀n, a tool for controlling characters used in OCR and HTR projects</article-title>
          .
          <source>Comp. so昀琀ware. Version 0.0.4</source>
          .
          <year>2021</year>
          . doi:
          <volume>10</volume>
          . 5281 / zenodo . 5356154. url: https://github.com/PonteIneptique/choco-m.ufin
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          , and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Vlachou-EfstathioGue.neric CREMMA Model for Medieval Manuscripts (Latin and Old French</article-title>
          ),
          <fpage>8</fpage>
          -
          <lpage>15th</lpage>
          century.
          <source>Version 1.0.0</source>
          .
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/zen odo.
          <volume>7234166</volume>
          . url: https://doi.org/10.5281/zenodo.72341.66
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Vlachou Efstathiou, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>ChagCuRé.EMMA</surname>
          </string-name>
          <article-title>Manuscrits médiévaux latins</article-title>
          . Ed. by
          <string-name>
            <given-names>A.</given-names>
            <surname>Chagué</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          .
          <year>2022</year>
          . urlh:ttps://github.com/HTR-United/CREMMAMedieval-LAT.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cloppet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Eglin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. C.</given-names>
            <surname>Kieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stutzmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Vincent</surname>
          </string-name>
          . “
          <source>ICFHR2016 Competition on the Classi昀椀cation of Medieval Handwritings in Latin Script”2.0I1n6: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)</source>
          .
          <source>2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)</source>
          . Shenzhen, China: Ieee, Oct.
          <year>2016</year>
          , pp.
          <fpage>590</fpage>
          -
          <lpage>595</lpage>
          . doi:
          <volume>10</volume>
          .1109/icfhr.
          <year>2016</year>
          .
          <volume>0113</volume>
          . url: http://ieeexplore.ieee.or g/document/7814129./
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cuper</surname>
          </string-name>
          . “
          <article-title>Examining a Multi Layered Approach for Classi昀椀cation of OCR Quality without Ground Truth”</article-title>
          . IDn
          <string-name>
            <surname>:H Benelux Journal</surname>
          </string-name>
          (
          <year>2022</year>
          ), p.
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eder</surname>
          </string-name>
          . “
          <article-title>Mind your corpus: systematic errors in authorship attributioLnit”e</article-title>
          .
          <source>raInry: and Linguistic Computing</source>
          <volume>28</volume>
          .4 (
          <issue>Dec</issue>
          . 1,
          <year>2013</year>
          ), pp.
          <fpage>603</fpage>
          -
          <lpage>614</lpage>
          . doi:
          <volume>10</volume>
          .1093/llc/fqt03.9url: https://doi.org/10.1093/llc/fqt.039
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>W.</given-names>
            <surname>Falcon</surname>
          </string-name>
          and
          <article-title>The PyTorch Lightning teamP y</article-title>
          .
          <source>Torch Lightning. Comp. so昀琀ware. Version 1.4</source>
          .
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.3828935. url: https://github.com/Lightning-AI/light ning.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>E.</given-names>
            <surname>Frunzeanu</surname>
          </string-name>
          , E. MacDonald, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Robineau</surname>
          </string-name>
          . “
          <article-title>Biblissima's Choices of Tools and Methodology for Interoperability PurposesC”</article-title>
          .
          <source>IIAnN:</source>
          . Revista de historia de
          <source>las universidades 19.1</source>
          (
          <issue>2016</issue>
          ), pp.
          <fpage>115</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhat</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Viswanath</surname>
          </string-name>
          . “
          <article-title>Enriching Word Embeddings with Temporal and Spatial Information”</article-title>
          .
          <source>InP:roceedings of the 24th Conference on Computational Natural Language Learning. Online: Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .conll-
          <volume>1</volume>
          .1.url: https://aclanthology.org/
          <year>2020</year>
          .conll.-
          <volume>1</volume>
          .
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>E.</given-names>
            <surname>Guéville</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Wrisley</surname>
          </string-name>
          . “
          <article-title>Transcribing Medieval Manuscripts for Machine Learning”</article-title>
          .
          <source>In: arXiv preprint arXiv:2207.07726</source>
          (
          <year>2022</year>
          ). url: https://arxiv.org/abs/2207.0772.6
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Holley</surname>
          </string-name>
          . “
          <article-title>How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation programs”</article-title>
          .
          <source>DI-nL:ib Magazine 15.3/4</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Honkapohja</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Suomela</surname>
          </string-name>
          . “
          <article-title>Lexical and function words or language and text type? Abbreviation consistency in an aligned corpus of Latin and Middle English plague tracts”</article-title>
          .
          <source>In: Digital Scholarship in the Humanities 37.3</source>
          (
          <issue>2021</issue>
          ), pp.
          <fpage>765</fpage>
          -
          <lpage>787</lpage>
          . doi:
          <volume>10</volume>
          .1093/llc/fqab007. url: https://doi.org/10.1093/llc/fqab0.07
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kahle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Colutto</surname>
          </string-name>
          , G. Hackl, and
          <string-name>
            <surname>G. Mühlberger.</surname>
          </string-name>
          “
          <article-title>Transkribus-a service platform for transcription, recognition and retrieval of historical documen2t0s1”7. I1n4:th IAPR International Conference on Document Analysis and Recognition (ICDAR)</article-title>
          . Vol.
          <volume>4</volume>
          .
          <string-name>
            <surname>Ieee</surname>
          </string-name>
          .
          <year>2017</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          . “
          <article-title>A modular region and text line layout analysis system20”.2I0n1:7th International Conference on Frontiers in Handwriting Recognition (ICFHR)</article-title>
          .
          <source>Ieee</source>
          .
          <year>2020</year>
          , pp.
          <fpage>313</fpage>
          -
          <lpage>318</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          .
          <article-title>The Kraken OCR system</article-title>
          .
          <source>Comp. so昀琀ware. Version 4.1.2</source>
          .
          <year>2022</year>
          . url: https://k raken.
          <source>re.</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tissot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stokes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. S. B.</given-names>
            <surname>Ezra</surname>
          </string-name>
          .
          <article-title>“eScriptorium: an open source platform for historical document analysis”</article-title>
          .
          <source>2I0n1:9 International Conference on Document Analysis and Recognition Workshops (ICDARW)</source>
          . Vol.
          <volume>2</volume>
          .
          <string-name>
            <surname>Ieee</surname>
          </string-name>
          .
          <year>2019</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          . “
          <article-title>Convolutional Neural Networks for Sentence Classi昀椀cation”P. rIonc:eedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</article-title>
          . Doha, Qatar: Association for Computational Linguistics,
          <year>2014</year>
          , pp.
          <fpage>1746</fpage>
          -
          <lpage>1751</lpage>
          .
          <year>1d0o</year>
          .
          <year>3i</year>
          :
          <volume>115</volume>
          /v1 /
          <fpage>D14</fpage>
          -1181. url: https://aclanthology.org/D14-1.
          <fpage>181</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>E.</given-names>
            <surname>Manjavacas</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Fonteyn</surname>
          </string-name>
          . “
          <article-title>Adapting vs. Pre-training Language Models for Historical Languages”</article-title>
          .
          <source>In: Journal of Data Mining and Digital Humanities Nlp4dh</source>
          (
          <year>2022</year>
          ). doi:10.4 6298/jdmdh.9152. url: https://hal.inria.fr/hal-035921.
          <fpage>37</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <surname>É. Villemonte de La Clergerie</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Sagot</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          . “
          <article-title>Controllable Sentence Simpli昀椀cation”</article-title>
          .
          <source>In: LREC 2020 - 12th Language Resources and Evaluation Conference</source>
          . Marseille, France,
          <year>2020</year>
          . urlh:ttps://hal.inria.fr/hal-026782.
          <fpage>14</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>M.</given-names>
            <surname>Martinc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pollak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Robnik-Šikonja</surname>
          </string-name>
          .
          <article-title>“Supervised and Unsupervised Neural Approaches to Text Readability”</article-title>
          .
          <source>ICno:mputational Linguistics 47.1 (Apr. 21</source>
          ,
          <year>2021</year>
          ), pp.
          <fpage>141</fpage>
          -
          <lpage>179</lpage>
          . doi:
          <volume>10</volume>
          .1162/coli\_a\_
          <volume>00398</volume>
          . url: https://doi.org/10.1162/coli%5C
          <source>%5Fa%5C%5F0039 8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>W.</given-names>
            <surname>McKinney</surname>
          </string-name>
          et al. “
          <article-title>pandas: a foundational Python library for data analysis and statistics”</article-title>
          . In:
          <article-title>Python for high performance</article-title>
          and
          <source>scienti昀椀c computing 14.9</source>
          (
          <issue>2011</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>H. T. T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Coustaty</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Doucet</surname>
          </string-name>
          . “
          <article-title>ReadOCR: A Novel Dataset and Readability Assessment of OCRed Texts”</article-title>
          .
          <source>IInn:ternational Workshop on Document Analysis Systems</source>
          . Springer.
          <year>2022</year>
          , pp.
          <fpage>479</fpage>
          -
          <lpage>491</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chintala</surname>
          </string-name>
          . “
          <article-title>PyTorch: An Imperative Style, High-Performance Deep Learning Library”</article-title>
          .
          <source>IAnd:vances in Neural Information Processing Systems</source>
          <volume>32</volume>
          . Ed. by
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beygelzimer</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>d'Alché-</article-title>
          <string-name>
            <surname>Buc</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Fox</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Garnett</surname>
          </string-name>
          . Curran Associates, Inc.,
          <year>2019</year>
          , pp.
          <fpage>8024</fpage>
          -
          <lpage>8035</lpage>
          . urhlt: tp://papers .neurips.cc/paper/9015-pytorch
          <article-title>-an-imperative-style-high-performance-deep-learninglibrary.pd</article-title>
          .f
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A. Pinche. Cremma</given-names>
            <surname>Medieval</surname>
          </string-name>
          .
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.5235185. url: https://github.co m/HTR-United/cremma-medieva.l
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          . “
          <article-title>Guide de transcription pour les manuscrits du Xe au XVe siècle”</article-title>
          .
          <year>2022</year>
          . url: https://hal.archives-ouvertes.
          <source>fr/hal-03697</source>
          .
          <fpage>382</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gabay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leroy</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. ChristensenD.
          <source>onnées HTR incunables du 15e siècle</source>
          .
          <year>2022</year>
          . url: https://github.com/Gallicorpora/HTR-incunable
          <string-name>
            <surname>-</surname>
          </string-name>
          15e-s.iecle
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gabay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leroy</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. ChristensenD.
          <source>onnées HTR manuscrits du 15e siècle</source>
          .
          <year>2022</year>
          . url: https://github.com/Gallicorpora/HTR-MSS
          <string-name>
            <surname>-</surname>
          </string-name>
          15e-Si.ecle
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>J.</given-names>
            <surname>Schoen</surname>
          </string-name>
          and
          <string-name>
            <surname>G. E. Saretto.</surname>
          </string-name>
          “
          <article-title>Optical Character Recognition (OCR) and Medieval Manuscripts: Reconsidering Transcriptions in the Digital Age”D. iIgni:tal Philology:</article-title>
          <source>A Journal of Medieval Cultures 11.1</source>
          (
          <issue>2022</issue>
          ), pp.
          <fpage>174</fpage>
          -
          <lpage>206</lpage>
          . doi:
          <volume>10</volume>
          .1353/dph.
          <year>2022</year>
          .
          <volume>0010</volume>
          . url: https://muse.jhu.edu/article/8535.21
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>U.</given-names>
            <surname>Springmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. U. SchulzA.
          <article-title>utomatic quality evaluation and (semi-) automatic improvement of OCR models for historical printings</article-title>
          .
          <source>Oct</source>
          .
          <volume>20</volume>
          ,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .48550/ar Xiv.
          <volume>1606</volume>
          .05157. url: http://arxiv.org/abs/1606.0515.7
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>P. B. Ströbel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Clematide</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Volk</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Schwitter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hodel</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schoch</surname>
          </string-name>
          . “
          <article-title>Evaluation of HTR models without Ground Truth Materiala”r</article-title>
          . XIniv: preprint arXiv:
          <volume>2201</volume>
          .06170 (
          <year>2022</year>
          ). url: http://arxiv.org/abs/2201.0617.0
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vlachou-EfstathiouV.oss.Lat</surname>
          </string-name>
          .O.
          <fpage>41</fpage>
          - Eutyches ”de uerbo” glossed.
          <year>2022</year>
          . url: https://git hub.com/malamatenia/Eutych.es
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wright</surname>
          </string-name>
          .
          <article-title>New Deep Learning Optimizer, Ranger: Synergistic combination of RAdam + LookAhead for the best of… Medium</article-title>
          .
          <source>Sept. 4</source>
          ,
          <year>2019</year>
          . url:https://medium.com/%5C@
          <article-title>less w/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookaheadfor-the-best-of-2dc83f79a48</article-title>
          . d
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smola</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Hovy.</surname>
          </string-name>
          “
          <article-title>Hierarchical Attention Networks for Document Classi昀椀cation”</article-title>
          .
          <article-title>InP: roceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          .
          <article-title>Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          . San Diego, California: Association for Computational Linguistics,
          <year>2016</year>
          , pp.
          <fpage>1480</fpage>
          -
          <lpage>1489</lpage>
          .
          <year>1d0o</year>
          .
          <source>1i:8653/</source>
          v 1/
          <fpage>N16</fpage>
          -1174. url: http://aclweb.org/anthology/N16-1.
          <fpage>174</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>