<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Does Context Matter ? Enhancing Handwritten Text Recognition with Metadata in Historical ⋆ Manuscripts</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Benjamin Kiessling</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CHR 2024: Computational Humanities Research Conference</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>École Pratique des Hautes Études, Université PSL</institution>
          ,
          <addr-line>4-14 rue Ferrus, 75014</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Inria</institution>
          ,
          <addr-line>48 rue Barrault, 75013 Paris</addr-line>
        </aff>
      </contrib-group>
      <fpage>427</fpage>
      <lpage>442</lpage>
      <abstract>
        <p>The digitization of historical manuscripts has significantly advanced in recent decades, yet many documents remain as images without machine-readable text. Handwritten Text Recognition (HTR) has emerged as a crucial tool for converting these images into text, facilitating large-scale analysis of historical collections. In 2024, the CATMuS Medieval dataset was released, featuring extensive diachronic coverage and a variety of languages and script types. Previous research indicated that model performance degraded on the best manuscripts over time as more data was incorporated, likely due to overgeneralization. This paper investigates the impact of incorporating contextual metadata in training HTR models using the CATMuS Medieval dataset to mitigate this efect. Our experiments compare the performance of various model architectures, focusing on Conformer models with and without contextual inputs, as well as Conformer models trained with auxiliary classification tasks. Results indicate that Conformer models utilizing semantic contextual tokens (Century, Script, Language) outperform baseline models, particularly on challenging manuscripts. The study underscores the importance of metadata in enhancing model accuracy and robustness across diverse historical texts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;handwritten text recognition</kwd>
        <kwd>medieval manuscripts</kwd>
        <kwd>metadata</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The digitization wave of the past two decades has significantly increased online access to
historical manuscripts. Despite this progress, a substantial number of these documents are
available only as images, lacking machine-readable text. Handwritten Text Recognition (HTR) has
emerged as a vital tool for converting these images into text, facilitating the analysis of vast
historical collections such as Camps et al.’s wor2k].[ Consequently, multiple large datasets
have emerged in recent years 2[
        <xref ref-type="bibr" rid="ref1">1, 16, 18, 17</xref>
        ]. However, most of these datasets are mono- or
bilingual, with relatively limited geographical, temporal, scribal, and generic diversity. While
this does not afect the quality of the datasets per se, it limits the generalization of models
derived from them. Specifically, such models may face vocabulary limitations in the case of
language or generic unicity (e.g., corpora composed solely of biblical con7te]n),ta[nd graphical
interpretation issues due to the lack of scribal, temporal, or geographical variation.
      </p>
      <p>The Middle Ages, spanning approximately ten centuries, encompass a period of immense
linguistic and cultural diversity. This era witnessed the evolution of numerous languages and
dialects, each with distinct characteristics and scripts. From Old English and Latin to Old High
German and Old French, the linguistic landscape of the medieval period was dynamic and
continually evolving. This diversity poses both challenges and opportunities for HTR, as models
must be capable of handling a wide array of scripts and languages that changed significantly
over time. Addressing these challenges requires datasets that reflect the rich and varied
nature of medieval manuscripts, incorporating a broad spectrum of geographical, temporal, and
scribal variations to enhance the robustness and generalizability of HTR models.</p>
      <p>
        In late 2023 and early 2024, the publication of the CATMuS Kraken mode1l3][ and
subsequently the CATMuS Medieval dataset3[] has opened up new opportunities for training and
evaluating generic models across a vast diversity of categories and traits. With 200 manuscripts
in their initial release in January 2024, and 250 in their 1.5.0 July release, encompassing 10
languages and 6 other metadata fields, these resources provide a robust framework for developing
generalizable models that account for these specific features. However, in their initial study,
Pinche et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] indicate that the new generalizing models, trained on the comprehensive
dataset, exhibited a drop in performance compared to earlier, more language-specific models.
This finding seems to contradict the intended benefit of large, intercompatible dataset1s.
      </p>
      <p>One promising approach to mitigate these issues is the enrichment of handwritten text
datasets with metadata. Metadata provides contextual information that can enhance model
training and improve recognition accuracy. For instance, metadata on the century of
production, language, script, and genre can help models better understand and adapt to the specific
characteristics of the text they are processing.</p>
      <p>This paper explores the potential need for metadata-enriched handwritten text datasets. We
hypothesize that incorporating detailed metadata can improve HTR performance, particularly
for complex historical texts. By analyzing the performance of current models on
metadataenriched versus non-enriched datasets, we aim to demonstrate the benefits of this approach
and propose a framework for its implementation.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>Automatic text recognition in general and in particular the processing of historical typewritten
and machine-printed material has seen a stellar rise in recent years. This advancement has
had a profound impact on scholarly work, especially in the field of historical research. The
retrodigitization and accurate transcription of most types of historical documents, which were
once laborious tasks, can now be accomplished with relative ease and sufÏcient precision to
enable a multitude of novel investigations.
1It is important to note, however, that the models were compared using a similar architecture, without any
hyperparameter optimization based on the newly acquired diversity of the dataset. This suggests that further optimization
and adaptation may be necessary to fully leverage the potential of such diverse datasets.</p>
      <p>Metadata and domain knowledge have long played important roles in the design of
automatic text recognition systems (ATR). In fact, the limitations of early ATR methods, principally
utilized for the processing of documents in tightly constrained domains, necessitated
incorporating both to restrict the search space and boost accuracy to acceptable levels. Examples of
these are systems designed to aid in automatic letter sorting where the vocabulary is
efectively closed but also general-purpose ATR software such as Tesserac1t5[] utilizing extensive
dictionaries and other means of language modelling.</p>
      <p>
        Unfortunately traditional techniques to incorporate metadata have strong normalizing
tendencies which are problematic for the recognition of historical documents which often have
diverse language use, orthography, and multilingualism. While modern ATR software with
its more powerful text recognition methods dispenses with many of these accuracy-boosting
techniques, this is doubly true for software designed for historical document digitization like
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] which in most cases go to great lengths to eliminate them as far as possible.
Automatic Text Recognition The principal paradigms employed in typical Automatic Text
Recognition text recognizers have been stable for more than a decade although considerable
research has resulted in recognition methods that are significantly more powerful, with higher
accuracy, better generalization, and increased ease-of-training than the basic algorithm
proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These recognizers are placed at the end of a pipeline of interconnected processes.
A rudimentary but fairly standard ATR pipeline will ingest a digital scan of a page image at
a time, perform any necessary pre-processing, e.g. rectification, dewarping, or binarization,
ifnd individual lines on the page image in a step called layout analysis, and feed the identified
lines individually through the text recognizer. In a final step, the recognition results of the
individual lines are reassembled into a paginated text by concatenation and serialization into
raw text files or combined with data from the layout analysis to produce a digital facsimile,
most frequently in standardized formats like ALTO or PageXML.
      </p>
      <p>
        The most important feature of these ATR systems is that they implement text recognition as
a sequence to sequence modelling task where the input sequence is typically a line image and
the desired output sequence a string of characters. There are multiple ways to construct such a
sequence-mapping text recognizer albeit the most popular way is with Connectionist
Temporal Classification loss (CTC) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which permits the model to learn without requiring an explicit
alignment between input and output. Further, these methods have multiple other advantages,
some especially pertinent for historical document retrodigitization: training data creation is
typically much faster than with older character-based ATR methods as line-wise annotation
is generally more efÏcient, a lack of explicit character segmentation markedly improves error
rates on cursive writing and connected scripts, and the ability of the recognizer to take
contextual information into account boosts accuracy of characters that are difÏcult to recognize in
isolation, e.g. in the case of degraded writing.
      </p>
      <p>Style-aware HTR and other metadata-enriched architectures While interventions
contributing domain knowledge into ATR systems at a general language level, e.g. with
dictionaries or language modelling, are widespread, approaches explicitly leveraging other metadata
that might be known about the text to be recognized have rarely been described in the
literature. Minor exceptions include a method described i2n0][, similar to the semantic context
token in section3.2, for the processing of standardized European Accident Statements,
achieving a 10% reduction in CER with an architecture concatenating a metadata vector to the encoder
features in a standard CNN-LSTM trained with CTC.</p>
      <p>
        [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] describes a metadata-aware handwritten text recognition method albeit for a very
diferent use case. A  -shot learning algorithm for style-aware HTR based on meta-learning, a base
model is first trained from a text recognition training set enriched with writer labels where
each meta-learning task corresponds to writing produced by a single writer. During
inference on writing produced by a previously unknown individual scribe, an update of the model
weights with a low number of labelled samples results in an adapted model for this particular
scribe. This approach boost accuracy by around 5-7 percentage points in comparison to naive
ifne-tuning.
      </p>
      <p>Automatic Text Recognition (ATR) datasets for historical, and specifically medieval,
manuscripts likely began with Latin script datasets from the Historical Databases of IAM,
notably the Partzival 5[] and St. Gall 4[] subsets. These datasets, which remain widely used for
benchmarking new ATR engines, are relatively small (1,000 and 4,000 lines respectively), derive
from single source documents, and are fundamentally incompatible due to difering annotation
guidelines.</p>
      <p>Late 2010s datasets, such as those developed by D. Stutzmann and the company Teklia19[,
17, 18]2, have taken a more focused generic approach (e.g., cartularies, books of psalms) and
provided a significantly larger number of lines (more than 120,000 for HIMANIS). However,
these datasets are limited by their generic and language unicity, and their use of annotation
guidelines that resolve abbreviations restricts their reusability in multilingual settings. This
is due to genre- or language-specific abbreviations and normalizations, which pose challenges
for contextual-dependent abbreviation resolutio2n1][.</p>
      <p>
        The CATMuS dataset ofers an innovative framework to address these limitations, enabling
testing of ATR models across diachronic (8th-16th century), diageneric (from practical
documents to poetry), and multilingual (10 languages) variations. With a consistent annotation
approach, the CATMuS dataset [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] allows for the development and evaluation of single models
capable of handling the rich diversity of medieval manuscripts.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed method</title>
      <p>We propose two basic approaches to evaluate the impact of metadata on recognition
performance at diferent points of a text recognition method and evaluate it against a baseline of an
advanced attentional text recognizer based on the Conformer architectu8r]ea[nd the default
hybrid convolutional and recurrent neural recognizer of the kraken OCR engine. Although
our experiments are run on an adaptation of fairly complex Conformer models the
fundamental idea can be employed in almost any type of text recognizer based on neural networks.</p>
      <sec id="sec-3-1">
        <title>2Their publication date is relatively older than their original availability.</title>
        <sec id="sec-3-1-1">
          <title>3.1. Text Recognition with Transformers</title>
          <p>The baseline system consists of an adapted Conformer, a Transformer-sty2l2e][neural network
augmented with convolutional layers, currently the dominant neural network architecture in
automatic speech recognition (ASR). While ASR and ATR share many of the same features, e.g.
relatively low-dimensional inputs and a prevalent sequence-to-sequence paradigm, there is no
reported use of them in the ATR domain as of yet.</p>
          <p>While the fundamental architecture requires no adaptation for text recognition, the size of
even very large text recognition datasets is significantly smaller than the corpora of spoken
speech typically used in ASR research which necessitates downscaling the network for
reliable convergence (encoder_dim= 144, encoder_layers= 16, num_attention_heads = 4). In
addition, we adopt the computationally more efÏcient depthwise-convolution downsampling
schema (conv_channels= 32, subsampling_factor= 4) from [14] which roughly doubles
inference speed without accuracy losses.</p>
          <p>Our baseline recognizer consists of this down-sized Conformer encoder followed by a single
fully connected layer as a decoder. Like most text recognition methods it is trained with CTC
loss.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. Semantic Context Token</title>
          <p>Our first proposed method explicitly supplies the text recognizer with contextual information
of the line to be recognized during training and inference. Given an input image of a line
 ∈ ℝ ×ℎ× with height ℎ, width ,  channels to the recognizer and a vecto⃗r∈ {0, 1} containing
the encoded metadata, which we will call the semantic context token, we simply expand the
token to size  × | ⃗| and concatenate it to the input resulting in a new input to the network
 ′ ∈ ℝ+| ⃗|×ℎ× . The neural network is then trained as usual with CTC loss.</p>
          <p>The chosen metadata is encoded into semantic context toke n⃗ through a simple multi-hot
encoding, suitable for a wide-range of tag-type metadata, placing a high value at a particular
position in the vector to indicate the presence of a tag. Classes are dealt with through expansion,
e.g. for a language metadata field and possible values= { Castilian, Venetian, Latin} we would
be converted into a semantic context token| ⃗ | = 3.</p>
          <p>An obvious drawback of this method is that the text recognizer needs to be supplied the
same array of metadata during both training and inference, i.e. it can only efectively recognize
unknown text lines when the same metadata using during training is known.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.3. Auxiliary Loss</title>
          <p>In contrast to the first approach which is intended to induce the recognition model to context
switch based on explicitly provided information during inference, our second method relies on
an auxiliary loss during training to aid the network in learning the structure of the input data
without requiring a semantic context token during inference.</p>
          <p>Instead, the network is trained to reconstruct the semantic context token as the output of a
side-branch of the text recognition network. This side branch, situated just after the Conformer
encoder, consists of a simple adaptive max pooling and fully connected layer and operates on</p>
          <p>Features
Context Token
0 1 0 1 1
the totality of the encoder features. For a context tok enand a prediction of the side branc ĥ
of size  the auxiliary loss aux is computed using binary cross-entropy (BCE):
where</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>The overall training objective thus becomes:</title>
        <p>aux(, )̂ = − ∑{ 1, … ,   }⊤
  = − [ ̂ ⋅ log  + (1 −  ̂ ) ⋅ log(1 −   )]
 = (1 −  ) ⋅</p>
        <p>CTC +  ⋅ 
aux
na retex.</p>
        <p>(1)
(2)
(3)
where  is an additional hyperparameter of the training process that determines the
proportion between the main CTC and auxiliary BCE loss. In line with common practice and
conifrmed with preliminary experiments we chose to put a relatively low weig ht∈((0.1, 0.3) )
on the auxiliary loss during training.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Data</title>
      <p>For the purpose of this paper, we utilized the CATMuS Medieval dataset, adhering to the
provided dataset splits, which segment the training, validation, and evaluation sets by document.
The training and validation splits were sourced from the 1.0.1 release, while the evaluation split
was taken from the 1.5.0 releas3e for testing purposes (see Tabl8e). This approach allowed us to
3We leveraged the release of a larger, more diverse test set for evaluation; however, due to the short time frame (less
than five days) between the release of version 1.5.0 and the submission deadline of this paper, we were unable to
retrain and redo all experiments. While some documents seem to have undergone metadata correction in between
releases, we expect it to have a relatively small impact on our evaluation scores.
Features
na refex
benefit from the expanded and more varied test set, enhancing the robustness of our evaluation
without compromising the integrity of our initial training and validation processes.</p>
      <p>Representing the diversity, or lack thereof, in the CATMuS dataset is challenging due to
the various metrics (lines, characters, pages, or documents) and numerous features to consider
(genre, language, script, century, etc.). Language can be seen as a super-category, which is then
refined by genre if we view genre as primarily limiting vocabulary. In our dataset description,
we focus on script (which can serve as a proxy for century), language, and use lines as the
metric of choice. Lines are ultimately the unit used for training (sample and batch size) and
ofer a compromise between document and character count. However, it is important to note
that some documents are heavily represented in terms of lines, while others have much longer
lines (particularly in the context of prose vs. poetry), afecting the overall representation.</p>
      <p>CATMuS 1.0.1 and 1.5.0 are heavily uneven across categories. In Tabl2e, we identify four
particularly challenging ”couples” in the test set: 156 lines of Castilian in Humanistica script,
273 lines of French in Semihybrida, 736 lines of Navarese, and 147 lines of Venetian in
Textualis script. Each of these scripts has representatives in the training and development sets in
other languages, but Venetian has only two documents in CATMuS (1 in train and 1 in test
since CATMuS 1.0.0) and Navarese has only one document overall, and only in the test set.
However, the Textualis script, which represents these languages, is the most common script in
the training and development sets (see Tabl1e). We anticipate these test lines to be the most
difÏcult for the model to predict. Latin is the most represented language across scripts, missing
595
representation in only five classes in the test sets. Additionally, two scripts (Personal and Print)
and two languages (Catalan and English) are absent from the test set entirely. Caroline and
Praegothica scripts are overly represented in the test set in terms of lines, but this metric hides
a reality for Caroline in number of documents, as three documents in Latin Caroline are in the
test set, but 22 diferent small documents represent this script in the train and dev sp4l.it
4This is another example of how difÏcult it is to represent the diversity and over-representation of some categories.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>We perform experiments on the latest 2024 version of the CATMuS Medieval dataset. While
this dataset is sufÏcient in size to train a Conformer model from scratch, the models in our
experiments were fine-tuned from a base model trained on around 2.5 million text lines in a
large number of scripts and languages in order to reduce the time and computational resources
expended.</p>
      <sec id="sec-5-1">
        <title>5.1. Implementation Details</title>
        <p>All experiments are performed using the same hyperparameters and identical initial seeds. The
model architecture follows sectio3n.1. Line images are scaled to a fixed height of 96 pixels and
padded on both sides with16 pixels.</p>
        <p>The batch size is set to 32, the maximum supported by our Nvidia A40 GP under BFloat16
mixed precision.</p>
        <p>Models are trained using the AdamW optimizer12[] for 100 epochs with a cosine learning
rate schedule with linear warmup over 35000 iterations, equivalent to slightly more than 8
epochs on our dataset and batch size. Initial learning rate after warmup 3is− 4 decaying to
3 −5 by the end of the schedule. The network is regularized with weight deca1y−(5 ), dropout
(0.1), and augmentation with random blurring, scaling, rotation, and elastic transfo5rms</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Experimental Setup</title>
        <p>We chose to evaluate our methods on a subset, shown in Tabl3e, of the line-level metadata
provided by the CATMuS dataset. To determine the impact of each metadata field and potential
synergistic efects on recognition accuracy, both methods were trained with language, script
type, and age fields both separately and jointly. For the auxiliary loss weight an upper limit was
determined empirically, from below which the valu{0e.s1, 0.2, 0.3} were sampled for evaluation.</p>
        <p>All models are evaluated on character accuracy. For comparison, baseline models were
trained with both the default configuration of the Kraken OCR engine (CNN+LSTM recognizer)
5The source code for all experiments can be found undelribare Apache 2.0 at https://github.com/mittagessen/con
former_ocr.gi.t
and the unmodified Conformer architecture.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>General results. Out of the two proposed architectures, only the Conformer model using
contextual input tokens with all context tokens (Century, Script, Language) consistently
outperforms the other models. Specifically, this model surpasses the baseline Conformer
architecture, which itself outperforms the original Kraken baselines (see Ta4b)l.eModels that utilized
a single category of features, such as Language or Century, ultimately performed worse than
the baseline. The auxiliary loss approach yielded unexpected results: out of the 12
configurations (four types of tasks with three types of loss weights), half did not converge and resulted
in character accuracies below 15%. Even worse, the observed unstable training behavior seems
to be unrelated to the chosen weigh t , which indicates that optimal hyperparameters must be
determined for each new dataset and metadata token.</p>
      <p>Accuracy dispersion across manuscript. The Contextual Input model consistently
outperforms all other models, with the lowest median CER and the lowest variance. For the most
challenging manuscript, it achieves over a 2 percentage point increase in accuracy compared to
the Conformer baseline (see Tabl5e). Additionally, the Contextual Input model, without
ablation, exhibits the smallest variance among all models (see Figu3ra)e. Compared to the baseline
(see Figure3b), the model utilizing the contextual token demonstrates superior accuracy, with
a median improvement of 0.64 percentage points. It only underperforms on three manuscripts:
Paris, BnF, fr. 6447 (baseline: 97.20%, -0.33); Paris, BnF, lat. 17903 (baseline: 80.25%, -0.32); and
Paris, BnF, lat. 130 (baseline: 97.31%, -0.08).
Ablation study. To evaluate the impact of the contextual token, we present results with
null contextual tokens in Tabl7e. For models utilizing a single category of contextual input,
removing the contextual token results in decreased accuracy, with macro-accuracy dropping
by up to 3.2 percentage points for the model using the Century metadata and by as little as 0.88
points for the model using scripts. These findings suggest that the models may be overfitting
to the contextual token, as evidenced by the baseline Conformer models outperforming them.</p>
      <p>However, for the model using all contextual inputs (Context Input All), the removal of the
context token leads to a smaller reduction in efÏciency. Despite being less efÏcient with null
contextual tokens, the model still leverages learned features during decoding, aligning with our
expectations for the Auxiliary Loss training architecture. The minimal variation in accuracy
between the zeroed-out context and the full contex&lt;t(0.15 percentage points) while still
surpassing the baseline may indicate that the model has efectively learned to separate features,
even without manually provided context.</p>
      <p>Impact of unknown features. In documents featuring unknown or extremely rare features,
such as the Navarrese language (unknown) and Venetian language (represented by only one
training sample), our results not only remain stable but also surpass those of the conformer
model when utilizing all contextual tokens. Particularly noteworthy are manuscripts BnF 65
and BnF ita. 783 (cf. Table 6), where we observe consistently stronger performance. Even in
cases with null semantic tokens, we achieve improvements ranging from +0.2 to +0.4 points in
Conformer
Context. Input
accuracy.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>(b) CER diference between the baseline and
the best model (Context. Input All
nonzeroed).
(a) Dispersion of the CER across manuscripts</p>
      <p>per model for the main models
In this study, we explored the efectiveness of incorporating contextual metadata into
Handwritten Text Recognition (HTR) models to enhance the digitization of medieval manuscripts.
Utilizing the CATMuS Medieval dataset, which ofers a rich variety of scripts, languages, and
centuries, we compared the performance of Conformer models with and without contextual
inputs, as well as training these models with auxiliary classification tasks. Our objective was
to determine whether adding metadata such as Century, Script, and Language could improve
model accuracy and robustness. We tested several configurations, including models with
single and multiple contextual tokens, and evaluated them against both the baseline Conformer
architecture and the original Kraken baselines. By doing so, we aimed to identify the most
efective strategies for leveraging contextual information in HTR tasks.</p>
      <p>Our results showed that the Conformer model using all contextual input tokens
(Century, Script, Language) consistently outperformed other configurations, including the
baseline models. This model achieved higher accuracy, particularly on the most challenging
manuscripts, with an improvement of over 2 percentage points in some cases. Moreover, it
exhibited the smallest variance in performance, indicating its robustness across diferent types
of manuscripts. The use of multiple contextual tokens enabled the model to efectively learn
and utilize diverse features, leading to better generalization. Interestingly, models with
single contextual tokens did not perform as well and often fell short of the baseline, suggesting
that a more comprehensive approach to metadata integration is necessary. Additionally, the
auxiliary loss approach did not yield the expected improvements and frequently resulted in
non-converging models, indicating the complexity of efectively balancing multiple training
objectives.</p>
      <p>
        While our approach demonstrated significant improvements, there are several areas for
future exploration. The current approach relies on multi-hot encoding various categories
without embedding these features into a learnable space beforehand. Approaches in natural
language processing, such as [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], could potentially allow the model to approximate relationships
between scripts and languages that are closely related, such as ’Caroline’ and ’Humanistica’
scripts. Secondly, the context token method appends information directly onto the image data
fed into the encoder, a design choice motivated by the very lightweight FFN decoder which
we deemed to be unlikely to efectively make use of the encoder features augmented with the
raw context token. Combining a more powerful decoder, e.g. a pre-trained language model
like in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and injecting metadata after the encoder is an avenue of future research. Such an
architecture with a clear separation between the visual and linguistic model would presumably
be beneficial for some types of semantic tokens, in particular language and genre, which we
consider to be of more importance to the latter than the former.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments References</title>
      <p>[16]
[17]
[18]
[19]</p>
      <p>A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and
I. Polosukhin. “Attention is All you Need”. InA:dvances in Neural Information
Processing Systems. Ed. by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S.
Vishwanathan, and R. Garnett. Vol. 30. Curran Associates, Inc., 2017. urlh:ttps://proceeding
s.neurips.cc/paper%5C%5Ffiles/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Pa
per.pdf.</p>
      <p>Language</p>
      <p>Script</p>
      <p>Type</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Bhunia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Chowdhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sain</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.-Z.</given-names>
            <surname>Song</surname>
          </string-name>
          . “
          <article-title>MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition”</article-title>
          .
          <source>I2n0:21 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          .
          <year>2021</year>
          , pp.
          <fpage>15825</fpage>
          -
          <lpage>15834</lpage>
          . doi:
          <volume>10</volume>
          .1109 /cvpr46437.
          <year>2021</year>
          .
          <volume>01557</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>J.-B. Camps</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Baumard</surname>
            ,
            <given-names>P.-C.</given-names>
          </string-name>
          <string-name>
            <surname>Langlais</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Morin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Clérice</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Norindr</surname>
          </string-name>
          . “
          <article-title>Make Love or War? Monitoring the Thematic Evolution of Medieval French Narratives”</article-title>
          .
          <source>In: Computational Humanities Research (CHR</source>
          <year>2023</year>
          ).
          <article-title>CEUR-WS</article-title>
          .org,
          <year>2023</year>
          , pp.
          <fpage>734</fpage>
          -
          <lpage>756</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vlachou-Efstathiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chagué</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Camps</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gille-Levenson</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Brisville-Fertin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gervers</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Boutreux</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Manton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gabay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. O'Connor</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Haverals</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Vandyck</surname>
            , and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Kiessling</surname>
          </string-name>
          . “CATMuS Medieval:
          <article-title>A multilingual large-scale cross-century dataset in Latin script for handwritten text recognition and beyond”</article-title>
          .
          <source>In: 2024 International Conference on Document Analysis and Recognition (ICDAR)</source>
          . Athens, Greece,
          <year>2024</year>
          . url: https://inria.hal.
          <source>science/hal-0445395</source>
          .
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Frinken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fornés</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Bunke</surname>
          </string-name>
          . “
          <article-title>Transcription Alignment of Latin Manuscripts using Hidden Markov Models”</article-title>
          .
          <source>InP:roceedings of the 2011 Workshop on Historical Document Imaging and Processing</source>
          .
          <year>2011</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wuthrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liwicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Frinken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bunke</surname>
          </string-name>
          , G. Viehhauser, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Stolz</surname>
          </string-name>
          . “
          <article-title>Automatic Transcription of Handwritten Medieval Documents”</article-title>
          .
          <source>I2n0:09 15th International Conference on Virtual Systems and Multimedia. Ieee</source>
          .
          <year>2009</year>
          , pp.
          <fpage>137</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Graves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          . “
          <article-title>Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks”</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on Machine learning. Acm</source>
          .
          <year>2006</year>
          , pp.
          <fpage>369</fpage>
          -
          <lpage>376</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Gueville</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Wrisley</surname>
          </string-name>
          . “
          <article-title>Transcribing Medieval Manuscripts for Machine Learning”</article-title>
          .
          <year>2023</year>
          . url: https://shs.hal.
          <source>science/halshs-0372516 6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gulati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-C. Chiu</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            , W. Han,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            , Y. Wu, and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Pang</surname>
          </string-name>
          . “Conformer:
          <article-title>Convolution-augmented Transformer for Speech Recognition”</article-title>
          .
          <source>In: Proc. Interspeech</source>
          <year>2020</year>
          .
          <year>2020</year>
          , pp.
          <fpage>5036</fpage>
          -
          <lpage>5040</lpage>
          . doi:
          <volume>10</volume>
          .21437/Interspeech.2020-
          <volume>3015</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          . “
          <article-title>Kraken - a Universal Text Recognizer for the Humanities”</article-title>
          .
          <source>IAnD:HO</source>
          ,
          <string-name>
            <surname>Éd</surname>
            <given-names>.</given-names>
          </string-name>
          , Actes de Digital Humanities Conference.
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Amplayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Seo</surname>
          </string-name>
          , and S.-w. Hwang. “
          <article-title>Categorical Metadata Representation for Customized Text Classification”</article-title>
          .
          <source>In:Transactions of the Association for Computational Linguistics</source>
          <volume>7</volume>
          (
          <year>2019</year>
          ), pp.
          <fpage>201</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Florencio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          . “TrOCR:
          <article-title>Transformer-based Optical Character Recognition with Pre-trained Models”P</article-title>
          .rIonc:
          <source>eedings of the AAAI Conference on Artificial Intelligence</source>
          . Vol.
          <volume>37</volume>
          . 11.
          <year>2023</year>
          , pp.
          <fpage>13094</fpage>
          -
          <lpage>13102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          . “
          <article-title>Decoupled Weight Decay Regularization”</article-title>
          .
          <source>IInnt:ernational Conference on Learning Representations</source>
          .
          <year>2019</year>
          . url: https://openreview.net/forum?id=Bk g6RiCqY7.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chagué</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Camps</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Vlachou-Efstathiou</surname>
            ,
            <given-names>M. Gille</given-names>
          </string-name>
          <string-name>
            <surname>Levenson</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Brisville-Fertin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Boschetti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gervers</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Boutreux</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Manton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gabay</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Haverals</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Vandyck</surname>
          </string-name>
          , and
          <string-name>
            <surname>P. O'Connor. “</surname>
          </string-name>
          CATMuS-Medieval:
          <article-title>Consistent Approaches to Transcribing ManuScripts”</article-title>
          .
          <source>InD: h2024. Adho</source>
          . Washington DC,
          <string-name>
            <surname>United</surname>
            <given-names>States</given-names>
          </string-name>
          ,
          <year>2024</year>
          . url: https://inria.hal.
          <source>science/hal-0434693</source>
          .9 [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rekesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. Rao</given-names>
            <surname>Koluguri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kriman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Noroozi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hrinchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Puvvada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Balam</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Ginsburg</surname>
          </string-name>
          . “
          <article-title>Fast Conformer with Linearly Scalable Attention for EfÏcient Speech Recognition”</article-title>
          . In: arXiv e-prints,
          <source>arXiv:2305.05084</source>
          (
          <year>2023</year>
          ), arXiv:
          <fpage>2305</fpage>
          .05084. doi:
          <volume>10</volume>
          .48550/arXiv.2305.05084. arXiv:
          <volume>2305</volume>
          .05084 [eess.AS].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Smith</surname>
          </string-name>
          . “
          <article-title>An Overview of the Tesseract OCR Engine”</article-title>
          .
          <source>In:Proceedings of the Ninth International Conference on Document Analysis and Recognition -</source>
          Volume
          <volume>02</volume>
          . Icdar '
          <volume>07</volume>
          . Usa: IEEE Computer Society,
          <year>2007</year>
          , pp.
          <fpage>629</fpage>
          -
          <lpage>633</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Stutzmann</surname>
          </string-name>
          . Fontenay Dataset.
          <source>Original Charters From Fontenay before 1213</source>
          .
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Stutzmann</surname>
          </string-name>
          . “
          <article-title>Words as graphic and linguistic structures. Word spacing in Psalm 101 Domine exaudi orationem meam (eleventh-fiteenth centuries)”</article-title>
          . In:
          <article-title>Les Mots au Moyen Âge - Words in the Middle Ages</article-title>
          .
          <source>Utrecht Studies in Medieval Literacy 46. Turnhout: Brepols</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>59</lpage>
          . url:
          <volume>10</volume>
          .1484/m.usml-eb.
          <volume>5</volume>
          .120721.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Stutzmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Aguilar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Chafenet.</surname>
          </string-name>
          HOME-Alcar: Aligned and
          <string-name>
            <given-names>Annotated</given-names>
            <surname>Cartularies</surname>
          </string-name>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Stutzmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-F. MoufÒet</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hamel</surname>
          </string-name>
          . “
          <article-title>La recherche en plein texte dans les sources manuscrites médiévales: enjeux et perspectives du projet HIMANIS pour l'édition électronique”</article-title>
          . In:Médiévales (
          <year>2017</year>
          ), pp.
          <fpage>67</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Tomoiaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Salzmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Jayet</surname>
          </string-name>
          . “
          <article-title>Field Typing for Improved Recognition on Heterogeneous Handwritten Forms”</article-title>
          .
          <source>In2:019 International Conference on Document Analysis and Recognition (ICDAR)</source>
          .
          <source>IEEE Computer Society</source>
          .
          <year>2019</year>
          , pp.
          <fpage>487</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S. Torres</given-names>
            <surname>Aguilar</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Jolivet</surname>
          </string-name>
          . “
          <article-title>Handwritten Text Recognition for Documentary Medieval Manuscripts”</article-title>
          .
          <article-title>InJ:ournal of Data Mining and Digital Humanities Historical Documents and automatic text recognition (</article-title>
          <year>2023</year>
          ).
          <year>doi1</year>
          :
          <fpage>0</fpage>
          .46298/jdmdh.10484.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          Paris, BnF, lat. 130 Paris, BnF, lat. 8001 Paris, BnF, lat. 7499 Paris, BnF, fr. 1881 Paris, BnF, fr. 604 Paris, BnF, fr. 413 Paris, BnF, lat. 14650 Paris, Bibliothèque inter-universitaire
          <source>de la Sorbonne</source>
          , 193 Paris, BnF, lat. 10996 Paris, BnF, esp. 368 Paris, BnF, ita. 481 Florence, Biblioteca Medicea Laurenziana, Laur. Plut.
          <volume>39</volume>
          .34 Paris, BnF, Smith-Lesouëf 16 Paris, BnF, esp. 36 Paris, BnF, lat. 17903 Montpellier, Bibliothèque universitaire Historique de Médecine, H318 Paris, BnF, Rés. YE-1325 Madrid, BNE, MSS. 3995 Paris, BnF, fr. 2701 Paris, BnF, lat. 14137 Paris, BnF, fr. 574 Paris, BnF, fr. 13496 Paris, BnF, fr. 747 Paris, BnF, fr. 6447 Paris, BnF, fr. 23117 Paris, BnF, NAL 730 Vienna, ÖNB,
          <volume>12</volume>
          .905 Paris, BnF, esp. 65 Paris, BnF, ita. 783
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>